Science.gov

Sample records for parallel discrete-event simulation

  1. Synchronization Of Parallel Discrete Event Simulations

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S.

    1992-01-01

    Adaptive, parallel, discrete-event-simulation-synchronization algorithm, Breathing Time Buckets, developed in Synchronous Parallel Environment for Emulation and Discrete Event Simulation (SPEEDES) operating system. Algorithm allows parallel simulations to process events optimistically in fluctuating time cycles that naturally adapt while simulation in progress. Combines best of optimistic and conservative synchronization strategies while avoiding major disadvantages. Algorithm processes events optimistically in time cycles adapting while simulation in progress. Well suited for modeling communication networks, for large-scale war games, for simulated flights of aircraft, for simulations of computer equipment, for mathematical modeling, for interactive engineering simulations, and for depictions of flows of information.

  2. Program For Parallel Discrete-Event Simulation

    NASA Technical Reports Server (NTRS)

    Beckman, Brian C.; Blume, Leo R.; Geiselman, John S.; Presley, Matthew T.; Wedel, John J., Jr.; Bellenot, Steven F.; Diloreto, Michael; Hontalas, Philip J.; Reiher, Peter L.; Weiland, Frederick P.

    1991-01-01

    User does not have to add any special logic to aid in synchronization. Time Warp Operating System (TWOS) computer program is special-purpose operating system designed to support parallel discrete-event simulation. Complete implementation of Time Warp mechanism. Supports only simulations and other computations designed for virtual time. Time Warp Simulator (TWSIM) subdirectory contains sequential simulation engine interface-compatible with TWOS. TWOS and TWSIM written in, and support simulations in, C programming language.

  3. Running Parallel Discrete Event Simulators on Sierra

    SciTech Connect

    Barnes, P. D.; Jefferson, D. R.

    2015-12-03

    In this proposal we consider porting the ROSS/Charm++ simulator and the discrete event models that run under its control so that they run on the Sierra architecture and make efficient use of the Volta GPUs.

  4. An adaptive synchronization protocol for parallel discrete event simulation

    SciTech Connect

    Bisset, K.R.

    1998-12-01

    Simulation, especially discrete event simulation (DES), is used in a variety of disciplines where numerical methods are difficult or impossible to apply. One problem with this method is that a sufficiently detailed simulation may take hours or days to execute, and multiple runs may be needed in order to generate the desired results. Parallel discrete event simulation (PDES) has been explored for many years as a method to decrease the time taken to execute a simulation. Many protocols have been developed which work well for particular types of simulations, but perform poorly when used for other types of simulations. Often it is difficult to know a priori whether a particular protocol is appropriate for a given problem. In this work, an adaptive synchronization method (ASM) is developed which works well on an entire spectrum of problems. The ASM determines, using an artificial neural network (ANN), the likelihood that a particular event is safe to process.

  5. The effects of parallel processing architectures on discrete event simulation

    NASA Astrophysics Data System (ADS)

    Cave, William; Slatt, Edward; Wassmer, Robert E.

    2005-05-01

    As systems become more complex, particularly those containing embedded decision algorithms, mathematical modeling presents a rigid framework that often impedes representation to a sufficient level of detail. Using discrete event simulation, one can build models that more closely represent physical reality, with actual algorithms incorporated in the simulations. Higher levels of detail increase simulation run time. Hardware designers have succeeded in producing parallel and distributed processor computers with theoretical speeds well into the teraflop range. However, the practical use of these machines on all but some very special problems is extremely limited. The inability to use this power is due to great difficulties encountered when trying to translate real world problems into software that makes effective use of highly parallel machines. This paper addresses the application of parallel processing to simulations of real world systems of varying inherent parallelism. It provides a brief background in modeling and simulation validity and describes a parameter that can be used in discrete event simulation to vary opportunities for parallel processing at the expense of absolute time synchronization and is constrained by validity. It focuses on the effects of model architecture, run-time software architecture, and parallel processor architecture on speed, while providing an environment where modelers can achieve sufficient model accuracy to produce valid simulation results. It describes an approach to simulation development that captures subject area expert knowledge to leverage inherent parallelism in systems in the following ways: * Data structures are separated from instructions to track which instruction sets share what data. This is used to determine independence and thus the potential for concurrent processing at run-time. * Model connectivity (independence) can be inspected visually to determine if the inherent parallelism of a physical system is properly represented. Models need not be changed to move from a single processor to parallel processor hardware architectures. * Knowledge of the architectural parallelism is stored within the system and used during run time to allocate processors to processes in a maximally efficient way.

  6. Parallel discrete-event simulation of FCFS stochastic queueing networks

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1988-01-01

    Physical systems are inherently parallel. Intuition suggests that simulations of these systems may be amenable to parallel execution. The parallel execution of a discrete-event simulation requires careful synchronization of processes in order to ensure the execution's correctness; this synchronization can degrade performance. Largely negative results were recently reported in a study which used a well-known synchronization method on queueing network simulations. Discussed here is a synchronization method (appointments), which has proven itself to be effective on simulations of FCFS queueing networks. The key concept behind appointments is the provision of lookahead. Lookahead is a prediction on a processor's future behavior, based on an analysis of the processor's simulation state. It is shown how lookahead can be computed for FCFS queueing network simulations, give performance data that demonstrates the method's effectiveness under moderate to heavy loads, and discuss performance tradeoffs between the quality of lookahead, and the cost of computing lookahead.

  7. Parallel discrete event simulation: A shared memory approach

    NASA Technical Reports Server (NTRS)

    Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

    1987-01-01

    With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.

  8. The cost of conservative synchronization in parallel discrete event simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1990-01-01

    The performance of a synchronous conservative parallel discrete-event simulation protocol is analyzed. The class of simulation models considered is oriented around a physical domain and possesses a limited ability to predict future behavior. A stochastic model is used to show that as the volume of simulation activity in the model increases relative to a fixed architecture, the complexity of the average per-event overhead due to synchronization, event list manipulation, lookahead calculations, and processor idle time approach the complexity of the average per-event overhead of a serial simulation. The method is therefore within a constant factor of optimal. The analysis demonstrates that on large problems--those for which parallel processing is ideally suited--there is often enough parallel workload so that processors are not usually idle. The viability of the method is also demonstrated empirically, showing how good performance is achieved on large problems using a thirty-two node Intel iPSC/2 distributed memory multiprocessor.

  9. Synchronous parallel system for emulation and discrete event simulation

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (inventor)

    1992-01-01

    A synchronous parallel system for emulation and discrete event simulation having parallel nodes responds to received messages at each node by generating event objects having individual time stamps, stores only the changes to state variables of the simulation object attributable to the event object, and produces corresponding messages. The system refrains from transmitting the messages and changing the state variables while it determines whether the changes are superseded, and then stores the unchanged state variables in the event object for later restoral to the simulation object if called for. This determination preferably includes sensing the time stamp of each new event object and determining which new event object has the earliest time stamp as the local event horizon, determining the earliest local event horizon of the nodes as the global event horizon, and ignoring the events whose time stamps are less than the global event horizon. Host processing between the system and external terminals enables such a terminal to query, monitor, command or participate with a simulation object during the simulation process.

  10. Model for the evolution of the time profile in optimistic parallel discrete event simulations

    NASA Astrophysics Data System (ADS)

    Ziganurova, L.; Novotny, M. A.; Shchur, L. N.

    2016-02-01

    We investigate synchronisation aspects of an optimistic algorithm for parallel discrete event simulations (PDES). We present a model for the time evolution in optimistic PDES. This model evaluates the local virtual time profile of the processing elements. We argue that the evolution of the time profile is reminiscent of the surface profile in the directed percolation problem and in unrestricted surface growth. We present results of the simulation of the model and emphasise predictive features of our approach.

  11. SPEEDES - A multiple-synchronization environment for parallel discrete-event simulation

    NASA Technical Reports Server (NTRS)

    Steinman, Jeff S.

    1992-01-01

    Synchronous Parallel Environment for Emulation and Discrete-Event Simulation (SPEEDES) is a unified parallel simulation environment. It supports multiple-synchronization protocols without requiring users to recompile their code. When a SPEEDES simulation runs on one node, all the extra parallel overhead is removed automatically at run time. When the same executable runs in parallel, the user preselects the synchronization algorithm from a list of options. SPEEDES currently runs on UNIX networks and on the California Institute of Technology/Jet Propulsion Laboratory Mark III Hypercube. SPEEDES also supports interactive simulations. Featured in the SPEEDES environment is a new parallel synchronization approach called Breathing Time Buckets. This algorithm uses some of the conservative techniques found in Time Bucket synchronization, along with the optimism that characterizes the Time Warp approach. A mathematical model derived from first principles predicts the performance of Breathing Time Buckets. Along with the Breathing Time Buckets algorithm, this paper discusses the rules for processing events in SPEEDES, describes the implementation of various other synchronization protocols supported by SPEEDES, describes some new ones for the future, discusses interactive simulations, and then gives some performance results.

  12. Explicit spatial scattering for load balancing in conservatively synchronized parallel discrete-event simulations

    SciTech Connect

    Thulasidasan, Sunil; Kasiviswanathan, Shiva; Eidenbenz, Stephan; Romero, Philip

    2010-01-01

    We re-examine the problem of load balancing in conservatively synchronized parallel, discrete-event simulations executed on high-performance computing clusters, focusing on simulations where computational and messaging load tend to be spatially clustered. Such domains are frequently characterized by the presence of geographic 'hot-spots' - regions that generate significantly more simulation events than others. Examples of such domains include simulation of urban regions, transportation networks and networks where interaction between entities is often constrained by physical proximity. Noting that in conservatively synchronized parallel simulations, the speed of execution of the simulation is determined by the slowest (i.e most heavily loaded) simulation process, we study different partitioning strategies in achieving equitable processor-load distribution in domains with spatially clustered load. In particular, we study the effectiveness of partitioning via spatial scattering to achieve optimal load balance. In this partitioning technique, nearby entities are explicitly assigned to different processors, thereby scattering the load across the cluster. This is motivated by two observations, namely, (i) since load is spatially clustered, spatial scattering should, intuitively, spread the load across the compute cluster, and (ii) in parallel simulations, equitable distribution of CPU load is a greater determinant of execution speed than message passing overhead. Through large-scale simulation experiments - both of abstracted and real simulation models - we observe that scatter partitioning, even with its greatly increased messaging overhead, significantly outperforms more conventional spatial partitioning techniques that seek to reduce messaging overhead. Further, even if hot-spots change over the course of the simulation, if the underlying feature of spatial clustering is retained, load continues to be balanced with spatial scattering leading us to the observation that spatial scattering can often obviate the need for dynamic load balancing.

  13. Optimized Hypervisor Scheduler for Parallel Discrete Event Simulations on Virtual Machine Platforms

    SciTech Connect

    Yoginath, Srikanth B; Perumalla, Kalyan S

    2013-01-01

    With the advent of virtual machine (VM)-based platforms for parallel computing, it is now possible to execute parallel discrete event simulations (PDES) over multiple virtual machines, in contrast to executing in native mode directly over hardware as is traditionally done over the past decades. While mature VM-based parallel systems now offer new, compelling benefits such as serviceability, dynamic reconfigurability and overall cost effectiveness, the runtime performance of parallel applications can be significantly affected. In particular, most VM-based platforms are optimized for general workloads, but PDES execution exhibits unique dynamics significantly different from other workloads. Here we first present results from experiments that highlight the gross deterioration of the runtime performance of VM-based PDES simulations when executed using traditional VM schedulers, quantitatively showing the bad scaling properties of the scheduler as the number of VMs is increased. The mismatch is fundamental in nature in the sense that any fairness-based VM scheduler implementation would exhibit this mismatch with PDES runs. We also present a new scheduler optimized specifically for PDES applications, and describe its design and implementation. Experimental results obtained from running PDES benchmarks (PHOLD and vehicular traffic simulations) over VMs show over an order of magnitude improvement in the run time of the PDES-optimized scheduler relative to the regular VM scheduler, with over 20 reduction in run time of simulations using up to 64 VMs. The observations and results are timely in the context of emerging systems such as cloud platforms and VM-based high performance computing installations, highlighting to the community the need for PDES-specific support, and the feasibility of significantly reducing the runtime overhead for scalable PDES on VM platforms.

  14. Synchronous Parallel Emulation and Discrete Event Simulation System with Self-Contained Simulation Objects and Active Event Objects

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor)

    1998-01-01

    The present invention is embodied in a method of performing object-oriented simulation and a system having inter-connected processor nodes operating in parallel to simulate mutual interactions of a set of discrete simulation objects distributed among the nodes as a sequence of discrete events changing state variables of respective simulation objects so as to generate new event-defining messages addressed to respective ones of the nodes. The object-oriented simulation is performed at each one of the nodes by assigning passive self-contained simulation objects to each one of the nodes, responding to messages received at one node by generating corresponding active event objects having user-defined inherent capabilities and individual time stamps and corresponding to respective events affecting one of the passive self-contained simulation objects of the one node, restricting the respective passive self-contained simulation objects to only providing and receiving information from die respective active event objects, requesting information and changing variables within a passive self-contained simulation object by the active event object, and producing corresponding messages specifying events resulting therefrom by the active event objects.

  15. A discrete event method for wave simulation

    SciTech Connect

    Nutaro, James J

    2006-01-01

    This article describes a discrete event interpretation of the finite difference time domain (FDTD) and digital wave guide network (DWN) wave simulation schemes. The discrete event method is formalized using the discrete event system specification (DEVS). The scheme is shown to have errors that are proportional to the resolution of the spatial grid. A numerical example demonstrates the relative efficiency of the scheme with respect to FDTD and DWN schemes. The potential for the discrete event scheme to reduce numerical dispersion and attenuation errors is discussed.

  16. Distributed discrete event simulation. Final report

    SciTech Connect

    De Vries, R.C.

    1988-02-01

    The presentation given here is restricted to discrete event simulation. The complexity of and time required for many present and potential discrete simulations exceeds the reasonable capacity of most present serial computers. The desire, then, is to implement the simulations on a parallel machine. However, certain problems arise in an effort to program the simulation on a parallel machine. In one category of methods deadlock care arise and some method is required to either detect deadlock and recover from it or to avoid deadlock through information passing. In the second category of methods, potentially incorrect simulations are allowed to proceed. If the situation is later determined to be incorrect, recovery from the error must be initiated. In either case, computation and information passing are required which would not be required in a serial implementation. The net effect is that the parallel simulation may not be much better than a serial simulation. In an effort to determine alternate approaches, important papers in the area were reviewed. As a part of that review process, each of the papers was summarized. The summary of each paper is presented in this report in the hopes that those doing future work in the area will be able to gain insight that might not otherwise be available, and to aid in deciding which papers would be most beneficial to pursue in more detail. The papers are broken down into categories and then by author. Conclusions reached after examining the papers and other material, such as direct talks with an author, are presented in the last section. Also presented there are some ideas that surfaced late in the research effort. These promise to be of some benefit in limiting information which must be passed between processes and in better understanding the structure of a distributed simulation. Pursuit of these ideas seems appropriate.

  17. Optimization of Operations Resources via Discrete Event Simulation Modeling

    NASA Technical Reports Server (NTRS)

    Joshi, B.; Morris, D.; White, N.; Unal, R.

    1996-01-01

    The resource levels required for operation and support of reusable launch vehicles are typically defined through discrete event simulation modeling. Minimizing these resources constitutes an optimization problem involving discrete variables and simulation. Conventional approaches to solve such optimization problems involving integer valued decision variables are the pattern search and statistical methods. However, in a simulation environment that is characterized by search spaces of unknown topology and stochastic measures, these optimization approaches often prove inadequate. In this paper, we have explored the applicability of genetic algorithms to the simulation domain. Genetic algorithms provide a robust search strategy that does not require continuity and differentiability of the problem domain. The genetic algorithm successfully minimized the operation and support activities for a space vehicle, through a discrete event simulation model. The practical issues associated with simulation optimization, such as stochastic variables and constraints, were also taken into consideration.

  18. On constructing optimistic simulation algorithms for the discrete event system specification

    SciTech Connect

    Nutaro, James J

    2008-01-01

    This article describes a Time Warp simulation algorithm for discrete event models that are described in terms of the Discrete Event System Specification (DEVS). The article shows how the total state transition and total output function of a DEVS atomic model can be transformed into an event processing procedure for a logical process. A specific Time Warp algorithm is constructed around this logical process, and it is shown that the algorithm correctly simulates a DEVS coupled model that consists entirely of interacting atomic models. The simulation algorithm is presented abstractly; it is intended to provide a basis for implementing efficient and scalable parallel algorithms that correctly simulate DEVS models.

  19. Synchronization of autonomous objects in discrete event simulation

    NASA Technical Reports Server (NTRS)

    Rogers, Ralph V.

    1990-01-01

    Autonomous objects in event-driven discrete event simulation offer the potential to combine the freedom of unrestricted movement and positional accuracy through Euclidean space of time-driven models with the computational efficiency of event-driven simulation. The principal challenge to autonomous object implementation is object synchronization. The concept of a spatial blackboard is offered as a potential methodology for synchronization. The issues facing implementation of a spatial blackboard are outlined and discussed.

  20. Reversible Discrete Event Formulation and Optimistic Parallel Execution of Vehicular Traffic Models

    SciTech Connect

    Yoginath, Srikanth B; Perumalla, Kalyan S

    2009-01-01

    Vehicular traffic simulations are useful in applications such as emergency planning and traffic management. High speed of traffic simulations translates to speed of response and level of resilience in those applications. Discrete event formulation of traffic flow at the level of individual vehicles affords both the flexibility of simulating complex scenarios of vehicular flow behavior as well as rapid simulation time advances. However, efficient parallel/distributed execution of the models becomes challenging due to synchronization overheads. Here, a parallel traffic simulation approach is presented that is aimed at reducing the time for simulating emergency vehicular traffic scenarios. Our approach resolves the challenges that arise in parallel execution of microscopic, vehicular-level models of traffic. We apply a reverse computation-based optimistic execution approach to address the parallel synchronization problem. This is achieved by formulating a reversible version of a discrete event model of vehicular traffic, and by utilizing this reversible model in an optimistic execution setting. Three unique aspects of this effort are: (1) exploration of optimistic simulation applied to vehicular traffic simulation (2) addressing reverse computation challenges specific to optimistic vehicular traffic simulation (3) achieving absolute (as opposed to self-relative) speedup with a sequential speed close to that of a fast, de facto standard sequential simulator for emergency traffic. The design and development of the parallel simulation system is presented, along with a performance study that demonstrates excellent sequential performance as well as parallel performance. The benefits of optimistic execution are demonstrated, including a speed up of nearly 20 on 32 processors observed on a vehicular network of over 65,000 intersections and over 13 million vehicles.

  1. Reversible Parallel Discrete-Event Execution of Large-scale Epidemic Outbreak Models

    SciTech Connect

    Perumalla, Kalyan S; Seal, Sudip K

    2010-01-01

    The spatial scale, runtime speed and behavioral detail of epidemic outbreak simulations together require the use of large-scale parallel processing. In this paper, an optimistic parallel discrete event execution of a reaction-diffusion simulation model of epidemic outbreaks is presented, with an implementation over the $\\mu$sik simulator. Rollback support is achieved with the development of a novel reversible model that combines reverse computation with a small amount of incremental state saving. Parallel speedup and other runtime performance metrics of the simulation are tested on a small (8,192-core) Blue Gene / P system, while scalability is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes (up to several hundred million individuals in the largest case) are exercised.

  2. Reversible Parallel Discrete Event Formulation of a TLM-based Radio Signal Propagation Model

    SciTech Connect

    Seal, Sudip K; Perumalla, Kalyan S

    2011-01-01

    Radio signal strength estimation is essential in many applications, including the design of military radio communications and industrial wireless installations. For scenarios with large or richly- featured geographical volumes, parallel processing is required to meet the memory and computa- tion time demands. Here, we present a scalable and efficient parallel execution of the sequential model for radio signal propagation recently developed by Nutaro et al. Starting with that model, we (a) provide a vector-based reformulation that has significantly lower computational overhead for event handling, (b) develop a parallel decomposition approach that is amenable to reversibility with minimal computational overheads, (c) present a framework for transparently mapping the conservative time-stepped model into an optimistic parallel discrete event execution, (d) present a new reversible method, along with its analysis and implementation, for inverting the vector-based event model to be executed in an optimistic parallel style of execution, and (e) present performance results from implementation on Cray XT platforms. We demonstrate scalability, with the largest runs tested on up to 127,500 cores of a Cray XT5, enabling simulation of larger scenarios and with faster execution than reported before on the radio propagation model. This also represents the first successful demonstration of the ability to efficiently map a conservative time-stepped model to an optimistic discrete-event execution.

  3. Discrete event simulation in an artificial intelligence environment: Some examples

    SciTech Connect

    Roberts, D.J.; Farish, T.

    1991-01-01

    Several Los Alamos National Laboratory (LANL) object-oriented discrete-event simulation efforts have been completed during the past three years. One of these systems has been put into production and has a growing customer base. Another (started two years earlier than the first project) was completed but has not yet been used. This paper will describe these simulation projects. Factors which were pertinent to the success of the one project, and to the failure of the second project will be discussed (success will be measured as the extent to which the simulation model was used as originally intended). 5 figs.

  4. Advances in Discrete-Event Simulation for MSL Command Validation

    NASA Technical Reports Server (NTRS)

    Patrikalakis, Alexander; O'Reilly, Taifun

    2013-01-01

    In the last five years, the discrete event simulator, SEQuence GENerator (SEQGEN), developed at the Jet Propulsion Laboratory to plan deep-space missions, has greatly increased uplink operations capacity to deal with increasingly complicated missions. In this paper, we describe how the Mars Science Laboratory (MSL) project makes full use of an interpreted environment to simulate change in more than fifty thousand flight software parameters and conditional command sequences to predict the result of executing a conditional branch in a command sequence, and enable the ability to warn users whenever one or more simulated spacecraft states change in an unexpected manner. Using these new SEQGEN features, operators plan more activities in one sol than ever before.

  5. Discrete Event Modeling and Massively Parallel Execution of Epidemic Outbreak Phenomena

    SciTech Connect

    Perumalla, Kalyan S; Seal, Sudip K

    2011-01-01

    In complex phenomena such as epidemiological outbreaks, the intensity of inherent feedback effects and the significant role of transients in the dynamics make simulation the only effective method for proactive, reactive or post-facto analysis. The spatial scale, runtime speed, and behavioral detail needed in detailed simulations of epidemic outbreaks make it necessary to use large-scale parallel processing. Here, an optimistic parallel execution of a new discrete event formulation of a reaction-diffusion simulation model of epidemic propagation is presented to facilitate in dramatically increasing the fidelity and speed by which epidemiological simulations can be performed. Rollback support needed during optimistic parallel execution is achieved by combining reverse computation with a small amount of incremental state saving. Parallel speedup of over 5,500 and other runtime performance metrics of the system are observed with weak-scaling execution on a small (8,192-core) Blue Gene / P system, while scalability with a weak-scaling speedup of over 10,000 is demonstrated on 65,536 cores of a large Cray XT5 system. Scenarios representing large population sizes exceeding several hundreds of millions of individuals in the largest cases are successfully exercised to verify model scalability.

  6. Predicting Liver Transplant Capacity Using Discrete Event Simulation.

    PubMed

    Toro-Diaz, Hector; Mayorga, Maria E; Barritt, A Sidney; Orman, Eric S; Wheeler, Stephanie B

    2014-11-12

    The number of liver transplants (LTs) performed in the US increased until 2006 but has since declined despite an ongoing increase in demand. This decline may be due in part to decreased donor liver quality and increasing discard of poor-quality livers. We constructed a discrete event simulation (DES) model informed by current donor characteristics to predict future LT trends through the year 2030. The data source for our model is the United Network for Organ Sharing database, which contains patient-level information on all organ transplants performed in the US. Previous analysis showed that liver discard is increasing and that discarded organs are more often from donors who are older, are obese, have diabetes, and donated after cardiac death. Given that the prevalence of these factors is increasing, the DES model quantifies the reduction in the number of LTs performed through 2030. In addition, the model estimatesthe total number of future donors needed to maintain the current volume of LTs and the effect of a hypothetical scenario of improved reperfusion technology.We also forecast the number of patients on the waiting list and compare this with the estimated number of LTs to illustrate the impact that decreased LTs will have on patients needing transplants. By altering assumptions about the future donor pool, this model can be used to develop policy interventions to prevent a further decline in this lifesaving therapy. To our knowledge, there are no similar predictive models of future LT use based on epidemiological trends. PMID:25391681

  7. Enhancing Complex System Performance Using Discrete-Event Simulation

    SciTech Connect

    Allgood, Glenn O; Olama, Mohammed M; Lake, Joe E

    2010-01-01

    In this paper, we utilize discrete-event simulation (DES) merged with human factors analysis to provide the venue within which the separation and deconfliction of the system/human operating principles can occur. A concrete example is presented to illustrate the performance enhancement gains for an aviation cargo flow and security inspection system achieved through the development and use of a process DES. The overall performance of the system is computed, analyzed, and optimized for the different system dynamics. Various performance measures are considered such as system capacity, residual capacity, and total number of pallets waiting for inspection in the queue. These metrics are performance indicators of the system's ability to service current needs and respond to additional requests. We studied and analyzed different scenarios by changing various model parameters such as the number of pieces per pallet ratio, number of inspectors and cargo handling personnel, number of forklifts, number and types of detection systems, inspection modality distribution, alarm rate, and cargo closeout time. The increased physical understanding resulting from execution of the queuing model utilizing these vetted performance measures identified effective ways to meet inspection requirements while maintaining or reducing overall operational cost and eliminating any shipping delays associated with any proposed changes in inspection requirements. With this understanding effective operational strategies can be developed to optimally use personnel while still maintaining plant efficiency, reducing process interruptions, and holding or reducing costs.

  8. DISCRETE EVENT SIMULATION OF OPTICAL SWITCH MATRIX PERFORMANCE IN COMPUTER NETWORKS

    SciTech Connect

    Imam, Neena; Poole, Stephen W

    2013-01-01

    In this paper, we present application of a Discrete Event Simulator (DES) for performance modeling of optical switching devices in computer networks. Network simulators are valuable tools in situations where one cannot investigate the system directly. This situation may arise if the system under study does not exist yet or the cost of studying the system directly is prohibitive. Most available network simulators are based on the paradigm of discrete-event-based simulation. As computer networks become increasingly larger and more complex, sophisticated DES tool chains have become available for both commercial and academic research. Some well-known simulators are NS2, NS3, OPNET, and OMNEST. For this research, we have applied OMNEST for the purpose of simulating multi-wavelength performance of optical switch matrices in computer interconnection networks. Our results suggest that the application of DES to computer interconnection networks provides valuable insight in device performance and aids in topology and system optimization.

  9. Using Discrete Event Simulation to predict KPI's at a Projected Emergency Room.

    PubMed

    Concha, Pablo; Neriz, Liliana; Parada, Danilo; Ramis, Francisco

    2015-01-01

    Discrete Event Simulation (DES) is a powerful factor in the design of clinical facilities. DES enables facilities to be built or adapted to achieve the expected Key Performance Indicators (KPI's) such as average waiting times according to acuity, average stay times and others. Our computational model was built and validated using expert judgment and supporting statistical data. One scenario studied resulted in a 50% decrease in the average cycle time of patients compared to the original model, mainly by modifying the patient's attention model. PMID:26262262

  10. Architecture for explicit representation of cause and function in discrete event simulation modeling

    NASA Astrophysics Data System (ADS)

    Larson, Raymond A.; Slagle, James R.

    1993-03-01

    For simulations that are to enable human understanding of the simulated system, inspectable models are highly desirable. A model is inspectable to the extent that its observer can access and interpret each of its reasoning steps. Development of an inspectable model requires an architecture that represents explicitly some of the key types of relationships among domain objects. Although there are formalisms that address some of these relationships, none attempts a representation for a dynamic simulation that is both comprehensive and general. This paper defines an architecture that meets these requirements for well-defined, engineered systems. The architecture combines features from two formalisms, namely, discrete event simulation modeling, and functional representation. Besides representing generalization-specialization and multi-level part-whole composition in a dynamic world, we provide an explicit representation for causality and functionality. We set up a framework for developing well-integrated systems, and an architecture for the core components of a `shell.' We use a complex but sparse class structure that allows for composition of elementary units into structures of arbitrary complexity. Our architecture is event-driven, but uses a distributed timing mechanism. The scope of our research includes definition of the problem and the solution method. Implementation of a prototype to validate key aspects of the architecture is in progress. We have named the architecture `IDEA' for inspectable discrete event architecture.

  11. Tutorial in medical decision modeling incorporating waiting lines and queues using discrete event simulation.

    PubMed

    Jahn, Beate; Theurl, Engelbert; Siebert, Uwe; Pfeiffer, Karl-Peter

    2010-01-01

    In most decision-analytic models in health care, it is assumed that there is treatment without delay and availability of all required resources. Therefore, waiting times caused by limited resources and their impact on treatment effects and costs often remain unconsidered. Queuing theory enables mathematical analysis and the derivation of several performance measures of queuing systems. Nevertheless, an analytical approach with closed formulas is not always possible. Therefore, simulation techniques are used to evaluate systems that include queuing or waiting, for example, discrete event simulation. To include queuing in decision-analytic models requires a basic knowledge of queuing theory and of the underlying interrelationships. This tutorial introduces queuing theory. Analysts and decision-makers get an understanding of queue characteristics, modeling features, and its strength. Conceptual issues are covered, but the emphasis is on practical issues like modeling the arrival of patients. The treatment of coronary artery disease with percutaneous coronary intervention including stent placement serves as an illustrative queuing example. Discrete event simulation is applied to explicitly model resource capacities, to incorporate waiting lines and queues in the decision-analytic modeling example. PMID:20345550

  12. A Framework for the Optimization of Discrete-Event Simulation Models

    NASA Technical Reports Server (NTRS)

    Joshi, B. D.; Unal, R.; White, N. H.; Morris, W. D.

    1996-01-01

    With the growing use of computer modeling and simulation, in all aspects of engineering, the scope of traditional optimization has to be extended to include simulation models. Some unique aspects have to be addressed while optimizing via stochastic simulation models. The optimization procedure has to explicitly account for the randomness inherent in the stochastic measures predicted by the model. This paper outlines a general purpose framework for optimization of terminating discrete-event simulation models. The methodology combines a chance constraint approach for problem formulation, together with standard statistical estimation and analyses techniques. The applicability of the optimization framework is illustrated by minimizing the operation and support resources of a launch vehicle, through a simulation model.

  13. Discrete-event simulation for the design and evaluation of physical protection systems

    SciTech Connect

    Jordan, S.E.; Snell, M.K.; Madsen, M.M.; Smith, J.S.; Peters, B.A.

    1998-08-01

    This paper explores the use of discrete-event simulation for the design and control of physical protection systems for fixed-site facilities housing items of significant value. It begins by discussing several modeling and simulation activities currently performed in designing and analyzing these protection systems and then discusses capabilities that design/analysis tools should have. The remainder of the article then discusses in detail how some of these new capabilities have been implemented in software to achieve a prototype design and analysis tool. The simulation software technology provides a communications mechanism between a running simulation and one or more external programs. In the prototype security analysis tool, these capabilities are used to facilitate human-in-the-loop interaction and to support a real-time connection to a virtual reality (VR) model of the facility being analyzed. This simulation tool can be used for both training (in real-time mode) and facility analysis and design (in fast mode).

  14. DeMO: An Ontology for Discrete-event Modeling and Simulation.

    PubMed

    Silver, Gregory A; Miller, John A; Hybinette, Maria; Baramidze, Gregory; York, William S

    2011-09-01

    Several fields have created ontologies for their subdomains. For example, the biological sciences have developed extensive ontologies such as the Gene Ontology, which is considered a great success. Ontologies could provide similar advantages to the Modeling and Simulation community. They provide a way to establish common vocabularies and capture knowledge about a particular domain with community-wide agreement. Ontologies can support significantly improved (semantic) search and browsing, integration of heterogeneous information sources, and improved knowledge discovery capabilities. This paper discusses the design and development of an ontology for Modeling and Simulation called the Discrete-event Modeling Ontology (DeMO), and it presents prototype applications that demonstrate various uses and benefits that such an ontology may provide to the Modeling and Simulation community. PMID:22919114

  15. DeMO: An Ontology for Discrete-event Modeling and Simulation

    PubMed Central

    Silver, Gregory A; Miller, John A; Hybinette, Maria; Baramidze, Gregory; York, William S

    2011-01-01

    Several fields have created ontologies for their subdomains. For example, the biological sciences have developed extensive ontologies such as the Gene Ontology, which is considered a great success. Ontologies could provide similar advantages to the Modeling and Simulation community. They provide a way to establish common vocabularies and capture knowledge about a particular domain with community-wide agreement. Ontologies can support significantly improved (semantic) search and browsing, integration of heterogeneous information sources, and improved knowledge discovery capabilities. This paper discusses the design and development of an ontology for Modeling and Simulation called the Discrete-event Modeling Ontology (DeMO), and it presents prototype applications that demonstrate various uses and benefits that such an ontology may provide to the Modeling and Simulation community. PMID:22919114

  16. A conceptual modeling framework for discrete event simulation using hierarchical control structures

    PubMed Central

    Furian, N.; OSullivan, M.; Walker, C.; Vssner, S.; Neubacher, D.

    2015-01-01

    Conceptual Modeling (CM) is a fundamental step in a simulation project. Nevertheless, it is only recently that structured approaches towards the definition and formulation of conceptual models have gained importance in the Discrete Event Simulation (DES) community. As a consequence, frameworks and guidelines for applying CM to DES have emerged and discussion of CM for DES is increasing. However, both the organization of model-components and the identification of behavior and system control from standard CM approaches have shortcomings that limit CMs applicability to DES. Therefore, we discuss the different aspects of previous CM frameworks and identify their limitations. Further, we present the Hierarchical Control Conceptual Modeling framework that pays more attention to the identification of a models system behavior, control policies and dispatching routines and their structured representation within a conceptual model. The framework guides the user step-by-step through the modeling process and is illustrated by a worked example. PMID:26778940

  17. CONFIG - Adapting qualitative modeling and discrete event simulation for design of fault management systems

    NASA Technical Reports Server (NTRS)

    Malin, Jane T.; Basham, Bryan D.

    1989-01-01

    CONFIG is a modeling and simulation tool prototype for analyzing the normal and faulty qualitative behaviors of engineered systems. Qualitative modeling and discrete-event simulation have been adapted and integrated, to support early development, during system design, of software and procedures for management of failures, especially in diagnostic expert systems. Qualitative component models are defined in terms of normal and faulty modes and processes, which are defined by invocation statements and effect statements with time delays. System models are constructed graphically by using instances of components and relations from object-oriented hierarchical model libraries. Extension and reuse of CONFIG models and analysis capabilities in hybrid rule- and model-based expert fault-management support systems are discussed.

  18. Developing Flexible Discrete Event Simulation Models in an Uncertain Policy Environment

    NASA Technical Reports Server (NTRS)

    Miranda, David J.; Fayez, Sam; Steele, Martin J.

    2011-01-01

    On February 1st, 2010 U.S. President Barack Obama submitted to Congress his proposed budget request for Fiscal Year 2011. This budget included significant changes to the National Aeronautics and Space Administration (NASA), including the proposed cancellation of the Constellation Program. This change proved to be controversial and Congressional approval of the program's official cancellation would take many months to complete. During this same period an end-to-end discrete event simulation (DES) model of Constellation operations was being built through the joint efforts of Productivity Apex Inc. (PAl) and Science Applications International Corporation (SAIC) teams under the guidance of NASA. The uncertainty in regards to the Constellation program presented a major challenge to the DES team, as to: continue the development of this program-of-record simulation, while at the same time remain prepared for possible changes to the program. This required the team to rethink how it would develop it's model and make it flexible enough to support possible future vehicles while at the same time be specific enough to support the program-of-record. This challenge was compounded by the fact that this model was being developed through the traditional DES process-orientation which lacked the flexibility of object-oriented approaches. The team met this challenge through significant pre-planning that led to the "modularization" of the model's structure by identifying what was generic, finding natural logic break points, and the standardization of interlogic numbering system. The outcome of this work resulted in a model that not only was ready to be easily modified to support any future rocket programs, but also a model that was extremely structured and organized in a way that facilitated rapid verification. This paper discusses in detail the process the team followed to build this model and the many advantages this method provides builders of traditional process-oriented discrete event simulations.

  19. Statistical and Probabilistic Extensions to Ground Operations' Discrete Event Simulation Modeling

    NASA Technical Reports Server (NTRS)

    Trocine, Linda; Cummings, Nicholas H.; Bazzana, Ashley M.; Rychlik, Nathan; LeCroy, Kenneth L.; Cates, Grant R.

    2010-01-01

    NASA's human exploration initiatives will invest in technologies, public/private partnerships, and infrastructure, paving the way for the expansion of human civilization into the solar system and beyond. As it is has been for the past half century, the Kennedy Space Center will be the embarkation point for humankind's journey into the cosmos. Functioning as a next generation space launch complex, Kennedy's launch pads, integration facilities, processing areas, launch and recovery ranges will bustle with the activities of the world's space transportation providers. In developing this complex, KSC teams work through the potential operational scenarios: conducting trade studies, planning and budgeting for expensive and limited resources, and simulating alternative operational schemes. Numerous tools, among them discrete event simulation (DES), were matured during the Constellation Program to conduct such analyses with the purpose of optimizing the launch complex for maximum efficiency, safety, and flexibility while minimizing life cycle costs. Discrete event simulation is a computer-based modeling technique for complex and dynamic systems where the state of the system changes at discrete points in time and whose inputs may include random variables. DES is used to assess timelines and throughput, and to support operability studies and contingency analyses. It is applicable to any space launch campaign and informs decision-makers of the effects of varying numbers of expensive resources and the impact of off nominal scenarios on measures of performance. In order to develop representative DES models, methods were adopted, exploited, or created to extend traditional uses of DES. The Delphi method was adopted and utilized for task duration estimation. DES software was exploited for probabilistic event variation. A roll-up process was used, which was developed to reuse models and model elements in other less - detailed models. The DES team continues to innovate and expand DES capabilities to address KSC's planning needs.

  20. The effects of indoor environmental exposures on pediatric asthma: a discrete event simulation model

    PubMed Central

    2012-01-01

    Background In the United States, asthma is the most common chronic disease of childhood across all socioeconomic classes and is the most frequent cause of hospitalization among children. Asthma exacerbations have been associated with exposure to residential indoor environmental stressors such as allergens and air pollutants as well as numerous additional factors. Simulation modeling is a valuable tool that can be used to evaluate interventions for complex multifactorial diseases such as asthma but in spite of its flexibility and applicability, modeling applications in either environmental exposures or asthma have been limited to date. Methods We designed a discrete event simulation model to study the effect of environmental factors on asthma exacerbations in school-age children living in low-income multi-family housing. Model outcomes include asthma symptoms, medication use, hospitalizations, and emergency room visits. Environmental factors were linked to percent predicted forced expiratory volume in 1 second (FEV1%), which in turn was linked to risk equations for each outcome. Exposures affecting FEV1% included indoor and outdoor sources of NO2 and PM2.5, cockroach allergen, and dampness as a proxy for mold. Results Model design parameters and equations are described in detail. We evaluated the model by simulating 50,000 children over 10 years and showed that pollutant concentrations and health outcome rates are comparable to values reported in the literature. In an application example, we simulated what would happen if the kitchen and bathroom exhaust fans were improved for the entire cohort, and showed reductions in pollutant concentrations and healthcare utilization rates. Conclusions We describe the design and evaluation of a discrete event simulation model of pediatric asthma for children living in low-income multi-family housing. Our model simulates the effect of environmental factors (combustion pollutants and allergens), medication compliance, seasonality, and medical history on asthma outcomes (symptom-days, medication use, hospitalizations, and emergency room visits). The model can be used to evaluate building interventions and green building construction practices on pollutant concentrations, energy savings, and asthma healthcare utilization costs, and demonstrates the value of a simulation approach for studying complex diseases such as asthma. PMID:22989068

  1. Discrete event simulation tool for analysis of qualitative models of continuous processing systems

    NASA Technical Reports Server (NTRS)

    Malin, Jane T. (Inventor); Basham, Bryan D. (Inventor); Harris, Richard A. (Inventor)

    1990-01-01

    An artificial intelligence design and qualitative modeling tool is disclosed for creating computer models and simulating continuous activities, functions, and/or behavior using developed discrete event techniques. Conveniently, the tool is organized in four modules: library design module, model construction module, simulation module, and experimentation and analysis. The library design module supports the building of library knowledge including component classes and elements pertinent to a particular domain of continuous activities, functions, and behavior being modeled. The continuous behavior is defined discretely with respect to invocation statements, effect statements, and time delays. The functionality of the components is defined in terms of variable cluster instances, independent processes, and modes, further defined in terms of mode transition processes and mode dependent processes. Model construction utilizes the hierarchy of libraries and connects them with appropriate relations. The simulation executes a specialized initialization routine and executes events in a manner that includes selective inherency of characteristics through a time and event schema until the event queue in the simulator is emptied. The experimentation and analysis module supports analysis through the generation of appropriate log files and graphics developments and includes the ability of log file comparisons.

  2. Towards High Performance Discrete-Event Simulations of Smart Electric Grids

    SciTech Connect

    Perumalla, Kalyan S; Nutaro, James J; Yoginath, Srikanth B

    2011-01-01

    Future electric grid technology is envisioned on the notion of a smart grid in which responsive end-user devices play an integral part of the transmission and distribution control systems. Detailed simulation is often the primary choice in analyzing small network designs, and the only choice in analyzing large-scale electric network designs. Here, we identify and articulate the high-performance computing needs underlying high-resolution discrete event simulation of smart electric grid operation large network scenarios such as the entire Eastern Interconnect. We focus on the simulator's most computationally intensive operation, namely, the dynamic numerical solution for the electric grid state, for both time-integration as well as event-detection. We explore solution approaches using general-purpose dense and sparse solvers, and propose a scalable solver specialized for the sparse structures of actual electric networks. Based on experiments with an implementation in the THYME simulator, we identify performance issues and possible solution approaches for smart grid experimentation in the large.

  3. StratBAM: A Discrete-Event Simulation Model to Support Strategic Hospital Bed Capacity Decisions.

    PubMed

    Devapriya, Priyantha; Strömblad, Christopher T B; Bailey, Matthew D; Frazier, Seth; Bulger, John; Kemberling, Sharon T; Wood, Kenneth E

    2015-10-01

    The ability to accurately measure and assess current and potential health care system capacities is an issue of local and national significance. Recent joint statements by the Institute of Medicine and the Agency for Healthcare Research and Quality have emphasized the need to apply industrial and systems engineering principles to improving health care quality and patient safety outcomes. To address this need, a decision support tool was developed for planning and budgeting of current and future bed capacity, and evaluating potential process improvement efforts. The Strategic Bed Analysis Model (StratBAM) is a discrete-event simulation model created after a thorough analysis of patient flow and data from Geisinger Health System's (GHS) electronic health records. Key inputs include: timing, quantity and category of patient arrivals and discharges; unit-level length of care; patient paths; and projected patient volume and length of stay. Key outputs include: admission wait time by arrival source and receiving unit, and occupancy rates. Electronic health records were used to estimate parameters for probability distributions and to build empirical distributions for unit-level length of care and for patient paths. Validation of the simulation model against GHS operational data confirmed its ability to model real-world data consistently and accurately. StratBAM was successfully used to evaluate the system impact of forecasted patient volumes and length of stay in terms of patient wait times, occupancy rates, and cost. The model is generalizable and can be appropriately scaled for larger and smaller health care settings. PMID:26310949

  4. Implementation of Tree and Butterfly Barriers with Optimistic Time Management Algorithms for Discrete Event Simulation

    NASA Astrophysics Data System (ADS)

    Rizvi, Syed S.; Shah, Dipali; Riasat, Aasia

    The Time Wrap algorithm [3] offers a run time recovery mechanism that deals with the causality errors. These run time recovery mechanisms consists of rollback, anti-message, and Global Virtual Time (GVT) techniques. For rollback, there is a need to compute GVT which is used in discrete-event simulation to reclaim the memory, commit the output, detect the termination, and handle the errors. However, the computation of GVT requires dealing with transient message problem and the simultaneous reporting problem. These problems can be dealt in an efficient manner by the Samadi's algorithm [8] which works fine in the presence of causality errors. However, the performance of both Time Wrap and Samadi's algorithms depends on the latency involve in GVT computation. Both algorithms give poor latency for large simulation systems especially in the presence of causality errors. To improve the latency and reduce the processor ideal time, we implement tree and butterflies barriers with the optimistic algorithm. Our analysis shows that the use of synchronous barriers such as tree and butterfly with the optimistic algorithm not only minimizes the GVT latency but also minimizes the processor idle time.

  5. Discrete event simulation and the resultant data storage system response in the operational mission environment of Jupiter-Saturn /Voyager/ spacecraft

    NASA Technical Reports Server (NTRS)

    Mukhopadhyay, A. K.

    1978-01-01

    The Data Storage Subsystem Simulator (DSSSIM) simulating (by ground software) occurrence of discrete events in the Voyager mission is described. Functional requirements for Data Storage Subsystems (DSS) simulation are discussed, and discrete event simulation/DSSSIM processing is covered. Four types of outputs associated with a typical DSSSIM run are presented, and DSSSIM limitations and constraints are outlined.

  6. Discrete-event simulation of a wide-area health care network.

    PubMed Central

    McDaniel, J G

    1995-01-01

    OBJECTIVE: Predict the behavior and estimate the telecommunication cost of a wide-area message store-and-forward network for health care providers that uses the telephone system. DESIGN: A tool with which to perform large-scale discrete-event simulations was developed. Network models for star and mesh topologies were constructed to analyze the differences in performances and telecommunication costs. The distribution of nodes in the network models approximates the distribution of physicians, hospitals, medical labs, and insurers in the Province of Saskatchewan, Canada. Modeling parameters were based on measurements taken from a prototype telephone network and a survey conducted at two medical clinics. Simulation studies were conducted for both topologies. RESULTS: For either topology, the telecommunication cost of a network in Saskatchewan is projected to be less than $100 (Canadian) per month per node. The estimated telecommunication cost of the star topology is approximately half that of the mesh. Simulations predict that a mean end-to-end message delivery time of two hours or less is achievable at this cost. A doubling of the data volume results in an increase of less than 50% in the mean end-to-end message transfer time. CONCLUSION: The simulation models provided an estimate of network performance and telecommunication cost in a specific Canadian province. At the expected operating point, network performance appeared to be relatively insensitive to increases in data volume. Similar results might be anticipated in other rural states and provinces in North America where a telephone-based network is desired. PMID:7583646

  7. Teleradiology system analysis using a discrete event-driven block-oriented network simulator

    NASA Astrophysics Data System (ADS)

    Stewart, Brent K.; Dwyer, Samuel J., III

    1992-07-01

    Performance evaluation and trade-off analysis are the central issues in the design of communication networks. Simulation plays an important role in computer-aided design and analysis of communication networks and related systems, allowing testing of numerous architectural configurations and fault scenarios. We are using the Block Oriented Network Simulator (BONeS, Comdisco, Foster City, CA) software package to perform discrete, event- driven Monte Carlo simulations in capacity planning, tradeoff analysis and evaluation of alternate architectures for a high-speed, high-resolution teleradiology project. A queuing network model of the teleradiology system has been devise, simulations executed and results analyzed. The wide area network link uses a switched, dial-up N X 56 kbps inverting multiplexer where the number of digital voice-grade lines (N) can vary from one (DS-0) through 24 (DS-1). The proposed goal of such a system is 200 films (2048 X 2048 X 12-bit) transferred between a remote and local site in an eight hour period with a mean delay time less than five minutes. It is found that: (1) the DS-1 service limit is around 100 films per eight hour period with a mean delay time of 412 +/- 39 seconds, short of the goal stipulated above; (2) compressed video teleconferencing can be run simultaneously with image data transfer over the DS-1 wide area network link without impacting the performance of the described teleradiology system; (3) there is little sense in upgrading to a higher bandwidth WAN link like DS-2 or DS-3 for the current system; and (4) the goal of transmitting 200 films in an eight hour period with a mean delay time less than five minutes can be achieved simply if the laser printer interface is updated from the current DR-11W interface to a much faster SCSI interface.

  8. Using Discrete Event Simulation for Programming Model Exploration at Extreme-Scale: Macroscale Components for the Structural Simulation Toolkit (SST).

    SciTech Connect

    Wilke, Jeremiah J; Kenny, Joseph P.

    2015-02-01

    Discrete event simulation provides a powerful mechanism for designing and testing new extreme- scale programming models for high-performance computing. Rather than debug, run, and wait for results on an actual system, design can first iterate through a simulator. This is particularly useful when test beds cannot be used, i.e. to explore hardware or scales that do not yet exist or are inaccessible. Here we detail the macroscale components of the structural simulation toolkit (SST). Instead of depending on trace replay or state machines, the simulator is architected to execute real code on real software stacks. Our particular user-space threading framework allows massive scales to be simulated even on small clusters. The link between the discrete event core and the threading framework allows interesting performance metrics like call graphs to be collected from a simulated run. Performance analysis via simulation can thus become an important phase in extreme-scale programming model and runtime system design via the SST macroscale components.

  9. Tutorial: Parallel Simulation on Supercomputers

    SciTech Connect

    Perumalla, Kalyan S

    2012-01-01

    This tutorial introduces typical hardware and software characteristics of extant and emerging supercomputing platforms, and presents issues and solutions in executing large-scale parallel discrete event simulation scenarios on such high performance computing systems. Covered topics include synchronization, model organization, example applications, and observed performance from illustrative large-scale runs.

  10. Modeling Temporal Processes in Early Spacecraft Design: Application of Discrete-Event Simulations for Darpa's F6 Program

    NASA Technical Reports Server (NTRS)

    Dubos, Gregory F.; Cornford, Steven

    2012-01-01

    While the ability to model the state of a space system over time is essential during spacecraft operations, the use of time-based simulations remains rare in preliminary design. The absence of the time dimension in most traditional early design tools can however become a hurdle when designing complex systems whose development and operations can be disrupted by various events, such as delays or failures. As the value delivered by a space system is highly affected by such events, exploring the trade space for designs that yield the maximum value calls for the explicit modeling of time.This paper discusses the use of discrete-event models to simulate spacecraft development schedule as well as operational scenarios and on-orbit resources in the presence of uncertainty. It illustrates how such simulations can be utilized to support trade studies, through the example of a tool developed for DARPA's F6 program to assist the design of "fractionated spacecraft".

  11. Discrete-event simulation of nuclear-waste transport in geologic sites subject to disruptive events. Final report

    SciTech Connect

    Aggarwal, S.; Ryland, S.; Peck, R.

    1980-06-19

    This report outlines a methodology to study the effects of disruptive events on nuclear waste material in stable geologic sites. The methodology is based upon developing a discrete events model that can be simulated on the computer. This methodology allows a natural development of simulation models that use computer resources in an efficient manner. Accurate modeling in this area depends in large part upon accurate modeling of ion transport behavior in the storage media. Unfortunately, developments in this area are not at a stage where there is any consensus on proper models for such transport. Consequently, our work is directed primarily towards showing how disruptive events can be properly incorporated in such a model, rather than as a predictive tool at this stage. When and if proper geologic parameters can be determined, then it would be possible to use this as a predictive model. Assumptions and their bases are discussed, and the mathematical and computer model are described.

  12. The Activity-tracking paradigm in discrete-event modeling and simulation: The case of spatially continuous distributed systems

    SciTech Connect

    Muzy,; Jammalamadaka, Rajanikanth; Zeigler, Bernard P; Nutaro, James J

    2011-01-01

    From a modeling and simulation perspective, studying dynamic systems consists of focusing on changes in states. According to the precision of state changes, generic algorithms can be developed to track the activity of sub-systems. This paper aims at describing and applying this more natural and intuitive way to describe and implement dynamic systems. Activity is defined mathematically. A generic application case of diffusion is experimented with to compare the efficiency of quantized state methods using this new approach with traditional methods which do not focus computations on active areas. Our goal is to demonstrate that the concept of activity can estimate the computational effort required by a quantized state method. Specifically, when properly designed, a discrete-event simulator for such a method achieves a reduction in the number of state transitions that more than compensates for the overhead it imposes.

  13. Discrete event simulation of a proton therapy facility: a case study.

    PubMed

    Corazza, Uliana; Filippini, Roberto; Setola, Roberto

    2011-06-01

    Proton therapy is a type of particle therapy which utilizes a beam of protons to irradiate diseased tissue. The main difference with respect to conventional radiotherapy (X-rays, γ-rays) is the capability to target tumors with extreme precision, which makes it possible to treat deep-seated tumors and tumors affecting noble tissues as brain, eyes, etc. However, proton therapy needs high-energy cyclotrons and this requires sophisticated control-supervision schema to guarantee, further than the prescribed performance, the safety of the patients and of the operators. In this paper we present the modeling and simulation of the irradiation process of the PROSCAN facility at the Paul Scherrer Institut. This is a challenging task because of the complexity of the operation scenario, which consists of deterministic and stochastic processes resulting from the coordination-interaction among diverse entities such as distributed automatic control systems, safety protection systems, and human operators. PMID:20675013

  14. Forest biomass supply logistics for a power plant using the discrete-event simulation approach

    SciTech Connect

    Mobini, Mahdi; Sowlati, T.; Sokhansanj, Shahabaddine

    2011-04-01

    This study investigates the logistics of supplying forest biomass to a potential power plant. Due to the complexities in such a supply logistics system, a simulation model based on the framework of Integrated Biomass Supply Analysis and Logistics (IBSAL) is developed in this study to evaluate the cost of delivered forest biomass, the equilibrium moisture content, and carbon emissions from the logistics operations. The model is applied to a proposed case of 300 MW power plant in Quesnel, BC, Canada. The results show that the biomass demand of the power plant would not be met every year. The weighted average cost of delivered biomass to the gate of the power plant is about C$ 90 per dry tonne. Estimates of equilibrium moisture content of delivered biomass and CO2 emissions resulted from the processes are also provided.

  15. Parallelized direct execution simulation of message-passing parallel programs

    NASA Technical Reports Server (NTRS)

    Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

    1994-01-01

    As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

  16. Evaluating the cost effectiveness of donepezil in the treatment of Alzheimer's disease in Germany using discrete event simulation

    PubMed Central

    2012-01-01

    Background Previous cost-effectiveness studies of cholinesterase inhibitors have modeled Alzheimer's disease (AD) progression and treatment effects through single or global severity measures, or progression to "Full Time Care". This analysis evaluates the cost-effectiveness of donepezil versus memantine or no treatment in Germany by considering correlated changes in cognition, behavior and function. Methods Rates of change were modeled using trial and registry-based patient level data. A discrete event simulation projected outcomes for three identical patient groups: donepezil 10 mg, memantine 20 mg and no therapy. Patient mix, mortality and costs were developed using Germany-specific sources. Results Treatment of patients with mild to moderately severe AD with donepezil compared to no treatment was associated with 0.13 QALYs gained per patient, and 0.01 QALYs gained per caregiver and resulted in average savings of 7,007 and 9,893 per patient from the healthcare system and societal perspectives, respectively. In patients with moderate to moderately-severe AD, donepezil compared to memantine resulted in QALY gains averaging 0.01 per patient, and savings averaging 1,960 and 2,825 from the healthcare system and societal perspective, respectively. In probabilistic sensitivity analyses, donepezil dominated no treatment in most replications and memantine in over 70% of the replications. Donepezil leads to savings in 95% of replications versus memantine. Conclusions Donepezil is highly cost-effective in patients with AD in Germany, leading to improvements in health outcomes and substantial savings compared to no treatment. This holds across a variety of sensitivity analyses. PMID:22316501

  17. Simulating Billion-Task Parallel Programs

    SciTech Connect

    Perumalla, Kalyan S; Park, Alfred J

    2014-01-01

    In simulating large parallel systems, bottom-up approaches exercise detailed hardware models with effects from simplified software models or traces, whereas top-down approaches evaluate the timing and functionality of detailed software models over coarse hardware models. Here, we focus on the top-down approach and significantly advance the scale of the simulated parallel programs. Via the direct execution technique combined with parallel discrete event simulation, we stretch the limits of the top-down approach by simulating message passing interface (MPI) programs with millions of tasks. Using a timing-validated benchmark application, a proof-of-concept scaling level is achieved to over 0.22 billion virtual MPI processes on 216,000 cores of a Cray XT5 supercomputer, representing one of the largest direct execution simulations to date, combined with a multiplexing ratio of 1024 simulated tasks per real task.

  18. Concurrency and discrete event control

    NASA Technical Reports Server (NTRS)

    Heymann, Michael

    1990-01-01

    Much of discrete event control theory has been developed within the framework of automata and formal languages. An alternative approach inspired by the theories of process-algebra as developed in the computer science literature is presented. The framework, which rests on a new formalism of concurrency, can adequately handle nondeterminism and can be used for analysis of a wide range of discrete event phenomena.

  19. Analysis hierarchical model for discrete event systems

    NASA Astrophysics Data System (ADS)

    Ciortea, E. M.

    2015-11-01

    The This paper presents the hierarchical model based on discrete event network for robotic systems. Based on the hierarchical approach, Petri network is analysed as a network of the highest conceptual level and the lowest level of local control. For modelling and control of complex robotic systems using extended Petri nets. Such a system is structured, controlled and analysed in this paper by using Visual Object Net ++ package that is relatively simple and easy to use, and the results are shown as representations easy to interpret. The hierarchical structure of the robotic system is implemented on computers analysed using specialized programs. Implementation of hierarchical model discrete event systems, as a real-time operating system on a computer network connected via a serial bus is possible, where each computer is dedicated to local and Petri model of a subsystem global robotic system. Since Petri models are simplified to apply general computers, analysis, modelling, complex manufacturing systems control can be achieved using Petri nets. Discrete event systems is a pragmatic tool for modelling industrial systems. For system modelling using Petri nets because we have our system where discrete event. To highlight the auxiliary time Petri model using transport stream divided into hierarchical levels and sections are analysed successively. Proposed robotic system simulation using timed Petri, offers the opportunity to view the robotic time. Application of goods or robotic and transmission times obtained by measuring spot is obtained graphics showing the average time for transport activity, using the parameters sets of finished products. individually.

  20. The split system approach to managing time in simulations of hybrid systems having continuous and discrete event components

    SciTech Connect

    Nutaro, James J.; Kuruganti, Phani Teja; Protopopescu, Vladimir A.; Shankar, Mallikarjun

    2012-02-08

    The efficient and accurate management of time in simulations of hybrid models is an outstanding engineering problem. General a priori knowledge about the dynamic behavior of the hybrid system (i.e. essentially continuous, essentially discrete, or 'truly hybrid') facilitates this task. Indeed, for essentially discrete and essentially continuous systems, existing software packages can be conveniently used to perform quite sophisticated and satisfactory simulations. The situation is different for 'truly hybrid' systems, for which direct application of existing software packages results in a lengthy design process, cumbersome software assemblies, inaccurate results, or some combination of these independent of the designer's a priori knowledge about the system's structure and behavior. The main goal of this paper is to provide a methodology whereby simulation designers can use a priori knowledge about the hybrid model's structure to build a straightforward, efficient, and accurate simulator with existing software packages. The proposed methodology is based on a formal decomposition and re-articulation of the hybrid system; this is the main theoretical result of the paper. To set the result in the right perspective, we briefly review the essentially continuous and essentially discrete approaches, which are illustrated with typical examples. Then we present our new, split system approach, first in a general formal context, then in three more specific guises that reflect the viewpoints of three main communities of hybrid system researchers and practitioners. For each of these variants we indicate an implementation path. Our approach is illustrated with an archetypal problem of power grid control.

  1. A discrete event simulation model for evaluating the performances of an m/g/c/c state dependent queuing system.

    PubMed

    Khalid, Ruzelan; Nawawi, Mohd Kamal M; Kawsar, Luthful A; Ghani, Noraida A; Kamil, Anton A; Mustafa, Adli

    2013-01-01

    M/G/C/C state dependent queuing networks consider service rates as a function of the number of residing entities (e.g., pedestrians, vehicles, and products). However, modeling such dynamic rates is not supported in modern Discrete Simulation System (DES) software. We designed an approach to cater this limitation and used it to construct the M/G/C/C state-dependent queuing model in Arena software. Using the model, we have evaluated and analyzed the impacts of various arrival rates to the throughput, the blocking probability, the expected service time and the expected number of entities in a complex network topology. Results indicated that there is a range of arrival rates for each network where the simulation results fluctuate drastically across replications and this causes the simulation results and analytical results exhibit discrepancies. Detail results that show how tally the simulation results and the analytical results in both abstract and graphical forms and some scientific justifications for these have been documented and discussed. PMID:23560037

  2. A Discrete Event Simulation Model for Evaluating the Performances of an M/G/C/C State Dependent Queuing System

    PubMed Central

    Khalid, Ruzelan; M. Nawawi, Mohd Kamal; Kawsar, Luthful A.; Ghani, Noraida A.; Kamil, Anton A.; Mustafa, Adli

    2013-01-01

    M/G/C/C state dependent queuing networks consider service rates as a function of the number of residing entities (e.g., pedestrians, vehicles, and products). However, modeling such dynamic rates is not supported in modern Discrete Simulation System (DES) software. We designed an approach to cater this limitation and used it to construct the M/G/C/C state-dependent queuing model in Arena software. Using the model, we have evaluated and analyzed the impacts of various arrival rates to the throughput, the blocking probability, the expected service time and the expected number of entities in a complex network topology. Results indicated that there is a range of arrival rates for each network where the simulation results fluctuate drastically across replications and this causes the simulation results and analytical results exhibit discrepancies. Detail results that show how tally the simulation results and the analytical results in both abstract and graphical forms and some scientific justifications for these have been documented and discussed. PMID:23560037

  3. Estimating the Unknown Parameters of the Natural History of Metachronous Colorectal Cancer Using Discrete-Event Simulation

    PubMed Central

    Erenay, Fatih Safa; Alagoz, Oguzhan; Banerjee, Ritesh; Cima, Robert R.

    2011-01-01

    Objectives Some aspects of the natural history of metachronous colorectal cancer (MCRC), such as the rate of progression from adenomatous polyp to MCRC, are unknown. The objective of this study is to estimate a set of parameters revealing some of these unknown characteristics of MCRC. Methods The authors developed a computer simulation model that mimics the progression of MCRC for a 5-year period following the treatment of primary colorectal cancer (CRC). They obtained the inputs of the simulation model using longitudinal data for 284 CRC patients from the Mayo Clinic, Rochester. Results Five-year MCRC incidence and all-cause mortality were 7.4% and 12.7% in the patient cohort, respectively. Statistical analysis showed that 5-year MCRC incidence was associated with gender (P = 0.05), whereas both all-cause and CRC-related mortalities were associated with age (P < 0.001 and P = 0.01). Estimated annual probabilities of progression from adenomatous polyp to MCRC and from MCRC to metastatic MCRC were 0.14 and 0.28, respectively. Annual probabilities of mortality after MCRC and metastatic MCRC treatments were estimated to be 0.06 and 0.26, respectively. The estimated annual probability of mortality due to undetected MCRC was 0.16. Conclusions The results imply that MCRC, especially in women, may be more common than suggested by previous studies. In addition, statistics derived from the clinical data and results of the simulation model indicate that gender and age affect the progression of MCRC. PMID:21212440

  4. Using discrete event simulation to compare the performance of family health unit and primary health care centre organizational models in Portugal

    PubMed Central

    2011-01-01

    Background Recent reforms in Portugal aimed at strengthening the role of the primary care system, in order to improve the quality of the health care system. Since 2006 new policies aiming to change the organization, incentive structures and funding of the primary health care sector were designed, promoting the evolution of traditional primary health care centres (PHCCs) into a new type of organizational unit - family health units (FHUs). This study aimed to compare performances of PHCC and FHU organizational models and to assess the potential gains from converting PHCCs into FHUs. Methods Stochastic discrete event simulation models for the two types of organizational models were designed and implemented using Simul8 software. These models were applied to data from nineteen primary care units in three municipalities of the Greater Lisbon area. Results The conversion of PHCCs into FHUs seems to have the potential to generate substantial improvements in productivity and accessibility, while not having a significant impact on costs. This conversion might entail a 45% reduction in the average number of days required to obtain a medical appointment and a 7% and 9% increase in the average number of medical and nursing consultations, respectively. Conclusions Reorganization of PHCC into FHUs might increase accessibility of patients to services and efficiency in the provision of primary care services. PMID:21999336

  5. Budget Impact Analysis of Switching to Digital Mammography in a Population-Based Breast Cancer Screening Program: A Discrete Event Simulation Model

    PubMed Central

    Comas, Merc; Arrospide, Arantzazu; Mar, Javier; Sala, Maria; Vilapriny, Ester; Hernndez, Cristina; Cots, Francesc; Martnez, Juan; Castells, Xavier

    2014-01-01

    Objective To assess the budgetary impact of switching from screen-film mammography to full-field digital mammography in a population-based breast cancer screening program. Methods A discrete-event simulation model was built to reproduce the breast cancer screening process (biennial mammographic screening of women aged 50 to 69 years) combined with the natural history of breast cancer. The simulation started with 100,000 women and, during a 20-year simulation horizon, new women were dynamically entered according to the aging of the Spanish population. Data on screening were obtained from Spanish breast cancer screening programs. Data on the natural history of breast cancer were based on US data adapted to our population. A budget impact analysis comparing digital with screen-film screening mammography was performed in a sample of 2,000 simulation runs. A sensitivity analysis was performed for crucial screening-related parameters. Distinct scenarios for recall and detection rates were compared. Results Statistically significant savings were found for overall costs, treatment costs and the costs of additional tests in the long term. The overall cost saving was 1,115,857 (95%CI from 932,147 to 1,299,567) in the 10th year and 2,866,124 (95%CI from 2,492,610 to 3,239,638) in the 20th year, representing 4.5% and 8.1% of the overall cost associated with screen-film mammography. The sensitivity analysis showed net savings in the long term. Conclusions Switching to digital mammography in a population-based breast cancer screening program saves long-term budget expense, in addition to providing technical advantages. Our results were consistent across distinct scenarios representing the different results obtained in European breast cancer screening programs. PMID:24832200

  6. Inflated speedups in parallel simulations via malloc()

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1990-01-01

    Discrete-event simulation programs make heavy use of dynamic memory allocation in order to support simulation's very dynamic space requirements. When programming in C one is likely to use the malloc() routine. However, a parallel simulation which uses the standard Unix System V malloc() implementation may achieve an overly optimistic speedup, possibly superlinear. An alternate implementation provided on some (but not all systems) can avoid the speedup anomaly, but at the price of significantly reduced available free space. This is especially severe on most parallel architectures, which tend not to support virtual memory. It is shown how a simply implemented user-constructed interface to malloc() can both avoid artificially inflated speedups, and make efficient use of the dynamic memory space. The interface simply catches blocks on the basis of their size. The problem is demonstrated empirically, and the effectiveness of the solution is shown both empirically and analytically.

  7. Expected lifetime numbers and costs of fractures in postmenopausal women with and without osteoporosis in Germany: a discrete event simulation model

    PubMed Central

    2014-01-01

    Background Osteoporotic fractures cause a large health burden and substantial costs. This study estimated the expected fracture numbers and costs for the remaining lifetime of postmenopausal women in Germany. Methods A discrete event simulation (DES) model which tracks changes in fracture risk due to osteoporosis, a previous fracture or institutionalization in a nursing home was developed. Expected lifetime fracture numbers and costs per capita were estimated for postmenopausal women (aged 50 and older) at average osteoporosis risk (AOR) and for those never suffering from osteoporosis. Direct and indirect costs were modeled. Deterministic univariate and probabilistic sensitivity analyses were conducted. Results The expected fracture numbers over the remaining lifetime of a 50 year old woman with AOR for each fracture type (% attributable to osteoporosis) were: hip 0.282 (57.9%), wrist 0.229 (18.2%), clinical vertebral 0.206 (39.2%), humerus 0.147 (43.5%), pelvis 0.105 (47.5%), and other femur 0.033 (52.1%). Expected discounted fracture lifetime costs (excess cost attributable to osteoporosis) per 50 year old woman with AOR amounted to €4,479 (€1,995). Most costs were accrued in the hospital €1,743 (€751) and long-term care sectors €1,210 (€620). Univariate sensitivity analysis resulted in percentage changes between -48.4% (if fracture rates decreased by 2% per year) and +83.5% (if fracture rates increased by 2% per year) compared to base case excess costs. Costs for women with osteoporosis were about 3.3 times of those never getting osteoporosis (€7,463 vs. €2,247), and were markedly increased for women with a previous fracture. Conclusion The results of this study indicate that osteoporosis causes a substantial share of fracture costs in postmenopausal women, which strongly increase with age and previous fractures. PMID:24981316

  8. Parallel simulation today

    NASA Technical Reports Server (NTRS)

    Nicol, David; Fujimoto, Richard

    1992-01-01

    This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.

  9. Parallel Lisp simulator

    SciTech Connect

    Weening, J.S.

    1988-05-01

    CSIM is a simulator for parallel Lisp, based on a continuation passing interpreter. It models a shared-memory multiprocessor executing programs written in Common Lisp, extended with several primitives for creating and controlling processes. This paper describes the structure of the simulator, measures its performance, and gives an example of its use with a parallel Lisp program.

  10. Xyce parallel electronic simulator.

    SciTech Connect

    Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Rankin, Eric Lamont; Schiek, Richard Louis; Thornquist, Heidi K.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.

    2010-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide.

  11. Parallel Dislocation Simulator

    Energy Science and Technology Software Center (ESTSC)

    2006-10-30

    ParaDiS is software capable of simulating the motion, evolution, and interaction of dislocation networks in single crystals using massively parallel computer architectures. The software is capable of outputting the stress-strain response of a single crystal whose plastic deformation is controlled by the dislocation processes.

  12. Scaling Time Warp-based Discrete Event Execution to 104 Processors on Blue Gene Supercomputer

    SciTech Connect

    Perumalla, Kalyan S

    2007-01-01

    Lately, important large-scale simulation applications, such as emergency/event planning and response, are emerging that are based on discrete event models. The applications are characterized by their scale (several millions of simulated entities), their fine-grained nature of computation (microseconds per event), and their highly dynamic inter-entity event interactions. The desired scale and speed together call for highly scalable parallel discrete event simulation (PDES) engines. However, few such parallel engines have been designed or tested on platforms with thousands of processors. Here an overview is given of a unique PDES engine that has been designed to support Time Warp-style optimistic parallel execution as well as a more generalized mixed, optimistic-conservative synchronization. The engine is designed to run on massively parallel architectures with minimal overheads. A performance study of the engine is presented, including the first results to date of PDES benchmarks demonstrating scalability to as many as 16,384 processors, on an IBM Blue Gene supercomputer. The results show, for the first time, the promise of effectively sustaining very large scale discrete event execution on up to 104 processors.

  13. An algebra of discrete event processes

    NASA Technical Reports Server (NTRS)

    Heymann, Michael; Meyer, George

    1991-01-01

    This report deals with an algebraic framework for modeling and control of discrete event processes. The report consists of two parts. The first part is introductory, and consists of a tutorial survey of the theory of concurrency in the spirit of Hoare's CSP, and an examination of the suitability of such an algebraic framework for dealing with various aspects of discrete event control. To this end a new concurrency operator is introduced and it is shown how the resulting framework can be applied. It is further shown that a suitable theory that deals with the new concurrency operator must be developed. In the second part of the report the formal algebra of discrete event control is developed. At the present time the second part of the report is still an incomplete and occasionally tentative working paper.

  14. Optimal Discrete Event Supervisory Control of Aircraft Gas Turbine Engines

    NASA Technical Reports Server (NTRS)

    Litt, Jonathan (Technical Monitor); Ray, Asok

    2004-01-01

    This report presents an application of the recently developed theory of optimal Discrete Event Supervisory (DES) control that is based on a signed real measure of regular languages. The DES control techniques are validated on an aircraft gas turbine engine simulation test bed. The test bed is implemented on a networked computer system in which two computers operate in the client-server mode. Several DES controllers have been tested for engine performance and reliability.

  15. Discrete Events as Units of Perceived Time

    ERIC Educational Resources Information Center

    Liverence, Brandon M.; Scholl, Brian J.

    2012-01-01

    In visual images, we perceive both space (as a continuous visual medium) and objects (that inhabit space). Similarly, in dynamic visual experience, we perceive both continuous time and discrete events. What is the relationship between these units of experience? The most intuitive answer may be similar to the spatial case: time is perceived as an

  16. Nonlinear Control and Discrete Event Systems

    NASA Technical Reports Server (NTRS)

    Meyer, George; Null, Cynthia H. (Technical Monitor)

    1995-01-01

    As the operation of large systems becomes ever more dependent on extensive automation, the need for an effective solution to the problem of design and validation of the underlying software becomes more critical. Large systems possesses much detailed structure, typically hierarchical, and they are hybrid. Information processing at the top of the hierarchy is by means of formal logic and sentences; on the bottom it is by means of simple scalar differential equations and functions of time; and in the middle it is by an interacting mix of nonlinear multi-axis differential equations and automata, and functions of time and discrete events. The lecture will address the overall problem as it relates to flight vehicle management, describe the middle level, and offer a design approach that is based on Differential Geometry and Discrete Event Dynamic Systems Theory.

  17. Multiple Autonomous Discrete Event Controllers for Constellations

    NASA Technical Reports Server (NTRS)

    Esposito, Timothy C.

    2003-01-01

    The Multiple Autonomous Discrete Event Controllers for Constellations (MADECC) project is an effort within the National Aeronautics and Space Administration Goddard Space Flight Center's (NASA/GSFC) Information Systems Division to develop autonomous positioning and attitude control for constellation satellites. It will be accomplished using traditional control theory and advanced coordination algorithms developed by the Johns Hopkins University Applied Physics Laboratory (JHU/APL). This capability will be demonstrated in the discrete event control test-bed located at JHU/APL. This project will be modeled for the Leonardo constellation mission, but is intended to be adaptable to any constellation mission. To develop a common software architecture. the controllers will only model very high-level responses. For instance, after determining that a maneuver must be made. the MADECC system will output B (Delta)V (velocity change) value. Lower level systems must then decide which thrusters to fire and for how long to achieve that (Delta)V.

  18. Parallel implementation of VHDL simulations on the Intel iPSC/2 hypercube. Master's thesis

    SciTech Connect

    Comeau, R.C.

    1991-12-01

    VHDL models are executed sequentially in current commercial simulators. As chip designs grow larger and more complex, simulations must run faster. One approach to increasing simulation speed is through parallel processors. This research transforms the behavioral and structural models created by Intermetrics' sequential VHDL simulator into models for parallel execution. The models are simulated on an Intel iPSC/2 hypercube with synchronization of the nodes being achieved by utilizing the Chandy Misra paradigm for discrete-event simulations. Three eight-bit adders, the ripple carry, the carry save, and the carry-lookahead, are each run through the parallel simulator. Simulation time is cut in at least half for all three test cases over the sequential Intermetrics model. Results with regard to speedup are given to show effects of different mappings, varying workloads per node, and overhead due to output messages.

  19. GVT Algorithms and Discrete Event Dynamics on 128K+ Processor Cores

    SciTech Connect

    Perumalla, Kalyan S; Park, Alfred J; Tipparaju, Vinod

    2011-01-01

    Parallel discrete event simulation (PDES) represents a class of codes that are challenging to scale to large number of processors due to tight global timestamp-ordering and fine-grained event execution. One of the critical factors in scaling PDES is the efficiency of the underlying global virtual time (GVT) algorithm needed for correctness of parallel execution and speed of progress. Although many GVT algorithms have been proposed previously, few have been proposed for scalable asynchronous execution and none customized to exploit one-sided communication. Moreover, the detailed performance effects of actual GVT algorithm implementations on large platforms are unknown. Here, three major GVT algorithms intended for scalable execution on high-performance systems are studied: (1) a synchronous GVT algorithm that affords ease of implementation, (2) an asynchronous GVT algorithm that is more complex to implement but can relieve blocking latencies, and (3) a variant of the asynchronous GVT algorithm, proposed and studied for the first time here, to exploit one-sided communication in extant supercomputing platforms. Performance results are presented of implementations of these algorithms on over 64,000 cores of a Cray XT5 system, exercised on a range of parameters: optimistic and conservative synchronization, fine- to medium-grained event computation, synthetic and non-synthetic applications, and different lookahead values. Performance of tens of billions of events executed per second are registered, exceeding the speeds of any known PDES engine, and showing asynchronous GVT algorithms to outperform state-of-the-art synchronous GVT algorithms. Detailed PDES-specific runtime metrics are presented to further the understanding of tightly-coupled discrete event execution dynamics on massively parallel platforms.

  20. Discrete Event Execution with One-Sided and Two-Sided GVT Algorithms on 216,000 Processor Cores

    SciTech Connect

    Perumalla, Kalyan S; Park, Alfred J; Tipparaju, Vinod

    2014-01-01

    Global virtual time (GVT) computation is a key determinant of the efficiency and runtime dynamics of parallel discrete event simulations (PDES), especially on large-scale parallel platforms. Here, three execution modes of a generalized GVT computation algorithm are studied on high-performance parallel computing systems: (1) a synchronous GVT algorithm that affords ease of implementation, (2) an asynchronous GVT algorithm that is more complex to implement but can relieve blocking latencies, and (3) a variant of the asynchronous GVT algorithm to exploit one-sided communication in extant supercomputing platforms. Performance results are presented of implementations of these algorithms on up to 216,000 cores of a Cray XT5 system, exercised on a range of parameters: optimistic and conservative synchronization, fine- to medium-grained event computation, synthetic and non-synthetic applications, and different lookahead values. Performance of up to 54 billion events executed per second is registered. Detailed PDES-specific runtime metrics are presented to further the understanding of tightly-coupled discrete event dynamics on massively parallel platforms.

  1. Planning and supervision of reactor defueling using discrete event techniques

    SciTech Connect

    Garcia, H.E.; Imel, G.R.; Houshyar, A.

    1995-12-31

    New fuel handling and conditioning activities for the defueling of the Experimental Breeder Reactor II are being performed at Argonne National Laboratory. Research is being conducted to investigate the use of discrete event simulation, analysis, and optimization techniques to plan, supervise, and perform these activities in such a way that productivity can be improved. The central idea is to characterize this defueling operation as a collection of interconnected serving cells, and then apply operational research techniques to identify appropriate planning schedules for given scenarios. In addition, a supervisory system is being developed to provide personnel with on-line information on the progress of fueling tasks and to suggest courses of action to accommodate changing operational conditions. This paper provides an introduction to the research in progress at ANL. In particular, it briefly describes the fuel handling configuration for reactor defueling at ANL, presenting the flow of material from the reactor grid to the interim storage location, and the expected contributions of this work. As an example of the studies being conducted for planning and supervision of fuel handling activities at ANL, an application of discrete event simulation techniques to evaluate different fuel cask transfer strategies is given at the end of the paper.

  2. Parallel Power Grid Simulation Toolkit

    Energy Science and Technology Software Center (ESTSC)

    2015-09-14

    ParGrid is a 'wrapper' that integrates a coupled Power Grid Simulation toolkit consisting of a library to manage the synchronization and communication of independent simulations. The included library code in ParGid, named FSKIT, is intended to support the coupling multiple continuous and discrete even parallel simulations. The code is designed using modern object oriented C++ methods utilizing C++11 and current Boost libraries to ensure compatibility with multiple operating systems and environments.

  3. Parallelizing Timed Petri Net simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1993-01-01

    The possibility of using parallel processing to accelerate the simulation of Timed Petri Nets (TPN's) was studied. It was recognized that complex system development tools often transform system descriptions into TPN's or TPN-like models, which are then simulated to obtain information about system behavior. Viewed this way, it was important that the parallelization of TPN's be as automatic as possible, to admit the possibility of the parallelization being embedded in the system design tool. Later years of the grant were devoted to examining the problem of joint performance and reliability analysis, to explore whether both types of analysis could be accomplished within a single framework. In this final report, the results of our studies are summarized. We believe that the problem of parallelizing TPN's automatically for MIMD architectures has been almost completely solved for a large and important class of problems. Our initial investigations into joint performance/reliability analysis are two-fold; it was shown that Monte Carlo simulation, with importance sampling, offers promise of joint analysis in the context of a single tool, and methods for the parallel simulation of general Continuous Time Markov Chains, a model framework within which joint performance/reliability models can be cast, were developed. However, very much more work is needed to determine the scope and generality of these approaches. The results obtained in our two studies, future directions for this type of work, and a list of publications are included.

  4. Analytic Perturbation Analysis of Discrete Event Dynamic Systems

    SciTech Connect

    Uryasev, S.

    1994-09-01

    This paper considers a new Analytic Perturbation Analysis (APA) approach for Discrete Event Dynamic Systems (DEDS) with discontinuous sample-path functions with respect to control parameters. The performance functions for DEDS usually are formulated as mathematical expectations, which can be calculated only numerically. APA is based on new analytic formulas for the gradients of expectations of indicator functions; therefore, it is called an analytic perturbation analysis. The gradient of performance function may not coincide with the expectation of a gradient of sample-path function (i.e., the interchange formula for the gradient and expectation sign may not be valid). Estimates of gradients can be obtained with one simulation run of the models.

  5. LAN attack detection using Discrete Event Systems.

    PubMed

    Hubballi, Neminath; Biswas, Santosh; Roopa, S; Ratti, Ritesh; Nandi, Sukumar

    2011-01-01

    Address Resolution Protocol (ARP) is used for determining the link layer or Medium Access Control (MAC) address of a network host, given its Internet Layer (IP) or Network Layer address. ARP is a stateless protocol and any IP-MAC pairing sent by a host is accepted without verification. This weakness in the ARP may be exploited by malicious hosts in a Local Area Network (LAN) by spoofing IP-MAC pairs. Several schemes have been proposed in the literature to circumvent these attacks; however, these techniques either make IP-MAC pairing static, modify the existing ARP, patch operating systems of all the hosts etc. In this paper we propose a Discrete Event System (DES) approach for Intrusion Detection System (IDS) for LAN specific attacks which do not require any extra constraint like static IP-MAC, changing the ARP etc. A DES model is built for the LAN under both a normal and compromised (i.e., spoofed request/response) situation based on the sequences of ARP related packets. Sequences of ARP events in normal and spoofed scenarios are similar thereby rendering the same DES models for both the cases. To create different ARP events under normal and spoofed conditions the proposed technique uses active ARP probing. However, this probing adds extra ARP traffic in the LAN. Following that a DES detector is built to determine from observed ARP related events, whether the LAN is operating under a normal or compromised situation. The scheme also minimizes extra ARP traffic by probing the source IP-MAC pair of only those ARP packets which are yet to be determined as genuine/spoofed by the detector. Also, spoofed IP-MAC pairs determined by the detector are stored in tables to detect other LAN attacks triggered by spoofing namely, man-in-the-middle (MiTM), denial of service etc. The scheme is successfully validated in a test bed. PMID:20804980

  6. Xyce parallel electronic simulator design.

    SciTech Connect

    Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.

    2010-09-01

    This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.

  7. Modelling machine ensembles with discrete event dynamical system theory

    NASA Technical Reports Server (NTRS)

    Hunter, Dan

    1990-01-01

    Discrete Event Dynamical System (DEDS) theory can be utilized as a control strategy for future complex machine ensembles that will be required for in-space construction. The control strategy involves orchestrating a set of interactive submachines to perform a set of tasks for a given set of constraints such as minimum time, minimum energy, or maximum machine utilization. Machine ensembles can be hierarchically modeled as a global model that combines the operations of the individual submachines. These submachines are represented in the global model as local models. Local models, from the perspective of DEDS theory , are described by the following: a set of system and transition states, an event alphabet that portrays actions that takes a submachine from one state to another, an initial system state, a partial function that maps the current state and event alphabet to the next state, and the time required for the event to occur. Each submachine in the machine ensemble is presented by a unique local model. The global model combines the local models such that the local models can operate in parallel under the additional logistic and physical constraints due to submachine interactions. The global model is constructed from the states, events, event functions, and timing requirements of the local models. Supervisory control can be implemented in the global model by various methods such as task scheduling (open-loop control) or implementing a feedback DEDS controller (closed-loop control).

  8. Discrete Event Supervisory Control Applied to Propulsion Systems

    NASA Technical Reports Server (NTRS)

    Litt, Jonathan S.; Shah, Neerav

    2005-01-01

    The theory of discrete event supervisory (DES) control was applied to the optimal control of a twin-engine aircraft propulsion system and demonstrated in a simulation. The supervisory control, which is implemented as a finite-state automaton, oversees the behavior of a system and manages it in such a way that it maximizes a performance criterion, similar to a traditional optimal control problem. DES controllers can be nested such that a high-level controller supervises multiple lower level controllers. This structure can be expanded to control huge, complex systems, providing optimal performance and increasing autonomy with each additional level. The DES control strategy for propulsion systems was validated using a distributed testbed consisting of multiple computers--each representing a module of the overall propulsion system--to simulate real-time hardware-in-the-loop testing. In the first experiment, DES control was applied to the operation of a nonlinear simulation of a turbofan engine (running in closed loop using its own feedback controller) to minimize engine structural damage caused by a combination of thermal and structural loads. This enables increased on-wing time for the engine through better management of the engine-component life usage. Thus, the engine-level DES acts as a life-extending controller through its interaction with and manipulation of the engine s operation.

  9. Empirical Evaluation of Conservative and Optimistic Discrete Event Execution on Cloud and VM Platforms

    SciTech Connect

    Yoginath, Srikanth B; Perumalla, Kalyan S

    2013-01-01

    Virtual machine (VM) technologies, especially those offered via Cloud platforms, present new dimensions with respect to performance and cost in executing parallel discrete event simulation (PDES) applications. Due to the introduction of overall cost as a metric, the choice of the highest-end computing configuration is no longer the most economical one. Moreover, runtime dynamics unique to VM platforms introduce new performance characteristics, and the variety of possible VM configurations give rise to a range of choices for hosting a PDES run. Here, an empirical study of these issues is undertaken to guide an understanding of the dynamics, trends and trade-offs in executing PDES on VM/Cloud platforms. Performance results and cost measures are obtained from actual execution of a range of scenarios in two PDES benchmark applications on the Amazon Cloud offerings and on a high-end VM host machine. The data reveals interesting insights into the new VM-PDES dynamics that come into play and also leads to counter-intuitive guidelines with respect to choosing the best and second-best configurations when overall cost of execution is considered. In particular, it is found that choosing the highest-end VM configuration guarantees neither the best runtime nor the least cost. Interestingly, choosing a (suitably scaled) low-end VM configuration provides the least overall cost without adversely affecting the total runtime.

  10. A Summary of Some Discrete-Event System Control Problems

    NASA Astrophysics Data System (ADS)

    Rudie, Karen

    A summary of the area of control of discrete-event systems is given. In this research area, automata and formal language theory is used as a tool to model physical problems that arise in technological and industrial systems. The key ingredients to discrete-event control problems are a process that can be modeled by an automaton, events in that process that cannot be disabled or prevented from occurring, and a controlling agent that manipulates the events that can be disabled to guarantee that the process under control either generates all the strings in some prescribed language or as many strings as possible in some prescribed language. When multiple controlling agents act on a process, decentralized control problems arise. In decentralized discrete-event systems, it is presumed that the agents effecting control cannot each see all event occurrences. Partial observation leads to some problems that cannot be solved in polynomial time and some others that are not even decidable.

  11. Hierarchical Discrete Event Supervisory Control of Aircraft Propulsion Systems

    NASA Technical Reports Server (NTRS)

    Yasar, Murat; Tolani, Devendra; Ray, Asok; Shah, Neerav; Litt, Jonathan S.

    2004-01-01

    This paper presents a hierarchical application of Discrete Event Supervisory (DES) control theory for intelligent decision and control of a twin-engine aircraft propulsion system. A dual layer hierarchical DES controller is designed to supervise and coordinate the operation of two engines of the propulsion system. The two engines are individually controlled to achieve enhanced performance and reliability, necessary for fulfilling the mission objectives. Each engine is operated under a continuously varying control system that maintains the specified performance and a local discrete-event supervisor for condition monitoring and life extending control. A global upper level DES controller is designed for load balancing and overall health management of the propulsion system.

  12. CAISSON: Interconnect Network Simulator

    NASA Technical Reports Server (NTRS)

    Springer, Paul L.

    2006-01-01

    Cray response to HPCS initiative. Model future petaflop computer interconnect. Parallel discrete event simulation techniques for large scale network simulation. Built on WarpIV engine. Run on laptop and Altix 3000. Can be sized up to 1000 simulated nodes per host node. Good parallel scaling characteristics. Flexible: multiple injectors, arbitration strategies, queue iterators, network topologies.

  13. Parallel methods for the flight simulation model

    SciTech Connect

    Xiong, Wei Zhong; Swietlik, C.

    1994-06-01

    The Advanced Computer Applications Center (ACAC) has been involved in evaluating advanced parallel architecture computers and the applicability of these machines to computer simulation models. The advanced systems investigated include parallel machines with shared. memory and distributed architectures consisting of an eight processor Alliant FX/8, a twenty four processor sor Sequent Symmetry, Cray XMP, IBM RISC 6000 model 550, and the Intel Touchstone eight processor Gamma and 512 processor Delta machines. Since parallelizing a truly efficient application program for the parallel machine is a difficult task, the implementation for these machines in a realistic setting has been largely overlooked. The ACAC has developed considerable expertise in optimizing and parallelizing application models on a collection of advanced multiprocessor systems. One of aspect of such an application model is the Flight Simulation Model, which used a set of differential equations to describe the flight characteristics of a launched missile by means of a trajectory. The Flight Simulation Model was written in the FORTRAN language with approximately 29,000 lines of source code. Depending on the number of trajectories, the computation can require several hours to full day of CPU time on DEC/VAX 8650 system. There is an impetus to reduce the execution time and utilize the advanced parallel architecture computing environment available. ACAC researchers developed a parallel method that allows the Flight Simulation Model to be able to run in parallel on the multiprocessor system. For the benchmark data tested, the parallel Flight Simulation Model implemented on the Alliant FX/8 has achieved nearly linear speedup. In this paper, we describe a parallel method for the Flight Simulation Model. We believe the method presented in this paper provides a general concept for the design of parallel applications. This concept, in most cases, can be adapted to many other sequential application programs.

  14. Parallel numerical reservoir simulation: A feasibility study

    SciTech Connect

    Michielse, P.H.

    1994-12-31

    This paper discusses a feasibility study to implement a parallel reservoir simulator on parallel computers. The basis of this study is a reservoir simulator that models an injection-production mechanism. The simulator implements a multigrid solver for the elliptic part of the equations, and uses adaptive local grid refinement to rack moving fronts in the reservoir. The parallelization method is based on a domain decomposition method, which assigns the subdomains to the processors. In order to obtain a correct solution, communication across the internal boundaries between the subdomains is required. The implementation of the multigrid method imposes restrictions on the domain decomposition. Furthermore, the adaptive local grid refinement may cause the work load distribution over the processors to be out of balance. Hence, some load balancing technique is required to ensure parallel efficiency. This parallel efficiency is illustrated by experiments on a Convex MetaSeries system.

  15. Xyce parallel electronic simulator : users' guide.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.

  16. Parallel Discrete Molecular Dynamics Simulation With Speculation and In-Order Commitment*

    PubMed Central

    Khan, Md. Ashfaquzzaman; Herbordt, Martin C.

    2011-01-01

    Discrete molecular dynamics simulation (DMD) uses simplified and discretized models enabling simulations to advance by event rather than by timestep. DMD is an instance of discrete event simulation and so is difficult to scale: even in this multi-core era, all reported DMD codes are serial. In this paper we discuss the inherent difficulties of scaling DMD and present our method of parallelizing DMD through event-based decomposition. Our method is microarchitecture inspired: speculative processing of events exposes parallelism, while in-order commitment ensures correctness. We analyze the potential of this parallelization method for shared-memory multiprocessors. Achieving scalability required extensive experimentation with scheduling and synchronization methods to mitigate serialization. The speed-up achieved for a variety of system sizes and complexities is nearly 6 on an 8-core and over 9 on a 12-core processor. We present and verify analytical models that account for the achieved performance as a function of available concurrency and architectural limitations. PMID:21822327

  17. Hierarchical, modular discrete-event modelling in an object-oriented environment

    SciTech Connect

    Zeigler, B.P.

    1987-11-01

    Hierarchical, modular specification of discrete-event models offers a basis for reusable model bases and hence for enhanced simulation of truly varied design alternatives. The authors describe an environment which realizes the DEVS formalism developed for hierarchical, modular models. It is implemented in PC-Scheme, a powerful Lisp dialect for microcomputers containing an object-oriented programming subsystem. Since both the implementation and the underlying language are accessible to the user, the result is a capable medium for combining simulation modelling and artificial intelligence techniques.

  18. Stochastic Parallel PARticle Kinetic Simulator

    Energy Science and Technology Software Center (ESTSC)

    2008-07-01

    SPPARKS is a kinetic Monte Carlo simulator which implements kinetic and Metropolis Monte Carlo solvers in a general way so that they can be hooked to applications of various kinds. Specific applications are implemented in SPPARKS as physical models which generate events (e.g. a diffusive hop or chemical reaction) and execute them one-by-one. Applications can run in paralle so long as the simulation domain can be partitoned spatially so that multiple events can be invokedmore » simultaneously. SPPARKS is used to model various kinds of mesoscale materials science scenarios such as grain growth, surface deposition and growth, and reaction kinetics. It can also be used to develop new Monte Carlo models that hook to the existing solver and paralle infrastructure provided by the code.« less

  19. Stochastic Parallel PARticle Kinetic Simulator

    SciTech Connect

    2008-07-01

    SPPARKS is a kinetic Monte Carlo simulator which implements kinetic and Metropolis Monte Carlo solvers in a general way so that they can be hooked to applications of various kinds. Specific applications are implemented in SPPARKS as physical models which generate events (e.g. a diffusive hop or chemical reaction) and execute them one-by-one. Applications can run in paralle so long as the simulation domain can be partitoned spatially so that multiple events can be invoked simultaneously. SPPARKS is used to model various kinds of mesoscale materials science scenarios such as grain growth, surface deposition and growth, and reaction kinetics. It can also be used to develop new Monte Carlo models that hook to the existing solver and paralle infrastructure provided by the code.

  20. Simulating the scheduling of parallel supercomputer applications

    SciTech Connect

    Seager, M.K.; Stichnoth, J.M.

    1989-09-19

    An Event Driven Simulator for Evaluating Multiprocessing Scheduling (EDSEMS) disciplines is presented. The simulator is made up of three components: machine model; parallel workload characterization ; and scheduling disciplines for mapping parallel applications (many processes cooperating on the same computation) onto processors. A detailed description of how the simulator is constructed, how to use it and how to interpret the output is also given. Initial results are presented from the simulation of parallel supercomputer workloads using Dog-Eat-Dog,'' Family'' and Gang'' scheduling disciplines. These results indicate that Gang scheduling is far better at giving the number of processors that a job requests than Dog-Eat-Dog or Family scheduling. In addition, the system throughput and turnaround time are not adversely affected by this strategy. 10 refs., 8 figs., 1 tab.

  1. Visualization and Tracking of Parallel CFD Simulations

    NASA Technical Reports Server (NTRS)

    Vaziri, Arsi; Kremenetsky, Mark

    1995-01-01

    We describe a system for interactive visualization and tracking of a 3-D unsteady computational fluid dynamics (CFD) simulation on a parallel computer. CM/AVS, a distributed, parallel implementation of a visualization environment (AVS) runs on the CM-5 parallel supercomputer. A CFD solver is run as a CM/AVS module on the CM-5. Data communication between the solver, other parallel visualization modules, and a graphics workstation, which is running AVS, are handled by CM/AVS. Partitioning of the visualization task, between CM-5 and the workstation, can be done interactively in the visual programming environment provided by AVS. Flow solver parameters can also be altered by programmable interactive widgets. This system partially removes the requirement of storing large solution files at frequent time steps, a characteristic of the traditional 'simulate (yields) store (yields) visualize' post-processing approach.

  2. Parallel processing of a rotating shaft simulation

    NASA Technical Reports Server (NTRS)

    Arpasi, Dale J.

    1989-01-01

    A FORTRAN program describing the vibration modes of a rotor-bearing system is analyzed for parellelism in this simulation using a Pascal-like structured language. Potential vector operations are also identified. A critical path through the simulation is identified and used in conjunction with somewhat fictitious processor characteristics to determine the time to calculate the problem on a parallel processing system having those characteristics. A parallel processing overhead time is included as a parameter for proper evaluation of the gain over serial calculation. The serial calculation time is determined for the same fictitious system. An improvement of up to 640 percent is possible depending on the value of the overhead time. Based on the analysis, certain conclusions are drawn pertaining to the development needs of parallel processing technology, and to the specification of parallel processing systems to meet computational needs.

  3. The Xyce Parallel Electronic Simulator - An Overview

    SciTech Connect

    HUTCHINSON,SCOTT A.; KEITER,ERIC R.; HOEKSTRA,ROBERT J.; WATTS,HERMAN A.; WATERS,ARLON J.; SCHELLS,REGINA L.; WIX,STEVEN D.

    2000-12-08

    The Xyce{trademark} Parallel Electronic Simulator has been written to support the simulation needs of the Sandia National Laboratories electrical designers. As such, the development has focused on providing the capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). In addition, they are providing improved performance for numerical kernels using state-of-the-art algorithms, support for modeling circuit phenomena at a variety of abstraction levels and using object-oriented and modern coding-practices that ensure the code will be maintainable and extensible far into the future. The code is a parallel code in the most general sense of the phrase--a message passing parallel implementation--which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Furthermore, careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved even as the number of processors grows.

  4. Parallel and Distributed System Simulation

    NASA Technical Reports Server (NTRS)

    Dongarra, Jack

    1998-01-01

    This exploratory study initiated our research into the software infrastructure necessary to support the modeling and simulation techniques that are most appropriate for the Information Power Grid. Such computational power grids will use high-performance networking to connect hardware, software, instruments, databases, and people into a seamless web that supports a new generation of computation-rich problem solving environments for scientists and engineers. In this context we looked at evaluating the NetSolve software environment for network computing that leverages the potential of such systems while addressing their complexities. NetSolve's main purpose is to enable the creation of complex applications that harness the immense power of the grid, yet are simple to use and easy to deploy. NetSolve uses a modular, client-agent-server architecture to create a system that is very easy to use. Moreover, it is designed to be highly composable in that it readily permits new resources to be added by anyone willing to do so. In these respects NetSolve is to the Grid what the World Wide Web is to the Internet. But like the Web, the design that makes these wonderful features possible can also impose significant limitations on the performance and robustness of a NetSolve system. This project explored the design innovations that push the performance and robustness of the NetSolve paradigm as far as possible without sacrificing the Web-like ease of use and composability that make it so powerful.

  5. Xyce parallel electronic simulator release notes.

    SciTech Connect

    Keiter, Eric Richard; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Santarelli, Keith R.

    2010-05-01

    The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.

  6. NWChem: Exploiting Parallelism in Molecular Simulations

    SciTech Connect

    Straatsma, Tp; Philippopoulos, M.; Mccammon, J. A.

    2000-06-01

    NWChem is the software package for computational chemistry on massively parallel computing systems developed by the High Performance Computational Chemistry group for the Environmental Molecular Sciences Laboratory. The software provides a variety of modules for quantum mechanical and classical mechanical simulation. This article describes the design of the molecular dynamics simulation module, which is based on a domain decomposition, and provides implementation details on the data and communication structure and how the code deals with the complexity of atom redistribution and load balancing.

  7. Parallel Performance of a Combustion Chemistry Simulation

    DOE PAGESBeta

    Skinner, Gregg; Eigenmann, Rudolf

    1995-01-01

    We used a description of a combustion simulation's mathematical and computational methods to develop a version for parallel execution. The result was a reasonable performance improvement on small numbers of processors. We applied several important programming techniques, which we describe, in optimizing the application. This work has implications for programming languages, compiler design, and software engineering.

  8. Performance limitations in parallel processor simulations

    NASA Technical Reports Server (NTRS)

    O'Grady, E. Pearse; Wang, Chung-Hsien

    1987-01-01

    A jet-engine model is partitioned and simulated on a parallel processor system consisting of five 8086/8087 floating-point computers. The simulation uses Heun's integration method. A near-optimal parallel simulation (in the sense of minimum execution time) achieves speedup of only 2.13 and efficiency of 42.6 percent, in effect wasting 57.4 percent of the available processing power. A detailed analysis identifies and graphically demonstrates why the system fails to achieve ideal performance (viz., speedup of 5 and efficiency of 100 percent). Inherent characteristics of the problem equations and solution algorithm account for the loss of nearly half of the available processing power. Overheads associated with interprocessor communication and processor synchronization account for only a small fraction of the lost processing power. The effects of these and other factors which limit parallel processor performance are illustrated through real-time timing-analyzer tracers describing the run/idle status of the parallel processors during the simulation.

  9. Parallel software simulation using PS-nets

    SciTech Connect

    Markov, N.G.; Miroshnichenko, E.A.; Saraikin, A.V.

    1995-09-01

    The requirements of techniques for parallel software simulation are discussed. According to these requirements, techniques on the basis of PS-nets are suggested. The fundamentals of program system modeling by PS-nets are given. The tools developed for modeling are described.

  10. Discrete-Event Execution Alternatives on General Purpose Graphical Processing Units

    SciTech Connect

    Perumalla, Kalyan S

    2006-01-01

    Graphics cards, traditionally designed as accelerators for computer graphics, have evolved to support more general-purpose computation. General Purpose Graphical Processing Units (GPGPUs) are now being used as highly efficient, cost-effective platforms for executing certain simulation applications. While most of these applications belong to the category of time-stepped simulations, little is known about the applicability of GPGPUs to discrete event simulation (DES). Here, we identify some of the issues & challenges that the GPGPU stream-based interface raises for DES, and present some possible approaches to moving DES to GPGPUs. Initial performance results on simulation of a diffusion process show that DES-style execution on GPGPU runs faster than DES on CPU and also significantly faster than time-stepped simulations on either CPU or GPGPU.

  11. Parallel algorithm strategies for circuit simulation.

    SciTech Connect

    Thornquist, Heidi K.; Schiek, Richard Louis; Keiter, Eric Richard

    2010-01-01

    Circuit simulation tools (e.g., SPICE) have become invaluable in the development and design of electronic circuits. However, they have been pushed to their performance limits in addressing circuit design challenges that come from the technology drivers of smaller feature scales and higher integration. Improving the performance of circuit simulation tools through exploiting new opportunities in widely-available multi-processor architectures is a logical next step. Unfortunately, not all traditional simulation applications are inherently parallel, and quickly adapting mature application codes (even codes designed to parallel applications) to new parallel paradigms can be prohibitively difficult. In general, performance is influenced by many choices: hardware platform, runtime environment, languages and compilers used, algorithm choice and implementation, and more. In this complicated environment, the use of mini-applications small self-contained proxies for real applications is an excellent approach for rapidly exploring the parameter space of all these choices. In this report we present a multi-core performance study of Xyce, a transistor-level circuit simulation tool, and describe the future development of a mini-application for circuit simulation.

  12. Safety Discrete Event Models for Holonic Cyclic Manufacturing Systems

    NASA Astrophysics Data System (ADS)

    Ciufudean, Calin; Filote, Constantin

    In this paper the expression “holonic cyclic manufacturing systems” refers to complex assembly/disassembly systems or fork/join systems, kanban systems, and in general, to any discrete event system that transforms raw material and/or components into products. Such a system is said to be cyclic if it provides the same sequence of products indefinitely. This paper considers the scheduling of holonic cyclic manufacturing systems and describes a new approach using Petri nets formalism. We propose an approach to frame the optimum schedule of holonic cyclic manufacturing systems in order to maximize the throughput while minimize the work in process. We also propose an algorithm to verify the optimum schedule.

  13. Parallel Implicit Kinetic Simulation with PARSEK

    NASA Astrophysics Data System (ADS)

    Stefano, Markidis; Giovanni, Lapenta

    2004-11-01

    Kinetic plasma simulation is the ultimate tool for plasma analysis. One of the prime tools for kinetic simulation is the particle in cell (PIC) method. The explicit or semi-implicit (i.e. implicit only on the fields) PIC method requires exceedingly small time steps and grid spacing, limited by the necessity to resolve the electron plasma frequency, the Debye length and the speed of light (for fully explicit schemes). A different approach is to consider fully implicit PIC methods where both particles and fields are discretized implicitly. This approach allows radically larger time steps and grid spacing, reducing the cost of a simulation by orders of magnitude while keeping the full kinetic treatment. In our previous work, simulations impossible for the explicit PIC method even on massively parallel computers have been made possible on a single processor machine using the implicit PIC code CELESTE3D [1]. We propose here another quantum leap: PARSEK, a parallel cousin of CELESTE3D, based on the same approach but sporting a radically redesigned software architecture (object oriented C++, where CELESTE3D was structured and written in FORTRAN77/90) and fully parallelized using MPI for both particle and grid communication. [1] G. Lapenta, J.U. Brackbill, W.S. Daughton, Phys. Plasmas, 10, 1577 (2003).

  14. Parallel node placement method by bubble simulation

    NASA Astrophysics Data System (ADS)

    Nie, Yufeng; Zhang, Weiwei; Qi, Nan; Li, Yiqiang

    2014-03-01

    An efficient Parallel Node Placement method by Bubble Simulation (PNPBS), employing METIS-based domain decomposition (DD) for an arbitrary number of processors is introduced. In accordance with the desired nodal density and Newtons Second Law of Motion, automatic generation of node sets by bubble simulation has been demonstrated in previous work. Since the interaction force between nodes is short-range, for two distant nodes, their positions and velocities can be updated simultaneously and independently during dynamic simulation, which indicates the inherent property of parallelism, it is quite suitable for parallel computing. In this PNPBS method, the METIS-based DD scheme has been investigated for uniform and non-uniform node sets, and dynamic load balancing is obtained by evenly distributing work among the processors. For the nodes near the common interface of two neighboring subdomains, there is no need for special treatment after dynamic simulation. These nodes have good geometrical properties and a smooth density distribution which is desirable in the numerical solution of partial differential equations (PDEs). The results of numerical examples show that quasi linear speedup in the number of processors and high efficiency are achieved.

  15. Xyce parallel electronic simulator : reference guide.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide. The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. It is targeted specifically to run on large-scale parallel computing platforms but also runs well on a variety of architectures including single processor workstations. It also aims to support a variety of devices and models specific to Sandia needs. This document is intended to complement the Xyce Users Guide. It contains comprehensive, detailed information about a number of topics pertinent to the usage of Xyce. Included in this document is a netlist reference for the input-file commands and elements supported within Xyce; a command line reference, which describes the available command line arguments for Xyce; and quick-references for users of other circuit codes, such as Orcad's PSpice and Sandia's ChileSPICE.

  16. Parallel simulated annealing for emission tomography.

    PubMed

    Girodias, K A; Barrett, H H; Shoemaker, R L

    1991-07-01

    A method for implementing simulated annealing in parallel to speed up the execution of emission tomography (ET) image reconstruction is presented. A high degree of parallelism can be attained by using a parallel-acceptance partitioning strategy, in which perturbations to subsets of the estimate are evaluated in parallel. However because the point spread function in ET imaging systems is globally dependent, processors cannot update the current estimate independently. Consequently, processors must be synchronized each time a perturbation is accepted to avoid introducing error. This can produce excessive communications overhead, especially when the acceptance rate is high. In this paper an energy function is constructed to reduce the synchronization requirements by using a reformulation of the log-likelihood function from the expectation maximization (EM) algorithm. The approach is to change the global dependence in the energy function from the current estimate to the estimate generated during the last iteration. The synchronization requirements for guaranteed convergence are then significantly reduced from once per acceptance to once per iteration. This parallel implementation on 54 Inmos T800 transputers connected in a ring topology resulted in execution times that were almost 50 times faster than on a VAX 8600. PMID:1886927

  17. Efficient massively parallel simulation of dynamic channel assignment schemes for wireless cellular communications

    NASA Technical Reports Server (NTRS)

    Greenberg, Albert G.; Lubachevsky, Boris D.; Nicol, David M.; Wright, Paul E.

    1994-01-01

    Fast, efficient parallel algorithms are presented for discrete event simulations of dynamic channel assignment schemes for wireless cellular communication networks. The driving events are call arrivals and departures, in continuous time, to cells geographically distributed across the service area. A dynamic channel assignment scheme decides which call arrivals to accept, and which channels to allocate to the accepted calls, attempting to minimize call blocking while ensuring co-channel interference is tolerably low. Specifically, the scheme ensures that the same channel is used concurrently at different cells only if the pairwise distances between those cells are sufficiently large. Much of the complexity of the system comes from ensuring this separation. The network is modeled as a system of interacting continuous time automata, each corresponding to a cell. To simulate the model, conservative methods are used; i.e., methods in which no errors occur in the course of the simulation and so no rollback or relaxation is needed. Implemented on a 16K processor MasPar MP-1, an elegant and simple technique provides speedups of about 15 times over an optimized serial simulation running on a high speed workstation. A drawback of this technique, typical of conservative methods, is that processor utilization is rather low. To overcome this, new methods were developed that exploit slackness in event dependencies over short intervals of time, thereby raising the utilization to above 50 percent and the speedup over the optimized serial code to about 120 times.

  18. Xyce(™) Parallel Electronic Simulator

    Energy Science and Technology Software Center (ESTSC)

    2013-10-03

    The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuitmore » network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.« less

  19. Xyce(™) Parallel Electronic Simulator

    SciTech Connect

    2013-10-03

    The Xyce Parallel Electronic Simulator simulates electronic circuit behavior in DC, AC, HB, MPDE and transient mode using standard analog (DAE) and/or device (PDE) device models including several age and radiation aware devices. It supports a variety of computing platforms (both serial and parallel) computers. Lastly, it uses a variety of modern solution algorithms dynamic parallel load-balancing and iterative solvers.! ! Xyce is primarily used to simulate the voltage and current behavior of a circuit network (a network of electronic devices connected via a conductive network). As a tool, it is mainly used for the design and analysis of electronic circuits.! ! Kirchoff's conservation laws are enforced over a network using modified nodal analysis. This results in a set of differential algebraic equations (DAEs). The resulting nonlinear problem is solved iteratively using a fully coupled Newton method, which in turn results in a linear system that is solved by either a standard sparse-direct solver or iteratively using Trilinos linear solver packages, also developed at Sandia National Laboratories.

  20. Parallelism extraction and program restructuring for parallel simulation of digital systems

    SciTech Connect

    Vellandi, B.L.

    1990-01-01

    Two topics currently of interest to the computer aided design (CADF) for the very-large-scale integrated circuit (VLSI) community are using the VHSIC Hardware Description Language (VHDL) effectively and decreasing simulation times of VLSI designs through parallel execution of the simulator. The goal of this research is to increase the degree of parallelism obtainable in VHDL simulation, and consequently to decrease simulation times. The research targets simulation on massively parallel architectures. Experimentation and instrumentation were done on the SIMD Connection Machine. The author discusses her method used to extract parallelism and restructure a VHDL program, experimental results using this method, and requirements for a parallel architecture for fast simulation.

  1. Parallelization of the Simulated Annealing Algorithm

    NASA Astrophysics Data System (ADS)

    Seacat, Russell Holland, III

    Nuclear medicine imaging involves the introduction of a radiopharmaceutical into the body and the subsequent detection of the radiation emanating from the organ at which the procedure was directed. The data set resulting from such a procedure is generally very underdetermined, due to the dimensions of the imaging apparatus, and underconstrained due to the noise in the imaging process. A means by which more information can be obtained is through a form of imaging utilizing code-apertures. Although increasing the amount of information collected, coded-aperture imaging results in a multiplexing of the data. Demultiplexing the data requires a reconstruction process not required in conventional nuclear medicine imaging. The reconstruction process requires the optimization of an estimate to the object to be reconstructed. This optimization is done through the minimization of an energy functional. The minimization of such energy functionals requires the optimization of several parameters. Solution of this type problem is difficult because there are far too many degrees of freedom to permit an exhaustive search for an optimum, and in many cases no algorithms are known which will determine the exact optimum with significantly less work than exhaustive search. Instead, heuristic algorithms, such as the simulated annealing algorithm, have been employed and have proven effective in minimizing such energy functionals. Unfortunately, the simulated annealing algorithm, as characteristic of Monte Carlo algorithms, is very computer intensive; in fact, it is so intensive that insufficient computational power is often the chief hindrance to investigation of the algorithm. The simulated annealing algorithm, however, is amenable to parallel processing. The goal of the research in this dissertation is to investigate the parameters involved in implementing the simulated annealing algorithm in parallel; however, the form of the simulated annealing algorithm implemented here requires no annealing because the energy functionals investigated are quadratic in form. The parameters related to the parallelization of the simulated annealing algorithm include the decomposition of the reconstruction space among the processors, the formulation of the problem at the estimate level with the smallest task being a single perturbation trial evaluated on a local basis, and the communications required to keep all the processors as current as possible with changes made simultaneously to the estimate. Three objects, varying in size, shape and detail, are reconstructed utilizing the TRIMM parallel processor.

  2. Parallel Strategies for Crash and Impact Simulations

    SciTech Connect

    Attaway, S.; Brown, K.; Hendrickson, B.; Plimpton, S.

    1998-12-07

    We describe a general strategy we have found effective for parallelizing solid mechanics simula- tions. Such simulations often have several computationally intensive parts, including finite element integration, detection of material contacts, and particle interaction if smoothed particle hydrody- namics is used to model highly deforming materials. The need to balance all of these computations simultaneously is a difficult challenge that has kept many commercial and government codes from being used effectively on parallel supercomputers with hundreds or thousands of processors. Our strategy is to load-balance each of the significant computations independently with whatever bal- ancing technique is most appropriate. The chief benefit is that each computation can be scalably paraIlelized. The drawback is the data exchange between processors and extra coding that must be written to maintain multiple decompositions in a single code. We discuss these trade-offs and give performance results showing this strategy has led to a parallel implementation of a widely-used solid mechanics code that can now be run efficiently on thousands of processors of the Pentium-based Sandia/Intel TFLOPS machine. We illustrate with several examples the kinds of high-resolution, million-element models that can now be simulated routinely. We also look to the future and dis- cuss what possibilities this new capabUity promises, as well as the new set of challenges it poses in material models, computational techniques, and computing infrastructure.

  3. Massively Parallel Direct Simulation of Multiphase Flow

    SciTech Connect

    COOK,BENJAMIN K.; PREECE,DALE S.; WILLIAMS,J.R.

    2000-08-10

    The authors understanding of multiphase physics and the associated predictive capability for multi-phase systems are severely limited by current continuum modeling methods and experimental approaches. This research will deliver an unprecedented modeling capability to directly simulate three-dimensional multi-phase systems at the particle-scale. The model solves the fully coupled equations of motion governing the fluid phase and the individual particles comprising the solid phase using a newly discovered, highly efficient coupled numerical method based on the discrete-element method and the Lattice-Boltzmann method. A massively parallel implementation will enable the solution of large, physically realistic systems.

  4. A parallel algorithm for implicit depletant simulations

    NASA Astrophysics Data System (ADS)

    Glaser, Jens; Karas, Andrew S.; Glotzer, Sharon C.

    2015-11-01

    We present an algorithm to simulate the many-body depletion interaction between anisotropic colloids in an implicit way, integrating out the degrees of freedom of the depletants, which we treat as an ideal gas. Because the depletant particles are statistically independent and the depletion interaction is short-ranged, depletants are randomly inserted in parallel into the excluded volume surrounding a single translated and/or rotated colloid. A configurational bias scheme is used to enhance the acceptance rate. The method is validated and benchmarked both on multi-core processors and graphics processing units for the case of hard spheres, hemispheres, and discoids. With depletants, we report novel cluster phases in which hemispheres first assemble into spheres, which then form ordered hcp/fcc lattices. The method is significantly faster than any method without cluster moves and that tracks depletants explicitly, for systems of colloid packing fraction ?c < 0.50, and additionally enables simulation of the fluid-solid transition.

  5. Distributed simulation, 1989

    SciTech Connect

    Unger, B. ); Fujimoto, R. )

    1989-01-01

    Computer simulation of large, complex systems remains a major stumbling block in many research and development efforts. Computational requirements continue to grow and far exceed the capabilities of uniprocessor hardware. Simulation of many important applications in engineering and economics require excessive amounts of time on existing machines, and many large-scale simulations cannot be performed because computation costs are prohibitive. Obtaining truly significant speedups for these problems requires the widespread exploitation of parallelism. Research in parallel simulation, and parallel discrete event simulation in particular, has expanded dramatically over the last few years. The increased availability of parallel computers, coupled with the intellectual challenges associated with exploring new uncharted territory in this difficult problem domain, has lead to a renaissance of activity. The authors report recent developments regarding the application of parallel computation to both discrete event simulation problems. Many of the papers address advances in parallel discrete event simulation, where synchronization of simulated time clocks across the parallel simulator represents a major stumbling block to achieving significant speedups.

  6. Optimal Parametric Discrete Event Control: Problem and Solution

    SciTech Connect

    Griffin, Christopher H

    2008-01-01

    We present a novel optimization problem for discrete event control, similar in spirit to the optimal parametric control problem common in statistical process control. In our problem, we assume a known finite state machine plant model $G$ defined over an event alphabet $\\Sigma$ so that the plant model language $L = \\LanM(G)$ is prefix closed. We further assume the existence of a \\textit{base control structure} $M_K$, which may be either a finite state machine or a deterministic pushdown machine. If $K = \\LanM(M_K)$, we assume $K$ is prefix closed and that $K \\subseteq L$. We associate each controllable transition of $M_K$ with a binary variable $X_1,\\dots,X_n$ indicating whether the transition is enabled or not. This leads to a function $M_K(X_1,\\dots,X_n)$, that returns a new control specification depending upon the values of $X_1,\\dots,X_n$. We exhibit a branch-and-bound algorithm to solve the optimization problem $\\min_{X_1,\\dots,X_n}\\max_{w \\in K} C(w)$ such that $M_K(X_1,\\dots,X_n) \\models \\Pi$ and $\\LanM(M_K(X_1,\\dots,X_n)) \\in \\Con(L)$. Here $\\Pi$ is a set of logical assertions on the structure of $M_K(X_1,\\dots,X_n)$, and $M_K(X_1,\\dots,X_n) \\models \\Pi$ indicates that $M_K(X_1,\\dots,X_n)$ satisfies the logical assertions; and, $\\Con(L)$ is the set of controllable sublanguages of $L$.

  7. Improving the Teaching of Discrete-Event Control Systems Using a LEGO Manufacturing Prototype

    ERIC Educational Resources Information Center

    Sanchez, A.; Bucio, J.

    2012-01-01

    This paper discusses the usefulness of employing LEGO as a teaching-learning aid in a post-graduate-level first course on the control of discrete-event systems (DESs). The final assignment of the course is presented, which asks students to design and implement a modular hierarchical discrete-event supervisor for the coordination layer of a

  8. Improving the Teaching of Discrete-Event Control Systems Using a LEGO Manufacturing Prototype

    ERIC Educational Resources Information Center

    Sanchez, A.; Bucio, J.

    2012-01-01

    This paper discusses the usefulness of employing LEGO as a teaching-learning aid in a post-graduate-level first course on the control of discrete-event systems (DESs). The final assignment of the course is presented, which asks students to design and implement a modular hierarchical discrete-event supervisor for the coordination layer of a…

  9. Empirical study of parallel LRU simulation algorithms

    NASA Technical Reports Server (NTRS)

    Carr, Eric; Nicol, David M.

    1994-01-01

    This paper reports on the performance of five parallel algorithms for simulating a fully associative cache operating under the LRU (Least-Recently-Used) replacement policy. Three of the algorithms are SIMD, and are implemented on the MasPar MP-2 architecture. Two other algorithms are parallelizations of an efficient serial algorithm on the Intel Paragon. One SIMD algorithm is quite simple, but its cost is linear in the cache size. The two other SIMD algorithm are more complex, but have costs that are independent on the cache size. Both the second and third SIMD algorithms compute all stack distances; the second SIMD algorithm is completely general, whereas the third SIMD algorithm presumes and takes advantage of bounds on the range of reference tags. Both MIMD algorithm implemented on the Paragon are general and compute all stack distances; they differ in one step that may affect their respective scalability. We assess the strengths and weaknesses of these algorithms as a function of problem size and characteristics, and compare their performance on traces derived from execution of three SPEC benchmark programs.

  10. Parallel Proximity Detection for Computer Simulation

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

    1997-01-01

    The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are includes by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

  11. Parallel Proximity Detection for Computer Simulations

    NASA Technical Reports Server (NTRS)

    Steinman, Jeffrey S. (Inventor); Wieland, Frederick P. (Inventor)

    1998-01-01

    The present invention discloses a system for performing proximity detection in computer simulations on parallel processing architectures utilizing a distribution list which includes movers and sensor coverages which check in and out of grids. Each mover maintains a list of sensors that detect the mover's motion as the mover and sensor coverages check in and out of the grids. Fuzzy grids are included by fuzzy resolution parameters to allow movers and sensor coverages to check in and out of grids without computing exact grid crossings. The movers check in and out of grids while moving sensors periodically inform the grids of their coverage. In addition, a lookahead function is also included for providing a generalized capability without making any limiting assumptions about the particular application to which it is applied. The lookahead function is initiated so that risk-free synchronization strategies never roll back grid events. The lookahead function adds fixed delays as events are scheduled for objects on other nodes.

  12. A polymorphic reconfigurable emulator for parallel simulation

    NASA Technical Reports Server (NTRS)

    Parrish, E. A., Jr.; Mcvey, E. S.; Cook, G.

    1980-01-01

    Microprocessor and arithmetic support chip technology was applied to the design of a reconfigurable emulator for real time flight simulation. The system developed consists of master control system to perform all man machine interactions and to configure the hardware to emulate a given aircraft, and numerous slave compute modules (SCM) which comprise the parallel computational units. It is shown that all parts of the state equations can be worked on simultaneously but that the algebraic equations cannot (unless they are slowly varying). Attempts to obtain algorithms that will allow parellel updates are reported. The word length and step size to be used in the SCM's is determined and the architecture of the hardware and software is described.

  13. Parallel multiscale simulations of a brain aneurysm

    NASA Astrophysics Data System (ADS)

    Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

    2013-07-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver NɛκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NɛκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.

  14. Parallel multiscale simulations of a brain aneurysm.

    PubMed

    Grinberg, Leopold; Fedosov, Dmitry A; Karniadakis, George Em

    2013-07-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multi-scale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver εκ αr . The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers ( εκ αr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work. PMID:23734066

  15. Parallel multiscale simulations of a brain aneurysm

    SciTech Connect

    Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

    2013-07-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future work.

  16. Parallelization of Rocket Engine Simulator Software (PRESS)

    NASA Technical Reports Server (NTRS)

    Cezzar, Ruknet

    1997-01-01

    Parallelization of Rocket Engine System Software (PRESS) project is part of a collaborative effort with Southern University at Baton Rouge (SUBR), University of West Florida (UWF), and Jackson State University (JSU). The second-year funding, which supports two graduate students enrolled in our new Master's program in Computer Science at Hampton University and the principal investigator, have been obtained for the period from October 19, 1996 through October 18, 1997. The key part of the interim report was new directions for the second year funding. This came about from discussions during Rocket Engine Numeric Simulator (RENS) project meeting in Pensacola on January 17-18, 1997. At that time, a software agreement between Hampton University and NASA Lewis Research Center had already been concluded. That agreement concerns off-NASA-site experimentation with PUMPDES/TURBDES software. Before this agreement, during the first year of the project, another large-scale FORTRAN-based software, Two-Dimensional Kinetics (TDK), was being used for translation to an object-oriented language and parallelization experiments. However, that package proved to be too complex and lacking sufficient documentation for effective translation effort to the object-oriented C + + source code. The focus, this time with better documented and more manageable PUMPDES/TURBDES package, was still on translation to C + + with design improvements. At the RENS Meeting, however, the new impetus for the RENS projects in general, and PRESS in particular, has shifted in two important ways. One was closer alignment with the work on Numerical Propulsion System Simulator (NPSS) through cooperation and collaboration with LERC ACLU organization. The other was to see whether and how NASA's various rocket design software can be run over local and intra nets without any radical efforts for redesign and translation into object-oriented source code. There were also suggestions that the Fortran based code be encapsulated in C + + code thereby facilitating reuse without undue development effort. The details are covered in the aforementioned section of the interim report filed on April 28, 1997.

  17. Partitioning strategies for parallel KIVA-4 engine simulations

    SciTech Connect

    Torres, D J; Kong, S C

    2008-01-01

    Parallel KIVA-4 is described and simulated in four different engine geometries. The Message Passing-Interface (MPl) was used to parallelize KIVA-4. Par itioning strategies ar accesed in light of the fact that cells can become deactivated and activated during the course of an engine simulation which will affect the load balance between processors.

  18. Parallel methods for dynamic simulation of multiple manipulator systems

    NASA Technical Reports Server (NTRS)

    Mcmillan, Scott; Sadayappan, P.; Orin, David E.

    1993-01-01

    In this paper, efficient dynamic simulation algorithms for a system of m manipulators, cooperating to manipulate a large load, are developed; their performance, using two possible forms of parallelism on a general-purpose parallel computer, is investigated. One form, temporal parallelism, is obtained with the use of parallel numerical integration methods. A speedup of 3.78 on four processors of CRAY Y-MP8 was achieved with a parallel four-point block predictor-corrector method for the simulation of a four manipulator system. These multi-point methods suffer from reduced accuracy, and when comparing these runs with a serial integration method, the speedup can be as low as 1.83 for simulations with the same accuracy. To regain the performance lost due to accuracy problems, a second form of parallelism is employed. Spatial parallelism allows most of the dynamics of each manipulator chain to be computed simultaneously. Used exclusively in the four processor case, this form of parallelism in conjunction with a serial integration method results in a speedup of 3.1 on four processors over the best serial method. In cases where there are either more processors available or fewer chains in the system, the multi-point parallel integration methods are still advantageous despite the reduced accuracy because both forms of parallelism can then combine to generate more parallel tasks and achieve greater effective speedups. This paper also includes results for these cases.

  19. Aerodynamic simulation on massively parallel systems

    NASA Astrophysics Data System (ADS)

    Haeuser, Jochem; Simon, Horst D.

    This paper briefly addresses the computational requirements for the analysis of complete configurations of aircraft and spacecraft currently under design to be used for advanced transportation in commercial applications as well as in space flight. The discussion clearly shows that massively parallel systems are the only alternative which is both cost effective and on the other hand can provide the necessary TeraFlops, needed to satisfy the narrow design margins of modern vehicles. It is assumed that the solution of the governing physical equations, i.e., the Navier-Stokes equations which may be complemented by chemistry and turbulence models, is done on multiblock grids. This technique is situated between the fully structured approach of classical boundary fitted grids and the fully unstructured tetrahedra grids. A fully structured grid best represents the flow physics, while the unstructured grid gives best geometrical flexibility. The multiblock grid employed is structured within a block, but completely unstructured on the block level. While a completely unstructured grid is not straightforward to parallelize, the above mentioned multiblock grid is inherently parallel, in particular for multiple instruction multiple datastream (MIMD) machines. In this paper guidelines are provided for setting up or modifying an existing sequential code so that a direct parallelization on a massively parallel system is possible. Results are presented for three parallel systems, namely the Intel hypercube, the Ncube hypercube, and the FPS 500 system. Some preliminary results for an 8K CM2 machine will also be mentioned. The code run is the two dimensional grid generation module of Grid, which is a general two dimensional and three dimensional grid generation code for complex geometries. A system of nonlinear Poisson equations is solved. This code is also a good testcase for complex fluid dynamics codes, since the same datastructures are used. All systems provided good speedups, but message passing MIMD systems seem to be best suited for large miltiblock applications.

  20. Discrete event command and control for networked teams with multiple missions

    NASA Astrophysics Data System (ADS)

    Lewis, Frank L.; Hudas, Greg R.; Pang, Chee Khiang; Middleton, Matthew B.; McMurrough, Christopher

    2009-05-01

    During mission execution in military applications, the TRADOC Pamphlet 525-66 Battle Command and Battle Space Awareness capabilities prescribe expectations that networked teams will perform in a reliable manner under changing mission requirements, varying resource availability and reliability, and resource faults. In this paper, a Command and Control (C2) structure is presented that allows for computer-aided execution of the networked team decision-making process, control of force resources, shared resource dispatching, and adaptability to change based on battlefield conditions. A mathematically justified networked computing environment is provided called the Discrete Event Control (DEC) Framework. DEC has the ability to provide the logical connectivity among all team participants including mission planners, field commanders, war-fighters, and robotic platforms. The proposed data management tools are developed and demonstrated on a simulation study and an implementation on a distributed wireless sensor network. The results show that the tasks of multiple missions are correctly sequenced in real-time, and that shared resources are suitably assigned to competing tasks under dynamically changing conditions without conflicts and bottlenecks.

  1. Parallel-Processing Test Bed For Simulation Software

    NASA Technical Reports Server (NTRS)

    Blech, Richard; Cole, Gary; Townsend, Scott

    1996-01-01

    Second-generation Hypercluster computing system is multiprocessor test bed for research on parallel algorithms for simulation in fluid dynamics, electromagnetics, chemistry, and other fields with large computational requirements but relatively low input/output requirements. Built from standard, off-shelf hardware readily upgraded as improved technology becomes available. System used for experiments with such parallel-processing concepts as message-passing algorithms, debugging software tools, and computational steering. First-generation Hypercluster system described in "Hypercluster Parallel Processor" (LEW-15283).

  2. Parallel architecture for real-time simulation. Master's thesis

    SciTech Connect

    Cockrell, C.D.

    1989-01-01

    This thesis is concerned with the development of a very fast and highly efficient parallel computer architecture for real-time simulation of continuous systems. Currently, several parallel processing systems exist that may be capable of executing a complex simulation in real-time. These systems are examined and the pros and cons of each system discussed. The thesis then introduced a custom-designed parallel architecture based upon The University of Alabama's OPERA architecture. Each component of this system is discussed and rationale presented for its selection. The problem selected, real-time simulation of the Space Shuttle Main Engine for the test and evaluation of the proposed architecture, is explored, identifying the areas where parallelism can be exploited and parallel processing applied. Results from the test and evaluation phase are presented and compared with the results of the same problem that has been processed on a uniprocessor system.

  3. Xyce Parallel Electronic Simulator : users' guide, version 4.1.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2009-02-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.

  4. Xyce parallel electronic simulator : users' guide. Version 5.1.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2009-11-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only). (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique electrical simulation capability, designed to meet the unique needs of the laboratory.

  5. Broadband monitoring simulation with massively parallel processors

    NASA Astrophysics Data System (ADS)

    Trubetskov, Mikhail; Amotchkina, Tatiana; Tikhonravov, Alexander

    2011-09-01

    Modern efficient optimization techniques, namely needle optimization and gradual evolution, enable one to design optical coatings of any type. Even more, these techniques allow obtaining multiple solutions with close spectral characteristics. It is important, therefore, to develop software tools that can allow one to choose a practically optimal solution from a wide variety of possible theoretical designs. A practically optimal solution provides the highest production yield when optical coating is manufactured. Computational manufacturing is a low-cost tool for choosing a practically optimal solution. The theory of probability predicts that reliable production yield estimations require many hundreds or even thousands of computational manufacturing experiments. As a result reliable estimation of the production yield may require too much computational time. The most time-consuming operation is calculation of the discrepancy function used by a broadband monitoring algorithm. This function is formed by a sum of terms over wavelength grid. These terms can be computed simultaneously in different threads of computations which opens great opportunities for parallelization of computations. Multi-core and multi-processor systems can provide accelerations up to several times. Additional potential for further acceleration of computations is connected with using Graphics Processing Units (GPU). A modern GPU consists of hundreds of massively parallel processors and is capable to perform floating-point operations efficiently.

  6. Large nonadiabatic quantum molecular dynamics simulations on parallel computers

    NASA Astrophysics Data System (ADS)

    Shimojo, Fuyuki; Ohmura, Satoshi; Mou, Weiwei; Kalia, Rajiv K.; Nakano, Aiichiro; Vashishta, Priya

    2013-01-01

    We have implemented a quantum molecular dynamics simulation incorporating nonadiabatic electronic transitions on massively parallel computers to study photoexcitation dynamics of electrons and ions. The nonadiabatic quantum molecular dynamics (NAQMD) simulation is based on Casida's linear response time-dependent density functional theory to describe electronic excited states and Tully's fewest-switches surface hopping approach to describe nonadiabatic electron-ion dynamics. To enable large NAQMD simulations, a series of techniques are employed for efficiently calculating long-range exact exchange correction and excited-state forces. The simulation program is parallelized using hybrid spatial and band decomposition, and is tested for various materials.

  7. A conservative approach to parallelizing the Sharks World simulation

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Riffe, Scott E.

    1990-01-01

    Parallelizing a benchmark problem for parallel simulation, the Sharks World, is described. The described solution is conservative, in the sense that no state information is saved, and no 'rollbacks' occur. The used approach illustrates both the principal advantage and principal disadvantage of conservative parallel simulation. The advantage is that by exploiting lookahead an approach was found that dramatically improves the serial execution time, and also achieves excellent speedups. The disadvantage is that if the model rules are changed in such a way that the lookahead is destroyed, it is difficult to modify the solution to accommodate the changes.

  8. Traffic simulations on parallel computers using domain decomposition techniques

    SciTech Connect

    Hanebutte, U.R.; Tentner, A.M.

    1995-12-31

    Large scale simulations of Intelligent Transportation Systems (ITS) can only be achieved by using the computing resources offered by parallel computing architectures. Domain decomposition techniques are proposed which allow the performance of traffic simulations with the standard simulation package TRAF-NETSIM on a 128 nodes IBM SPx parallel supercomputer as well as on a cluster of SUN workstations. Whilst this particular parallel implementation is based on NETSIM, a microscopic traffic simulation model, the presented strategy is applicable to a broad class of traffic simulations. An outer iteration loop must be introduced in order to converge to a global solution. A performance study that utilizes a scalable test network that consist of square-grids is presented, which addresses the performance penalty introduced by the additional iteration loop.

  9. Parallel Signal Processing and System Simulation using aCe

    NASA Technical Reports Server (NTRS)

    Dorband, John E.; Aburdene, Maurice F.

    2003-01-01

    Recently, networked and cluster computation have become very popular for both signal processing and system simulation. A new language is ideally suited for parallel signal processing applications and system simulation since it allows the programmer to explicitly express the computations that can be performed concurrently. In addition, the new C based parallel language (ace C) for architecture-adaptive programming allows programmers to implement algorithms and system simulation applications on parallel architectures by providing them with the assurance that future parallel architectures will be able to run their applications with a minimum of modification. In this paper, we will focus on some fundamental features of ace C and present a signal processing application (FFT).

  10. Fully Implicit Parallel Simulation of Single Neurons

    PubMed Central

    Hines, Michael L.; Markram, Henry; Schrmann, Felix

    2009-01-01

    When a multi-compartment neuron is divided into subtrees such that no subtree has more than two connection points to other subtrees, the subtrees can be on different processors and the entire system remains amenable to direct Gaussian elimination with only a modest increase in complexity. Accuracy is the same as with standard Gaussian elimination on a single processor. It is often feasible to divide a 3-d reconstructed neuron model onto a dozen or so processors and experience almost linear speedup. We have also used the method for purposes of load balance in network simulations when some cells are so large that their individual computation time is much longer than the average processor computation time or when there are many more processors than cells. The method is available in the standard distribution of the NEURON simulation program. PMID:18379867

  11. High-performance retargetable simulator for parallel architectures. Technical report

    SciTech Connect

    Dellarocas, C.N.

    1991-06-01

    In this thesis, the authors describe Proteus, a high-performance simulation-based system for the evaluation of parallel algorithms and system software. Proteus is built around a retargetable parallel architecture simulator and a flexible data collection and display component. The simulator uses a combination of simulation and direct execution to achieve high performance, while retaining simulation accuracy. Proteus can be configured to simulate a wide range of shared memory and message passing MIMD architectures and the level of simulation detail can be chosen by the user. Detailed memory, cache and network simulation is supported. Parallel programs can be written using a programming model based on C and a set of runtime system calls for thread and memory management. The system allows nonintrusive monitoring of arbitrary information about an execution, and provides flexible graphical utilities for displaying recorded data. To validate the accuracy of the system, a number of published experiments were reproduced on Proteus. In all cases the results obtained by simulation are very close to those published, a fact that provides support for the reliability of the system. Performance measurements demonstrate that the simulator is one to two orders of magnitude faster than other similar multiprocessor simulators.

  12. An approach to real-time simulation using parallel processing

    NASA Technical Reports Server (NTRS)

    Blech, R. A.; Arpasi, D. J.

    1981-01-01

    A preliminary simulator design that uses a parallel computer organization to provide accuracy, portability, and low cost is presented. The hardware and software for this prototype simulator are discussed. A detailed discussion of the inter-computer data transfer mechanism is also presented.

  13. 3-D massively parallel impact simulations using PCTH

    SciTech Connect

    Fang, H.E.; Robinson, A.C.

    1992-12-31

    Simulations of hypervelocity impact problems are performed frequently by government laboratories and contractors for armor/anti-armor applications. These simulations need to deal with shock wave physics phenomena, large material deformation, motion of debris particles and complex geometries. As a result, memory and processing time requirements are large for detailed, three-dimensional calculations. The large massively parallel supercomputing systems of the future will provide the power necessary to greatly reduce simulation times currently required by shared-memory, vector supercomputers. This paper gives an introduction to PCTH, a next-generation shock wave physics code which is being built at Sandia National Laboratories for massively parallel supercomputers, and demonstrates that massively parallel hydrocodes, such as PCTH, can provide highly-detailed, three-dimensional simulations of armor/anti-armor systems.

  14. 3-D massively parallel impact simulations using PCTH

    SciTech Connect

    Fang, H.E.; Robinson, A.C.

    1992-01-01

    Simulations of hypervelocity impact problems are performed frequently by government laboratories and contractors for armor/anti-armor applications. These simulations need to deal with shock wave physics phenomena, large material deformation, motion of debris particles and complex geometries. As a result, memory and processing time requirements are large for detailed, three-dimensional calculations. The large massively parallel supercomputing systems of the future will provide the power necessary to greatly reduce simulation times currently required by shared-memory, vector supercomputers. This paper gives an introduction to PCTH, a next-generation shock wave physics code which is being built at Sandia National Laboratories for massively parallel supercomputers, and demonstrates that massively parallel hydrocodes, such as PCTH, can provide highly-detailed, three-dimensional simulations of armor/anti-armor systems.

  15. Parallelization of Rocket Engine Simulator Software (PRESS)

    NASA Technical Reports Server (NTRS)

    Cezzar, Ruknet

    1998-01-01

    We have outlined our work in the last half of the funding period. We have shown how a demo package for RESSAP using MPI can be done. However, we also mentioned the difficulties with the UNIX platform. We have reiterated some of the suggestions made during the presentation of the progress of the at Fourth Annual HBCU Conference. Although we have discussed, in some detail, how TURBDES/PUMPDES software can be run in parallel using MPI, at present, we are unable to experiment any further with either MPI or PVM. Due to X windows not being implemented, we are also not able to experiment further with XPVM, which it will be recalled, has a nice GUI interface. There are also some concerns, on our part, about MPI being an appropriate tool. The best thing about MPr is that it is public domain. Although and plenty of documentation exists for the intricacies of using MPI, little information is available on its actual implementations. Other than very typical, somewhat contrived examples, such as Jacobi algorithm for solving Laplace's equation, there are few examples which can readily be applied to real situations, such as in our case. In effect, the review of literature on both MPI and PVM, and there is a lot, indicate something similar to the enormous effort which was spent on LISP and LISP-like languages as tools for artificial intelligence research. During the development of a book on programming languages [12], when we searched the literature for very simple examples like taking averages, reading and writing records, multiplying matrices, etc., we could hardly find a any! Yet, so much was said and done on that topic in academic circles. It appears that we faced the same problem with MPI, where despite significant documentation, we could not find even a simple example which supports course-grain parallelism involving only a few processes. From the foregoing, it appears that a new direction may be required for more productive research during the extension period (10/19/98 - 10/18/99). At the least, the research would need to be done on Windows 95/Windows NT based platforms. Moreover, with the acquisition of Lahey Fortran package for PC platform, and the existing Borland C + + 5. 0, we can do work on C + + wrapper issues. We have carefully studied the blueprint for Space Transportation Propulsion Integrated Design Environment for the next 25 years [13] and found the inclusion of HBCUs in that effort encouraging. Especially in the long period for which a map is provided, there is no doubt that HBCUs will grow and become better equipped to do meaningful research. In the shorter period, as was suggested in our presentation at the HBCU conference, some key decisions regarding the aging Fortran based software for rocket propellants will need to be made. One important issue is whether or not object oriented languages such as C + + or Java should be used for distributed computing. Whether or not "distributed computing" is necessary for the existing software is yet another, larger, question to be tackled with.

  16. Improved task scheduling for parallel simulations. Master's thesis

    SciTech Connect

    McNear, A.E.

    1991-12-01

    The objective of this investigation is to design, analyze, and validate the generation of optimal schedules for simulation systems. Improved performance in simulation execution times can greatly improve the return rate of information provided by such simulations resulting in reduced development costs of future computer/electronic systems. Optimal schedule generation of precedence-constrained task systems including iterative feedback systems such as VHDL or war gaming simulations for execution on a parallel computer is known to be N P-hard. Efficiently parallelizing such problems takes full advantage of present computer technology to achieve a significant reduction in the search times required. Unfortunately, the extreme combinatoric 'explosion' of possible task assignments to processors creates an exponential search space prohibitive on any computer for search algorithms which maintain more than one branch of the search graph at any one time. This work develops various parallel modified backtracking (MBT) search algorithms for execution on an iPSC/2 hypercube that bound the space requirements and produce an optimally minimum schedule with linear speed-up. The parallel MBT search algorithm is validated using various feedback task simulation systems which are scheduled for execution on an iPSC/2 hypercube. The search time, size of the enumerated search space, and communications overhead required to ensure efficient utilization during the parallel search process are analyzed. The various applications indicated appreciable improvement in performance using this method.

  17. Applying Parallel Processing Techniques to Tether Dynamics Simulation

    NASA Technical Reports Server (NTRS)

    Wells, B. Earl

    1996-01-01

    The focus of this research has been to determine the effectiveness of applying parallel processing techniques to a sizable real-world problem, the simulation of the dynamics associated with a tether which connects two objects in low earth orbit, and to explore the degree to which the parallelization process can be automated through the creation of new software tools. The goal has been to utilize this specific application problem as a base to develop more generally applicable techniques.

  18. Efficient parallel simulation of CO2 geologic sequestration insaline aquifers

    SciTech Connect

    Zhang, Keni; Doughty, Christine; Wu, Yu-Shu; Pruess, Karsten

    2007-01-01

    An efficient parallel simulator for large-scale, long-termCO2 geologic sequestration in saline aquifers has been developed. Theparallel simulator is a three-dimensional, fully implicit model thatsolves large, sparse linear systems arising from discretization of thepartial differential equations for mass and energy balance in porous andfractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics andthermophysical properties of H2O-NaCl- CO2 mixtures, modeling singleand/or two-phase isothermal or non-isothermal flow processes, two-phasemixtures, fluid phases appearing or disappearing, as well as saltprecipitation or dissolution. The new parallel simulator uses MPI forparallel implementation, the METIS software package for simulation domainpartitioning, and the iterative parallel linear solver package Aztec forsolving linear equations by multiple processors. In addition, theparallel simulator has been implemented with an efficient communicationscheme. Test examples show that a linear or super-linear speedup can beobtained on Linux clusters as well as on supercomputers. Because of thesignificant improvement in both simulation time and memory requirement,the new simulator provides a powerful tool for tackling larger scale andmore complex problems than can be solved by single-CPU codes. Ahigh-resolution simulation example is presented that models buoyantconvection, induced by a small increase in brine density caused bydissolution of CO2.

  19. Xyce Parallel Electronic Simulator : users' guide, version 2.0.

    SciTech Connect

    Hoekstra, Robert John; Waters, Lon J.; Rankin, Eric Lamont; Fixel, Deborah A.; Russo, Thomas V.; Keiter, Eric Richard; Hutchinson, Scott Alan; Pawlowski, Roger Patrick; Wix, Steven D.

    2004-06-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator capable of simulating electrical circuits at a variety of abstraction levels. Primarily, Xyce has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability the current state-of-the-art in the following areas: {sm_bullet} Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers. {sm_bullet} Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. {sm_bullet} Device models which are specifically tailored to meet Sandia's needs, including many radiation-aware devices. {sm_bullet} A client-server or multi-tiered operating model wherein the numerical kernel can operate independently of the graphical user interface (GUI). {sm_bullet} Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing of computing platforms. These include serial, shared-memory and distributed-memory parallel implementation - which allows it to run efficiently on the widest possible number parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. One feature required by designers is the ability to add device models, many specific to the needs of Sandia, to the code. To this end, the device package in the Xyce These input formats include standard analytical models, behavioral models look-up Parallel Electronic Simulator is designed to support a variety of device model inputs. tables, and mesh-level PDE device models. Combined with this flexible interface is an architectural design that greatly simplifies the addition of circuit models. One of the most important feature of Xyce is in providing a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia now has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods) research and development can be performed. Ultimately, these capabilities are migrated to end users.

  20. A hybrid parallel framework for the cellular Potts model simulations

    SciTech Connect

    Jiang, Yi; He, Kejing; Dong, Shoubin

    2009-01-01

    The Cellular Potts Model (CPM) has been widely used for biological simulations. However, most current implementations are either sequential or approximated, which can't be used for large scale complex 3D simulation. In this paper we present a hybrid parallel framework for CPM simulations. The time-consuming POE solving, cell division, and cell reaction operation are distributed to clusters using the Message Passing Interface (MPI). The Monte Carlo lattice update is parallelized on shared-memory SMP system using OpenMP. Because the Monte Carlo lattice update is much faster than the POE solving and SMP systems are more and more common, this hybrid approach achieves good performance and high accuracy at the same time. Based on the parallel Cellular Potts Model, we studied the avascular tumor growth using a multiscale model. The application and performance analysis show that the hybrid parallel framework is quite efficient. The hybrid parallel CPM can be used for the large scale simulation ({approx}10{sup 8} sites) of complex collective behavior of numerous cells ({approx}10{sup 6}).

  1. On the hierarchical parallelization of ab initio simulations

    NASA Astrophysics Data System (ADS)

    Ruiz-Barragan, Sergi; Ishimura, Kazuya; Shiga, Motoyuki

    2016-02-01

    A hierarchical parallelization has been implemented in a new unified code PIMD-SMASH for ab initio simulation where the replicas and the Born-Oppenheimer forces are parallelized. It is demonstrated that ab initio path integral molecular dynamics simulations can be carried out very efficiently for systems up to a few tens of water molecules. The code was then used to study a Diels-Alder reaction of cyclopentadiene and butenone by ab initio string method. A reduction in the reaction energy barrier is found in the presence of hydrogen-bonded water, in accordance with experiment.

  2. Parallel Monte Carlo Simulation for control system design

    NASA Technical Reports Server (NTRS)

    Schubert, Wolfgang M.

    1995-01-01

    The research during the 1993/94 academic year addressed the design of parallel algorithms for stochastic robustness synthesis (SRS). SRS uses Monte Carlo simulation to compute probabilities of system instability and other design-metric violations. The probabilities form a cost function which is used by a genetic algorithm (GA). The GA searches for the stochastic optimal controller. The existing sequential algorithm was analyzed and modified to execute in a distributed environment. For this, parallel approaches to Monte Carlo simulation and genetic algorithms were investigated. Initial empirical results are available for the KSR1.

  3. Parallel runway requirement analysis study. Volume 2: Simulation manual

    NASA Technical Reports Server (NTRS)

    Ebrahimi, Yaghoob S.; Chun, Ken S.

    1993-01-01

    This document is a user manual for operating the PLAND_BLUNDER (PLB) simulation program. This simulation is based on two aircraft approaching parallel runways independently and using parallel Instrument Landing System (ILS) equipment during Instrument Meteorological Conditions (IMC). If an aircraft should deviate from its assigned localizer course toward the opposite runway, this constitutes a blunder which could endanger the aircraft on the adjacent path. The worst case scenario would be if the blundering aircraft were unable to recover and continue toward the adjacent runway. PLAND_BLUNDER is a Monte Carlo-type simulation which employs the events and aircraft positioning during such a blunder situation. The model simulates two aircraft performing parallel ILS approaches using Instrument Flight Rules (IFR) or visual procedures. PLB uses a simple movement model and control law in three dimensions (X, Y, Z). The parameters of the simulation inputs and outputs are defined in this document along with a sample of the statistical analysis. This document is the second volume of a two volume set. Volume 1 is a description of the application of the PLB to the analysis of close parallel runway operations.

  4. Parallelization of a Monte Carlo particle transport simulation code

    NASA Astrophysics Data System (ADS)

    Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

    2010-05-01

    We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.

  5. Efficient parallel CFD-DEM simulations using OpenMP

    NASA Astrophysics Data System (ADS)

    Amritkar, Amit; Deb, Surya; Tafti, Danesh

    2014-01-01

    The paper describes parallelization strategies for the Discrete Element Method (DEM) used for simulating dense particulate systems coupled to Computational Fluid Dynamics (CFD). While the field equations of CFD are best parallelized by spatial domain decomposition techniques, the N-body particulate phase is best parallelized over the number of particles. When the two are coupled together, both modes are needed for efficient parallelization. It is shown that under these requirements, OpenMP thread based parallelization has advantages over MPI processes. Two representative examples, fairly typical of dense fluid-particulate systems are investigated, including the validation of the DEM-CFD and thermal-DEM implementation with experiments. Fluidized bed calculations are performed on beds with uniform particle loading, parallelized with MPI and OpenMP. It is shown that as the number of processing cores and the number of particles increase, the communication overhead of building ghost particle lists at processor boundaries dominates time to solution, and OpenMP which does not require this step is about twice as fast as MPI. In rotary kiln heat transfer calculations, which are characterized by spatially non-uniform particle distributions, the low overhead of switching the parallelization mode in OpenMP eliminates the load imbalances, but introduces increased overheads in fetching non-local data. In spite of this, it is shown that OpenMP is between 50-90% faster than MPI.

  6. Simulating the Immune Response on a Distributed Parallel Computer

    NASA Astrophysics Data System (ADS)

    Castiglione, F.; Bernaschi, M.; Succi, S.

    The application of ideas and methods of statistical mechanics to problems of biological relevance is one of the most promising frontiers of theoretical and computational mathematical physics.1,2 Among others, the computer simulation of the immune system dynamics stands out as one of the prominent candidates for this type of investigations. In the recent years immunological research has been drawing increasing benefits from the resort to advanced mathematical modeling on modern computers.3,4 Among others, Cellular Automata (CA), i.e., fully discrete dynamical systems evolving according to boolean laws, appear to be extremely well suited to computer simulation of biological systems.5 A prominent example of immunological CA is represented by the Celada-Seiden automaton, that has proven capable of providing several new insights into the dynamics of the immune system response. To date, the Celada-Seiden automaton was not in a position to exploit the impressive advances of computer technology, and notably parallel processing, simply because no parallel version of this automaton had been developed yet. In this paper we fill this gap and describe a parallel version of the Celada-Seiden cellular automaton aimed at simulating the dynamic response of the immune system. Details on the parallel implementation as well as performance data on the IBM SP2 parallel platform are presented and commented on.

  7. Xyce parallel electronic simulator reference guide, version 6.1

    SciTech Connect

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory

    2014-03-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide [1] .

  8. Xyce parallel electronic simulator reference guide, version 6.0.

    SciTech Connect

    Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David G.

    2013-08-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1].

  9. Xyce Parallel Electronic Simulator : reference guide, version 4.1.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2009-02-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users Guide. The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users Guide.

  10. Max-plus Algebraic Tools for Discrete Event Systems, Static Analysis, and Zero-Sum Games

    NASA Astrophysics Data System (ADS)

    Gaubert, Stphane

    The max-plus algebraic approach of timed discrete event systems emerged in the eighties, after the discovery that synchronization phenomena can be modeled in a linear way in the max-plus setting. This led to a number of results, like the determination of long term characteristics (throughput, stationary regime) by spectral theory methods or the representation of the input-output behavior by rational series.

  11. A network of discrete events for the representation and analysis of diffusion dynamics.

    PubMed

    Pintus, Alberto M; Pazzona, Federico G; Demontis, Pierfranco; Suffritti, Giuseppe B

    2015-11-14

    We developed a coarse-grained description of the phenomenology of diffusive processes, in terms of a space of discrete events and its representation as a network. Once a proper classification of the discrete events underlying the diffusive process is carried out, their transition matrix is calculated on the basis of molecular dynamics data. This matrix can be represented as a directed, weighted network where nodes represent discrete events, and the weight of edges is given by the probability that one follows the other. The structure of this network reflects dynamical properties of the process of interest in such features as its modularity and the entropy rate of nodes. As an example of the applicability of this conceptual framework, we discuss here the physics of diffusion of small non-polar molecules in a microporous material, in terms of the structure of the corresponding network of events, and explain on this basis the diffusivity trends observed. A quantitative account of these trends is obtained by considering the contribution of the various events to the displacement autocorrelation function. PMID:26567654

  12. Fault Diagnosis in Discrete-Event Systems with Incomplete Models: Learnability and Diagnosability.

    PubMed

    Kwong, Raymond H; Yonge-Mallo, David L

    2015-07-01

    Most model-based approaches to fault diagnosis of discrete-event systems require a complete and accurate model of the system to be diagnosed. However, the discrete-event model may have arisen from abstraction and simplification of a continuous time system, or through model building from input-output data. As such, it may not capture the dynamic behavior of the system completely. In a previous paper, we addressed the problem of diagnosing faults given an incomplete model of the discrete-event system. We presented the learning diagnoser which not only diagnoses faults, but also attempts to learn missing model information through parsimonious hypothesis generation. In this paper, we study the properties of learnability and diagnosability. Learnability deals with the issue of whether the missing model information can be learned, while diagnosability corresponds to the ability to detect and isolate a fault after it has occurred. We provide conditions under which the learning diagnoser can learn missing model information. We define the notions of weak and strong diagnosability and also give conditions under which they hold. PMID:25204002

  13. The parallel subdomain-levelset deflation method in reservoir simulation

    NASA Astrophysics Data System (ADS)

    van der Linden, J. H.; Jnsthvel, T. B.; Lukyanov, A. A.; Vuik, C.

    2016-01-01

    Extreme and isolated eigenvalues are known to be harmful to the convergence of an iterative solver. These eigenvalues can be produced by strong heterogeneity in the underlying physics. We can improve the quality of the spectrum by 'deflating' the harmful eigenvalues. In this work, deflation is applied to linear systems in reservoir simulation. In particular, large, sudden differences in the permeability produce extreme eigenvalues. The number and magnitude of these eigenvalues is linked to the number and magnitude of the permeability jumps. Two deflation methods are discussed. Firstly, we state that harmonic Ritz eigenvector deflation, which computes the deflation vectors from the information produced by the linear solver, is unfeasible in modern reservoir simulation due to high costs and lack of parallelism. Secondly, we test a physics-based subdomain-levelset deflation algorithm that constructs the deflation vectors a priori. Numerical experiments show that both methods can improve the performance of the linear solver. We highlight the fact that subdomain-levelset deflation is particularly suitable for a parallel implementation. For cases with well-defined permeability jumps of a factor 104 or higher, parallel physics-based deflation has potential in commercial applications. In particular, the good scalability of parallel subdomain-levelset deflation combined with the robust parallel preconditioner for deflated system suggests the use of this method as an alternative for AMG.

  14. Random number generators for massively parallel simulations on GPU

    NASA Astrophysics Data System (ADS)

    Manssen, M.; Weigel, M.; Hartmann, A. K.

    2012-08-01

    High-performance streams of (pseudo) random numbers are crucial for the efficient implementation of countless stochastic algorithms, most importantly, Monte Carlo simulations and molecular dynamics simulations with stochastic thermostats. A number of implementations of random number generators has been discussed for GPU platforms before and some generators are even included in the CUDA supporting libraries. Nevertheless, not all of these generators are well suited for highly parallel applications where each thread requires its own generator instance. For this specific situation encountered, for instance, in simulations of lattice models, most of the high-quality generators with large states such as Mersenne twister cannot be used efficiently without substantial changes. We provide a broad review of existing CUDA variants of random-number generators and present the CUDA implementation of a new massively parallel high-quality, high-performance generator with a small memory load overhead.

  15. Parallelization of Program to Optimize Simulated Trajectories (POST3D)

    NASA Technical Reports Server (NTRS)

    Hammond, Dana P.; Korte, John J. (Technical Monitor)

    2001-01-01

    This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.

  16. Reusable component model development approach for parallel and distributed simulation.

    PubMed

    Zhu, Feng; Yao, Yiping; Chen, Huilong; Yao, Feng

    2014-01-01

    Model reuse is a key issue to be resolved in parallel and distributed simulation at present. However, component models built by different domain experts usually have diversiform interfaces, couple tightly, and bind with simulation platforms closely. As a result, they are difficult to be reused across different simulation platforms and applications. To address the problem, this paper first proposed a reusable component model framework. Based on this framework, then our reusable model development approach is elaborated, which contains two phases: (1) domain experts create simulation computational modules observing three principles to achieve their independence; (2) model developer encapsulates these simulation computational modules with six standard service interfaces to improve their reusability. The case study of a radar model indicates that the model developed using our approach has good reusability and it is easy to be used in different simulation platforms and applications. PMID:24729751

  17. Reusable Component Model Development Approach for Parallel and Distributed Simulation

    PubMed Central

    Zhu, Feng; Yao, Yiping; Chen, Huilong; Yao, Feng

    2014-01-01

    Model reuse is a key issue to be resolved in parallel and distributed simulation at present. However, component models built by different domain experts usually have diversiform interfaces, couple tightly, and bind with simulation platforms closely. As a result, they are difficult to be reused across different simulation platforms and applications. To address the problem, this paper first proposed a reusable component model framework. Based on this framework, then our reusable model development approach is elaborated, which contains two phases: (1) domain experts create simulation computational modules observing three principles to achieve their independence; (2) model developer encapsulates these simulation computational modules with six standard service interfaces to improve their reusability. The case study of a radar model indicates that the model developed using our approach has good reusability and it is easy to be used in different simulation platforms and applications. PMID:24729751

  18. Potts-model grain growth simulations: Parallel algorithms and applications

    SciTech Connect

    Wright, S.A.; Plimpton, S.J.; Swiler, T.P.

    1997-08-01

    Microstructural morphology and grain boundary properties often control the service properties of engineered materials. This report uses the Potts-model to simulate the development of microstructures in realistic materials. Three areas of microstructural morphology simulations were studied. They include the development of massively parallel algorithms for Potts-model grain grow simulations, modeling of mass transport via diffusion in these simulated microstructures, and the development of a gradient-dependent Hamiltonian to simulate columnar grain growth. Potts grain growth models for massively parallel supercomputers were developed for the conventional Potts-model in both two and three dimensions. Simulations using these parallel codes showed self similar grain growth and no finite size effects for previously unapproachable large scale problems. In addition, new enhancements to the conventional Metropolis algorithm used in the Potts-model were developed to accelerate the calculations. These techniques enable both the sequential and parallel algorithms to run faster and use essentially an infinite number of grain orientation values to avoid non-physical grain coalescence events. Mass transport phenomena in polycrystalline materials were studied in two dimensions using numerical diffusion techniques on microstructures generated using the Potts-model. The results of the mass transport modeling showed excellent quantitative agreement with one dimensional diffusion problems, however the results also suggest that transient multi-dimension diffusion effects cannot be parameterized as the product of the grain boundary diffusion coefficient and the grain boundary width. Instead, both properties are required. Gradient-dependent grain growth mechanisms were included in the Potts-model by adding an extra term to the Hamiltonian. Under normal grain growth, the primary driving term is the curvature of the grain boundary, which is included in the standard Potts-model Hamiltonian.

  19. Xyce Parallel Electronic Simulator Users Guide Version 6.2.

    SciTech Connect

    Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory

    2014-09-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2014 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce%40sandia.gov (outside Sandia) xyce-sandia%40sandia.gov (Sandia only)

  20. Flow simulations by parallel computer MiPAX

    SciTech Connect

    Hara, H.; Kodera, Y.; Kanehiro )

    1988-01-01

    The authors have developed some parallel computer programs to show that the parallel computer MiPAX is suitable to deal with fluid flow simulations. MiPAX is the first commercially available parallel computer for scientific applications in Japan. They describe two typical methods for incompressible viscous flow problems: the MAC method and the third-order upwind scheme in the general curvilinear coordinate system. The techniques of mapping a physical space onto a processing unit array and the procedure of data transfer are also presented for MiPAX-32JFV, which consists of 32 processing units. They conclude that a program for MiPAX is basically identical with one for conventional machines.

  1. Numerical simulation of supersonic wake flow with parallel computers

    SciTech Connect

    Wong, C.C.; Soetrisno, M.

    1995-07-01

    Simulating a supersonic wake flow field behind a conical body is a computing intensive task. It requires a large number of computational cells to capture the dominant flow physics and a robust numerical algorithm to obtain a reliable solution. High performance parallel computers with unique distributed processing and data storage capability can provide this need. They have larger computational memory and faster computing time than conventional vector computers. We apply the PINCA Navier-Stokes code to simulate a wind-tunnel supersonic wake experiment on Intel Gamma, Intel Paragon, and IBM SP2 parallel computers. These simulations are performed to study the mean flow in the near wake region of a sharp, 7-degree half-angle, adiabatic cone at Mach number 4.3 and freestream Reynolds number of 40,600. Overall the numerical solutions capture the general features of the hypersonic laminar wake flow and compare favorably with the wind tunnel data. With a refined and clustering grid distribution in the recirculation zone, the calculated location of the rear stagnation point is consistent with the 2D axisymmetric and 3D experiments. In this study, we also demonstrate the importance of having a large local memory capacity within a computer node and the effective utilization of the number of computer nodes to achieve good parallel performance when simulating a complex, large-scale wake flow problem.

  2. Casting pearls ballistically: Efficient massively parallel simulation of particle deposition

    SciTech Connect

    Lubachevsky, B.D.; Privman, V.; Roy, S.C.

    1996-06-01

    We simulate ballistic particle deposition wherein a large number of spherical particles are {open_quotes}cast{close_quotes} vertically over a planar horizontal surface. Upon first contact (with the surface or with a previously deposited particle) each particle stops. This model helps material scientists to study the adsorption and sediment formation. The model is sequential, with particles deposited one by one. We have found an equivalent formulation using a continuous time random process and we simulate the latter in parallel using a method similar to the one previously employed for simulating Ising spins. We augment the parallel algorithm for simulating Ising spins with several techniques aimed at the increase of efficiency of producing the particle configuration and statistics collection. Some of these techniques are similar to earlier ones. We implement the resulting algorithm on a 16K PE MasPar MP-1 and a 4K PE MasPar MP-2. The parallel code runs on MasPar computers nearly two orders of magnitude faster than an optimized sequential code runs on a fast workstation. 17 refs., 9 figs.

  3. Modularized Parallel Neutron Instrument Simulation on the TeraGrid

    SciTech Connect

    Chen, Meili; Cobb, John W; Hagen, Mark E; Miller, Stephen D; Lynch, Vickie E

    2007-01-01

    In order to build a bridge between the TeraGrid (TG), a national scale cyberinfrastructure resource, and neutron science, the Neutron Science TeraGrid Gateway (NSTG) is focused on introducing productive HPC usage to the neutron science community, primarily the Spallation Neutron Source (SNS) at Oak Ridge National Laboratory (ORNL). Monte Carlo simulations are used as a powerful tool for instrument design and optimization at SNS. One of the successful efforts of a collaboration team composed of NSTG HPC experts and SNS instrument scientists is the development of a software facility named PSoNI, Parallelizing Simulations of Neutron Instruments. Parallelizing the traditional serial instrument simulation on TeraGrid resources, PSoNI quickly computes full instrument simulation at sufficient statistical levels in instrument de-sign. Upon SNS successful commissioning, to the end of 2007, three out of five commissioned instruments in SNS target station will be available for initial users. Advanced instrument study, proposal feasibility evalua-tion, and experiment planning are on the immediate schedule of SNS, which pose further requirements such as flexibility and high runtime efficiency on fast instrument simulation. PSoNI has been redesigned to meet the new challenges and a preliminary version is developed on TeraGrid. This paper explores the motivation and goals of the new design, and the improved software structure. Further, it describes the realized new fea-tures seen from MPI parallelized McStas running high resolution design simulations of the SEQUOIA and BSS instruments at SNS. A discussion regarding future work, which is targeted to do fast simulation for automated experiment adjustment and comparing models to data in analysis, is also presented.

  4. Scalability study of parallel spatial direct numerical simulation code on IBM SP1 parallel supercomputer

    NASA Technical Reports Server (NTRS)

    Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad

    1994-01-01

    The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.

  5. Adaptive domain decomposition for Monte Carlo simulations on parallel processors

    NASA Technical Reports Server (NTRS)

    Wilmoth, Richard G.

    1990-01-01

    A method is described for performing direct simulation Monte Carlo (DSMC) calculations on parallel processors using adaptive domain decomposition to distribute the computational work load. The method has been implemented on a commercially available hypercube and benchmark results are presented which show the performance of the method relative to current supercomputers. The problems studied were simulations of equilibrium conditions in a closed, stationary box, a two-dimensional vortex flow, and the hypersonic, rarefield flow in a two-dimensional channel. For these problems, the parallel DSMC method ran 5 to 13 times faster than on a single processor of a Cray-2. The adaptive decomposition method worked well in uniformly distributing the computational work over an arbitrary number of processors and reduced the average computational time by over a factor of two in certain cases.

  6. Adaptive domain decomposition for Monte Carlo simulations on parallel processors

    NASA Technical Reports Server (NTRS)

    Wilmoth, Richard G.

    1991-01-01

    A method is described for performing direct simulation Monte Carlo (DSMC) calculations on parallel processors using adaptive domain decomposition to distribute the computational work load. The method has been implemented on a commercially available hypercube and benchmark results are presented which show the performance of the method relative to current supercomputers. The problems studied were simulations of equilibrium conditions in a closed, stationary box, a two-dimensional vortex flow, and the hypersonic, rarefied flow in a two-dimensional channel. For these problems, the parallel DSMC method ran 5 to 13 times faster than on a single processor of a Cray-2. The adaptive decomposition method worked well in uniformly distributing the computational work over an arbitrary number of processors and reduced the average computational time by over a factor of two in certain cases.

  7. Parallel algorithms for simulating continuous time Markov chains

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Heidelberger, Philip

    1992-01-01

    We have previously shown that the mathematical technique of uniformization can serve as the basis of synchronization for the parallel simulation of continuous-time Markov chains. This paper reviews the basic method and compares five different methods based on uniformization, evaluating their strengths and weaknesses as a function of problem characteristics. The methods vary in their use of optimism, logical aggregation, communication management, and adaptivity. Performance evaluation is conducted on the Intel Touchstone Delta multiprocessor, using up to 256 processors.

  8. Time parallelization of advanced operation scenario simulations of ITER plasma

    SciTech Connect

    Samaddar, D.; Casper, T. A.; Kim, S. H.; Berry, Lee A; Elwasif, Wael R; Batchelor, Donald B; Houlberg, Wayne A

    2013-01-01

    This work demonstrates that simulations of advanced burning plasma operation scenarios can be successfully parallelized in time using the parareal algorithm. CORSICA - an advanced operation scenario code for tokamak plasmas is used as a test case. This is a unique application since the parareal algorithm has so far been applied to relatively much simpler systems except for the case of turbulence. In the present application, a computational gain of an order of magnitude has been achieved which is extremely promising. A successful implementation of the Parareal algorithm to codes like CORSICA ushers in the possibility of time efficient simulations of ITER plasmas.

  9. Supervisor Localization: A Top-Down Approach to Distributed Control of Discrete-Event Systems

    SciTech Connect

    Cai, K.; Wonham, W. M.

    2009-03-05

    A purely distributed control paradigm is proposed for discrete-event systems (DES). In contrast to control by one or more external supervisors, distributed control aims to design built-in strategies for individual agents. First a distributed optimal nonblocking control problem is formulated. To solve it, a top-down localization procedure is developed which systematically decomposes an external supervisor into local controllers while preserving optimality and nonblockingness. An efficient localization algorithm is provided to carry out the computation, and an automated guided vehicles (AGV) example presented for illustration. Finally, the 'easiest' and 'hardest' boundary cases of localization are discussed.

  10. Sequential Window Diagnoser for Discrete-Event Systems Under Unreliable Observations

    SciTech Connect

    Wen-Chiao Lin; Humberto E. Garcia; David Thorsley; Tae-Sic Yoo

    2009-09-01

    This paper addresses the issue of counting the occurrence of special events in the framework of partiallyobserved discrete-event dynamical systems (DEDS). Developed diagnosers referred to as sequential window diagnosers (SWDs) utilize the stochastic diagnoser probability transition matrices developed in [9] along with a resetting mechanism that allows on-line monitoring of special event occurrences. To illustrate their performance, the SWDs are applied to detect and count the occurrence of special events in a particular DEDS. Results show that SWDs are able to accurately track the number of times special events occur.

  11. Determining the significance of associations between two series of discrete events : bootstrap methods /

    SciTech Connect

    Niehof, Jonathan T.; Morley, Steven K.

    2012-01-01

    We review and develop techniques to determine associations between series of discrete events. The bootstrap, a nonparametric statistical method, allows the determination of the significance of associations with minimal assumptions about the underlying processes. We find the key requirement for this method: one of the series must be widely spaced in time to guarantee the theoretical applicability of the bootstrap. If this condition is met, the calculated significance passes a reasonableness test. We conclude with some potential future extensions and caveats on the applicability of these methods. The techniques presented have been implemented in a Python-based software toolkit.

  12. Parallel hyperbolic PDE simulation on clusters: Cell versus GPU

    NASA Astrophysics Data System (ADS)

    Rostrup, Scott; De Sterck, Hans

    2010-12-01

    Increasingly, high-performance computing is looking towards data-parallel computational devices to enhance computational performance. Two technologies that have received significant attention are IBM's Cell Processor and NVIDIA's CUDA programming model for graphics processing unit (GPU) computing. In this paper we investigate the acceleration of parallel hyperbolic partial differential equation simulation on structured grids with explicit time integration on clusters with Cell and GPU backends. The message passing interface (MPI) is used for communication between nodes at the coarsest level of parallelism. Optimizations of the simulation code at the several finer levels of parallelism that the data-parallel devices provide are described in terms of data layout, data flow and data-parallel instructions. Optimized Cell and GPU performance are compared with reference code performance on a single x86 central processing unit (CPU) core in single and double precision. We further compare the CPU, Cell and GPU platforms on a chip-to-chip basis, and compare performance on single cluster nodes with two CPUs, two Cell processors or two GPUs in a shared memory configuration (without MPI). We finally compare performance on clusters with 32 CPUs, 32 Cell processors, and 32 GPUs using MPI. Our GPU cluster results use NVIDIA Tesla GPUs with GT200 architecture, but some preliminary results on recently introduced NVIDIA GPUs with the next-generation Fermi architecture are also included. This paper provides computational scientists and engineers who are considering porting their codes to accelerator environments with insight into how structured grid based explicit algorithms can be optimized for clusters with Cell and GPU accelerators. It also provides insight into the speed-up that may be gained on current and future accelerator architectures for this class of applications. Program summaryProgram title: SWsolver Catalogue identifier: AEGY_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEGY_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GPL v3 No. of lines in distributed program, including test data, etc.: 59 168 No. of bytes in distributed program, including test data, etc.: 453 409 Distribution format: tar.gz Programming language: C, CUDA Computer: Parallel Computing Clusters. Individual compute nodes may consist of x86 CPU, Cell processor, or x86 CPU with attached NVIDIA GPU accelerator. Operating system: Linux Has the code been vectorised or parallelized?: Yes. Tested on 1-128 x86 CPU cores, 1-32 Cell Processors, and 1-32 NVIDIA GPUs. RAM: Tested on Problems requiring up to 4 GB per compute node. Classification: 12 External routines: MPI, CUDA, IBM Cell SDK Nature of problem: MPI-parallel simulation of Shallow Water equations using high-resolution 2D hyperbolic equation solver on regular Cartesian grids for x86 CPU, Cell Processor, and NVIDIA GPU using CUDA. Solution method: SWsolver provides 3 implementations of a high-resolution 2D Shallow Water equation solver on regular Cartesian grids, for CPU, Cell Processor, and NVIDIA GPU. Each implementation uses MPI to divide work across a parallel computing cluster. Additional comments: Sub-program numdiff is used for the test run.

  13. Xyce Parallel Electronic Simulator - Users' Guide Version 2.1.

    SciTech Connect

    Hutchinson, Scott A; Hoekstra, Robert J.; Russo, Thomas V.; Rankin, Eric; Pawlowski, Roger P.; Fixel, Deborah A; Schiek, Richard; Bogdan, Carolyn W.; Shirley, David N.; Campbell, Phillip M.; Keiter, Eric R.

    2005-06-01

    This manual describes the use of theXyceParallel Electronic Simulator.Xycehasbeen designed as a SPICE-compatible, high-performance analog circuit simulator, andhas been written to support the simulation needs of the Sandia National Laboratorieselectrical designers. This development has focused on improving capability over thecurrent state-of-the-art in the following areas:%04Capability to solve extremely large circuit problems by supporting large-scale par-allel computing platforms (up to thousands of processors). Note that this includessupport for most popular parallel and serial computers.%04Improved performance for all numerical kernels (e.g., time integrator, nonlinearand linear solvers) through state-of-the-art algorithms and novel techniques.%04Device models which are specifically tailored to meet Sandia's needs, includingmany radiation-aware devices.3 XyceTMUsers' Guide%04Object-oriented code design and implementation using modern coding practicesthat ensure that theXyceParallel Electronic Simulator will be maintainable andextensible far into the future.Xyceis a parallel code in the most general sense of the phrase - a message passingparallel implementation - which allows it to run efficiently on the widest possible numberof computing platforms. These include serial, shared-memory and distributed-memoryparallel as well as heterogeneous platforms. Careful attention has been paid to thespecific nature of circuit-simulation problems to ensure that optimal parallel efficiencyis achieved as the number of processors grows.The development ofXyceprovides a platform for computational research and de-velopment aimed specifically at the needs of the Laboratory. WithXyce, Sandia hasan %22in-house%22 capability with which both new electrical (e.g., device model develop-ment) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms)research and development can be performed. As a result,Xyceis a unique electricalsimulation capability, designed to meet the unique needs of the laboratory.4 XyceTMUsers' GuideAcknowledgementsThe authors would like to acknowledge the entire Sandia National Laboratories HPEMS(High Performance Electrical Modeling and Simulation) team, including Steve Wix, CarolynBogdan, Regina Schells, Ken Marx, Steve Brandon and Bill Ballard, for their support onthis project. We also appreciate very much the work of Jim Emery, Becky Arnold and MikeWilliamson for the help in reviewing this document.Lastly, a very special thanks to Hue Lai for typesetting this document with LATEX.TrademarksThe information herein is subject to change without notice.Copyrightc 2002-2003 Sandia Corporation. All rights reserved.XyceTMElectronic Simulator andXyceTMtrademarks of Sandia Corporation.Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence DesignSystems, Inc.Silicon Graphics, the Silicon Graphics logo and IRIX are registered trademarks of SiliconGraphics, Inc.Microsoft, Windows and Windows 2000 are registered trademark of Microsoft Corporation.Solaris and UltraSPARC are registered trademarks of Sun Microsystems Corporation.Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation.HP and Alpha are registered trademarks of Hewlett-Packard company.Amtec and TecPlot are trademarks of Amtec Engineering, Inc.Xyce's expression library is based on that inside Spice 3F5 developed by the EECS De-partment at the University of California.All other trademarks are property of their respective owners.ContactsBug Reportshttp://tvrusso.sandia.gov/bugzillaEmailxyce-support%40sandia.govWorld Wide Webhttp://www.cs.sandia.gov/xyce5 XyceTMUsers' GuideThis page is left intentionally blank6

  14. Parallel conjugate gradient algorithms for manipulator dynamic simulation

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Scheld, Robert E.

    1989-01-01

    Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).

  15. A massively parallel cellular automaton for the simulation of recrystallization

    NASA Astrophysics Data System (ADS)

    Khbach, M.; Barrales-Mora, L. A.; Gottstein, G.

    2014-10-01

    A new implementation of a cellular automaton for the simulation of primary recrystallization in 3D space is presented. In this new approach, a parallel computer architecture is utilized to partition the simulation domain into multiple computational subdomains that can be treated as coupled, gradually coupled or decoupled entities. This enabled us to identify the characteristic growth length associated with the space repartitioning during nucleus growth. In doing so, several communication strategies between the simulation domains were implemented and tested for accuracy and parallel performance. Specifically, the model was applied to investigate the effect of a gradual spatial decoupling on microstructure evolution during oriented growth of random texture components into a deformed Al single crystal. For a domain discretized into one billion cells, it was found that a particular decoupling strategy resulted in faster executions of about two orders of magnitude and highly accurate simulations. Further partition of the domain into isolated entities systematically and negatively impacts microstructure evolution. We investigated this effect quantitatively by geometrical considerations.

  16. Particle simulation of plasmas on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Gledhill, I. M. A.; Storey, L. R. O.

    1987-01-01

    Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.

  17. Long-range interactions and parallel scalability in molecular simulations

    NASA Astrophysics Data System (ADS)

    Patra, Michael; Hyvönen, Marja T.; Falck, Emma; Sabouri-Ghomi, Mohsen; Vattulainen, Ilpo; Karttunen, Mikko

    2007-01-01

    Typical biomolecular systems such as cellular membranes, DNA, and protein complexes are highly charged. Thus, efficient and accurate treatment of electrostatic interactions is of great importance in computational modeling of such systems. We have employed the GROMACS simulation package to perform extensive benchmarking of different commonly used electrostatic schemes on a range of computer architectures (Pentium-4, IBM Power 4, and Apple/IBM G5) for single processor and parallel performance up to 8 nodes—we have also tested the scalability on four different networks, namely Infiniband, GigaBit Ethernet, Fast Ethernet, and nearly uniform memory architecture, i.e. communication between CPUs is possible by directly reading from or writing to other CPUs' local memory. It turns out that the particle-mesh Ewald method (PME) performs surprisingly well and offers competitive performance unless parallel runs on PC hardware with older network infrastructure are needed. Lipid bilayers of sizes 128, 512 and 2048 lipid molecules were used as the test systems representing typical cases encountered in biomolecular simulations. Our results enable an accurate prediction of computational speed on most current computing systems, both for serial and parallel runs. These results should be helpful in, for example, choosing the most suitable configuration for a small departmental computer cluster.

  18. Mapping a battlefield simulation onto message-passing parallel architectures

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1987-01-01

    Perhaps the most critical problem in distributed simulation is that of mapping: without an effective mapping of workload to processors the speedup potential of parallel processing cannot be realized. Mapping a simulation onto a message-passing architecture is especially difficult when the computational workload dynamically changes as a function of time and space; this is exactly the situation faced by battlefield simulations. This paper studies an approach where the simulated battlefield domain is first partitioned into many regions of equal size; typically there are more regions than processors. The regions are then assigned to processors; a processor is responsible for performing all simulation activity associated with the regions. The assignment algorithm is quite simple and attempts to balance load by exploiting locality of workload intensity. The performance of this technique is studied on a simple battlefield simulation implemented on the Flex/32 multiprocessor. Measurements show that the proposed method achieves reasonable processor efficiencies. Furthermore, the method shows promise for use in dynamic remapping of the simulation.

  19. A fast ultrasonic simulation tool based on massively parallel implementations

    NASA Astrophysics Data System (ADS)

    Lambert, Jason; Rougeron, Gilles; Lacassagne, Lionel; Chatillon, Sylvain

    2014-02-01

    This paper presents a CIVA optimized ultrasonic inspection simulation tool, which takes benefit of the power of massively parallel architectures: graphical processing units (GPU) and multi-core general purpose processors (GPP). This tool is based on the classical approach used in CIVA: the interaction model is based on Kirchoff, and the ultrasonic field around the defect is computed by the pencil method. The model has been adapted and parallelized for both architectures. At this stage, the configurations addressed by the tool are : multi and mono-element probes, planar specimens made of simple isotropic materials, planar rectangular defects or side drilled holes of small diameter. Validations on the model accuracy and performances measurements are presented.

  20. Conservative parallel simulation of priority class queueing networks

    NASA Technical Reports Server (NTRS)

    Nicol, David M.

    1990-01-01

    A conservative synchronization protocol is described for the parallel simulation of queueing networks having C job priority classes, where a job's class is fixed. This problem has long vexed designers of conservative synchronization protocols because of its seemingly poor ability to compute lookahead: the time of the next departure. For, a job in service having low priority can be preempted at any time by an arrival having higher priority and an arbitrarily small service time. The solution is to skew the event generation activity so that the events for higher priority jobs are generated farther ahead in simulated time than lower priority jobs. Thus, when a lower priority job enters service for the first time, all the higher priority jobs that may preempt it are already known and the job's departure time can be exactly predicted. Finally, the protocol was analyzed and it was demonstrated that good performance can be expected on the simulation of large queueing networks.

  1. Conservative parallel simulation of priority class queueing networks

    NASA Technical Reports Server (NTRS)

    Nicol, David

    1992-01-01

    A conservative synchronization protocol is described for the parallel simulation of queueing networks having C job priority classes, where a job's class is fixed. This problem has long vexed designers of conservative synchronization protocols because of its seemingly poor ability to compute lookahead: the time of the next departure. For, a job in service having low priority can be preempted at any time by an arrival having higher priority and an arbitrarily small service time. The solution is to skew the event generation activity so that the events for higher priority jobs are generated farther ahead in simulated time than lower priority jobs. Thus, when a lower priority job enters service for the first time, all the higher priority jobs that may preempt it are already known and the job's departure time can be exactly predicted. Finally, the protocol was analyzed and it was demonstrated that good performance can be expected on the simulation of large queueing networks.

  2. Xyce Parallel Electronic Simulator Users Guide Version 6.4

    SciTech Connect

    Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory

    2015-12-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been de- signed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel com- puting platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiation- aware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase -- a message passing parallel implementation -- which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only)

  3. MRISIMUL: a GPU-based parallel approach to MRI simulations.

    PubMed

    Xanthis, Christos G; Venetis, Ioannis E; Chalkias, A V; Aletras, Anthony H

    2014-03-01

    A new step-by-step comprehensive MR physics simulator (MRISIMUL) of the Bloch equations is presented. The aim was to develop a magnetic resonance imaging (MRI) simulator that makes no assumptions with respect to the underlying pulse sequence and also allows for complex large-scale analysis on a single computer without requiring simplifications of the MRI model. We hypothesized that such a simulation platform could be developed with parallel acceleration of the executable core within the graphic processing unit (GPU) environment. MRISIMUL integrates realistic aspects of the MRI experiment from signal generation to image formation and solves the entire complex problem for densely spaced isochromats and for a densely spaced time axis. The simulation platform was developed in MATLAB whereas the computationally demanding core services were developed in CUDA-C. The MRISIMUL simulator imaged three different computer models: a user-defined phantom, a human brain model and a human heart model. The high computational power of GPU-based simulations was compared against other computer configurations. A speedup of about 228 times was achieved when compared to serially executed C-code on the CPU whereas a speedup between 31 to 115 times was achieved when compared to the OpenMP parallel executed C-code on the CPU, depending on the number of threads used in multithreading (2-8 threads). The high performance of MRISIMUL allows its application in large-scale analysis and can bring the computational power of a supercomputer or a large computer cluster to a single GPU personal computer. PMID:24595337

  4. Development of magnetron sputtering simulator with GPU parallel computing

    NASA Astrophysics Data System (ADS)

    Sohn, Ilyoup; Kim, Jihun; Bae, Junkyeong; Lee, Jinpil

    2014-12-01

    Sputtering devices are widely used in the semiconductor and display panel manufacturing process. Currently, a number of surface treatment applications using magnetron sputtering techniques are being used to improve the efficiency of the sputtering process, through the installation of magnets outside the vacuum chamber. Within the internal space of the low pressure chamber, plasma generated from the combination of a rarefied gas and an electric field is influenced interactively. Since the quality of the sputtering and deposition rate on the substrate is strongly dependent on the multi-physical phenomena of the plasma regime, numerical simulations using PIC-MCC (Particle In Cell, Monte Carlo Collision) should be employed to develop an efficient sputtering device. In this paper, the development of a magnetron sputtering simulator based on the PIC-MCC method and the associated numerical techniques are discussed. To solve the electric field equations in the 2-D Cartesian domain, a Poisson equation solver based on the FDM (Finite Differencing Method) is developed and coupled with the Monte Carlo Collision method to simulate the motion of gas particles influenced by an electric field. The magnetic field created from the permanent magnet installed outside the vacuum chamber is also numerically calculated using Biot-Savart's Law. All numerical methods employed in the present PIC code are validated by comparison with analytical and well-known commercial engineering software results, with all of the results showing good agreement. Finally, the developed PIC-MCC code is parallelized to be suitable for general purpose computing on graphics processing unit (GPGPU) acceleration, so as to reduce the large computation time which is generally required for particle simulations. The efficiency and accuracy of the GPGPU parallelized magnetron sputtering simulator are examined by comparison with the calculated results and computation times from the original serial code. It is found that initially both simulations are in good agreement; however, differences develop over time due to statistical noise in the PIC-MCC GPGPU model.

  5. CHOLLA: A New Massively Parallel Hydrodynamics Code for Astrophysical Simulation

    NASA Astrophysics Data System (ADS)

    Schneider, Evan E.; Robertson, Brant E.

    2015-04-01

    We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (?2563) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.

  6. Parallel grid library for rapid and flexible simulation development

    NASA Astrophysics Data System (ADS)

    Honkonen, Ilja; von Alfthan, Sebastian; Sandroos, Arto; Janhunen, Pekka; Palmroth, Minna

    2013-04-01

    As the single CPU core performance is saturating while the number of cores in the fastest supercomputers increases exponentially, the parallel performance of simulations on distributed memory machines is crucial. At the same time, utilizing efficiently the large number of available cores presents a challenge, especially in simulations with run-time adaptive mesh refinement which can be the key to high performance. We have developed a generic grid library (dccrg) that is easy to use and scales well up to tens of thousands of cores. The grid has several attractive features: It 1) allows an arbitrary C++ class or structure to be used as cell data; 2) is easy to use and provides a simple interface for run-time adaptive mesh refinement ; 3) transfers the data of neighboring cells between processes transparently and asynchronously; and 4) provides a simple interface to run-time load balancing, e.g. domain decomposition, through the Zoltan library. Dccrg is freely available from https://gitorious.org/dccrg for anyone to use, study and modify under the GNU Lesser General Public License version 3. We present an overview of the implementation of dccrg, its parallel scalability and several source code examples of its usage in different types of simulations.

  7. Numerical Simulation of Flow Field Within Parallel Plate Plastometer

    NASA Technical Reports Server (NTRS)

    Antar, Basil N.

    2002-01-01

    Parallel Plate Plastometer (PPP) is a device commonly used for measuring the viscosity of high polymers at low rates of shear in the range 10(exp 4) to 10(exp 9) poises. This device is being validated for use in measuring the viscosity of liquid glasses at high temperatures having similar ranges for the viscosity values. PPP instrument consists of two similar parallel plates, both in the range of 1 inch in diameter with the upper plate being movable while the lower one is kept stationary. Load is applied to the upper plate by means of a beam connected to shaft attached to the upper plate. The viscosity of the fluid is deduced from measuring the variation of the plate separation, h, as a function of time when a specified fixed load is applied on the beam. Operating plate speeds measured with the PPP is usually in the range of 10.3 cm/s or lower. The flow field within the PPP can be simulated using the equations of motion of fluid flow for this configuration. With flow speeds in the range quoted above the flow field between the two plates is certainly incompressible and laminar. Such flows can be easily simulated using numerical modeling with computational fluid dynamics (CFD) codes. We present below the mathematical model used to simulate this flow field and also the solutions obtained for the flow using a commercially available finite element CFD code.

  8. High Performance Parallel Methods for Space Weather Simulations

    NASA Technical Reports Server (NTRS)

    Hunter, Paul (Technical Monitor); Gombosi, Tamas I.

    2003-01-01

    This is the final report of our NASA AISRP grant entitled 'High Performance Parallel Methods for Space Weather Simulations'. The main thrust of the proposal was to achieve significant progress towards new high-performance methods which would greatly accelerate global MHD simulations and eventually make it possible to develop first-principles based space weather simulations which run much faster than real time. We are pleased to report that with the help of this award we made major progress in this direction and developed the first parallel implicit global MHD code with adaptive mesh refinement. The main limitation of all earlier global space physics MHD codes was the explicit time stepping algorithm. Explicit time steps are limited by the Courant-Friedrichs-Lewy (CFL) condition, which essentially ensures that no information travels more than a cell size during a time step. This condition represents a non-linear penalty for highly resolved calculations, since finer grid resolution (and consequently smaller computational cells) not only results in more computational cells, but also in smaller time steps.

  9. Massively parallel algorithms for trace-driven cache simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Greenberg, Albert G.; Lubachevsky, Boris D.

    1991-01-01

    Trace driven cache simulation is central to computer design. A trace is a very long sequence of reference lines from main memory. At the t(exp th) instant, reference x sub t is hashed into a set of cache locations, the contents of which are then compared with x sub t. If at the t sup th instant x sub t is not present in the cache, then it is said to be a miss, and is loaded into the cache set, possibly forcing the replacement of some other memory line, and making x sub t present for the (t+1) sup st instant. The problem of parallel simulation of a subtrace of N references directed to a C line cache set is considered, with the aim of determining which references are misses and related statistics. A simulation method is presented for the Least Recently Used (LRU) policy, which regradless of the set size C runs in time O(log N) using N processors on the exclusive read, exclusive write (EREW) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. Timings are presented of the second algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference based line replacement policies are considered, which includes LRU as well as the Least Frequently Used and Random replacement policies. A simulation method is presented for any such policy that on any trace of length N directed to a C line set runs in the O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.

  10. Molecular Dynamics Simulations from SNL's Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS)

    DOE Data Explorer

    Plimpton, Steve; Thompson, Aidan; Crozier, Paul

    LAMMPS (http://lammps.sandia.gov/index.html) stands for Large-scale Atomic/Molecular Massively Parallel Simulator and is a code that can be used to model atoms or, as the LAMMPS website says, as a parallel particle simulator at the atomic, meso, or continuum scale. This Sandia-based website provides a long list of animations from large simulations. These were created using different visualization packages to read LAMMPS output, and each one provides the name of the PI and a brief description of the work done or visualization package used. See also the static images produced from simulations at http://lammps.sandia.gov/pictures.html The foundation paper for LAMMPS is: S. Plimpton, Fast Parallel Algorithms for Short-Range Molecular Dynamics, J Comp Phys, 117, 1-19 (1995), but the website also lists other papers describing contributions to LAMMPS over the years.

  11. Parallel programming in MIMD type parallel systems using transputer and i860 in physical simulations

    NASA Astrophysics Data System (ADS)

    Ido, S.; Hikosaka, S.

    1992-05-01

    Parallel programming and calculation performance were examined by using two types of MIMD parallel systems, that is, a transputer (T800) network and iPSC/860. Some interface subroutines were developed to apply the programs parallelized by using a transputer network to iPSC/860. Compatibility and performance of parallelized programs are discussed.

  12. Parallel Multiscale Algorithms for Astrophysical Fluid Dynamics Simulations

    NASA Technical Reports Server (NTRS)

    Norman, Michael L.

    1997-01-01

    Our goal is to develop software libraries and applications for astrophysical fluid dynamics simulations in multidimensions that will enable us to resolve the large spatial and temporal variations that inevitably arise due to gravity, fronts and microphysical phenomena. The software must run efficiently on parallel computers and be general enough to allow the incorporation of a wide variety of physics. Cosmological structure formation with realistic gas physics is the primary application driver in this work. Accurate simulations of e.g. galaxy formation require a spatial dynamic range (i.e., ratio of system scale to smallest resolved feature) of 104 or more in three dimensions in arbitrary topologies. We take this as our technical requirement. We have achieved, and in fact, surpassed these goals.

  13. Parallel Unsteady Turbopump Simulations for Liquid Rocket Engines

    NASA Technical Reports Server (NTRS)

    Kiris, Cetin C.; Kwak, Dochan; Chan, William

    2000-01-01

    This paper reports the progress being made towards complete turbo-pump simulation capability for liquid rocket engines. Space Shuttle Main Engine (SSME) turbo-pump impeller is used as a test case for the performance evaluation of the MPI and hybrid MPI/Open-MP versions of the INS3D code. Then, a computational model of a turbo-pump has been developed for the shuttle upgrade program. Relative motion of the grid system for rotor-stator interaction was obtained by employing overset grid techniques. Time-accuracy of the scheme has been evaluated by using simple test cases. Unsteady computations for SSME turbo-pump, which contains 136 zones with 35 Million grid points, are currently underway on Origin 2000 systems at NASA Ames Research Center. Results from time-accurate simulations with moving boundary capability, and the performance of the parallel versions of the code will be presented in the final paper.

  14. Financial simulations on a massively parallel Connection Machine

    SciTech Connect

    Hutchinson, J.M.; Zenios, S.A. )

    1991-01-01

    This paper reports on the valuation of complex financial instruments that appear in the banking and insurance industries which requires simulations of their cashflow behavior in a volatile interest rate environment. These simulations are complex and computationally intensive. Their use, thus far, has been limited to intra-day analysis and planning. Researchers at the Wharton School and Thinking Machines Corporation have developed model formulations for massively parallel architectures, like the Connection Machine CM-2. A library of financial modeling primitives has been designed and used to implement a model for the valuation of mortgage-backed securities. Analyzing a portfolio of these securities-which would require 2 days on a large mainframe-is carried out in 1 hour on a CM-2a.

  15. Generating scenario trees: A parallel integrated simulation-optimization approach

    NASA Astrophysics Data System (ADS)

    Beraldi, Patrizia; de Simone, Francesco; Violi, Antonio

    2010-03-01

    A crucial issue for addressing decision-making problems under uncertainty is the approximate representation of multivariate stochastic processes in the form of scenario tree. This paper proposes a scenario generation approach based on the idea of integrating simulation and optimization techniques. In particular, simulation is used to generate outcomes associated with the nodes of the scenario tree which, in turn, provide the input parameters for an optimization model aimed at determining the scenarios' probabilities matching some prescribed targets. The approach relies on the moment-matching technique originally proposed in [K. Høyland, S.W. Wallace, Generating scenario trees for multistage decision problems, Manag. Sci. 47 (2001) 295-307] and further refined in [K. Høyland, M. Kaut, S.W. Wallace, A heuristic for moment-matching scenario generation, Comput. Optim. Appl. 24 (2003) 169-185]. By taking advantage of the iterative nature of our approach, a parallel implementation has been designed and extensively tested on financial data. Numerical results show the efficiency of the parallel algorithm and the improvement in accuracy and effectiveness.

  16. A Generic Scheduling Simulator for High Performance Parallel Computers

    SciTech Connect

    Yoo, B S; Choi, G S; Jette, M A

    2001-08-01

    It is well known that efficient job scheduling plays a crucial role in achieving high system utilization in large-scale high performance computing environments. A good scheduling algorithm should schedule jobs to achieve high system utilization while satisfying various user demands in an equitable fashion. Designing such a scheduling algorithm is a non-trivial task even in a static environment. In practice, the computing environment and workload are constantly changing. There are several reasons for this. First, the computing platforms constantly evolve as the technology advances. For example, the availability of relatively powerful commodity off-the-shelf (COTS) components at steadily diminishing prices have made it feasible to construct ever larger massively parallel computers in recent years [1, 4]. Second, the workload imposed on the system also changes constantly. The rapidly increasing compute resources have provided many applications developers with the opportunity to radically alter program characteristics and take advantage of these additional resources. New developments in software technology may also trigger changes in user applications. Finally, political climate change may alter user priorities or the mission of the organization. System designers in such dynamic environments must be able to accurately forecast the effect of changes in the hardware, software, and/or policies under consideration. If the environmental changes are significant, one must also reassess scheduling algorithms. Simulation has frequently been relied upon for this analysis, because other methods such as analytical modeling or actual measurements are usually too difficult or costly. A drawback of the simulation approach, however, is that developing a simulator is a time-consuming process. Furthermore, an existing simulator cannot be easily adapted to a new environment. In this research, we attempt to develop a generic job-scheduling simulator, which facilitates the evaluation of different scheduling algorithms in various computing environments. The following are our design objectives for this generic simulator. (1) Accept descriptions of varied workloads for a wide range of computing environments. (2) Provide an easy-to-use interface for description of the scheduling policies being evaluated. (3) Accurately calculate the overhead induced by various scheduling algorithms. (4) Accurately model a variety of machine architectures. In summary, we have developed a generic scheduling simulator for high performance parallel computers. This generic simulator supports standard and user-defined job attributes and generates the job attribute values from different input sources, allowing users to model a wide range of workloads, and produces performance parameters with reliability measures. All overheads caused by scheduling algorithms are considered in measuring the performance parameters. The simulator simulates a queuing network to which users can bound a specific scheduling algorithm written as a C function. A set of APIs is provided for the users to facilitate describing the scheduling algorithms. With these features, this simulator can accurately simulate any scheduling algorithms under various workloads and computing platforms. The simulator does not currently model dynamic events like message passing between tasks closely, but we plan to include this crucial functionality into our simulator in the future.

  17. A Decision Tool that Combines Discrete Event Software Process Models with System Dynamics Pieces for Software Development Cost Estimation and Analysis

    NASA Technical Reports Server (NTRS)

    Mizell, Carolyn Barrett; Malone, Linda

    2007-01-01

    The development process for a large software development project is very complex and dependent on many variables that are dynamic and interrelated. Factors such as size, productivity and defect injection rates will have substantial impact on the project in terms of cost and schedule. These factors can be affected by the intricacies of the process itself as well as human behavior because the process is very labor intensive. The complex nature of the development process can be investigated with software development process models that utilize discrete event simulation to analyze the effects of process changes. The organizational environment and its effects on the workforce can be analyzed with system dynamics that utilizes continuous simulation. Each has unique strengths and the benefits of both types can be exploited by combining a system dynamics model and a discrete event process model. This paper will demonstrate how the two types of models can be combined to investigate the impacts of human resource interactions on productivity and ultimately on cost and schedule.

  18. Exception handling controllers: An application of pushdown systems to discrete event control

    SciTech Connect

    Griffin, Christopher H

    2008-01-01

    Recent work by the author has extended the Supervisory Control Theory to include the class of control languages defined by pushdown machines. A pushdown machine is a finite state machine extended by an infinite stack memory. In this paper, we define a specific type of deterministic pushdown machine that is particularly useful as a discrete event controller. Checking controllability of pushdown machines requires computing the complement of the controller machine. We show that Exception Handling Controllers have the property that algorithms for taking their complements and determining their prefix closures are nearly identical to the algorithms available for finite state machines. Further, they exhibit an important property that makes checking for controllability extremely simple. Hence, they maintain the simplicity of the finite state machine, while providing the extra power associated with a pushdown stack memory. We provide an example of a useful control specification that cannot be implemented using a finite state machine, but can be implemented using an Exception Handling Controller.

  19. Automated generation of discrete event controllers for dynamic reconfiguration of autonomous sensor networks

    NASA Astrophysics Data System (ADS)

    Damiani, Sarah; Griffin, Christopher; Phoha, Shashi

    2003-12-01

    Autonomous Sensor Networks have the potential for broad applicability to national security, intelligent transportation, industrial production and environmental and hazardous process control. Distributed sensors may be used for detecting bio-terrorist attacks, for contraband interdiction, border patrol, monitoring building safety and security, battlefield surveillance, or may be embedded in complex dynamic systems for enabling fault tolerant operations. In this paper we present algorithms and automation tools for constructing discrete event controllers for complex networked systems that restrict the dynamic behavior of the system according to given specifications. In our previous work we have modeled dynamic system as a discrete event automation whose open loop behavior is represented as a language L of strings generated with the alphabet 'Elipson' of all possible atomic events that cause state transitions in the network. The controlled behavior is represented by a sublanguage K, contained in L, that restricts the behavior of the system according to the specifications of the controller. We have developed the algebraic structure of controllable sublanguages as perfect right partial ideals that satisfy a precontrollability condition. In this paper we develop an iterative algorithm to take an ad hoc specification described using a natural language, and to formulate a complete specification that results in a controllable sublanguage. A supervisory controller modeled as an automaton that runs synchronously with the open loop system in the sense of Ramadge and Wonham is automatically generated to restrict the behavior of the open loop system to the controllable sublanguage. A battlefield surveillance scenario illustrates the iterative evolution of ad hoc specifications for controlling an autonomous sensor network and the generation of a controller that reconfigures the sensor network to dynamically adapt to environmental perturbations.

  20. Roadmap for efficient parallelization of breast anatomy simulation

    NASA Astrophysics Data System (ADS)

    Chui, Joseph H.; Pokrajac, David D.; Maidment, Andrew D. A.; Bakic, Predrag R.

    2012-03-01

    A roadmap has been proposed to optimize the simulation of breast anatomy by parallel implementation, in order to reduce the time needed to generate software breast phantoms. The rapid generation of high resolution phantoms is needed to support virtual clinical trials of breast imaging systems. We have recently developed an octree-based recursive partitioning algorithm for breast anatomy simulation. The algorithm has good asymptotic complexity; however, its current MATLAB implementation cannot provide optimal execution times. The proposed roadmap for efficient parallelization includes the following steps: (i) migrate the current code to a C/C++ platform and optimize it for single-threaded implementation; (ii) modify the code to allow for multi-threaded CPU implementation; (iii) identify and migrate the code to a platform designed for multithreaded GPU implementation. In this paper, we describe our results in optimizing the C/C++ code for single-threaded and multi-threaded CPU implementations. As the first step of the proposed roadmap we have identified a bottleneck component in the MATLAB implementation using MATLAB's profiling tool, and created a single threaded CPU implementation of the algorithm using C/C++'s overloaded operators and standard template library. The C/C++ implementation has been compared to the MATLAB version in terms of accuracy and simulation time. A 520-fold reduction of the execution time was observed in a test of phantoms with 50- 400 μm voxels. In addition, we have identified several places in the code which will be modified to allow for the next roadmap milestone of the multithreaded CPU implementation.

  1. Massively Parallel Simulations of Diffusion in Dense Polymeric Structures

    SciTech Connect

    Faulon, Jean-Loup, Wilcox, R.T. , Hobbs, J.D. , Ford, D.M.

    1997-11-01

    An original computational technique to generate close-to-equilibrium dense polymeric structures is proposed. Diffusion of small gases are studied on the equilibrated structures using massively parallel molecular dynamics simulations running on the Intel Teraflops (9216 Pentium Pro processors) and Intel Paragon(1840 processors). Compared to the current state-of-the-art equilibration methods this new technique appears to be faster by some orders of magnitude.The main advantage of the technique is that one can circumvent the bottlenecks in configuration space that inhibit relaxation in molecular dynamics simulations. The technique is based on the fact that tetravalent atoms (such as carbon and silicon) fit in the center of a regular tetrahedron and that regular tetrahedrons can be used to mesh the three-dimensional space. Thus, the problem of polymer equilibration described by continuous equations in molecular dynamics is reduced to a discrete problem where solutions are approximated by simple algorithms. Practical modeling applications include the constructing of butyl rubber and ethylene-propylene-dimer-monomer (EPDM) models for oxygen and water diffusion calculations. Butyl and EPDM are used in O-ring systems and serve as sealing joints in many manufactured objects. Diffusion coefficients of small gases have been measured experimentally on both polymeric systems, and in general the diffusion coefficients in EPDM are an order of magnitude larger than in butyl. In order to better understand the diffusion phenomena, 10, 000 atoms models were generated and equilibrated for butyl and EPDM. The models were submitted to a massively parallel molecular dynamics simulation to monitor the trajectories of the diffusing species.

  2. Parallel grid library for rapid and flexible simulation development

    NASA Astrophysics Data System (ADS)

    Honkonen, I.; von Alfthan, S.; Sandroos, A.; Janhunen, P.; Palmroth, M.

    2013-04-01

    We present an easy to use and flexible grid library for developing highly scalable parallel simulations. The distributed cartesian cell-refinable grid (dccrg) supports adaptive mesh refinement and allows an arbitrary C++ class to be used as cell data. The amount of data in grid cells can vary both in space and time allowing dccrg to be used in very different types of simulations, for example in fluid and particle codes. Dccrg transfers the data between neighboring cells on different processes transparently and asynchronously allowing one to overlap computation and communication. This enables excellent scalability at least up to 32 k cores in magnetohydrodynamic tests depending on the problem and hardware. In the version of dccrg presented here part of the mesh metadata is replicated between MPI processes reducing the scalability of adaptive mesh refinement (AMR) to between 200 and 600 processes. Dccrg is free software that anyone can use, study and modify and is available at https://gitorious.org/dccrg. Users are also kindly requested to cite this work when publishing results obtained with dccrg. Catalogue identifier: AEOM_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEOM_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: GNU Lesser General Public License version 3 No. of lines in distributed program, including test data, etc.: 54975 No. of bytes in distributed program, including test data, etc.: 974015 Distribution format: tar.gz Programming language: C++. Computer: PC, cluster, supercomputer. Operating system: POSIX. The code has been parallelized using MPI and tested with 1-32768 processes RAM: 10 MB-10 GB per process Classification: 4.12, 4.14, 6.5, 19.3, 19.10, 20. External routines: MPI-2 [1], boost [2], Zoltan [3], sfc++ [4] Nature of problem: Grid library supporting arbitrary data in grid cells, parallel adaptive mesh refinement, transparent remote neighbor data updates and load balancing. Solution method: The simulation grid is represented by an adjacency list (graph) with vertices stored into a hash table and edges into contiguous arrays. Message Passing Interface standard is used for parallelization. Cell data is given as a template parameter when instantiating the grid. Restrictions: Logically cartesian grid. Running time: Running time depends on the hardware, problem and the solution method. Small problems can be solved in under a minute and very large problems can take weeks. The examples and tests provided with the package take less than about one minute using default options. In the version of dccrg presented here the speed of adaptive mesh refinement is at most of the order of 106 total created cells per second. http://www.mpi-forum.org/. http://www.boost.org/. K. Devine, E. Boman, R. Heaphy, B. Hendrickson, C. Vaughan, Zoltan data management services for parallel dynamic applications, Comput. Sci. Eng. 4 (2002) 90-97. http://dx.doi.org/10.1109/5992.988653. https://gitorious.org/sfc++.

  3. HipGISAXS: A Massively Parallel Code for GISAXS Simulation

    NASA Astrophysics Data System (ADS)

    Chourou, Slim; Sarje, Abhinav; Li, Xiaoye; Chan, Elaine; Hexemer, Alexander; Hipgisaxs Team

    2013-03-01

    Grazing Incidence Small-Angle Scattering (GISAXS) is a valuable experimental technique in probing nanostructures of relevance to polymer science. New high-performance computing algorithms, codes, and software tools have been implemented to analyze GISAXS images generated at synchrotron light sources. We have developed flexible massively parallel GISAXS simulation software ``HipGISAXS'' based on the Distorted Wave Born Approximation (DWBA). The software computes the diffraction pattern for any given superposition of custom shapes or morphologies in a user-defined region of the reciprocal space for all possible grazing incidence angles and sample rotations. This flexibility allows a straightforward study of a wide variety of possible polymer topologies and assemblies whether embedded in a thin film or a multilayered structure. Hence, this code enables guided investigations of the morphological and dynamical properties of relevance in various applications. The current parallel code is capable of computing GISAXS images for highly complex structures and with high resolutions and attaining speedups of 200x on a single-node GPU compared to the sequential code. Moreover, the multi-GPU (CPU) code achieved additional 900x (4000x) speedup on 930 GPU (6000 CPU) nodes. This work was supported by the Director, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

  4. Parallel algorithm for multiscale atomistic/continuum simulations using LAMMPS

    NASA Astrophysics Data System (ADS)

    Pavia, F.; Curtin, W. A.

    2015-07-01

    Deformation and fracture processes in engineering materials often require simultaneous descriptions over a range of length and time scales, with each scale using a different computational technique. Here we present a high-performance parallel 3D computing framework for executing large multiscale studies that couple an atomic domain, modeled using molecular dynamics and a continuum domain, modeled using explicit finite elements. We use the robust Coupled Atomistic/Discrete-Dislocation (CADD) displacement-coupling method, but without the transfer of dislocations between atoms and continuum. The main purpose of the work is to provide a multiscale implementation within an existing large-scale parallel molecular dynamics code (LAMMPS) that enables use of all the tools associated with this popular open-source code, while extending CADD-type coupling to 3D. Validation of the implementation includes the demonstration of (i) stability in finite-temperature dynamics using Langevin dynamics, (ii) elimination of wave reflections due to large dynamic events occurring in the MD region and (iii) the absence of spurious forces acting on dislocations due to the MD/FE coupling, for dislocations further than 10 from the coupling boundary. A first non-trivial example application of dislocation glide and bowing around obstacles is shown, for dislocation lengths of??50 nm using fewer than 1 000?000 atoms but reproducing results of extremely large atomistic simulations at much lower computational cost.

  5. Ion dynamics at supercritical quasi-parallel shocks: Hybrid simulations

    SciTech Connect

    Su Yanqing; Lu Quanming; Gao Xinliang; Huang Can; Wang Shui

    2012-09-15

    By separating the incident ions into directly transmitted, downstream thermalized, and diffuse ions, we perform one-dimensional (1D) hybrid simulations to investigate ion dynamics at a supercritical quasi-parallel shock. In the simulations, the angle between the upstream magnetic field and shock nominal direction is {theta}{sub Bn}=30 Degree-Sign , and the Alfven Mach number is M{sub A}{approx}5.5. The shock exhibits a periodic reformation process. The ion reflection occurs at the beginning of the reformation cycle. Part of the reflected ions is trapped between the old and new shock fronts for an extended time period. These particles eventually form superthermal diffuse ions after they escape to the upstream of the new shock front at the end of the reformation cycle. The other reflected ions may return to the shock immediately or be trapped between the old and new shock fronts for a short time period. When the amplitude of the new shock front exceeds that of the old shock front and the reformation cycle is finished, these ions become thermalized ions in the downstream. No noticeable heating can be found in the directly transmitted ions. The relevance of our simulations to the satellite observations is also discussed in the paper.

  6. A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

    NASA Technical Reports Server (NTRS)

    Rao, Hariprasad Nannapaneni

    1989-01-01

    The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.

  7. Parallel continuous simulated tempering and its applications in large-scale molecular simulations

    PubMed Central

    Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng

    2014-01-01

    In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys.141, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys.141, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 23 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent. PMID:25084887

  8. Parallel continuous simulated tempering and its applications in large-scale molecular simulations.

    PubMed

    Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng

    2014-07-28

    In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2-3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent. PMID:25084887

  9. Parallel continuous simulated tempering and its applications in large-scale molecular simulations

    NASA Astrophysics Data System (ADS)

    Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng

    2014-07-01

    In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2-3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.

  10. Parallel continuous simulated tempering and its applications in large-scale molecular simulations

    SciTech Connect

    Zang, Tianwu; Yu, Linglin; Zhang, Chong; Ma, Jianpeng

    2014-07-28

    In this paper, we introduce a parallel continuous simulated tempering (PCST) method for enhanced sampling in studying large complex systems. It mainly inherits the continuous simulated tempering (CST) method in our previous studies [C. Zhang and J. Ma, J. Chem. Phys. 130, 194112 (2009); C. Zhang and J. Ma, J. Chem. Phys. 132, 244101 (2010)], while adopts the spirit of parallel tempering (PT), or replica exchange method, by employing multiple copies with different temperature distributions. Differing from conventional PT methods, despite the large stride of total temperature range, the PCST method requires very few copies of simulations, typically 2–3 copies, yet it is still capable of maintaining a high rate of exchange between neighboring copies. Furthermore, in PCST method, the size of the system does not dramatically affect the number of copy needed because the exchange rate is independent of total potential energy, thus providing an enormous advantage over conventional PT methods in studying very large systems. The sampling efficiency of PCST was tested in two-dimensional Ising model, Lennard-Jones liquid and all-atom folding simulation of a small globular protein trp-cage in explicit solvent. The results demonstrate that the PCST method significantly improves sampling efficiency compared with other methods and it is particularly effective in simulating systems with long relaxation time or correlation time. We expect the PCST method to be a good alternative to parallel tempering methods in simulating large systems such as phase transition and dynamics of macromolecules in explicit solvent.

  11. Xyce Parallel Electronic Simulator Reference Guide Version 6.4

    SciTech Connect

    Keiter, Eric R.; Mei, Ting; Russo, Thomas V.; Schiek, Richard; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason; Baur, David Gregory

    2015-12-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce . This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] . Trademarks The information herein is subject to change without notice. Copyright c 2002-2015 Sandia Corporation. All rights reserved. Xyce TM Electronic Simulator and Xyce TM are trademarks of Sandia Corporation. Portions of the Xyce TM code are: Copyright c 2002, The Regents of the University of California. Produced at the Lawrence Livermore National Laboratory. Written by Alan Hindmarsh, Allan Taylor, Radu Serban. UCRL-CODE-2002-59 All rights reserved. Orcad, Orcad Capture, PSpice and Probe are registered trademarks of Cadence Design Systems, Inc. Microsoft, Windows and Windows 7 are registered trademarks of Microsoft Corporation. Medici, DaVinci and Taurus are registered trademarks of Synopsys Corporation. Amtec and TecPlot are trademarks of Amtec Engineering, Inc. Xyce 's expression library is based on that inside Spice 3F5 developed by the EECS Department at the University of California. The EKV3 MOSFET model was developed by the EKV Team of the Electronics Laboratory-TUC of the Technical University of Crete. All other trademarks are property of their respective owners. Contacts Bug Reports (Sandia only) http://joseki.sandia.gov/bugzilla http://charleston.sandia.gov/bugzilla World Wide Web http://xyce.sandia.gov http://charleston.sandia.gov/xyce (Sandia only) Email xyce@sandia.gov (outside Sandia) xyce-sandia@sandia.gov (Sandia only)

  12. Sensor Configuration Selection for Discrete-Event Systems under Unreliable Observations

    SciTech Connect

    Wen-Chiao Lin; Tae-Sic Yoo; Humberto E. Garcia

    2010-08-01

    Algorithms for counting the occurrences of special events in the framework of partially-observed discrete event dynamical systems (DEDS) were developed in previous work. Their performances typically become better as the sensors providing the observations become more costly or increase in number. This paper addresses the problem of finding a sensor configuration that achieves an optimal balance between cost and the performance of the special event counting algorithm, while satisfying given observability requirements and constraints. Since this problem is generally computational hard in the framework considered, a sensor optimization algorithm is developed using two greedy heuristics, one myopic and the other based on projected performances of candidate sensors. The two heuristics are sequentially executed in order to find best sensor configurations. The developed algorithm is then applied to a sensor optimization problem for a multiunit- operation system. Results show that improved sensor configurations can be found that may significantly reduce the sensor configuration cost but still yield acceptable performance for counting the occurrences of special events.

  13. Particle/Continuum Hybrid Simulation in a Parallel Computing Environment

    NASA Technical Reports Server (NTRS)

    Baganoff, Donald

    1996-01-01

    The objective of this study was to modify an existing parallel particle code based on the direct simulation Monte Carlo (DSMC) method to include a Navier-Stokes (NS) calculation so that a hybrid solution could be developed. In carrying out this work, it was determined that the following five issues had to be addressed before extensive program development of a three dimensional capability was pursued: (1) find a set of one-sided kinetic fluxes that are fully compatible with the DSMC method, (2) develop a finite volume scheme to make use of these one-sided kinetic fluxes, (3) make use of the one-sided kinetic fluxes together with DSMC type boundary conditions at a material surface so that velocity slip and temperature slip arise naturally for near-continuum conditions, (4) find a suitable sampling scheme so that the values of the one-sided fluxes predicted by the NS solution at an interface between the two domains can be converted into the correct distribution of particles to be introduced into the DSMC domain, (5) carry out a suitable number of tests to confirm that the developed concepts are valid, individually and in concert for a hybrid scheme.

  14. Rasterizing geological models for parallel finite difference simulation using seismic simulation as an example

    NASA Astrophysics Data System (ADS)

    Zehner, Björn; Hellwig, Olaf; Linke, Maik; Görz, Ines; Buske, Stefan

    2016-01-01

    3D geological underground models are often presented by vector data, such as triangulated networks representing boundaries of geological bodies and geological structures. Since models are to be used for numerical simulations based on the finite difference method, they have to be converted into a representation discretizing the full volume of the model into hexahedral cells. Often the simulations require a high grid resolution and are done using parallel computing. The storage of such a high-resolution raster model would require a large amount of storage space and it is difficult to create such a model using the standard geomodelling packages. Since the raster representation is only required for the calculation, but not for the geometry description, we present an algorithm and concept for rasterizing geological models on the fly for the use in finite difference codes that are parallelized by domain decomposition. As a proof of concept we implemented a rasterizer library and integrated it into seismic simulation software that is run as parallel code on a UNIX cluster using the Message Passing Interface. We can thus run the simulation with realistic and complicated surface-based geological models that are created using 3D geomodelling software, instead of using a simplified representation of the geological subsurface using mathematical functions or geometric primitives. We tested this set-up using an example model that we provide along with the implemented library.

  15. Parallel processing for nonlinear dynamics simulations of structures including rotating bladed-disk assemblies

    NASA Technical Reports Server (NTRS)

    Hsieh, Shang-Hsien

    1993-01-01

    The principal objective of this research is to develop, test, and implement coarse-grained, parallel-processing strategies for nonlinear dynamic simulations of practical structural problems. There are contributions to four main areas: finite element modeling and analysis of rotational dynamics, numerical algorithms for parallel nonlinear solutions, automatic partitioning techniques to effect load-balancing among processors, and an integrated parallel analysis system.

  16. Parallel climate model (PCM) control and transient simulations

    NASA Astrophysics Data System (ADS)

    Washington, W. M.; Weatherly, J. W.; Meehl, G. A.; Semtner, A. J., Jr.; Bettge, T. W.; Craig, A. P.; Strand, W. G., Jr.; Arblaster, J.; Wayland, V. B.; James, R.; Zhang, Y.

    The Department of Energy (DOE) supported Parallel Climate Model (PCM) makes use of the NCAR Community Climate Model (CCM3) and Land Surface Model (LSM) for the atmospheric and land surface components, respectively, the DOE Los Alamos National Laboratory Parallel Ocean Program (POP) for the ocean component, and the Naval Postgraduate School sea-ice model. The PCM executes on several distributed and shared memory computer systems. The coupling method is similar to that used in the NCAR Climate System Model (CSM) in that a flux coupler ties the components together, with interpolations between the different grids of the component models. Flux adjustments are not used in the PCM. The ocean component has 2/3° average horizontal grid spacing with 32 vertical levels and a free surface that allows calculation of sea level changes. Near the equator, the grid spacing is approximately 1/2° in latitude to better capture the ocean equatorial dynamics. The North Pole is rotated over northern North America thus producing resolution smaller than 2/3° in the North Atlantic where the sinking part of the world conveyor circulation largely takes place. Because this ocean model component does not have a computational point at the North Pole, the Arctic Ocean circulation systems are more realistic and similar to the observed. The elastic viscous plastic sea ice model has a grid spacing of 27km to represent small-scale features such as ice transport through the Canadian Archipelago and the East Greenland current region. Results from a 300year present-day coupled climate control simulation are presented, as well as for a transient 1% per year compound CO2 increase experiment which shows a global warming of 1.27°C for a 10year average at the doubling point of CO2 and 2.89°C at the quadrupling point. There is a gradual warming beyond the doubling and quadrupling points with CO2 held constant. Globally averaged sea level rise at the time of CO2 doubling is approximately 7cm and at the time of quadrupling it is 23cm. Some of the regional sea level changes are larger and reflect the adjustments in the temperature, salinity, internal ocean dynamics, surface heat flux, and wind stress on the ocean. A 0.5% per year CO2 increase experiment also was performed showing a global warming of 1.5°C around the time of CO2 doubling and a similar warming pattern to the 1% CO2 per year increase experiment. El Niño and La Niña events in the tropical Pacific show approximately the observed frequency distribution and amplitude, which leads to near observed levels of variability on interannual time scales.

  17. A fuzzy discrete event system approach to determining optimal HIV/AIDS treatment regimens.

    PubMed

    Ying, Hao; Lin, Feng; MacArthur, Rodger D; Cohn, Jonathan A; Barth-Jones, Daniel C; Ye, Hong; Crane, Lawrence R

    2006-10-01

    Treatment decision-making is complex and involves many factors. A systematic decision-making and optimization technology capable of handling variations and uncertainties of patient characteristics and physician's subjectivity is currently unavailable. We recently developed a novel general-purpose fuzzy discrete event systems theory for optimal decision-making. We now apply it to develop an innovative system for medical treatment, specifically for the first round of highly active antiretroviral therapy of human immunodeficiency virus/acquired immunodeficiency syndrome (HIV/AIDS) patients involving three historically widely used regimens. The objective is to develop such a system whose regimen choice for any given patient will exactly match expert AIDS physician's selection to produce the (anticipated) optimal treatment outcome. Our regimen selection system consists of a treatment objectives classifier, fuzzy finite state machine models for treatment regimens, and a genetic-algorithm-based optimizer. The optimizer enables the system to either emulate an individual doctor's decision-making or generate a regimen that simultaneously satisfies diverse treatment preferences of multiple physicians to the maximum extent. We used the optimizer to automatically learn the values of 26 parameters of the models. The learning was based on the consensus of AIDS specialists A and B on this project, whose exact agreement was only 35%. The performance of the resulting models was first assessed. We then carried out a retrospective study of the entire system using all the qualifying patients treated in our institution's AIDS Clinical Center in 2001. A total of 35 patients were treated by 13 specialists using the regimens (four and eight patients were treated by specialists A and B, respectively). We compared the actually prescribed regimens with those selected by the system using the same available information. The overall exact agreement was 82.9% (29 out of 35), with the exact agreement with specialists A and B both at 100%. The exact agreement for the remaining 11 physicians not involved in the system training was 73.9% (17 out of 23), an impressive result given the fact that expert opinion can be quite divergent for treatment decisions of such complexity. Our specialists also carefully examined the six mismatched cases and deemed that the system actually chose a more appropriate regimen for four of them. In the other two cases, either would be reasonable choices. Our approach has the capabilities of generalizing, learning, and representing knowledge even in the face of weak consensus, and being readily upgradeable to new medical knowledge. These are practically important features to medical applications in general, and HIV/AIDS treatment in particular, as national HIV/AIDS treatment guidelines are modified several times per year. PMID:17044400

  18. Parallel Vehicular Traffic Simulation using Reverse Computation-based Optimistic Execution

    SciTech Connect

    Yoginath, Srikanth B; Perumalla, Kalyan S

    2008-01-01

    Vehicular traffic simulations are useful in applications such as emergency management and homeland security planning tools. High speed of traffic simulations translates directly to speed of response and level of resilience in those applications. Here, a parallel traffic simulation approach is presented that is aimed at reducing the time for simulating emergency vehicular traffic scenarios. Three unique aspects of this effort are: (1) exploration of optimistic simulation applied to vehicular traffic simulation (2) addressing reverse computation challenges specific to optimistic vehicular traffic simulation (3) achieving absolute (as opposed to self-relative) speedup with a sequential speed equal to that of a fast, de facto standard sequential simulator for emergency traffic. The design and development of the parallel simulation system is presented, along with a performance study that demonstrates excellent sequential performance as well as parallel performance.

  19. Efficient solid state NMR powder simulations using SMP and MPP parallel computation.

    PubMed

    Kristensen, Jrgen Holm; Farnan, Ian

    2003-04-01

    Methods for parallel simulation of solid state NMR powder spectra are presented for both shared and distributed memory parallel supercomputers. For shared memory architectures the performance of simulation programs implementing the OpenMP application programming interface is evaluated. It is demonstrated that the design of correct and efficient shared memory parallel programs is difficult as the performance depends on data locality and cache memory effects. The distributed memory parallel programming model is examined for simulation programs using the MPI message passing interface. The results reveal that both shared and distributed memory parallel computation are very efficient with an almost perfect application speedup and may be applied to the most advanced powder simulations. PMID:12713968

  20. A sweep algorithm for massively parallel simulation of circuit-switched networks

    NASA Technical Reports Server (NTRS)

    Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.

    1992-01-01

    A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.

  1. ANNarchy: a code generation approach to neural simulations on parallel hardware

    PubMed Central

    Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.

    2015-01-01

    Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957

  2. ANNarchy: a code generation approach to neural simulations on parallel hardware.

    PubMed

    Vitay, Julien; Dinkelbach, Helge ; Hamker, Fred H

    2015-01-01

    Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957

  3. Use of networked workstations for parallel nonlinear structural dynamic simulations of rotating bladed-disk assemblies

    NASA Technical Reports Server (NTRS)

    Hsieh, Shang-Hsien; Abel, J. F.

    1993-01-01

    The principal objective of this research is to investigate, develop and demonstrate coarse-grained, parallel-processing strategies for nonlinear dynamic simulations for rotating bladed-disk assemblies. The parallel -processing strategies addressed include numerical algorithms for parallel nonlinear solutions and techniques to effect load balancing among processors. The parallel environment employed is a distributed-memory, coarse-grained one consisting of networked workstations. A parallel explicit time integration method has been implemented for transient nonlinear solutions of rotationg bladed-disk assemblies. Automatic domain partitioning techniques have been investigated for load balancing among processors. Advanced computing environments, data structures and interactive computer graphics all contribute to an integrated parallel finite element analysis system to facilitate more efficient and powerful dynamic simulations.

  4. Comparison of serial and parallel simulations of a corridor fire using FDS

    NASA Astrophysics Data System (ADS)

    Valasek, L.

    2015-09-01

    Current fire simulators allow to model the course of fire in large areas and its impact on structure and equipment. This paper deals with a comparison of serial and parallel calculations of simulation of a corridor fire by the FDS (Fire Dynamics Simulator) system. In parallel case, the whole computational domain is divided into several computational meshes, the computation on each mesh is considered as a single MPI (Message Passing Interface) process realised on one computational core and communication between MPI processes is provided by MPI. The aim of this paper is to determine the size of error caused by parallelization of computation, which occurs at touches of computational meshes.

  5. Three-dimensional shock wave physics simulations with MIMD PAGOSA on massively parallel computers

    SciTech Connect

    Gardner, D.R.; Vaughan, C.T.; Cline, D.D.

    1992-12-31

    The numerical modeling of penetrator-armor interactions for design studies requires rapid, detailed, three-dimensional simulation of complex interactions of exotic materials at high speeds and high rates of strain. To perform such simulations, we have developed a multiple-instruction, multiple-data (MIMD) version of the PAGOSA hydrocode. The code includes a variety of models for material strength, fracture, and the detonation of high explosives. We present a typical armor/antiarmor penetration simulation conducted with this code, and measurements of its performance. The scaled speedups for MIMD PAGOSA on the 1024-processor nCUBE 2 parallel computer, measured as the simulation size is increased with the number of processors, reveal that small grind times (computational time per cell per cycle) and parallel scaled efficiencies of 90% can be achieved for realistic problems. This simulation demonstrates that massively parallel hydrocodes can provide rapid, highly-detailed armor/ antiarmor simulations.

  6. Three-dimensional shock wave physics simulations with MIMD PAGOSA on massively parallel computers

    SciTech Connect

    Gardner, D.R.; Vaughan, C.T. ); Cline, D.D. . Center for High Performance Computing)

    1992-01-01

    The numerical modeling of penetrator-armor interactions for design studies requires rapid, detailed, three-dimensional simulation of complex interactions of exotic materials at high speeds and high rates of strain. To perform such simulations, we have developed a multiple-instruction, multiple-data (MIMD) version of the PAGOSA hydrocode. The code includes a variety of models for material strength, fracture, and the detonation of high explosives. We present a typical armor/antiarmor penetration simulation conducted with this code, and measurements of its performance. The scaled speedups for MIMD PAGOSA on the 1024-processor nCUBE 2 parallel computer, measured as the simulation size is increased with the number of processors, reveal that small grind times (computational time per cell per cycle) and parallel scaled efficiencies of 90% can be achieved for realistic problems. This simulation demonstrates that massively parallel hydrocodes can provide rapid, highly-detailed armor/ antiarmor simulations.

  7. Three-dimensional shock wave physics simulations with MIMD PAGOSA on massively parallel computers

    NASA Astrophysics Data System (ADS)

    Gardner, D. R.; Vaughan, C. T.; Cline, D. D.

    The numerical modeling of penetrator-armor interactions for design studies requires rapid, detailed, three-dimensional simulation of complex interactions of exotic materials at high speeds and high rates of strain. To perform such simulations, we have developed a multiple-instruction, multiple-data (MIMD) version of the PAGOSA hydrocode. The code includes a variety of models for material strength, fracture, and the detonation of high explosives. We present a typical armor/antiarmor penetration simulation conducted with this code, and measurements of its performance. The scaled speedups for MIMD PAGOSA on the 1024-processor nCUBE 2 parallel computer, measured as the simulation size is increased with the number of processors, reveal that small grind times (computational time per cell per cycle) and parallel scaled efficiencies of 90% can be achieved for realistic problems. This simulation demonstrates that massively parallel hydrocodes can provide rapid, highly-detailed armor/antiarmor simulations.

  8. Parallel computing in enterprise modeling.

    SciTech Connect

    Goldsby, Michael E.; Armstrong, Robert C.; Shneider, Max S.; Vanderveen, Keith; Ray, Jaideep; Heath, Zach; Allan, Benjamin A.

    2008-08-01

    This report presents the results of our efforts to apply high-performance computing to entity-based simulations with a multi-use plugin for parallel computing. We use the term 'Entity-based simulation' to describe a class of simulation which includes both discrete event simulation and agent based simulation. What simulations of this class share, and what differs from more traditional models, is that the result sought is emergent from a large number of contributing entities. Logistic, economic and social simulations are members of this class where things or people are organized or self-organize to produce a solution. Entity-based problems never have an a priori ergodic principle that will greatly simplify calculations. Because the results of entity-based simulations can only be realized at scale, scalable computing is de rigueur for large problems. Having said that, the absence of a spatial organizing principal makes the decomposition of the problem onto processors problematic. In addition, practitioners in this domain commonly use the Java programming language which presents its own problems in a high-performance setting. The plugin we have developed, called the Parallel Particle Data Model, overcomes both of these obstacles and is now being used by two Sandia frameworks: the Decision Analysis Center, and the Seldon social simulation facility. While the ability to engage U.S.-sized problems is now available to the Decision Analysis Center, this plugin is central to the success of Seldon. Because Seldon relies on computationally intensive cognitive sub-models, this work is necessary to achieve the scale necessary for realistic results. With the recent upheavals in the financial markets, and the inscrutability of terrorist activity, this simulation domain will likely need a capability with ever greater fidelity. High-performance computing will play an important part in enabling that greater fidelity.

  9. Parallel Unsteady Turbopump Flow Simulations for Reusable Launch Vehicles

    NASA Technical Reports Server (NTRS)

    Kiris, Cetin; Kwak, Dochan

    2000-01-01

    An efficient solution procedure for time-accurate solutions of Incompressible Navier-Stokes equation is obtained. Artificial compressibility method requires a fast convergence scheme. Pressure projection method is efficient when small time-step is required. The number of sub-iteration is reduced significantly when Poisson solver employed with the continuity equation. Both computing time and memory usage are reduced (at least 3 times). Other work includes Multi Level Parallelism (MLP) of INS3D, overset connectivity for the validation case, experimental measurements, and computational model for boost pump.

  10. Parallelized modelling and solution scheme for hierarchically scaled simulations

    NASA Technical Reports Server (NTRS)

    Padovan, Joe

    1995-01-01

    This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.

  11. Molecular Dynamic Simulations of Nanostructured Ceramic Materials on Parallel Computers

    SciTech Connect

    Vashishta, Priya; Kalia, Rajiv

    2005-02-24

    Large-scale molecular-dynamics (MD) simulations have been performed to gain insight into: (1) sintering, structure, and mechanical behavior of nanophase SiC and SiO2; (2) effects of dynamic charge transfers on the sintering of nanophase TiO2; (3) high-pressure structural transformation in bulk SiC and GaAs nanocrystals; (4) nanoindentation in Si3N4; and (5) lattice mismatched InAs/GaAs nanomesas. In addition, we have designed a multiscale simulation approach that seamlessly embeds MD and quantum-mechanical (QM) simulations in a continuum simulation. The above research activities have involved strong interactions with researchers at various universities, government laboratories, and industries. 33 papers have been published and 22 talks have been given based on the work described in this report.

  12. A high resolution finite volume method for efficient parallel simulation of casting processes on unstructured meshes

    SciTech Connect

    Kothe, D.B.; Turner, J.A.; Mosso, S.J.; Ferrell, R.C.

    1997-03-01

    We discuss selected aspects of a new parallel three-dimensional (3-D) computational tool for the unstructured mesh simulation of Los Alamos National Laboratory (LANL) casting processes. This tool, known as {bold Telluride}, draws upon on robust, high resolution finite volume solutions of metal alloy mass, momentum, and enthalpy conservation equations to model the filling, cooling, and solidification of LANL castings. We briefly describe the current {bold Telluride} physical models and solution methods, then detail our parallelization strategy as implemented with Fortran 90 (F90). This strategy has yielded straightforward and efficient parallelization on distributed and shared memory architectures, aided in large part by new parallel libraries {bold JTpack9O} for Krylov-subspace iterative solution methods and {bold PGSLib} for efficient gather/scatter operations. We illustrate our methodology and current capabilities with source code examples and parallel efficiency results for a LANL casting simulation.

  13. Xyce parallel electronic simulator users' guide, Version 6.0.1.

    SciTech Connect

    Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David Gregory.

    2014-01-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

  14. Xyce parallel electronic simulator users guide, version 6.1

    SciTech Connect

    Keiter, Eric R; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Sholander, Peter E.; Thornquist, Heidi K.; Verley, Jason C.; Baur, David Gregory

    2014-03-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas; Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). This includes support for most popular parallel and serial computers; A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models; Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only); and Object-oriented code design and implementation using modern coding practices. Xyce is a parallel code in the most general sense of the phrase-a message passing parallel implementation-which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory parallel platforms. Attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows.

  15. Virtual reality visualization of parallel molecular dynamics simulation

    SciTech Connect

    Disz, T.; Papka, M.; Stevens, R.; Pellegrino, M.; Taylor, V.

    1995-12-31

    When performing communications mapping experiments for massively parallel processors, it is important to be able to visualize the mappings and resulting communications. In a molecular dynamics model, visualization of the atom to atom interaction and the processor mappings provides insight into the effectiveness of the communications algorithms. The basic quantities available for visualization in a model of this type are the number of molecules per unit volume, the mass, and velocity of each molecule. The computational information available for visualization is the atom to atom interaction within each time step, the atom to processor mapping, and the energy resealing events. We use the CAVE (CAVE Automatic Virtual Environment) to provide interactive, immersive visualization experiences.

  16. Toward parallel, adaptive mesh refinement for chemically reacting flow simulations

    SciTech Connect

    Devine, K.D.; Shadid, J.N.; Salinger, A.G. Hutchinson, S.A.; Hennigan, G.L.

    1997-12-01

    Adaptive numerical methods offer greater efficiency than traditional numerical methods by concentrating computational effort in regions of the problem domain where the solution is difficult to obtain. In this paper, the authors describe progress toward adding mesh refinement to MPSalsa, a computer program developed at Sandia National laboratories to solve coupled three-dimensional fluid flow and detailed reaction chemistry systems for modeling chemically reacting flow on large-scale parallel computers. Data structures that support refinement and dynamic load-balancing are discussed. Results using uniform refinement with mesh sequencing to improve convergence to steady-state solutions are also presented. Three examples are presented: a lid driven cavity, a thermal convection flow, and a tilted chemical vapor deposition reactor.

  17. Parallel performance optimizations on unstructured mesh-based simulations

    SciTech Connect

    Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid

    2015-06-01

    This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches. We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.

  18. Modular high-temperature gas-cooled reactor simulation using parallel processors

    SciTech Connect

    Ball, S.J.; Conklin, J.C.

    1989-01-01

    The MHPP (Modular HTGR Parallel Processor) code has been developed to simulate modular high-temperature gas-cooled reactor (MHTGR) transients and accidents. MHPP incorporates a very detailed model for predicting the dynamics of the reactor core, vessel, and cooling systems over a wide variety of scenarios ranging from expected transients to very-low-probability severe accidents. The simulation routines, which had originally been developed entirely as serial code, were readily adapted to parallel processing Fortran. The resulting parallelized simulation speed was enhanced significantly. Workstation interfaces are being developed to provide for user (''operator'') interaction. The benefits realized by adapting previous MHTGR codes to run on a parallel processor are discussed, along with results of typical accident analyses. 3 refs., 3 figs.

  19. Object-oriented particle simulation on parallel computers

    SciTech Connect

    Reynders, J.V.W.; Forslund, D.W.; Hinker, P.J.; Tholburn, M.; Kilman, D.G.; Humphrey, W.F.

    1994-04-01

    A general purpose, object-oriented particle simulation (OOPS) library has been developed for use on a variety of system architectures with a uniform high-level interface. This includes the development of library implementations for the CM5, Intel Paragon, and CRI T3D. Codes written on any of these platforms can be ported to other platforms without modifications by utilizing the high-level library. The general character of the library allows application to such diverse areas as plasma physics, suspension flows, vortex simulations, porous media, and materials science.

  20. An Optimization Algorithm for Multipath Parallel Allocation for Service Resource in the Simulation Task Workflow

    PubMed Central

    Zhang, Hongjun; Zhang, Rui; Li, Yong; Zhang, Xuliang

    2014-01-01

    Service oriented modeling and simulation are hot issues in the field of modeling and simulation, and there is need to call service resources when simulation task workflow is running. How to optimize the service resource allocation to ensure that the task is complete effectively is an important issue in this area. In military modeling and simulation field, it is important to improve the probability of success and timeliness in simulation task workflow. Therefore, this paper proposes an optimization algorithm for multipath service resource parallel allocation, in which multipath service resource parallel allocation model is built and multiple chains coding scheme quantum optimization algorithm is used for optimization and solution. The multiple chains coding scheme quantum optimization algorithm is to extend parallel search space to improve search efficiency. Through the simulation experiment, this paper investigates the effect for the probability of success in simulation task workflow from different optimization algorithm, service allocation strategy, and path number, and the simulation result shows that the optimization algorithm for multipath service resource parallel allocation is an effective method to improve the probability of success and timeliness in simulation task workflow. PMID:24963506

  1. Massively Parallel Reactive and Quantum Molecular Dynamics Simulations

    NASA Astrophysics Data System (ADS)

    Vashishta, Priya

    2015-03-01

    In this talk I will discuss two simulations: Cavitation bubbles readily occur in fluids subjected to rapid changes in pressure. We use billion-atom reactive molecular dynamics simulations on a 163,840-processor BlueGene/P supercomputer to investigate chemical and mechanical damages caused by shock-induced collapse of nanobubbles in water near silica surface. Collapse of an empty nanobubble generates high-speed nanojet, resulting in the formation of a pit on the surface. The gas-filled bubbles undergo partial collapse and consequently the damage on the silica surface is mitigated. Quantum molecular dynamics (QMD) simulations are performed on 786,432-processor Blue Gene/Q to study on-demand production of hydrogen gas from water using Al nanoclusters. QMD simulations reveal rapid hydrogen production from water by an Al nanocluster. We find a low activation-barrier mechanism, in which a pair of Lewis acid and base sites on the Aln surface preferentially catalyzes hydrogen production. I will also discuss on-demand production of hydrogen gas from water using and LiAl alloy particles. Research reported in this lecture was carried in collaboration with Rajiv Kalia, Aiichiro Nakano and Ken-ichi Nomura from the University of Southern California, and Fuyuki Shimojo and Kohei Shimamura from Kumamoto University, Japan.

  2. Parallel kinetic Monte Carlo simulations of Ag(111) island coarsening using a large database.

    PubMed

    Nandipati, Giridhar; Shim, Yunsic; Amar, Jacques G; Karim, Altaf; Kara, Abdelkader; Rahman, Talat S; Trushin, Oleg

    2009-02-25

    The results of parallel kinetic Monte Carlo (KMC) simulations of the room-temperature coarsening of Ag(111) islands carried out using a very large database obtained via self-learning KMC simulations are presented. Our results indicate that, while cluster diffusion and coalescence play an important role for small clusters and at very early times, at late time the coarsening proceeds via Ostwald ripening, i.e. large clusters grow while small clusters evaporate. In addition, an asymptotic analysis of our results for the average island size S(t) as a function of time t leads to a coarsening exponent n = 1/3 (where S(t)?t(2n)), in good agreement with theoretical predictions. However, by comparing with simulations without concerted (multi-atom) moves, we also find that the inclusion of such moves significantly increases the average island size. Somewhat surprisingly we also find that, while the average island size increases during coarsening, the scaled island-size distribution does not change significantly. Our simulations were carried out both as a test of, and as an application of, a variety of different algorithms for parallel kinetic Monte Carlo including the recently developed optimistic synchronous relaxation (OSR) algorithm as well as the semi-rigorous synchronous sublattice (SL) algorithm. A variation of the OSR algorithm corresponding to optimistic synchronous relaxation with pseudo-rollback (OSRPR) is also proposed along with a method for improving the parallel efficiency and reducing the number of boundary events via dynamic boundary allocation (DBA). A variety of other methods for enhancing the efficiency of our simulations are also discussed. We note that, because of the relatively high temperature of our simulations, as well as the large range of energy barriers (ranging from 0.05 to 0.8eV), developing an efficient algorithm for parallel KMC and/or SLKMC simulations is particularly challenging. However, by using DBA to minimize the number of boundary events, we have achieved significantly improved parallel efficiencies for the OSRPR and SL algorithms. Finally, we note that, among the three parallel algorithms which we have tested here, the semi-rigorous SL algorithm with DBA led to the highest parallel efficiencies. As a result, we have obtained reasonable parallel efficiencies in our simulations of room-temperature Ag(111) island coarsening for a small number of processors (e.g.N(p) = 2 and 4). Since the SL algorithm scales with system size for fixed processor size, we expect that comparable and/or even larger parallel efficiencies should be possible for parallel KMC and/or SLKMC simulations of larger systems with larger numbers of processors. PMID:21817366

  3. A parallel finite element simulator for ion transport through three-dimensional ion channel systems.

    PubMed

    Tu, Bin; Chen, Minxin; Xie, Yan; Zhang, Linbo; Eisenberg, Bob; Lu, Benzhuo

    2013-09-15

    A parallel finite element simulator, ichannel, is developed for ion transport through three-dimensional ion channel systems that consist of protein and membrane. The coordinates of heavy atoms of the protein are taken from the Protein Data Bank and the membrane is represented as a slab. The simulator contains two components: a parallel adaptive finite element solver for a set of Poisson-Nernst-Planck (PNP) equations that describe the electrodiffusion process of ion transport, and a mesh generation tool chain for ion channel systems, which is an essential component for the finite element computations. The finite element method has advantages in modeling irregular geometries and complex boundary conditions. We have built a tool chain to get the surface and volume mesh for ion channel systems, which consists of a set of mesh generation tools. The adaptive finite element solver in our simulator is implemented using the parallel adaptive finite element package Parallel Hierarchical Grid (PHG) developed by one of the authors, which provides the capability of doing large scale parallel computations with high parallel efficiency and the flexibility of choosing high order elements to achieve high order accuracy. The simulator is applied to a real transmembrane protein, the gramicidin A (gA) channel protein, to calculate the electrostatic potential, ion concentrations and I - V curve, with which both primitive and transformed PNP equations are studied and their numerical performances are compared. To further validate the method, we also apply the simulator to two other ion channel systems, the voltage dependent anion channel (VDAC) and ?-Hemolysin (?-HL). The simulation results agree well with Brownian dynamics (BD) simulation results and experimental results. Moreover, because ionic finite size effects can be included in PNP model now, we also perform simulations using a size-modified PNP (SMPNP) model on VDAC and ?-HL. It is shown that the size effects in SMPNP can effectively lead to reduced current in the channel, and the results are closer to BD simulation results. PMID:23740647

  4. Parallel performance optimizations on unstructured mesh-based simulations

    DOE PAGESBeta

    Sarje, Abhinav; Song, Sukhyun; Jacobsen, Douglas; Huck, Kevin; Hollingsworth, Jeffrey; Malony, Allen; Williams, Samuel; Oliker, Leonid

    2015-06-01

    This paper addresses two key parallelization challenges the unstructured mesh-based ocean modeling code, MPAS-Ocean, which uses a mesh based on Voronoi tessellations: (1) load imbalance across processes, and (2) unstructured data access patterns, that inhibit intra- and inter-node performance. Our work analyzes the load imbalance due to naive partitioning of the mesh, and develops methods to generate mesh partitioning with better load balance and reduced communication. Furthermore, we present methods that minimize both inter- and intranode data movement and maximize data reuse. Our techniques include predictive ordering of data elements for higher cache efficiency, as well as communication reduction approaches.more » We present detailed performance data when running on thousands of cores using the Cray XC30 supercomputer and show that our optimization strategies can exceed the original performance by over 2×. Additionally, many of these solutions can be broadly applied to a wide variety of unstructured grid-based computations.« less

  5. Parallel finite element simulation of mooring forces on floating objects

    NASA Astrophysics Data System (ADS)

    Aliabadi, S.; Abedi, J.; Zellars, B.

    2003-03-01

    The coupling between the equations governing the free-surface flows, the six degrees of freedom non-linear rigid body dynamics, the linear elasticity equations for mesh-moving and the cables has resulted in a fluid-structure interaction technology capable of simulating mooring forces on floating objects. The finite element solution strategy is based on a combination approach derived from fixed-mesh and moving-mesh techniques. Here, the free-surface flow simulations are based on the Navier-Stokes equations written for two incompressible fluids where the impact of one fluid on the other one is extremely small. An interface function with two distinct values is used to locate the position of the free-surface. The stabilized finite element formulations are written and integrated in an arbitrary Lagrangian-Eulerian domain. This allows us to handle the motion of the time dependent geometries. Forces and momentums exerted on the floating object by both water and hawsers are calculated and used to update the position of the floating object in time. In the mesh moving scheme, we assume that the computational domain is made of elastic materials. The linear elasticity equations are solved to obtain the displacements for each computational node. The non-linear rigid body dynamics equations are coupled with the governing equations of fluid flow and are solved simultaneously to update the position of the floating object. The numerical examples includes a 3D simulation of water waves impacting on a moored floating box and a model boat and simulation of floating object under water constrained with a cable.

  6. Characterization of parallel-hole collimator using Monte Carlo Simulation

    PubMed Central

    Pandey, Anil Kumar; Sharma, Sanjay Kumar; Karunanithi, Sellam; Kumar, Praveen; Bal, Chandrasekhar; Kumar, Rakesh

    2015-01-01

    Objective: Accuracy of in vivo activity quantification improves after the correction of penetrated and scattered photons. However, accurate assessment is not possible with physical experiment. We have used Monte Carlo Simulation to accurately assess the contribution of penetrated and scattered photons in the photopeak window. Materials and Methods: Simulations were performed with Simulation of Imaging Nuclear Detectors Monte Carlo Code. The simulations were set up in such a way that it provides geometric, penetration, and scatter components after each simulation and writes binary images to a data file. These components were analyzed graphically using Microsoft Excel (Microsoft Corporation, USA). Each binary image was imported in software (ImageJ) and logarithmic transformation was applied for visual assessment of image quality, plotting profile across the center of the images and calculating full width at half maximum (FWHM) in horizontal and vertical directions. Results: The geometric, penetration, and scatter at 140 keV for low-energy general-purpose were 93.20%, 4.13%, 2.67% respectively. Similarly, geometric, penetration, and scatter at 140 keV for low-energy high-resolution (LEHR), medium-energy general-purpose (MEGP), and high-energy general-purpose (HEGP) collimator were (94.06%, 3.39%, 2.55%), (96.42%, 1.52%, 2.06%), and (96.70%, 1.45%, 1.85%), respectively. For MEGP collimator at 245 keV photon and for HEGP collimator at 364 keV were 89.10%, 7.08%, 3.82% and 67.78%, 18.63%, 13.59%, respectively. Conclusion: Low-energy general-purpose and LEHR collimator is best to image 140 keV photon. HEGP can be used for 245 keV and 364 keV; however, correction for penetration and scatter must be applied if one is interested to quantify the in vivo activity of energy 364 keV. Due to heavy penetration and scattering, 511 keV photons should not be imaged with HEGP collimator. PMID:25829730

  7. Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

    SciTech Connect

    Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor

    2011-09-06

    We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.

  8. Parallel implementation of a power system dynamic simulation methodology using the conjugate gradient method

    SciTech Connect

    Decker, I.C.; Falcao, D.M.; Kaszkurewicz, E. )

    1992-02-01

    This paper presents results of tests with a parallel implementation of a power system dynamic simulation methodology for transient stability analysis in a parallel computer. The test system is a planned configuration of the interconnected Brazilian South-Southeastern power system with 616 buses, 995 lines, and 88 generators. The parallel machine used in the computer simulation is a distributed memory multiprocessor arranged in a hypercube topology architecture. The nodes are based on the Inmos T800 processor with 4 Mbytes of local memory. The simulation methodology is based on the interlaced alternating implicit integration scheme in which the network equations are re-ordered such that the network admittance matrix appears in the block bordered diagonal form and then is solved by a combined application of the LU factorization and the Conjugate Gradient Method. The results obtained show considerable reductions in the simulation time.

  9. Time-partitioning simulation models for calculation on parallel computers

    NASA Technical Reports Server (NTRS)

    Milner, Edward J.; Blech, Richard A.; Chima, Rodrick V.

    1987-01-01

    A technique allowing time-staggered solution of partial differential equations is presented in this report. Using this technique, called time-partitioning, simulation execution speedup is proportional to the number of processors used because all processors operate simultaneously, with each updating of the solution grid at a different time point. The technique is limited by neither the number of processors available nor by the dimension of the solution grid. Time-partitioning was used to obtain the flow pattern through a cascade of airfoils, modeled by the Euler partial differential equations. An execution speedup factor of 1.77 was achieved using a two processor Cray X-MP/24 computer.

  10. Parallel FEM Simulation of Electromechanics in the Heart

    NASA Astrophysics Data System (ADS)

    Xia, Henian; Wong, Kwai; Zhao, Xiaopeng

    2011-11-01

    Cardiovascular disease is the leading cause of death in America. Computer simulation of complicated dynamics of the heart could provide valuable quantitative guidance for diagnosis and treatment of heart problems. In this paper, we present an integrated numerical model which encompasses the interaction of cardiac electrophysiology, electromechanics, and mechanoelectrical feedback. The model is solved by finite element method on a Linux cluster and the Cray XT5 supercomputer, kraken. Dynamical influences between the effects of electromechanics coupling and mechanic-electric feedback are shown.

  11. Partitioning and packing mathematical simulation models for calculation on parallel computers

    NASA Technical Reports Server (NTRS)

    Arpasi, D. J.; Milner, E. J.

    1986-01-01

    The development of multiprocessor simulations from a serial set of ordinary differential equations describing a physical system is described. Degrees of parallelism (i.e., coupling between the equations) and their impact on parallel processing are discussed. The problem of identifying computational parallelism within sets of closely coupled equations that require the exchange of current values of variables is described. A technique is presented for identifying this parallelism and for partitioning the equations for parallel solution on a multiprocessor. An algorithm which packs the equations into a minimum number of processors is also described. The results of the packing algorithm when applied to a turbojet engine model are presented in terms of processor utilization.

  12. Parallel Monte Carlo Electron and Photon Transport Simulation Code (PMCEPT code)

    NASA Astrophysics Data System (ADS)

    Kum, Oyeon

    2004-11-01

    Simulations for customized cancer radiation treatment planning for each patient are very useful for both patient and doctor. These simulations can be used to find the most effective treatment with the least possible dose to the patient. This typical system, so called ``Doctor by Information Technology", will be useful to provide high quality medical services everywhere. However, the large amount of computing time required by the well-known general purpose Monte Carlo(MC) codes has prevented their use for routine dose distribution calculations for a customized radiation treatment planning. The optimal solution to provide ``accurate" dose distribution within an ``acceptable" time limit is to develop a parallel simulation algorithm on a beowulf PC cluster because it is the most accurate, efficient, and economic. I developed parallel MC electron and photon transport simulation code based on the standard MPI message passing interface. This algorithm solved the main difficulty of the parallel MC simulation (overlapped random number series in the different processors) using multiple random number seeds. The parallel results agreed well with the serial ones. The parallel efficiency approached 100% as was expected.

  13. LARGE-SCALE SIMULATION OF BEAM DYNAMICS IN HIGH INTENSITY ION LINACS USING PARALLEL SUPERCOMPUTERS

    SciTech Connect

    R. RYNE; J. QIANG

    2000-08-01

    In this paper we present results of using parallel supercomputers to simulate beam dynamics in next-generation high intensity ion linacs. Our approach uses a three-dimensional space charge calculation with six types of boundary conditions. The simulations use a hybrid approach involving transfer maps to treat externally applied fields (including rf cavities) and parallel particle-in-cell techniques to treat the space-charge fields. The large-scale simulation results presented here represent a three order of magnitude improvement in simulation capability, in terms of problem size and speed of execution, compared with typical two-dimensional serial simulations. Specific examples will be presented, including simulation of the spallation neutron source (SNS) linac and the Low Energy Demonstrator Accelerator (LEDA) beam halo experiment.

  14. Simulation of optically encoded multiplexing for parallel multipoint sensing.

    PubMed

    Babu Rao, C; Chelliah, Pandian; Sahoo, Trilochan

    2015-06-20

    Spectral emission/absorption-based sensors are commonly used to monitor explosives, narcotics, and other restricted materials in high-security zones such as airports. Monitoring a broad range of spectral wavelengths with high spectral resolution would increase the repertoire of chemicals that can be monitored. However, a portable unit will have limitations in meeting these requirements. Optical fibers can be employed for collecting and transmitting spectral signals from portable sensor heads (PSHs) to a sensitive central spectral analyzer. However, simultaneous detection by sensors in multiple PSHs needs to be differentiated for identifying individual PSHs. An optical encoding method is presented in this paper for use of a portable unit for highly sensitive measurement. The methodology is demonstrated through a simulation using MATLAB Simulink. PMID:26193007

  15. Robust large-scale parallel nonlinear solvers for simulations.

    SciTech Connect

    Bader, Brett William; Pawlowski, Roger Patrick; Kolda, Tamara Gibson

    2005-11-01

    This report documents research to develop robust and efficient solution techniques for solving large-scale systems of nonlinear equations. The most widely used method for solving systems of nonlinear equations is Newton's method. While much research has been devoted to augmenting Newton-based solvers (usually with globalization techniques), little has been devoted to exploring the application of different models. Our research has been directed at evaluating techniques using different models than Newton's method: a lower order model, Broyden's method, and a higher order model, the tensor method. We have developed large-scale versions of each of these models and have demonstrated their use in important applications at Sandia. Broyden's method replaces the Jacobian with an approximation, allowing codes that cannot evaluate a Jacobian or have an inaccurate Jacobian to converge to a solution. Limited-memory methods, which have been successful in optimization, allow us to extend this approach to large-scale problems. We compare the robustness and efficiency of Newton's method, modified Newton's method, Jacobian-free Newton-Krylov method, and our limited-memory Broyden method. Comparisons are carried out for large-scale applications of fluid flow simulations and electronic circuit simulations. Results show that, in cases where the Jacobian was inaccurate or could not be computed, Broyden's method converged in some cases where Newton's method failed to converge. We identify conditions where Broyden's method can be more efficient than Newton's method. We also present modifications to a large-scale tensor method, originally proposed by Bouaricha, for greater efficiency, better robustness, and wider applicability. Tensor methods are an alternative to Newton-based methods and are based on computing a step based on a local quadratic model rather than a linear model. The advantage of Bouaricha's method is that it can use any existing linear solver, which makes it simple to write and easily portable. However, the method usually takes twice as long to solve as Newton-GMRES on general problems because it solves two linear systems at each iteration. In this paper, we discuss modifications to Bouaricha's method for a practical implementation, including a special globalization technique and other modifications for greater efficiency. We present numerical results showing computational advantages over Newton-GMRES on some realistic problems. We further discuss a new approach for dealing with singular (or ill-conditioned) matrices. In particular, we modify an algorithm for identifying a turning point so that an increasingly ill-conditioned Jacobian does not prevent convergence.

  16. Satisfiability Test with Synchronous Simulated Annealing on the Fujitsu AP1000 Massively-Parallel Multiprocessor

    NASA Technical Reports Server (NTRS)

    Sohn, Andrew; Biswas, Rupak

    1996-01-01

    Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.

  17. Parallelization issues of a code for physically-based simulation of fabrics

    NASA Astrophysics Data System (ADS)

    Romero, Sergio; Gutiérrez, Eladio; Romero, Luis F.; Plata, Oscar; Zapata, Emilio L.

    2004-10-01

    The simulation of fabrics, clothes, and flexible materials is an essential topic in computer animation of realistic virtual humans and dynamic sceneries. New emerging technologies, as interactive digital TV and multimedia products, make necessary the development of powerful tools to perform real-time simulations. Parallelism is one of such tools. When analyzing computationally fabric simulations we found these codes belonging to the complex class of irregular applications. Frequently this kind of codes includes reduction operations in their core, so that an important fraction of the computational time is spent on such operations. In fabric simulators these operations appear when evaluating forces, giving rise to the equation system to be solved. For this reason, this paper discusses only this phase of the simulation. This paper analyzes and evaluates different irregular reduction parallelization techniques on ccNUMA shared memory machines, applied to a real, physically-based, fabric simulator we have developed. Several issues are taken into account in order to achieve high code performance, as exploitation of data access locality and parallelism, as well as careful use of memory resources (memory overhead). In this paper we use the concept of data affinity to develop various efficient algorithms for reduction parallelization exploiting data locality.

  18. Hybrid asynchronous algorithm for parallel kinetic Monte Carlo simulations of thin film growth

    SciTech Connect

    Shim, Yunsic; Amar, Jacques G. . E-mail: jamar@physics.utoledo.edu

    2006-02-10

    We have generalized and implemented the hybrid asynchronous algorithm, originally proposed for parallel simulations of the spin-flip Ising model, in order to carry out parallel kinetic Monte Carlo (KMC) simulations. The parallel performance has been tested using a simple model of thin-film growth in both 1D and 2D. We also briefly describe how the data collection must be modified as compared to the case of the spin-flip Ising model in order to carry out rigorous data collection. Due to the presence of a wide range of rates in the simulations, this algorithm turns out to be very inefficient. The poor parallel performance results from three factors: (1) the high probability of selecting a Metropolis Monte Carlo (MMC) move (2) the low acceptance probability of boundary moves and (3) the high cost of communications which is required before every MMC move. We also find that the parallel efficiency in two dimensions is lower than in one-dimension due to the higher probability of selecting an MMC attempt, suggesting that this algorithm may not be suitable for KMC simulations of two-dimensional thin-film growth.

  19. Dependability analysis of parallel systems using a simulation-based approach. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Sawyer, Darren Charles

    1994-01-01

    The analysis of dependability in large, complex, parallel systems executing real applications or workloads is examined in this thesis. To effectively demonstrate the wide range of dependability problems that can be analyzed through simulation, the analysis of three case studies is presented. For each case, the organization of the simulation model used is outlined, and the results from simulated fault injection experiments are explained, showing the usefulness of this method in dependability modeling of large parallel systems. The simulation models are constructed using DEPEND and C++. Where possible, methods to increase dependability are derived from the experimental results. Another interesting facet of all three cases is the presence of some kind of workload of application executing in the simulation while faults are injected. This provides a completely new dimension to this type of study, not possible to model accurately with analytical approaches.

  20. A conflict-free, path-level parallelization approach for sequential simulation algorithms

    NASA Astrophysics Data System (ADS)

    Rasera, Luiz Gustavo; Machado, Péricles Lopes; Costa, João Felipe C. L.

    2015-07-01

    Pixel-based simulation algorithms are the most widely used geostatistical technique for characterizing the spatial distribution of natural resources. However, sequential simulation does not scale well for stochastic simulation on very large grids, which are now commonly found in many petroleum, mining, and environmental studies. With the availability of multiple-processor computers, there is an opportunity to develop parallelization schemes for these algorithms to increase their performance and efficiency. Here we present a conflict-free, path-level parallelization strategy for sequential simulation. The method consists of partitioning the simulation grid into a set of groups of nodes and delegating all available processors for simulation of multiple groups of nodes concurrently. An automated classification procedure determines which groups are simulated in parallel according to their spatial arrangement in the simulation grid. The major advantage of this approach is that it does not require conflict resolution operations, and thus allows exact reproduction of results. Besides offering a large performance gain when compared to the traditional serial implementation, the method provides efficient use of computational resources and is generic enough to be adapted to several sequential algorithms.

  1. Application of integration algorithms in a parallel processing environment for the simulation of jet engines

    NASA Technical Reports Server (NTRS)

    Krosel, S. M.; Milner, E. J.

    1982-01-01

    The application of Predictor corrector integration algorithms developed for the digital parallel processing environment are investigated. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real time performance, interprocessor communication, and algorithm startup are also discussed.

  2. Application of integration algorithms in a parallel processing environment for the simulation of jet engines

    SciTech Connect

    Krosel, S.M.; Milner, E.J.

    1982-01-01

    Illustrates the application of predictor-corrector integration algorithms developed for the digital parallel processing environment. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real-time performance, inter-processor communication and algorithm startup are also discussed. 10 references.

  3. Numerical simulation of the system artificial satellites motion by parallel computing. (Russian Title: ????????? ????????????? ???????? ?????? ??? ? ????? ???????????? ??????????)

    NASA Astrophysics Data System (ADS)

    Bordovitsyna, T. V.; Avdyushev, V. A.; Chuvashov, I. N.; Aleksandrova, A. G.; Tomilova, I. V.

    2009-11-01

    In this paper features of numerical simulation of the large-scale system artificial satellites motion by parallel computing is discussed per example instantiation program complex "Numerical model of the system artificial satellites motion" in cluster "Skiff Cyberia". It is shown that using of parallel computing allows to implement simultaneously high-precision numerical simulation of the motion of large-scale system artificial satellites. It opens comprehensive facilities in solve direct and regressive problems of dynamics such satellite system as GLONASS and objects of space debris.

  4. Aggressively Parallel Algorithms of Collision and Nearest Neighbor Detection for GPU Planetesimal Disk Simulation

    NASA Astrophysics Data System (ADS)

    Quillen, Alice C.; Moore, A.

    2008-09-01

    Planetesimal and dust dynamical simulations require collision and nearest neighbor detection. A brute force implementation for sorting interparticle distances requires O(N2) computations for N particles, limiting the numbers of particles that have been simulated. Parallel algorithms recently developed for the GPU (graphics processing unit), such as the radix sort, can run as fast as O(N) and sort distances between a million particles in a few hundred milliseconds. We introduce improvements in collision and nearest neighbor detection algorithms and how we have incorporated them into our efficient parallel 2nd order democratic heliocentric method symplectic integrator written in NVIDIA's CUDA for the GPU.

  5. Parallel object oriented implementation of a 2D bounded electrostatic plasma PIC simulation

    SciTech Connect

    Norton, C.D.; Szymanski, B.K.; Decyk, V.K.

    1995-12-01

    We discuss the software development issues involved in designing parallel programs using object oriented techniques. Simulations involving 1D and 2D Particle In Cell plasma codes illustrate how C++ programs can effectively describe complex simulations while performing with reasonable efficiency when compared to the equivalent Fortran programs. The scalable object oriented modeling techniques closely match the physical view of the problem, thus supporting modifiability and portability of the code. Selection of a parallel programming paradigm must consider the important factors of efficiency of the computation and the programming implementation effort. C++ and Fortran implementation paradigms are compared and discussed from this point of view.

  6. A parallel finite volume algorithm for large-eddy simulation of turbulent flows

    NASA Astrophysics Data System (ADS)

    Bui, Trong Tri

    1998-11-01

    A parallel unstructured finite volume algorithm is developed for large-eddy simulation of compressible turbulent flows. Major components of the algorithm include piecewise linear least-square reconstruction of the unknown variables, trilinear finite element interpolation for the spatial coordinates, Roe flux difference splitting, and second-order MacCormack explicit time marching. The computer code is designed from the start to take full advantage of the additional computational capability provided by the current parallel computer systems. Parallel implementation is done using the message passing programming model and message passing libraries such as the Parallel Virtual Machine (PVM) and Message Passing Interface (MPI). The development of the numerical algorithm is presented in detail. The parallel strategy and issues regarding the implementation of a flow simulation code on the current generation of parallel machines are discussed. The results from parallel performance studies show that the algorithm is well suited for parallel computer systems that use the message passing programming model. Nearly perfect parallel speedup is obtained on MPP systems such as the Cray T3D and IBM SP2. Performance comparison with the older supercomputer systems such as the Cray YMP show that the simulations done on the parallel systems are approximately 10 to 30 times faster. The results of the accuracy and performance studies for the current algorithm are reported. To validate the flow simulation code, a number of Euler and Navier-Stokes simulations are done for internal duct flows. Inviscid Euler simulation of a very small amplitude acoustic wave interacting with a shock wave in a quasi-1D convergent-divergent nozzle shows that the algorithm is capable of simultaneously tracking the very small disturbances of the acoustic wave and capturing the shock wave. Navier-Stokes simulations are made for fully developed laminar flow in a square duct, developing laminar flow in a rectangular duct, and developing laminar flow in a 90-degree square bend. The Navier-Stokes solutions show good agreements with available analytical solutions and experimental data. To validate the flow simulation code for turbulence simulation, LES of fully-developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. The accuracy of the above algorithm for turbulence simulations is evaluated by comparison with the DNS solution. The effects of grid resolution, upwind numerical dissipation, and subgrid scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux difference splitting dissipation adversely affect the accuracy of the turbulence simulation. This problem is unique to the turbulence simulation, since it does not occur in the Euler and laminar Navier-Stokes simulations using the same code. For accurate turbulence simulation, it is found that only three to five percent of the standard Roe flux difference splitting dissipation is needed.

  7. A new parallel method for molecular dynamics simulation of macromolecular systems

    SciTech Connect

    Plimpton, S.; Hendrickson, B.

    1994-08-01

    Short-range molecular dynamics simulations of molecular systems are commonly parallelized by replicated-data methods, where each processor stores a copy of all atom positions. This enables computation of bonded 2-, 3-, and 4-body forces within the molecular topology to be partitioned among processors straightforwardly. A drawback to such methods is that the inter-processor communication scales as N, the number of atoms, independent of P, the number of processors. Thus, their parallel efficiency falls off rapidly when large numbers of processors are used. In this paper a new parallel method called force-decomposition for simulating macromolecular or small-molecule systems is presented. Its memory and communication costs scale as N/{radical}P, allowing larger problems to be run faster on greater numbers of processors. Like replicated-data techniques, and in contrast to spatial-decomposition approaches, the new method can be simply load-balanced and performs well even for irregular simulation geometries. The implementation of the algorithm in a prototypical macromolecular simulation code ParBond is also discussed. On a 1024-processor Intel Paragon, ParBond runs a standard benchmark simulation of solvated myoglobin with a parallel efficiency of 61% and at 40 times the speed of a vectorized version of CHARMM running on a single Cray Y-MP processor.

  8. Parallel simulation of tsunami inundation on a large-scale supercomputer

    NASA Astrophysics Data System (ADS)

    Oishi, Y.; Imamura, F.; Sugawara, D.

    2013-12-01

    An accurate prediction of tsunami inundation is important for disaster mitigation purposes. One approach is to approximate the tsunami wave source through an instant inversion analysis using real-time observation data (e.g., Tsushima et al., 2009) and then use the resulting wave source data in an instant tsunami inundation simulation. However, a bottleneck of this approach is the large computational cost of the non-linear inundation simulation and the computational power of recent massively parallel supercomputers is helpful to enable faster than real-time execution of a tsunami inundation simulation. Parallel computers have become approximately 1000 times faster in 10 years (www.top500.org), and so it is expected that very fast parallel computers will be more and more prevalent in the near future. Therefore, it is important to investigate how to efficiently conduct a tsunami simulation on parallel computers. In this study, we are targeting very fast tsunami inundation simulations on the K computer, currently the fastest Japanese supercomputer, which has a theoretical peak performance of 11.2 PFLOPS. One computing node of the K computer consists of 1 CPU with 8 cores that share memory, and the nodes are connected through a high-performance torus-mesh network. The K computer is designed for distributed-memory parallel computation, so we have developed a parallel tsunami model. Our model is based on TUNAMI-N2 model of Tohoku University, which is based on a leap-frog finite difference method. A grid nesting scheme is employed to apply high-resolution grids only at the coastal regions. To balance the computation load of each CPU in the parallelization, CPUs are first allocated to each nested layer in proportion to the number of grid points of the nested layer. Using CPUs allocated to each layer, 1-D domain decomposition is performed on each layer. In the parallel computation, three types of communication are necessary: (1) communication to adjacent neighbours for the finite difference calculation, (2) communication between adjacent layers for the calculations to connect each layer, and (3) global communication to obtain the time step which satisfies the CFL condition in the whole domain. A preliminary test on the K computer showed the parallel efficiency on 1024 cores was 57% relative to 64 cores. We estimate that the parallel efficiency will be considerably improved by applying a 2-D domain decomposition instead of the present 1-D domain decomposition in future work. The present parallel tsunami model was applied to the 2011 Great Tohoku tsunami. The coarsest resolution layer covers a 758 km × 1155 km region with a 405 m grid spacing. A nesting of five layers was used with the resolution ratio of 1/3 between nested layers. The finest resolution region has 5 m resolution and covers most of the coastal region of Sendai city. To complete 2 hours of simulation time, the serial (non-parallel) computation took approximately 4 days on a workstation. To complete the same simulation on 1024 cores of the K computer, it took 45 minutes which is more than two times faster than real-time. This presentation discusses the updated parallel computational performance and the efficient use of the K computer when considering the characteristics of the tsunami inundation simulation model in relation to the characteristics and capabilities of the K computer.

  9. Special purpose parallel computer architecture for real-time control and simulation in robotic applications

    NASA Technical Reports Server (NTRS)

    Fijany, Amir (inventor); Bejczy, Antal K. (inventor)

    1993-01-01

    This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.

  10. Transient dynamics simulations: Parallel algorithms for contact detection and smoothed particle hydrodynamics

    SciTech Connect

    Hendrickson, B.; Plimpton, S.; Attaway, S.; Swegle, J.

    1996-09-01

    Transient dynamics simulations are commonly used to model phenomena such as car crashes, underwater explosions, and the response of shipping containers to high-speed impacts. Physical objects in such a simulation are typically represented by Lagrangian meshes because the meshes can move and deform with the objects as they undergo stress. Fluids (gasoline, water) or fluid-like materials (earth) in the simulation can be modeled using the techniques of smoothed particle hydrodynamics. Implementing a hybrid mesh/particle model on a massively parallel computer poses several difficult challenges. One challenge is to simultaneously parallelize and load-balance both the mesh and particle portions of the computation. A second challenge is to efficiently detect the contacts that occur within the deforming mesh and between mesh elements and particles as the simulation proceeds. These contacts impart forces to the mesh elements and particles which must be computed at each timestep to accurately capture the physics of interest. In this paper we describe new parallel algorithms for smoothed particle hydrodynamics and contact detection which turn out to have several key features in common. Additionally, we describe how to join the new algorithms with traditional parallel finite element techniques to create an integrated particle/mesh transient dynamics simulation. Our approach to this problem differs from previous work in that we use three different parallel decompositions, a static one for the finite element analysis and dynamic ones for particles and for contact detection. We have implemented our ideas in a parallel version of the transient dynamics code PRONTO-3D and present results for the code running on a large Intel Paragon.

  11. A parallel simulated annealing algorithm for standard cell placement on a hypercube computer

    NASA Technical Reports Server (NTRS)

    Jones, Mark Howard

    1987-01-01

    A parallel version of a simulated annealing algorithm is presented which is targeted to run on a hypercube computer. A strategy for mapping the cells in a two dimensional area of a chip onto processors in an n-dimensional hypercube is proposed such that both small and large distance moves can be applied. Two types of moves are allowed: cell exchanges and cell displacements. The computation of the cost function in parallel among all the processors in the hypercube is described along with a distributed data structure that needs to be stored in the hypercube to support parallel cost evaluation. A novel tree broadcasting strategy is used extensively in the algorithm for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than the uniprocessor simulated annealing algorithms. An improved uniprocessor algorithm is proposed which is based on the improved results obtained from parallelization of the simulated annealing algorithm.

  12. Massively parallel Monte Carlo for many-particle simulations on GPUs

    NASA Astrophysics Data System (ADS)

    Glotzer, Sharon; Anderson, Joshua; Jankowski, Eric; Grubb, Thomas; Engel, Michael

    2013-03-01

    Current trends in parallel processors call for the design of efficient massively parallel algorithms for scientific computing. Parallel algorithms for Monte Carlo simulations of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. We present a massively parallel method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a GeForce GTX 680, our GPU implementation executes 95 times faster than on a single Intel Xeon E5540 CPU core, enabling 17 times better performance per dollar and cutting energy usage by a factor of 10.

  13. Research of control system stability in solar array simulator with continuous power amplifier of parallel type

    NASA Astrophysics Data System (ADS)

    Mizrah, E. A.; Tkachev, S. B.; Shtabel, N. V.

    2015-10-01

    Solar array simulators are nonlinear control systems designed to reproduce static and dynamic characteristics of solar array. Solar array characteristics depend on illumination, temperature, space environment and other causes. During on-earth testing of spacecraft power systems there is a problem reaching stable work of simulator with different impedance loads in wide range load regulation. In the article authors propose a research method for absolute process stability in solar array simulators and present results of absolute stability research for solar array simulator with continuous parallel type power amplifier.

  14. Wake Encounter Analysis for a Closely Spaced Parallel Runway Paired Approach Simulation

    NASA Technical Reports Server (NTRS)

    Mckissick,Burnell T.; Rico-Cusi, Fernando J.; Murdoch, Jennifer; Oseguera-Lohr, Rosa M.; Stough, Harry P, III; O'Connor, Cornelius J.; Syed, Hazari I.

    2009-01-01

    A Monte Carlo simulation of simultaneous approaches performed by two transport category aircraft from the final approach fix to a pair of closely spaced parallel runways was conducted to explore the aft boundary of the safe zone in which separation assurance and wake avoidance are provided. The simulation included variations in runway centerline separation, initial longitudinal spacing of the aircraft, crosswind speed, and aircraft speed during the approach. The data from the simulation showed that the majority of the wake encounters occurred near or over the runway and the aft boundaries of the safe zones were identified for all simulation conditions.

  15. IB: a Monte Carlo Simulation Tool for Neutron Scattering Instrument Design under Parallel Virtual Machine

    SciTech Connect

    Zhao, Jinkui

    2011-01-01

    IB is a Monte Carlo simulation tool for aiding neutron scattering instrument designs. It is written in C++ and implemented under Parallel Virtual Machine. The program has a few basic components, or modules, that can be used to build a virtual neutron scattering instrument. More complex components, such as neutron guides and multichannel beam benders, can be constructed using the grouping technique unique to IB. Users can specify a collection of modules as a group. For example, a neutron guide can be constructed by grouping four neutron mirrors together that make up the four sides of the guide. IB s simulation engine ensures that neutrons entering a group will be properly operated upon by all members of the group. For simulations that require higher computer speed, the program can be run in parallel mode under the PVM architecture. Initially, the program was written for designing instruments on pulsed neutron sources, it has since been used to simulate reactor based instruments as well.

  16. A parallel Monte Carlo simulation for gaseous electronics on a dynamically reconfigurable multiprocessor system

    NASA Astrophysics Data System (ADS)

    Singleton, Gregory L.; Wu, Chwan-Hwa; Tsai, Jyun-Hwei

    1991-09-01

    The Parallel Monte Carlo method implemented on a dynamically reconfigurable multiprocessor system is presented for simulating the evolution of an assembly of electrons interacting with a background gas under the influence of an electric field. The number of electrons keep increasing due to the electron impact ionization process. Since each electron can be traced independently, the simulation is inherently parallel. However, the Monte Carlo simulation is prohibitively expensive when a significantly large number of particles are needed to achieve satisfactory statistics. Hence, a low-cost multiprocessor system is constructed by grouping a number of Inmos T800 Transputers with a dynamically reconfigurable interconnection network implemented with an Inmos C004 crossbar switch. Both the hardware and software are discussed in detail. The performance of the Monte Carlo simulation on the multiprocessor system shows that this special reconfigurable system is cost-effective for investigations of gaseous-electronics problems which requires a considerable amount of computer time.

  17. Study of the parallel-plate EMP simulator and the simulator-obstacle interaction. Final technical report

    SciTech Connect

    Gedney, S.D.

    1990-12-01

    The Parallel-Plate Bounded-Wave EMP Simulator is typically used to test the vulnerability of electronic systems to the electromagnetic pulse (EMP) produced by a high altitude nuclear burst by subjecting the systems to a simulated EMP environment. However, when large test objects are placed within the simulator for investigation, the desired EMP environment may be affected by the interaction between the simulator and the test object. This simulator/obstacle interaction can be attributed to the following phenomena: (1) mutual coupling between the test object and the simulator, (2) fringing effects due to the finite width of the conducting plates of the simulator, and (3) multiple reflections between the object and the simulator's tapered end-sections. When the interaction is significant, the measurement of currents coupled into the system may not accurately represent those induced by an actual EMP. To better understand the problem of simulator/obstacle interaction, a dynamic analysis of the fields within the parallel-plate simulator is presented. The fields are computed using a moment method solution based on a wire mesh approximation of the conducting surfaces of the simulator. The fields within an empty simulator are found to be predominately transversse electromagnetic (TEM) for frequencies within the simulator's bandwidth, properly simulating the properties of the EMP propagating in free space. However, when a large test object is placed within the simulator, it is found that the currents induced on the object can be quite different from those on an object situated in free space. A comprehensive study of the mechanisms contributing to this deviation is presented.

  18. Xyce parallel electronic simulator reference guide, Version 6.0.1.

    SciTech Connect

    Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David Gregory.

    2014-01-01

    This document is a reference guide to the Xyce Parallel Electronic Simulator, and is a companion document to the Xyce Users' Guide [1] . The focus of this document is (to the extent possible) exhaustively list device parameters, solver options, parser options, and other usage details of Xyce. This document is not intended to be a tutorial. Users who are new to circuit simulation are better served by the Xyce Users' Guide [1] .

  19. A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL)

    NASA Technical Reports Server (NTRS)

    Carroll, Chester C.; Owen, Jeffrey E.

    1988-01-01

    A direct-execution parallel architecture for the Advanced Continuous Simulation Language (ACSL) is presented which overcomes the traditional disadvantages of simulations executed on a digital computer. The incorporation of parallel processing allows the mapping of simulations into a digital computer to be done in the same inherently parallel manner as they are currently mapped onto an analog computer. The direct-execution format maximizes the efficiency of the executed code since the need for a high level language compiler is eliminated. Resolution is greatly increased over that which is available with an analog computer without the sacrifice in execution speed normally expected with digitial computer simulations. Although this report covers all aspects of the new architecture, key emphasis is placed on the processing element configuration and the microprogramming of the ACLS constructs. The execution times for all ACLS constructs are computed using a model of a processing element based on the AMD 29000 CPU and the AMD 29027 FPU. The increase in execution speed provided by parallel processing is exemplified by comparing the derived execution times of two ACSL programs with the execution times for the same programs executed on a similar sequential architecture.

  20. Massively parallel simulation of flow and transport in variably saturated porous and fractured media

    SciTech Connect

    Wu, Yu-Shu; Zhang, Keni; Pruess, Karsten

    2002-01-15

    This paper describes a massively parallel simulation method and its application for modeling multiphase flow and multicomponent transport in porous and fractured reservoirs. The parallel-computing method has been implemented into the TOUGH2 code and its numerical performance is tested on a Cray T3E-900 and IBM SP. The efficiency and robustness of the parallel-computing algorithm are demonstrated by completing two simulations with more than one million gridblocks, using site-specific data obtained from a site-characterization study. The first application involves the development of a three-dimensional numerical model for flow in the unsaturated zone of Yucca Mountain, Nevada. The second application is the study of tracer/radionuclide transport through fracture-matrix rocks for the same site. The parallel-computing technique enhances modeling capabilities by achieving several-orders-of-magnitude speedup for large-scale and high resolution modeling studies. The resulting modeling results provide many new insights into flow and transport processes that could not be obtained from simulations using the single-CPU simulator.

  1. Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube

    NASA Technical Reports Server (NTRS)

    Joslin, Ronald D.; Zubair, Mohammad

    1993-01-01

    The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.

  2. xSim: The Extreme-Scale Simulator

    SciTech Connect

    Boehm, Swen; Engelmann, Christian

    2011-01-01

    Investigating parallel application performance properties at scale is becoming an important part of high-performance computing (HPC) application development and deployment. The Extreme-scale Simulator (xSim) is a performance investigation toolkit that permits running an application in a controlled environment at extreme scale without the need for a respective extreme-scale HPC system. Using a lightweight parallel discrete event simulation, xSim executes a parallel application with a virtual wall clock time, such that performance data can be extracted based on a processor model and a network model. This paper presents significant enhancements to the xSim toolkit prototype that provide a more complete Message Passing Interface (MPI) support and improve its versatility. These enhancements include full virtual MPI group, communicator and collective communication support, and global variables support. The new capabilities are demonstrated by executing the entire NAS Parallel Benchmark suite in a simulated HPC environment.

  3. Parallel Simulation of Three-Dimensional Free Surface Fluid Flow Problems

    SciTech Connect

    BAER,THOMAS A.; SACKINGER,PHILIP A.; SUBIA,SAMUEL R.

    1999-10-14

    Simulation of viscous three-dimensional fluid flow typically involves a large number of unknowns. When free surfaces are included, the number of unknowns increases dramatically. Consequently, this class of problem is an obvious application of parallel high performance computing. We describe parallel computation of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact fines. The Galerkin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-static solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of unknowns. Other issues discussed are the proper constraints appearing along the dynamic contact line in three dimensions. Issues affecting efficient parallel simulations include problem decomposition to equally distribute computational work among a SPMD computer and determination of robust, scalable preconditioners for the distributed matrix systems that must be solved. Solution continuation strategies important for serial simulations have an enhanced relevance in a parallel coquting environment due to the difficulty of solving large scale systems. Parallel computations will be demonstrated on an example taken from the coating flow industry: flow in the vicinity of a slot coater edge. This is a three dimensional free surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another region. As such, a significant fraction of the computational time is devoted to processing boundary data. Discussion focuses on parallel speed ups for fixed problem size, a class of problems of immediate practical importance.

  4. Petascale turbulence simulation using a highly parallel fast multipole method on GPUs

    NASA Astrophysics Data System (ADS)

    Yokota, Rio; Barba, L. A.; Narumi, Tetsu; Yasuoka, Kenji

    2013-03-01

    This paper reports large-scale direct numerical simulations of homogeneous-isotropic fluid turbulence, achieving sustained performance of 1.08 petaflop/s on GPU hardware using single precision. The simulations use a vortex particle method to solve the Navier-Stokes equations, with a highly parallel fast multipole method (FMM) as numerical engine, and match the current record in mesh size for this application, a cube of 40963 computational points solved with a spectral method. The standard numerical approach used in this field is the pseudo-spectral method, relying on the FFT algorithm as the numerical engine. The particle-based simulations presented in this paper quantitatively match the kinetic energy spectrum obtained with a pseudo-spectral method, using a trusted code. In terms of parallel performance, weak scaling results show the FMM-based vortex method achieving 74% parallel efficiency on 4096 processes (one GPU per MPI process, 3 GPUs per node of the TSUBAME-2.0 system). The FFT-based spectral method is able to achieve just 14% parallel efficiency on the same number of MPI processes (using only CPU cores), due to the all-to-all communication pattern of the FFT algorithm. The calculation time for one time step was 108 s for the vortex method and 154 s for the spectral method, under these conditions. Computing with 69 billion particles, this work exceeds by an order of magnitude the largest vortex-method calculations to date.

  5. Midpoint cell method for hybrid (MPI+OpenMP) parallelization of molecular dynamics simulations.

    PubMed

    Jung, Jaewoon; Mori, Takaharu; Sugita, Yuji

    2014-05-30

    We have developed a new hybrid (MPI+OpenMP) parallelization scheme for molecular dynamics (MD) simulations by combining a cell-wise version of the midpoint method with pair-wise Verlet lists. In this scheme, which we call the midpoint cell method, simulation space is divided into subdomains, each of which is assigned to a MPI processor. Each subdomain is further divided into small cells. The interaction between two particles existing in different cells is computed in the subdomain containing the midpoint cell of the two cells where the particles reside. In each MPI processor, cell pairs are distributed over OpenMP threads for shared memory parallelization. The midpoint cell method keeps the advantages of the original midpoint method, while filtering out unnecessary calculations of midpoint checking for all the particle pairs by single midpoint cell determination prior to MD simulations. Distributing cell pairs over OpenMP threads allows for more efficient shared memory parallelization compared with distributing atom indices over threads. Furthermore, cell grouping of particle data makes better memory access, reducing the number of cache misses. The parallel performance of the midpoint cell method on the K computer showed scalability up to 512 and 32,768 cores for systems of 20,000 and 1 million atoms, respectively. One MD time step for long-range interactions could be calculated within 4.5 ms even for a 1 million atoms system with particle-mesh Ewald electrostatics. PMID:24659253

  6. Large-scale numerical simulation of laser propulsion by parallel computing

    NASA Astrophysics Data System (ADS)

    Zeng, Yaoyuan; Zhao, Wentao; Wang, Zhenghua

    2013-05-01

    As one of the most significant methods to study laser propelled rocket, the numerical simulation of laser propulsion has drawn an ever increasing attention at present. Nevertheless, the traditional serial simulation model cannot satisfy the practical needs because of insatiable memory overhead and considerable computation time. In order to solve this problem, we study on a general algorithm for laser propulsion design, and bring about parallelization by using a twolevel hybrid parallel programming model. The total computing domain is decomposed into distributed data spaces, and each partition is assigned to a MPI process. A single step of computation operates in the inter loop level, where a compiler directive is used to split MPI process into several OpenMP threads. Finally, parallel efficiency of hybrid program about two typical configurations on a China-made supercomputer with 4 to 256 cores is compared with pure MPI program. And, the hybrid program exhibits better performance than the pure MPI program on the whole, roughly as expected. The result indicates that our hybrid parallel approach is effective and practical in large-scale numerical simulation of laser propulsion.

  7. Parallel 3D Multi-Stage Simulation of a Turbofan Engine

    NASA Technical Reports Server (NTRS)

    Turner, Mark G.; Topp, David A.

    1998-01-01

    A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force calculation) for a grid which has 227 points axially.

  8. GROMACS: High performance molecular simulations through multi-level parallelism from laptops to supercomputers

    NASA Astrophysics Data System (ADS)

    Abraham, Mark James; Murtola, Teemu; Schulz, Roland; Pll, Szilrd; Smith, Jeremy C.; Hess, Berk; Lindahl, Erik

    2015-09-01

    GROMACS is one of the most widely used open-source and free software codes in chemistry, used primarily for dynamical simulations of biomolecules. It provides a rich set of calculation types, preparation and analysis tools. Several advanced techniques for free-energy calculations are supported. In version 5, it reaches new performance heights, through several new and enhanced parallelization algorithms. These work on every level; SIMD registers inside cores, multithreading, heterogeneous CPU-GPU acceleration, state-of-the-art 3D domain decomposition, and ensemble-level parallelization through built-in replica exchange and the separate Copernicus framework. The latest best-in-class compressed trajectory storage format is supported.

  9. Progress on the Multiphysics Capabilities of the Parallel Electromagnetic ACE3P Simulation Suite

    SciTech Connect

    Kononenko, Oleksiy

    2015-03-26

    ACE3P is a 3D parallel simulation suite that is being developed at SLAC National Accelerator Laboratory. Effectively utilizing supercomputer resources, ACE3P has become a key tool for the coupled electromagnetic, thermal and mechanical research and design of particle accelerators. Based on the existing finite-element infrastructure, a massively parallel eigensolver is developed for modal analysis of mechanical structures. It complements a set of the multiphysics tools in ACE3P and, in particular, can be used for the comprehensive study of microphonics in accelerating cavities ensuring the operational reliability of a particle accelerator.

  10. Adventures in Parallel Processing: Entry, Descent and Landing Simulation for the Genesis and Stardust Missions

    NASA Technical Reports Server (NTRS)

    Lyons, Daniel T.; Desai, Prasun N.

    2005-01-01

    This paper will describe the Entry, Descent and Landing simulation tradeoffs and techniques that were used to provide the Monte Carlo data required to approve entry during a critical period just before entry of the Genesis Sample Return Capsule. The same techniques will be used again when Stardust returns on January 15, 2006. Only one hour was available for the simulation which propagated 2000 dispersed entry states to the ground. Creative simulation tradeoffs combined with parallel processing were needed to provide the landing footprint statistics that were an essential part of the Go/NoGo decision that authorized release of the Sample Return Capsule a few hours before entry.

  11. A method for data handling numerical results in parallel OpenFOAM simulations

    NASA Astrophysics Data System (ADS)

    Anton, Alin; Muntean, Sebastian

    2015-12-01

    Parallel computational fluid dynamics simulations produce vast amount of numerical result data. This paper introduces a method for reducing the size of the data by replaying the interprocessor traffic. The results are recovered only in certain regions of interest configured by the user. A known test case is used for several mesh partitioning scenarios using the OpenFOAM toolkit[1]. The space savings obtained with classic algorithms remain constant for more than 60 Gb of floating point data. Our method is most efficient on large simulation meshes and is much better suited for compressing large scale simulation results than the regular algorithms.

  12. Design of a real-time wind turbine simulator using a custom parallel architecture

    NASA Technical Reports Server (NTRS)

    Hoffman, John A.; Gluck, R.; Sridhar, S.

    1995-01-01

    The design of a new parallel-processing digital simulator is described. The new simulator has been developed specifically for analysis of wind energy systems in real time. The new processor has been named: the Wind Energy System Time-domain simulator, version 3 (WEST-3). Like previous WEST versions, WEST-3 performs many computations in parallel. The modules in WEST-3 are pure digital processors, however. These digital processors can be programmed individually and operated in concert to achieve real-time simulation of wind turbine systems. Because of this programmability, WEST-3 is very much more flexible and general than its two predecessors. The design features of WEST-3 are described to show how the system produces high-speed solutions of nonlinear time-domain equations. WEST-3 has two very fast Computational Units (CU's) that use minicomputer technology plus special architectural features that make them many times faster than a microcomputer. These CU's are needed to perform the complex computations associated with the wind turbine rotor system in real time. The parallel architecture of the CU causes several tasks to be done in each cycle, including an IO operation and the combination of a multiply, add, and store. The WEST-3 simulator can be expanded at any time for additional computational power. This is possible because the CU's interfaced to each other and to other portions of the simulation using special serial buses. These buses can be 'patched' together in essentially any configuration (in a manner very similar to the programming methods used in analog computation) to balance the input/ output requirements. CU's can be added in any number to share a given computational load. This flexible bus feature is very different from many other parallel processors which usually have a throughput limit because of rigid bus architecture.

  13. Application of parallel computing techniques to a large-scale reservoir simulation

    SciTech Connect

    Zhang, Keni; Wu, Yu-Shu; Ding, Chris; Pruess, Karsten

    2001-02-01

    Even with the continual advances made in both computational algorithms and computer hardware used in reservoir modeling studies, large-scale simulation of fluid and heat flow in heterogeneous reservoirs remains a challenge. The problem commonly arises from intensive computational requirement for detailed modeling investigations of real-world reservoirs. This paper presents the application of a massive parallel-computing version of the TOUGH2 code developed for performing large-scale field simulations. As an application example, the parallelized TOUGH2 code is applied to develop a three-dimensional unsaturated-zone numerical model simulating flow of moisture, gas, and heat in the unsaturated zone of Yucca Mountain, Nevada, a potential repository for high-level radioactive waste. The modeling approach employs refined spatial discretization to represent the heterogeneous fractured tuffs of the system, using more than a million 3-D gridblocks. The problem of two-phase flow and heat transfer within the model domain leads to a total of 3,226,566 linear equations to be solved per Newton iteration. The simulation is conducted on a Cray T3E-900, a distributed-memory massively parallel computer. Simulation results indicate that the parallel computing technique, as implemented in the TOUGH2 code, is very efficient. The reliability and accuracy of the model results have been demonstrated by comparing them to those of small-scale (coarse-grid) models. These comparisons show that simulation results obtained with the refined grid provide more detailed predictions of the future flow conditions at the site, aiding in the assessment of proposed repository performance.

  14. Parallel computing simulation of fluid flow in the unsaturated zone of Yucca Mountain, Nevada.

    PubMed

    Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G S

    2003-01-01

    This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-1-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the 1-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models. PMID:12714301

  15. Massively parallel computing simulation of fluid flow in the unsaturated zone of Yucca Mountain, Nevada

    SciTech Connect

    Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G.S.

    2001-08-31

    This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-one-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the one-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models.

  16. Evaluating the performance of parallel subsurface simulators: An illustrative example with PFLOTRAN

    NASA Astrophysics Data System (ADS)

    Hammond, G. E.; Lichtner, P. C.; Mills, R. T.

    2014-01-01

    To better inform the subsurface scientist on the expected performance of parallel simulators, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's parallel layout and code design, PFLOTRAN's parallel performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted.

  17. A Queue Simulation Tool for a High Performance Scientific Computing Center

    NASA Technical Reports Server (NTRS)

    Spear, Carrie; McGalliard, James

    2007-01-01

    The NASA Center for Computational Sciences (NCCS) at the Goddard Space Flight Center provides high performance highly parallel processors, mass storage, and supporting infrastructure to a community of computational Earth and space scientists. Long running (days) and highly parallel (hundreds of CPUs) jobs are common in the workload. NCCS management structures batch queues and allocates resources to optimize system use and prioritize workloads. NCCS technical staff use a locally developed discrete event simulation tool to model the impacts of evolving workloads, potential system upgrades, alternative queue structures and resource allocation policies.

  18. On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods.

    PubMed

    Lee, Anthony; Yau, Christopher; Giles, Michael B; Doucet, Arnaud; Holmes, Christopher C

    2010-12-01

    We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. PMID:22003276

  19. Recent progress in 3D EM/EM-PIC simulation with ARGUS and parallel ARGUS

    SciTech Connect

    Mankofsky, A.; Petillo, J.; Krueger, W.; Mondelli, A.; McNamara, B.; Philp, R.

    1994-12-31

    ARGUS is an integrated, 3-D, volumetric simulation model for systems involving electric and magnetic fields and charged particles, including materials embedded in the simulation region. The code offers the capability to carry out time domain and frequency domain electromagnetic simulations of complex physical systems. ARGUS offers a boolean solid model structure input capability that can include essentially arbitrary structures on the computational domain, and a modular architecture that allows multiple physics packages to access the same data structure and to share common code utilities. Physics modules are in place to compute electrostatic and electromagnetic fields, the normal modes of RF structures, and self-consistent particle-in-cell (PIC) simulation in either a time dependent mode or a steady state mode. The PIC modules include multiple particle species, the Lorentz equations of motion, and algorithms for the creation of particles by emission from material surfaces, injection onto the grid, and ionization. In this paper, we present an updated overview of ARGUS, with particular emphasis given in recent algorithmic and computational advances. These include a completely rewritten frequency domain solver which efficiently treats lossy materials and periodic structures, a parallel version of ARGUS with support for both shared memory parallel vector (i.e. CRAY) machines and distributed memory massively parallel MIMD systems, and numerous new applications of the code.

  20. Parallel-vector algorithms for particle simulations on shared-memory multiprocessors

    SciTech Connect

    Nishiura, Daisuke; Sakaguchi, Hide

    2011-03-01

    Over the last few decades, the computational demands of massive particle-based simulations for both scientific and industrial purposes have been continuously increasing. Hence, considerable efforts are being made to develop parallel computing techniques on various platforms. In such simulations, particles freely move within a given space, and so on a distributed-memory system, load balancing, i.e., assigning an equal number of particles to each processor, is not guaranteed. However, shared-memory systems achieve better load balancing for particle models, but suffer from the intrinsic drawback of memory access competition, particularly during (1) paring of contact candidates from among neighboring particles and (2) force summation for each particle. Here, novel algorithms are proposed to overcome these two problems. For the first problem, the key is a pre-conditioning process during which particle labels are sorted by a cell label in the domain to which the particles belong. Then, a list of contact candidates is constructed by pairing the sorted particle labels. For the latter problem, a table comprising the list indexes of the contact candidate pairs is created and used to sum the contact forces acting on each particle for all contacts according to Newton's third law. With just these methods, memory access competition is avoided without additional redundant procedures. The parallel efficiency and compatibility of these two algorithms were evaluated in discrete element method (DEM) simulations on four types of shared-memory parallel computers: a multicore multiprocessor computer, scalar supercomputer, vector supercomputer, and graphics processing unit. The computational efficiency of a DEM code was found to be drastically improved with our algorithms on all but the scalar supercomputer. Thus, the developed parallel algorithms are useful on shared-memory parallel computers with sufficient memory bandwidth.

  1. Parallel electric fields in a simulation of magnetotail reconnection and plasmoid evolution

    SciTech Connect

    Hesse, M.; Birn, J.

    1989-01-01

    We investigate properties of the electric field component parallel to the magnetic field (E/sub /parallel//) in a three-dimensional MHD simulation of plasmoid formation and evolution in the magnetotail in the presence of a net dawn-dusk magnetic field component. We emphasize particularly the spatial location of E/sub /parallel//, the concept of a diffusion zone and the role of E/sub /parallel// in accelerating electrons. We find a localization of the region of enhanced E/sub /parallel// in all space directions with a strong concentration in the z direction. We identify this region as the diffusion zone, which plays a crucial role in reconnection theory through the local break-down of magnetic flux conservation. The presence of B/sub y/ implies a north-south asymmetry of the injection of accelerated particles into the near-earth region, if the net B/sub y/ field is strong enough to force particles to follow field lines through the diffusion region. We estimate that for a typical net B/sub y/ field this should affect the injection of electrons into the near-earth dawn region, so that precipitation into the northern (southern) hemisphere should dominate for duskward (dawnward) net B/sub y/. In addition, we observe a spatial clottiness of the expected injection of adiabatic particles which could be related to the appearance bright spots in auroras. 12 refs., 9 figs.

  2. Parallel electric fields in a simulation of magnetotail reconnection and plasmoid evolution

    NASA Technical Reports Server (NTRS)

    Hesse, Michael; Birn, Joachim

    1989-01-01

    Properties of the electric field component parallel to the magnetic field (E sub parallel) in a three-dimensional MHD simulation of plasmoid formation and evolution in the magnetotail in the presence of a net dawn-dusk magnetic field component were observed. Particularly emphasized was the spatial location of E(sub parallel), the concept of a diffusion zone and the role of E(sub parallel) in accelerating electrons. A localization of the region of enhanced E(sub parallel) in all space directions with a strong concentration in the z direction was found. This region was identified as the diffusion zone, which plays a crucial role in reconnection theory through the local break-down of magnetic flux conservation. The presence of B(sub y) implies a north-south asymmetry of the injection of accelerated particles into the near-earth region, if the net B(sub y) field is strong enough to force particles to follow field lines through the diffusion region. It is estimated that for a typical net B(sub y) field this should affect the injection of electrons into the near-earth dawn region, so that precipitation into the Northern (Southern) Hemisphere should dominate for duskward (dawnward) net B(sub y). In addition, a spatial clottiness of the expected injection of adiabatic particles which could be related to the appearance bright spots in auroras was observed.

  3. Vortex-induced vibration of two parallel risers: Experimental test and numerical simulation

    NASA Astrophysics Data System (ADS)

    Huang, Weiping; Zhou, Yang; Chen, Haiming

    2016-04-01

    The vortex-induced vibration of two identical rigidly mounted risers in a parallel arrangement was studied using Ansys- CFX and model tests. The vortex shedding and force were recorded to determine the effect of spacing on the two-degree-of-freedom oscillation of the risers. CFX was used to study the single riser and two parallel risers in 2-8 D spacing considering the coupling effect. Because of the limited width of water channel, only three different riser spacings, 2 D, 3 D, and 4 D, were tested to validate the characteristics of the two parallel risers by comparing to the numerical simulation. The results indicate that the lift force changes significantly with the increase in spacing, and in the case of 3 D spacing, the lift force of the two parallel risers reaches the maximum. The vortex shedding of the risers in 3 D spacing shows that a variable velocity field with the same frequency as the vortex shedding is generated in the overlapped area, thus equalizing the period of drag force to that of lift force. It can be concluded that the interaction between the two parallel risers is significant when the risers are brought to a small distance between them because the trajectory of riser changes from oval to curve 8 as the spacing is increased. The phase difference of lift force between the two risers is also different as the spacing changes.

  4. Relevance of the parallel nonlinearity in gyrokinetic simulations of tokamak plasmas

    SciTech Connect

    Candy, J.; Waltz, R. E.; Parker, S. E.; Chen, Y.

    2006-07-15

    The influence of the parallel nonlinearity on transport in gyrokinetic simulations is assessed for values of {rho}{sub *} which are typical of current experiments. Here, {rho}{sub *}={rho}{sub s}/a is the ratio of gyroradius, {rho}{sub s}, to plasma minor radius, a. The conclusion, derived from simulations with both GYRO [J. Candy and R. E. Waltz, J. Comput. Phys., 186, 585 (2003)] and GEM [Y. Chen and S. E. Parker J. Comput. Phys., 189, 463 (2003)] is that no measurable effect of the parallel nonlinearity is apparent for {rho}{sub *}<0.012. This result is consistent with scaling arguments, which suggest that the parallel nonlinearity should be O({rho}{sub *}) smaller than the ExB nonlinearity. Indeed, for the plasma parameters under consideration, the magnitude of the parallel nonlinearity is a factor of 8{rho}{sub *} smaller (for 0.000 75<{rho}{sub *}<0.012) than the other retained terms in the nonlinear gyrokinetic equation.

  5. A multi-transputer system for parallel Monte Carlo simulations of extensive air showers

    NASA Astrophysics Data System (ADS)

    Gils, H. J.; Heck, D.; Oehlschlger, J.; Schatz, G.; Thouw, T.; Merkel, A.

    1989-12-01

    A multiprocessor computer system has been brought into operation at the Kernforschungszentrum Karlsruhe. It is dedicated to Monte Carlo simulations of extensive air showers induced by ultra-high energy cosmic rays. The architecture consists of two independently working VMEbus systems each with a 68020 microprocessor as host computer and twelve T800 transputers for parallel processing. The two systems are linked via Ethernet for data exchange. The T800 transputers are equipped with 4 Mbyte RAM each, sufficient to run rather large codes. The host computers are operated under UNIX 5.3. On the transputers compilers for PARALLEL FORTRAN, C, and PASCAL are available. The simple modular architecture of this parallel computer reflects the single purpose for which it is intended. The hardware of the multiprocessor computer is described as well as the way how the user software is handled and distributed to the 24 working processors. The performance of the parallel computer is demonstrated by well-known benchmarks and by realistic Monte Carlo simulations of air showers. Comparisons with other types of microprocessors and with large universal computers are made. It is demonstrated that a cost reduction by more than a factor of 20 is achieved by this system as compared to a universal computer.

  6. Parallel-in-time implementation of transient stability simulations on a transputer network

    SciTech Connect

    La Scala, M.; Sblendorio, G.; Sbrizzai, R. . Dept. di Elettrotecnica ed Elettronica)

    1994-05-01

    The most time consuming computer simulation in power system studies is the transient stability analysis. In recent years, parallel processing has been applied for time domain simulations of power system transient behavior. In this paper, a parallel implementation of an algorithm based on Shifted-Picard dynamic iterations is presented. The main idea is that a set of nonlinear Differential Algebraic Equations (DAEs), which describes the system, can be solved by the iterative solution of a linear set of DAEs. The time behavior of the linear set of differential equations can be obtained by the evaluation of the convolution integral. In the parallel-in-time implementation of the proposed algorithm, each processor is devoted to the evaluation of the complete set of variables relative to each time step. The quadrature formula, adopted for the integral evaluation, can be easily parallelized by using a number of processors equal to the number of time steps. The algorithm, implemented on a transputer network with 32 Inmos T800/20 adopting a uni-directional ring topology, has been tested on standard power systems.

  7. Parallel computers

    SciTech Connect

    Treveaven, P.

    1989-01-01

    This book presents an introduction to object-oriented, functional, and logic parallel computing on which the fifth generation of computer systems will be based. Coverage includes concepts for parallel computing languages, a parallel object-oriented system (DOOM) and its language (POOL), an object-oriented multilevel VLSI simulator using POOL, and implementation of lazy functional languages on parallel architectures.

  8. Parallel Solutions for Voxel-Based Simulations of Reaction-Diffusion Systems

    PubMed Central

    D'Agostino, Daniele; Pasquale, Giulia; Clematis, Andrea; Maj, Carlo; Mosca, Ettore; Milanesi, Luciano; Merelli, Ivan

    2014-01-01

    There is an increasing awareness of the pivotal role of noise in biochemical processes and of the effect of molecular crowding on the dynamics of biochemical systems. This necessity has given rise to a strong need for suitable and sophisticated algorithms for the simulation of biological phenomena taking into account both spatial effects and noise. However, the high computational effort characterizing simulation approaches, coupled with the necessity to simulate the models several times to achieve statistically relevant information on the model behaviours, makes such kind of algorithms very time-consuming for studying real systems. So far, different parallelization approaches have been deployed to reduce the computational time required to simulate the temporal dynamics of biochemical systems using stochastic algorithms. In this work we discuss these aspects for the spatial TAU-leaping in crowded compartments (STAUCC) simulator, a voxel-based method for the stochastic simulation of reaction-diffusion processes which relies on the S?-DPP algorithm. In particular we present how the characteristics of the algorithm can be exploited for an effective parallelization on the present heterogeneous HPC architectures. PMID:25045716

  9. A Parallel, Finite-Volume Algorithm for Large-Eddy Simulation of Turbulent Flows

    NASA Technical Reports Server (NTRS)

    Bui, Trong T.

    1999-01-01

    A parallel, finite-volume algorithm has been developed for large-eddy simulation (LES) of compressible turbulent flows. This algorithm includes piecewise linear least-square reconstruction, trilinear finite-element interpolation, Roe flux-difference splitting, and second-order MacCormack time marching. Parallel implementation is done using the message-passing programming model. In this paper, the numerical algorithm is described. To validate the numerical method for turbulence simulation, LES of fully developed turbulent flow in a square duct is performed for a Reynolds number of 320 based on the average friction velocity and the hydraulic diameter of the duct. Direct numerical simulation (DNS) results are available for this test case, and the accuracy of this algorithm for turbulence simulations can be ascertained by comparing the LES solutions with the DNS results. The effects of grid resolution, upwind numerical dissipation, and subgrid-scale dissipation on the accuracy of the LES are examined. Comparison with DNS results shows that the standard Roe flux-difference splitting dissipation adversely affects the accuracy of the turbulence simulation. For accurate turbulence simulations, only 3-5 percent of the standard Roe flux-difference splitting dissipation is needed.

  10. Object-Oriented Parallel Particle-in-Cell Code for Beam Dynamics Simulation in Linear Accelerators

    SciTech Connect

    Qiang, J.; Ryne, R.D.; Habib, S.; Decky, V.

    1999-11-13

    In this paper, we present an object-oriented three-dimensional parallel particle-in-cell code for beam dynamics simulation in linear accelerators. A two-dimensional parallel domain decomposition approach is employed within a message passing programming paradigm along with a dynamic load balancing. Implementing object-oriented software design provides the code with better maintainability, reusability, and extensibility compared with conventional structure based code. This also helps to encapsulate the details of communications syntax. Performance tests on SGI/Cray T3E-900 and SGI Origin 2000 machines show good scalability of the object-oriented code. Some important features of this code also include employing symplectic integration with linear maps of external focusing elements and using z as the independent variable, typical in accelerators. A successful application was done to simulate beam transport through three superconducting sections in the APT linac design.

  11. N-body, parallel simulation using a Barnes-Hut algorithm: performance versus accuracy

    NASA Astrophysics Data System (ADS)

    Chonacky, Norman; Dobbins, Brian

    2009-03-01

    The Barnes-Hut method facilitates prioritizing two-body interactions in an N-body system according to their likely significance in calculating the system's dynamics. In particular, it allows a consistent segregation of two-body interactions into those that should be treated by direct calculation versus those that can be aggregated in subsets and then treated by mean-field approximations. In this paper we describe the principles of the Barnes-Hut method, its use in parallelized N-body simulations, and the performance/accuracy trade-offs it presents. We present the latter in the context of results from simulation cases: N-bodies interacting via a gravitational potential, and N-bodies interacting via a Lennard- Jones potential. These should be available in the near future to operate as part of the ``Bootable Cluster CD'' parallel computation environment of the National Computational Science Institute of the Shodor Educational Foundation.

  12. Task parallel sensitivity analysis and parameter estimation of groundwater simulations through the SALSSA framework

    SciTech Connect

    Schuchardt, Karen L.; Agarwal, Khushbu; Chase, Jared M.; Rockhold, Mark L.; Freedman, Vicky L.; Elsethagen, Todd O.; Scheibe, Timothy D.; Chin, George; Sivaramakrishnan, Chandrika

    2010-07-15

    The Support Architecture for Large-Scale Subsurface Analysis (SALSSA) provides an extensible framework, sophisticated graphical user interface, and underlying data management system that simplifies the process of running subsurface models, tracking provenance information, and analyzing the model results. Initially, SALSSA supported two styles of job control: user directed execution and monitoring of individual jobs, and load balancing of jobs across multiple machines taking advantage of many available workstations. Recent efforts in subsurface modelling have been directed at advancing simulators to take advantage of leadership class supercomputers. We describe two approaches, current progress, and plans toward enabling efficient application of the subsurface simulator codes via the SALSSA framework: automating sensitivity analysis problems through task parallelism, and task parallel parameter estimation using the PEST framework.

  13. Combining the vortex-in-cell and parallel fast multipole methods for efficient domain decomposition simulations

    NASA Astrophysics Data System (ADS)

    Cocle, Roger; Winckelmans, Grgoire; Daeninck, Goric

    2008-11-01

    A new combination of vortex-in-cell and parallel fast multipole methods is presented which allows to efficiently simulate, in parallel, unbounded and half-unbounded vortical flows (flows with one flat wall). In the classical vortex-in-cell (VIC) method, the grid used to solve the Poisson equation is typically taken much larger than the vorticity field region, so as to be able to impose suitable far-field boundary conditions and thus approximate the truly unbounded (or half-unbounded) flow; an alternative is to assume periodicity. This approach leads to a solution that depends on the global grid size and, for large problems, to unmanageable memory and CPU requirements. The idea exploited here is to work on a domain that contains tightly the vorticity field and that can be decomposed in several subdomains on which the exact boundary conditions are obtained using the parallel fast multipole (PFM) method. This amounts to solving a 3-D Poisson equation without requiring any iteration between the subdomains (e.g., no Schwarz iteration required): this is so because the PFM method has a global view of the entire vorticity field and satisfies the far-field condition. The solution obtained by this VIC-PFM combination then corresponds to the simulation of a truly unbounded (or half-unbounded) flow. It requires far less memory and leads to a far better computational efficiency compared to simulations done using either (1) the VIC method alone, or (2) the vortex particle method with PFM solver alone. 3-D unbounded flow validation results are presented: instability, non-linear evolution and decay of a vortex ring (first at a moderate Reynolds number using the sequential version of the method, then at a high Reynolds number using the parallel version); instability and non-linear evolution of a two vortex system in ground effect. Finally, a space-developing simulation of an aircraft vortex wake in ground effect is also presented.

  14. Construction of a parallel processor for simulating manipulators and other mechanical systems

    NASA Technical Reports Server (NTRS)

    Hannauer, George

    1991-01-01

    This report summarizes the results of NASA Contract NAS5-30905, awarded under phase 2 of the SBIR Program, for a demonstration of the feasibility of a new high-speed parallel simulation processor, called the Real-Time Accelerator (RTA). The principal goals were met, and EAI is now proceeding with phase 3: development of a commercial product. This product is scheduled for commercial introduction in the second quarter of 1992.

  15. SUPREM-DSMC: A New Scalable, Parallel, Reacting, Multidimensional Direct Simulation Monte Carlo Flow Code

    NASA Technical Reports Server (NTRS)

    Campbell, David; Wysong, Ingrid; Kaplan, Carolyn; Mott, David; Wadsworth, Dean; VanGilder, Douglas

    2000-01-01

    An AFRL/NRL team has recently been selected to develop a scalable, parallel, reacting, multidimensional (SUPREM) Direct Simulation Monte Carlo (DSMC) code for the DoD user community under the High Performance Computing Modernization Office (HPCMO) Common High Performance Computing Software Support Initiative (CHSSI). This paper will introduce the JANNAF Exhaust Plume community to this three-year development effort and present the overall goals, schedule, and current status of this new code.

  16. Parallel molecular dynamics simulations of pressure-induced structural transformations in cadmium selenide nanocrystals

    NASA Astrophysics Data System (ADS)

    Lee, Nicholas Jabari Ouma

    Parallel molecular dynamics (MD) simulations are performed to investigate pressure-induced solid-to-solid structural phase transformations in cadmium selenide (CdSe) nanorods. The effects of the size and shape of nanorods on different aspects of structural phase transformations are studied. Simulations are based on interatomic potentials validated extensively by experiments. Simulations range from 105 to 106 atoms. These simulations are enabled by highly scalable algorithms executed on massively parallel Beowulf computing architectures. Pressure-induced structural transformations are studied using a hydrostatic pressure medium simulated by atoms interacting via Lennard-Jones potential. Four single-crystal CdSe nanorods, each 44A in diameter but varying in length, in the range between 44A and 600A, are studied independently in two sets of simulations. The first simulation is the downstroke simulation, where each rod is embedded in the pressure medium and subjected to increasing pressure during which it undergoes a forward transformation from a 4-fold coordinated wurtzite (WZ) crystal structure to a 6-fold coordinated rocksalt (RS) crystal structure. In the second so-called upstroke simulation, the pressure on the rods is decreased and a reverse transformation from 6-fold RS to a 4-fold coordinated phase is observed. The transformation pressure in the forward transformation depends on the nanorod size, with longer rods transforming at lower pressures close to the bulk transformation pressure. Spatially-resolved structural analyses, including pair-distributions, atomic-coordinations and bond-angle distributions, indicate nucleation begins at the surface of nanorods and spreads inward. The transformation results in a single RS domain, in agreement with experiments. The microscopic mechanism for transformation is observed to be the same as for bulk CdSe. A nanorod size dependency is also found in reverse structural transformations, with longer nanorods transforming more readily than smaller ones. Nucleation initiates at the center of the rod and grows outward.

  17. Spontaneous Hot Flow Anomalies at Quasi-Parallel Shocks: 2. Hybrid Simulations

    NASA Technical Reports Server (NTRS)

    Omidi, N.; Zhang, H.; Sibeck, D.; Turner, D.

    2013-01-01

    Motivated by recent THEMIS observations, this paper uses 2.5-D electromagnetic hybrid simulations to investigate the formation of Spontaneous Hot Flow Anomalies (SHFA) upstream of quasi-parallel bow shocks during steady solar wind conditions and in the absence of discontinuities. The results show the formation of a large number of structures along and upstream of the quasi-parallel bow shock. Their outer edges exhibit density and magnetic field enhancements, while their cores exhibit drops in density, magnetic field, solar wind velocity and enhancements in ion temperature. Using virtual spacecraft in the simulation, we show that the signatures of these structures in the time series data are very similar to those of SHFAs seen in THEMIS data and conclude that they correspond to SHFAs. Examination of the simulation data shows that SHFAs form as the result of foreshock cavitons interacting with the bow shock. Foreshock cavitons in turn form due to the nonlinear evolution of ULF waves generated by the interaction of the solar wind with the backstreaming ions. Because foreshock cavitons are an inherent part of the shock dissipation process, the formation of SHFAs is also an inherent part of the dissipation process leading to a highly non-uniform plasma in the quasi-parallel magnetosheath including large scale density and magnetic field cavities.

  18. Spontaneous hot flow anomalies at quasi-parallel shocks: 2. Hybrid simulations

    NASA Astrophysics Data System (ADS)

    Omidi, N.; Zhang, H.; Sibeck, D.; Turner, D.

    2013-01-01

    Abstract<p label="1">Motivated by recent THEMIS observations, this paper uses 2.5-D electromagnetic hybrid <span class="hlt">simulations</span> to investigate the formation of Spontaneous Hot Flow Anomalies (SHFAs) upstream of quasi-<span class="hlt">parallel</span> bow shocks during steady solar wind conditions and in the absence of discontinuities. The results show the formation of a large number of structures along and upstream of the quasi-<span class="hlt">parallel</span> bow shock. Their outer edges exhibit density and magnetic field enhancements, while their cores exhibit drops in density, magnetic field, solar wind velocity, and enhancements in ion temperature. Using virtual spacecraft in the <span class="hlt">simulation</span>, we show that the signatures of these structures in the time series data are very similar to those of SHFAs seen in THEMIS data and conclude that they correspond to SHFAs. Examination of the <span class="hlt">simulation</span> data shows that SHFAs form as the result of foreshock cavitons interacting with the bow shock. Foreshock cavitons in turn form due to the nonlinear evolution of ULF waves generated by the interaction of the solar wind with the backstreaming ions. Because foreshock cavitons are an inherent part of the shock dissipation process, the formation of SHFAs is also an inherent part of the dissipation process leading to a highly nonuniform plasma in the quasi-<span class="hlt">parallel</span> magnetosheath including large-scale density and magnetic field cavities.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20040200740','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20040200740"><span id="translatedtitle">Scalable High Performance Computing: Direct and Large-Eddy Turbulent Flow <span class="hlt">Simulations</span> Using Massively <span class="hlt">Parallel</span> Computers</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Morgan, Philip E.</p> <p>2004-01-01</p> <p>This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow <span class="hlt">Simulations</span> Using Massively <span class="hlt">Parallel</span> Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two <span class="hlt">parallel</span> flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order <span class="hlt">parallel</span> solver for Direct Numerical <span class="hlt">Simulations</span> (DNS) and Large Eddy <span class="hlt">Simulation</span> (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/5180989','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/5180989"><span id="translatedtitle"><span class="hlt">Simulation</span> study of a <span class="hlt">parallel</span> processor with unbalanced loads. Master's thesis</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Moore, T.S.</p> <p>1987-12-01</p> <p>The purpose of this thesis was twofold: to estimate the impact of unbalanced computational loads on a <span class="hlt">parallel</span>-processing architecture via Monte Carlo <span class="hlt">simulation</span>; and second to investigate the impact of representing the dynamics of the <span class="hlt">parallel</span>-processing problem via animated <span class="hlt">simulation</span>. It is constrained to the hypercube architecture in which each node is connected in a predetermined topology and allowed to communicate to other nodes through calls to the operating system. Routing of messages through the network is fixed and specified within the operating system. Message-transmission preempts nodal processing causing internodal communications to complicate the concurrent operation of the network. Two independent variables are defined: 1) the degree of imbalance characterizes the nature or severity of the load imbalance, and 2) the degree of locality characterizes the node loadings with respect to node locations across the cube. A SLAM II <span class="hlt">simulation</span> model of a generic 16 node hypercube was constructed in which each node processes a predetermined number of computational tasks and, following each task, sends a message to a single randomly chosen receiver node. An experiment was designed in which the independent variables, degree of imbalance and degree of locality were varied across two computation-to-IO ratios to determine their separate and interactive effects on the dependent variable, job speedup. ANOVA and regression techniques were used to estimate the relationship between load imbalance, locality, computation-to-IO ratio, and their interactions to job speedup. Results show that load imbalance severely impacts a <span class="hlt">parallel</span>-processor's performance.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_15");'>15</a></li> <li><a href="#" onclick='return showDiv("page_16");'>16</a></li> <li class="active"><span>17</span></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_17 --> <div id="page_18" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_16");'>16</a></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li class="active"><span>18</span></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="341"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22218447','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22218447"><span id="translatedtitle"><span class="hlt">Parallel</span> implementation of three-dimensional molecular dynamic <span class="hlt">simulation</span> for laser-cluster interaction</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Holkundkar, Amol R.</p> <p>2013-11-15</p> <p>The objective of this article is to report the <span class="hlt">parallel</span> implementation of the 3D molecular dynamic <span class="hlt">simulation</span> code for laser-cluster interactions. The benchmarking of the code has been done by comparing the <span class="hlt">simulation</span> results with some of the experiments reported in the literature. Scaling laws for the computational time is established by varying the number of processor cores and number of macroparticles used. The capabilities of the code are highlighted by implementing various diagnostic tools. To study the dynamics of the laser-cluster interactions, the executable version of the code is available from the author.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2003PhDT........43K','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2003PhDT........43K"><span id="translatedtitle"><span class="hlt">Parallel</span> octree-based multiresolution mesh method for large-scale earthquake ground motion <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Kim, Eui Joong</p> <p></p> <p>Large scale ground motion <span class="hlt">simulation</span> requires supercomputing systems in order to obtain reliable and useful results within reasonable elapsed time. In this study, we develop a framework for terascale ground motion <span class="hlt">simulations</span> in highly heterogeneous basins. As part of the development, we present a <span class="hlt">parallel</span> octree-based multiresolution finite element methodology for the elastodynamic wave propagation problem. The octree-based multiresolution finite element method reduces memory use significantly and improves overall computational performance. The framework is comprised of three parts; (1) an octree-based mesh generator, Euclid developed by TV and O'Hallaron, (2) a <span class="hlt">parallel</span> mesh partitioner, ParMETIS developed by Karypis et al.[2], and (3) a <span class="hlt">parallel</span> octree-based multiresolution finite element solver, QUAKE developed in this study. Realistic earthquakes parameters, soil material properties, and sedimentary basins dimensions will produce extremely large meshes. The out-of-core versional octree-based mesh generator, Euclid overcomes the resulting severe memory limitations. By using a <span class="hlt">parallel</span>, distributed-memory graph partitioning algorithm, ParMETIS partitions large meshes, overcoming the memory and cost problem. Despite capability of the Octree-Based Multiresolution Mesh Method ( OBM3), large problem sizes necessitate <span class="hlt">parallelism</span> to handle large memory and work requirements. The <span class="hlt">parallel</span> OBM 3 elastic wave propagation code, QUAKE has been developed to address these issues. The numerical methodology and the framework have been used to <span class="hlt">simulate</span> the seismic response of both idealized systems and of the Greater Los Angeles basin to simple pulses and to a mainshock of the 1994 Northridge Earthquake, for frequencies of up to 1 Hz and domain size of 80 km x 80 km x 30 km. In the idealized models, QUAKE shows good agreement with the analytical Green's function solutions. In the realistic models for the Northridge earthquake mainshock, QUAKE qualitatively agrees, with at most a factor of 2.5, with the observational data. Through <span class="hlt">simulations</span> for several models, ranging in size from 400,000 to 300 million degrees of freedom on the 512-processors Cray T3E and the 3000-processors HP-Compaq AlphaServer Cluster at the Pittsburgh Supercomputing Center, we achieve excellent performance and scalability.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015GMD.....8..473H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015GMD.....8..473H"><span id="translatedtitle">A generic <span class="hlt">simulation</span> cell method for developing extensible, efficient and readable <span class="hlt">parallel</span> computational models</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Honkonen, I.</p> <p>2015-03-01</p> <p>I present a method for developing extensible and modular computational models without sacrificing serial or <span class="hlt">parallel</span> performance or source code readability. By using a generic <span class="hlt">simulation</span> cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring modification of existing code. This is an advantage for the development and testing of, e.g., geoscientific software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. An implementation of the generic <span class="hlt">simulation</span> cell method presented here, generic <span class="hlt">simulation</span> cell class (gensimcell), also includes support for <span class="hlt">parallel</span> programming by allowing model developers to select which <span class="hlt">simulation</span> variables of, e.g., a domain-decomposed model to transfer between processes via a Message Passing Interface (MPI) library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic <span class="hlt">simulation</span> cell class requires a C++ compiler that supports a version of the language standardized in 2011 (C++11). The code is available at <a href="https://github.com/nasailja/gensimcell"target="_blank">https://github.com/nasailja/gensimcell</a> for everyone to use, study, modify and redistribute; those who do are kindly requested to acknowledge and cite this work.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/957422','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/957422"><span id="translatedtitle">Wakefield <span class="hlt">Simulation</span> of CLIC PETS Structure Using <span class="hlt">Parallel</span> 3D Finite Element Time-Domain Solver T3P</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; Syratchev, I.; /CERN</p> <p>2009-06-19</p> <p>In recent years, SLAC's Advanced Computations Department (ACD) has developed the <span class="hlt">parallel</span> 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively <span class="hlt">parallel</span> processing allow unprecedented <span class="hlt">simulation</span> accuracy for wakefield computations and <span class="hlt">simulations</span> of transient effects in realistic accelerator structures. Applications include <span class="hlt">simulation</span> of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22230824','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22230824"><span id="translatedtitle">Massively <span class="hlt">parallel</span> Monte Carlo for many-particle <span class="hlt">simulations</span> on GPUs</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Anderson, Joshua A.; Jankowski, Eric; Grubb, Thomas L.; Engel, Michael; Glotzer, Sharon C.</p> <p>2013-12-01</p> <p>Current trends in <span class="hlt">parallel</span> processors call for the design of efficient massively <span class="hlt">parallel</span> algorithms for scientific computing. <span class="hlt">Parallel</span> algorithms for Monte Carlo <span class="hlt">simulations</span> of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. In this paper, we present a massively <span class="hlt">parallel</span> method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a Tesla K20, our GPU implementation executes over one billion trial moves per second, which is 148 times faster than on a single Intel Xeon E5540 CPU core, enables 27 times better performance per dollar, and cuts energy usage by a factor of 13. With this improved performance we are able to calculate the equation of state for systems of up to one million hard disks. These large system sizes are required in order to probe the nature of the melting transition, which has been debated for the last forty years. In this paper we present the details of our computational method, and discuss the thermodynamics of hard disks separately in a companion paper.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2010cosp...38..483V','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2010cosp...38..483V"><span id="translatedtitle">Use of <span class="hlt">Parallel</span> Micro-Platform for the <span class="hlt">Simulation</span> the Space Exploration</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Velasco Herrera, Victor Manuel; Velasco Herrera, Graciela; Rosano, Felipe Lara; Rodriguez Lozano, Salvador; Lucero Roldan Serrato, Karen</p> <p></p> <p>The purpose of this work is to create a <span class="hlt">parallel</span> micro-platform, that <span class="hlt">simulates</span> the virtual movements of a space exploration in 3D. One of the innovations presented in this design consists of the application of a lever mechanism for the transmission of the movement. The development of such a robot is a challenging task very different of the industrial manipulators due to a totally different target system of requirements. This work presents the study and <span class="hlt">simulation</span>, aided by computer, of the movement of this <span class="hlt">parallel</span> manipulator. The development of this model has been developed using the platform of computer aided design Unigraphics, in which it was done the geometric modeled of each one of the components and end assembly (CAD), the generation of files for the computer aided manufacture (CAM) of each one of the pieces and the kinematics <span class="hlt">simulation</span> of the system evaluating different driving schemes. We used the toolbox (MATLAB) of aerospace and create an adaptive control module to <span class="hlt">simulate</span> the system.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2001JChPh.114.9772S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2001JChPh.114.9772S"><span id="translatedtitle">A novel <span class="hlt">parallel</span>-rotation algorithm for atomistic Monte Carlo <span class="hlt">simulation</span> of dense polymer systems</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Santos, S.; Suter, U. W.; Mller, M.; Nievergelt, J.</p> <p>2001-06-01</p> <p>We develop and test a new elementary Monte Carlo move for use in the off-lattice <span class="hlt">simulation</span> of polymer systems. This novel <span class="hlt">Parallel</span>-Rotation algorithm (ParRot) permits moving very efficiently torsion angles that are deeply inside long chains in melts. The <span class="hlt">parallel</span>-rotation move is extremely simple and is also demonstrated to be computationally efficient and appropriate for Monte Carlo <span class="hlt">simulation</span>. The ParRot move does not affect the orientation of those parts of the chain outside the moving unit. The move consists of a concerted rotation around four adjacent skeletal bonds. No assumption is made concerning the backbone geometry other than that bond lengths and bond angles are held constant during the elementary move. Properly weighted sampling techniques are needed for ensuring detailed balance because the new move involves a correlated change in four degrees of freedom along the chain backbone. The ParRot move is supplemented with the classical Metropolis Monte Carlo, the Continuum-Configurational-Bias, and Reptation techniques in an isothermal-isobaric Monte Carlo <span class="hlt">simulation</span> of melts of short and long chains. Comparisons are made with the capabilities of other Monte Carlo techniques to move the torsion angles in the middle of the chains. We demonstrate that ParRot constitutes a highly promising Monte Carlo move for the treatment of long polymer chains in the off-lattice <span class="hlt">simulation</span> of realistic models of dense polymer systems.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/968614','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/968614"><span id="translatedtitle">Switching to High Gear: Opportunities for Grand-scale Real-time <span class="hlt">Parallel</span> <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Perumalla, Kalyan S</p> <p>2009-01-01</p> <p>The recent emergence of dramatically large computational power, spanning desktops with multi-core processors and multiple graphics cards to supercomputers with 10^5 processor cores, has suddenly resulted in <span class="hlt">simulation</span>-based solutions trailing behind in the ability to fully tap the new computational capacity. Here, we motivate the need for switching the <span class="hlt">parallel</span> <span class="hlt">simulation</span> research to a higher gear to exploit the new, immense levels of computational power. The potential for grand-scale real-time solutions is illustrated using preliminary results from prototypes in four example application areas: (a) state- or regional-scale vehicular mobility modeling, (b) very large-scale epidemic modeling, (c) modeling the propagation of wireless network signals in very large, cluttered terrains, and, (d) country- or world-scale social behavioral modeling. We believe the stage is perfectly poised for the <span class="hlt">parallel</span>/distributed <span class="hlt">simulation</span> community to envision and formulate similar grand-scale, real-time <span class="hlt">simulation</span>-based solutions in many application areas.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014AdSpR..54.1581A','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014AdSpR..54.1581A"><span id="translatedtitle"><span class="hlt">Parallel</span>, adaptive, multi-object trajectory integrator for space <span class="hlt">simulation</span> applications</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Atanassov, Atanas Marinov</p> <p>2014-10-01</p> <p>Computer <span class="hlt">simulation</span> is a very helpful approach for improving results from space born experiments. Initial-value problems (IVPs) can be applied for modeling dynamics of different objects - artificial Earth satellites, charged particles in magnetic and electric fields, charged or non-charged dust particles, space debris. An ordinary differential equations systems (ODESs) integrator based on applying different order embedded Runge-Kutta-Fehlberg methods is developed. These methods enable evaluation of the local error. Instead of step-size control based on local error evaluation, an optimal integration method is selected. Integration while meeting the required local error proceeds with constant-sized steps. This optimal scheme selection reduces the amount of calculation needed for solving the IVPs. In addition, for an implementation on a multi core processor and <span class="hlt">parallelization</span> based on threads application, we describe how to solve multiple systems of IVPs efficiently in <span class="hlt">parallel</span>. The proposed integrator allows the application of a different force model for every object in multi-satellite <span class="hlt">simulation</span> models. Simultaneous application of the integrator toward different kinds of problems in the frames of one combined <span class="hlt">simulation</span> model is possible too. The basic application of the integrator is solving mechanical IVPs in the context of <span class="hlt">simulation</span> models and their application in complex multi-satellite space missions and as a design tool for experiments.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2012JChPh.137r4703M','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2012JChPh.137r4703M"><span id="translatedtitle"><span class="hlt">Simulations</span> of structural and dynamic anisotropy in nano-confined water between <span class="hlt">parallel</span> graphite plates</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Mosaddeghi, Hamid; Alavi, Saman; Kowsari, M. H.; Najafi, Bijan</p> <p>2012-11-01</p> <p>We use molecular dynamics <span class="hlt">simulations</span> to study the structure, dynamics, and transport properties of nano-confined water between <span class="hlt">parallel</span> graphite plates with separation distances (H) from 7 to 20 at different water densities with an emphasis on anisotropies generated by confinement. The behavior of the confined water phase is compared to non-confined bulk water under similar pressure and temperature conditions. Our <span class="hlt">simulations</span> show anisotropic structure and dynamics of the confined water phase in directions <span class="hlt">parallel</span> and perpendicular to the graphite plate. The magnitude of these anisotropies depends on the slit width H. Confined water shows "solid-like" structure and slow dynamics for the water layers near the plates. The mean square displacements (MSDs) and velocity autocorrelation functions (VACFs) for directions <span class="hlt">parallel</span> and perpendicular to the graphite plates are calculated. By increasing the confinement distance from H = 7 to H = 20 , the MSD increases and the behavior of the VACF indicates that the confined water changes from solid-like to liquid-like dynamics. If the initial density of the water phase is set up using geometric criteria (i.e., distance between the graphite plates), large pressures (in the order of 10 katm), and large pressure anisotropies are established within the water. By decreasing the density of the water between the confined plates to about 0.9 g cm-3, bubble formation and restructuring of the water layers are observed.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/23163385','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/23163385"><span id="translatedtitle"><span class="hlt">Simulations</span> of structural and dynamic anisotropy in nano-confined water between <span class="hlt">parallel</span> graphite plates.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Mosaddeghi, Hamid; Alavi, Saman; Kowsari, M H; Najafi, Bijan</p> <p>2012-11-14</p> <p>We use molecular dynamics <span class="hlt">simulations</span> to study the structure, dynamics, and transport properties of nano-confined water between <span class="hlt">parallel</span> graphite plates with separation distances (H) from 7 to 20 at different water densities with an emphasis on anisotropies generated by confinement. The behavior of the confined water phase is compared to non-confined bulk water under similar pressure and temperature conditions. Our <span class="hlt">simulations</span> show anisotropic structure and dynamics of the confined water phase in directions <span class="hlt">parallel</span> and perpendicular to the graphite plate. The magnitude of these anisotropies depends on the slit width H. Confined water shows "solid-like" structure and slow dynamics for the water layers near the plates. The mean square displacements (MSDs) and velocity autocorrelation functions (VACFs) for directions <span class="hlt">parallel</span> and perpendicular to the graphite plates are calculated. By increasing the confinement distance from H = 7 to H = 20 , the MSD increases and the behavior of the VACF indicates that the confined water changes from solid-like to liquid-like dynamics. If the initial density of the water phase is set up using geometric criteria (i.e., distance between the graphite plates), large pressures (in the order of ~10 katm), and large pressure anisotropies are established within the water. By decreasing the density of the water between the confined plates to about 0.9 g cm(-3), bubble formation and restructuring of the water layers are observed. PMID:23163385</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3605599','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3605599"><span id="translatedtitle">GROMACS 4.5: a high-throughput and highly <span class="hlt">parallel</span> open source molecular <span class="hlt">simulation</span> toolkit</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Pronk, Sander; Pll, Szilrd; Schulz, Roland; Larsson, Per; Bjelkmar, Pr; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik</p> <p>2013-01-01</p> <p>Motivation: Molecular <span class="hlt">simulation</span> has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated <span class="hlt">simulation</span> of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new <span class="hlt">simulation</span> algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient <span class="hlt">parallelization</span> even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art <span class="hlt">parallelization</span>, this provides extremely high performance and cost efficiency for high-throughput as well as massively <span class="hlt">parallel</span> <span class="hlt">simulations</span>. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2011CG.....37.1110P','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2011CG.....37.1110P"><span id="translatedtitle"><span class="hlt">Parallel</span> implementation of <span class="hlt">simulated</span> annealing to reproduce multiple-point statistics</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Peredo, Oscar; Ortiz, Julin M.</p> <p>2011-08-01</p> <p>This paper shows an innovative implementation of <span class="hlt">simulated</span> annealing in the context of <span class="hlt">parallel</span> computing. Details regarding the use of <span class="hlt">parallel</span> computing through a cluster of processors, as well as the implementation decisions, are provided. <span class="hlt">Simulated</span> annealing is presented aiming at the generation of stochastic realizations of categorical variables reproducing multiple-point statistics. The procedure starts with the use of a training image to determine the frequencies of occurrence of particular configurations of nodes and values. These frequencies are used as target statistics that must be matched by the stochastic images generated with the algorithm. The <span class="hlt">simulation</span> process considers an initial random image of the spatial distribution of the categories. Nodes are perturbed randomly and after each perturbation the mismatch between the target statistics and the current statistics of the image is calculated. The perturbation is accepted if the statistics are closer to the target, or conditionally rejected if not, based on the annealing schedule. The <span class="hlt">simulation</span> was implemented using <span class="hlt">parallel</span> processes with C++ and MPI. The message passing scheme was implemented using a speculative computation framework, by which prior to making the decision of acceptance or rejection of a proposed perturbation, processes already start calculating the next possible perturbation at a second level; one as if the perturbation on level one is accepted, and another process as if the proposed perturbation is rejected. Additional levels can start their calculation as well, conditional to the second level processes. Once a process reaches a decision as to whether accept or reject the suggested perturbation, all processes within the branch incompatible with that decision are dropped. This allows a speed up of up to log n( p+1), where n is the number of categories and p the number of processes simultaneously active. Examples are provided to demonstrate improvements and speed ups that can be achieved.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/936704','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/936704"><span id="translatedtitle">De Novo Ultrascale Atomistic <span class="hlt">Simulations</span> On High-End <span class="hlt">Parallel</span> Supercomputers</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Nakano, A; Kalia, R K; Nomura, K; Sharma, A; Vashishta, P; Shimojo, F; van Duin, A; Goddard, III, W A; Biswas, R; Srivastava, D; Yang, L H</p> <p>2006-09-04</p> <p>We present a de novo hierarchical <span class="hlt">simulation</span> framework for first-principles based predictive <span class="hlt">simulations</span> of materials and their validation on high-end <span class="hlt">parallel</span> supercomputers and geographically distributed clusters. In this framework, high-end chemically reactive and non-reactive molecular dynamics (MD) <span class="hlt">simulations</span> explore a wide solution space to discover microscopic mechanisms that govern macroscopic material properties, into which highly accurate quantum mechanical (QM) <span class="hlt">simulations</span> are embedded to validate the discovered mechanisms and quantify the uncertainty of the solution. The framework includes an embedded divide-and-conquer (EDC) algorithmic framework for the design of linear-scaling <span class="hlt">simulation</span> algorithms with minimal bandwidth complexity and tight error control. The EDC framework also enables adaptive hierarchical <span class="hlt">simulation</span> with automated model transitioning assisted by graph-based event tracking. A tunable hierarchical cellular decomposition <span class="hlt">parallelization</span> framework then maps the O(N) EDC algorithms onto Petaflops computers, while achieving performance tunability through a hierarchy of parameterized cell data/computation structures, as well as its implementation using hybrid Grid remote procedure call + message passing + threads programming. High-end computing platforms such as IBM BlueGene/L, SGI Altix 3000 and the NSF TeraGrid provide an excellent test grounds for the framework. On these platforms, we have achieved unprecedented scales of quantum-mechanically accurate and well validated, chemically reactive atomistic <span class="hlt">simulations</span>--1.06 billion-atom fast reactive force-field MD and 11.8 million-atom (1.04 trillion grid points) quantum-mechanical MD in the framework of the EDC density functional theory on adaptive multigrids--in addition to 134 billion-atom non-reactive space-time multiresolution MD, with the <span class="hlt">parallel</span> efficiency as high as 0.998 on 65,536 dual-processor BlueGene/L nodes. We have also achieved an automated execution of hierarchical QM/MD <span class="hlt">simulation</span> on a Grid consisting of 6 supercomputer centers in the US and Japan (in total of 150 thousand processor-hours), in which the number of processors change dynamically on demand and resources are allocated and migrated dynamically in response to faults. Furthermore, performance portability has been demonstrated on a wide range of platforms such as BlueGene/L, Altix 3000, and AMD Opteron-based Linux clusters.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015GMDD....8.2369H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015GMDD....8.2369H"><span id="translatedtitle">A <span class="hlt">parallelization</span> scheme to <span class="hlt">simulate</span> reactive transport in the subsurface environment with OGS#IPhreeqc</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>He, W.; Beyer, C.; Fleckenstein, J. H.; Jang, E.; Kolditz, O.; Naumov, D.; Kalbacher, T.</p> <p>2015-03-01</p> <p>This technical paper presents an efficient and performance-oriented method to model reactive mass transport processes in environmental and geotechnical subsurface systems. The open source scientific software packages OpenGeoSys and IPhreeqc have been coupled, to combine their individual strengths and features to <span class="hlt">simulate</span> thermo-hydro-mechanical-chemical coupled processes in porous and fractured media with simultaneous consideration of aqueous geochemical reactions. Furthermore, a flexible <span class="hlt">parallelization</span> scheme using MPI (Message Passing Interface) grouping techniques has been implemented, which allows an optimized allocation of computer resources for the node-wise calculation of chemical reactions on the one hand, and the underlying processes such as for groundwater flow or solute transport on the other hand. The coupling interface and <span class="hlt">parallelization</span> scheme have been tested and verified in terms of precision and performance.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015JSP...tmp..261U','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015JSP...tmp..261U"><span id="translatedtitle"><span class="hlt">Parallel</span> Tempering Monte Carlo <span class="hlt">Simulations</span> of Spherical Fixed-Connectivity Model for Polymerized Membranes</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Usui, Satoshi; Koibuchi, Hiroshi</p> <p>2015-12-01</p> <p>We study the first order phase transition of the fixed-connectivity triangulated surface model using the <span class="hlt">Parallel</span> Tempering Monte Carlo (PTMC) technique on relatively large lattices. From the PTMC results, we find that the transition is considerably stronger than the reported ones predicted by the conventional Metropolis MC (MMC) technique and the flat histogram MC technique. We also confirm that the results of the PTMC on relatively smaller lattices are in good agreement with those known results. This implies that the PTMC is successfully used to <span class="hlt">simulate</span> the first order phase transitions. The <span class="hlt">parallel</span> computation in the PTMC is implemented by OpenMP, where the speed of the PTMC on multi-core CPUs is considerably faster than that on the single-core CPUs.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4262936','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4262936"><span id="translatedtitle">Superposition-Enhanced Estimation of Optimal Temperature Spacings for <span class="hlt">Parallel</span> Tempering <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p></p> <p>2014-01-01</p> <p>Effective <span class="hlt">parallel</span> tempering <span class="hlt">simulations</span> rely crucially on a properly chosen sequence of temperatures. While it is desirable to achieve a uniform exchange acceptance rate across neighboring replicas, finding a set of temperatures that achieves this end is often a difficult task, in particular for systems undergoing phase transitions. Here we present a method for determination of optimal replica spacings, which is based upon knowledge of local minima in the potential energy landscape. Working within the harmonic superposition approximation, we derive an analytic expression for the <span class="hlt">parallel</span> tempering acceptance rate as a function of the replica temperatures. For a particular system and a given database of minima, we show how this expression can be used to determine optimal temperatures that achieve a desired uniform acceptance rate. We test our strategy for two atomic clusters that exhibit broken ergodicity, demonstrating that our method achieves uniform acceptance as well as significant efficiency gains. PMID:25512744</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/928005','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/928005"><span id="translatedtitle">LUsim: A Framework for <span class="hlt">Simulation</span>-Based Performance Modelingand Prediction of <span class="hlt">Parallel</span> Sparse LU Factorization</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Univ. of California, San Diego; Li, Xiaoye Sherry; Cicotti, Pietro; Li, Xiaoye Sherry; Baden, Scott B.</p> <p>2008-04-15</p> <p>Sparse <span class="hlt">parallel</span> factorization is among the most complicated and irregular algorithms to analyze and optimize. Performance depends both on system characteristics such as the floating point rate, the memory hierarchy, and the interconnect performance, as well as input matrix characteristics such as such as the number and location of nonzeros. We present LUsim, a <span class="hlt">simulation</span> framework for modeling the performance of sparse LU factorization. Our framework uses micro-benchmarks to calibrate the parameters of machine characteristics and additional tools to facilitate real-time performance modeling. We are using LUsim to analyze an existing <span class="hlt">parallel</span> sparse LU factorization code, and to explore a latency tolerant variant. We developed and validated a model of the factorization in SuperLU_DIST, then we modeled and implemented a new variant of slud, replacing a blocking collective communication phase with a non-blocking asynchronous point-to-point one. Our strategy realized a mean improvement of 11percent over a suite of test matrices.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/21499769','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/21499769"><span id="translatedtitle">Billion-atom synchronous <span class="hlt">parallel</span> kinetic Monte Carlo <span class="hlt">simulations</span> of critical 3D Ising systems</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Martinez, E.; Monasterio, P.R.; Marian, J.</p> <p>2011-02-20</p> <p>An extension of the synchronous <span class="hlt">parallel</span> kinetic Monte Carlo (spkMC) algorithm developed by Martinez et al. [J. Comp. Phys. 227 (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors' time clocks current in a global sense. Boundary conflicts are resolved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of serial calculations, which confirms the statistical validity of our algorithm. We have analyzed the <span class="hlt">parallel</span> efficiency of spkMC and find that it scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin <span class="hlt">simulations</span>.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016JSP...162..701U','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016JSP...162..701U"><span id="translatedtitle"><span class="hlt">Parallel</span> Tempering Monte Carlo <span class="hlt">Simulations</span> of Spherical Fixed-Connectivity Model for Polymerized Membranes</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Usui, Satoshi; Koibuchi, Hiroshi</p> <p>2016-02-01</p> <p>We study the first order phase transition of the fixed-connectivity triangulated surface model using the <span class="hlt">Parallel</span> Tempering Monte Carlo (PTMC) technique on relatively large lattices. From the PTMC results, we find that the transition is considerably stronger than the reported ones predicted by the conventional Metropolis MC (MMC) technique and the flat histogram MC technique. We also confirm that the results of the PTMC on relatively smaller lattices are in good agreement with those known results. This implies that the PTMC is successfully used to <span class="hlt">simulate</span> the first order phase transitions. The <span class="hlt">parallel</span> computation in the PTMC is implemented by OpenMP, where the speed of the PTMC on multi-core CPUs is considerably faster than that on the single-core CPUs.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_16");'>16</a></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li class="active"><span>18</span></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_18 --> <div id="page_19" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li class="active"><span>19</span></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="361"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/957421','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/957421"><span id="translatedtitle"><span class="hlt">Parallel</span> 3D Finite Element Particle-in-Cell <span class="hlt">Simulations</span> with Pic3P</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; Ben-Zvi, I.; Kewisch, J.; /Brookhaven</p> <p>2009-06-19</p> <p>SLAC's Advanced Computations Department (ACD) has developed the <span class="hlt">parallel</span> 3D Finite Element electromagnetic Particle-In-Cell code Pic3P. Designed for <span class="hlt">simulations</span> of beam-cavity interactions dominated by space charge effects, Pic3P solves the complete set of Maxwell-Lorentz equations self-consistently and includes space-charge, retardation and boundary effects from first principles. Higher-order Finite Element methods with adaptive refinement on conformal unstructured meshes lead to highly efficient use of computational resources. Massively <span class="hlt">parallel</span> processing with dynamic load balancing enables large-scale modeling of photoinjectors with unprecedented accuracy, aiding the design and operation of next-generation accelerator facilities. Applications include the LCLS RF gun and the BNL polarized SRF gun.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/919137','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/919137"><span id="translatedtitle">Xyce <span class="hlt">parallel</span> electronic <span class="hlt">simulator</span> design : mathematical formulation, version 2.0.</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Hoekstra, Robert John; Waters, Lon J.; Hutchinson, Scott Alan; Keiter, Eric Richard; Russo, Thomas V.</p> <p>2004-06-01</p> <p>This document is intended to contain a detailed description of the mathematical formulation of Xyce, a massively <span class="hlt">parallel</span> SPICE-style circuit <span class="hlt">simulator</span> developed at Sandia National Laboratories. The target audience of this document are people in the role of 'service provider'. An example of such a person would be a linear solver expert who is spending a small fraction of his time developing solver algorithms for Xyce. Such a person probably is not an expert in circuit <span class="hlt">simulation</span>, and would benefit from an description of the equations solved by Xyce. In this document, modified nodal analysis (MNA) is described in detail, with a number of examples. Issues that are unique to circuit <span class="hlt">simulation</span>, such as voltage limiting, are also described in detail.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1998ApOpt..37.6105G','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1998ApOpt..37.6105G"><span id="translatedtitle">Hardware Description Language for Optical Processing ( hadlop ): a <span class="hlt">Simulation</span> Environment for <span class="hlt">Parallel</span> Optoelectronic Architectures</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Grimm, Guido; Fey, Dietmar; Degenkolb, Marko; Erhart, Werner</p> <p>1998-09-01</p> <p>We present a <span class="hlt">simulation</span> environment for <span class="hlt">parallel</span> optoelectronic data-processing systems, and we especially consider the fusion of optoelectronic integrated circuits and optical interconnection modules. hadlop , which stands for hardware description language for optical processing, is a <span class="hlt">simulator</span> that works at the digital design level. So far, hadlop has allowed algorithm and architecture studies for smart-pixel systems. We have just begun to extend the capabilities of hadlop toward an automatic synthesis tool for three-dimensional optoelectronic VLSI circuits. A hadlop architecture will then be the basis for the automatic generation of detailed construction plans that consider the interaction between optical interconnection modules and optoelectronic integrated circuits. The <span class="hlt">simulation</span> system is freeware and is available through the Internet at http: www2.informatik.uni-jena.de pope HADLOP hadlop.html.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014GMDD....7.4577H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014GMDD....7.4577H"><span id="translatedtitle">The generic <span class="hlt">simulation</span> cell method for developing extensible, efficient and readable <span class="hlt">parallel</span> computational models</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Honkonen, I.</p> <p>2014-07-01</p> <p>I present a method for developing extensible and modular computational models without sacrificing serial or <span class="hlt">parallel</span> performance or source code readability. By using a generic <span class="hlt">simulation</span> cell method I show that it is possible to combine several distinct computational models to run in the same computational grid without requiring any modification of existing code. This is an advantage for the development and testing of computational modeling software as each submodel can be developed and tested independently and subsequently used without modification in a more complex coupled program. Support for <span class="hlt">parallel</span> programming is also provided by allowing users to select which <span class="hlt">simulation</span> variables to transfer between processes via a Message Passing Interface library. This allows the communication strategy of a program to be formalized by explicitly stating which variables must be transferred between processes for the correct functionality of each submodel and the entire program. The generic <span class="hlt">simulation</span> cell class presented here requires a C++ compiler that supports variadic templates which were standardized in 2011 (C++11). The code is available at: <a href="https://github.com/nasailja/gensimcell">https://github.com/nasailja/gensimcell</a> for everyone to use, study, modify and redistribute; those that do are kindly requested to cite this work.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2003JSMEC..46..263A','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2003JSMEC..46..263A"><span id="translatedtitle">Application of <span class="hlt">Discrete</span> <span class="hlt">Event</span> Control to the Insertion Task of Electric Line Using 6-Link Electro-Hydraulic Manipulators with Dual Arm</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Ahn, Kyoungkwan; Yokota, Shinichi</p> <p></p> <p>Uninterrupted power supply has become indispensable during the maintenance task of active electric power lines as a result of today's highly information-oriented society and increasing demand of electric utilities. The maintenance task has the risk of electric shock and the danger of falling from high place. Therefore it is necessary to realize an autonomous robot system using electro-hydraulic manipulator because hydraulic manipulators have the advantage of electric insulation. Meanwhile it is relatively difficult to realize autonomous assembly tasks particularly in the case of manipulating flexible objects such as electric lines. In this report, a <span class="hlt">discrete</span> <span class="hlt">event</span> control system is introduced for automatic assembly task of electric lines into sleeves as one of a typical task of active electric power lines. In the implementation of a <span class="hlt">discrete</span> <span class="hlt">event</span> control system, LVQNN (learning vector quantization neural network) is applied to the insertion task of electric lines to sleeves. In order to apply these proposed control system to the unknown environment, virtual learning data for LVQNN was generated by fuzzy inference. By the experimental results of two types of electric lines and sleeves, these proposed <span class="hlt">discrete</span> <span class="hlt">event</span> control and neural network learning algorithm are confirmed very effective to the insertion tasks of electric lines to sleeves as a typical task of active electric power maintenance tasks.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2007APS..DFD.GB003S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2007APS..DFD.GB003S"><span id="translatedtitle">Adaptive Flow <span class="hlt">Simulation</span> of Turbulence in Subject-Specific Abdominal Aortic Aneurysm on Massively <span class="hlt">Parallel</span> Computers</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Sahni, Onkar; Jansen, Kenneth; Shephard, Mark; Taylor, Charles</p> <p>2007-11-01</p> <p>Flow within the healthy human vascular system is typically laminar but diseased conditions can alter the geometry sufficiently to produce transitional/turbulent flows in regions focal (and immediately downstream) of the diseased section. The mean unsteadiness (pulsatile or respiratory cycle) further complicates the situation making traditional turbulence <span class="hlt">simulation</span> techniques (e.g., Reynolds-averaged Navier-Stokes <span class="hlt">simulations</span> (RANSS)) suspect. At the other extreme, direct numerical <span class="hlt">simulation</span> (DNS) while fully appropriate can lead to large computational expense, particularly when the <span class="hlt">simulations</span> must be done quickly since they are intended to affect the outcome of a medical treatment (e.g., virtual surgical planning). To produce <span class="hlt">simulations</span> in a clinically relevant time frame requires; 1) adaptive meshing technique that closely matches the desired local mesh resolution in all three directions to the highly anisotropic physical length scales in the flow, 2) efficient solution algorithms, and 3) excellent scaling on massively <span class="hlt">parallel</span> computers. In this presentation we will demonstrate results for a subject-specific <span class="hlt">simulation</span> of an abdominal aortic aneurysm using stabilized finite element method on anisotropically adapted meshes consisting of O(10^8) elements over O(10^4) processors.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22130979','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22130979"><span id="translatedtitle">THE ACCELERATION OF THERMAL PROTONS AT <span class="hlt">PARALLEL</span> COLLISIONLESS SHOCKS: THREE-DIMENSIONAL HYBRID <span class="hlt">SIMULATIONS</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Guo Fan; Giacalone, Joe</p> <p>2013-08-20</p> <p>We present three-dimensional hybrid <span class="hlt">simulations</span> of collisionless shocks that propagate <span class="hlt">parallel</span> to the background magnetic field to study the acceleration of protons that forms a high-energy tail on the distribution. We focus on the initial acceleration of thermal protons and compare it with results from one-dimensional <span class="hlt">simulations</span>. We find that for both one- and three-dimensional <span class="hlt">simulations</span>, particles that end up in the high-energy tail of the distribution later in the <span class="hlt">simulation</span> gained their initial energy right at the shock. This confirms previous results but is the first to demonstrate this using fully three-dimensional fields. The result is not consistent with the ''thermal leakage'' model. We also show that the gyrocenters of protons in the three-dimensional <span class="hlt">simulation</span> can drift away from the magnetic field lines on which they started due to the removal of ignorable coordinates that exist in one- and two-dimensional <span class="hlt">simulations</span>. Our study clarifies the injection problem for diffusive shock acceleration.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/427933','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/427933"><span id="translatedtitle">A three-phase series-<span class="hlt">parallel</span> resonant converter -- analysis, design, <span class="hlt">simulation</span> and experimental results</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Bhat, A.K.S.; Zheng, L.</p> <p>1995-12-31</p> <p>A three-phase dc-to-dc series-<span class="hlt">parallel</span> resonant converter is proposed and its operating modes for 180{degree} wide gating pulse scheme are explained. A detailed analysis of the converter using constant current model and Fourier series approach is presented. Based on the analysis, design curves are obtained and a design example of 1 kW converter is given. SPICE <span class="hlt">simulation</span> results for the designed converter and experimental results for a 500 W converter are presented to verify the performance of the proposed converter for varying load conditions. The converter operates in lagging PF mode for the entire load range and requires a narrow variation in switching frequency.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/5574659','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/5574659"><span id="translatedtitle">Forced-convection boiling tests performed in <span class="hlt">parallel</span> <span class="hlt">simulated</span> LMR fuel assemblies</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Rose, S.D.; Carbajo, J.J.; Levin, A.E.; Lloyd, D.B.; Montgomery, B.H.; Wantland, J.L.</p> <p>1985-04-21</p> <p>Forced-convection tests have been carried out using <span class="hlt">parallel</span> <span class="hlt">simulated</span> Liquid Metal Reactor fuel assemblies in an engineering-scale sodium loop, the Thermal-Hydraulic Out-of-Reactor Safety facility. The tests, performed under single- and two-phase conditions, have shown that for low forced-convection flow there is significant flow augmentation by thermal convection, an important phenomenon under degraded shutdown heat removal conditions in an LMR. The power and flows required for boiling and dryout to occur are much higher than decay heat levels. The experimental evidence supports analytical results that heat removal from an LMR is possible with a degraded shutdown heat removal system.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1035294','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1035294"><span id="translatedtitle">Understanding Performance of <span class="hlt">Parallel</span> Scientific <span class="hlt">Simulation</span> Codes using Open|SpeedShop</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Ghosh, K K</p> <p>2011-11-07</p> <p>Conclusions of this presentation are: (1) Open SpeedShop's (OSS) is convenient to use for large, <span class="hlt">parallel</span>, scientific <span class="hlt">simulation</span> codes; (2) Large codes benefit from uninstrumented execution; (3) Many experiments can be run in a short time - might need multiple shots e.g. usertime for caller-callee, hwcsamp for HW counters; (4) Decent idea of code's performance is easily obtained; (5) Statistical sampling calls for decent number of samples; and (6) HWC data is very useful for micro-analysis but can be tricky to analyze.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/46693','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/46693"><span id="translatedtitle">Visualization of <span class="hlt">parallel</span> molecular dynamics <span class="hlt">simulation</span> on a remote visualization platform</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Lee, T.Y.; Raghavendra, C.S.; Nicholas, J.B.</p> <p>1994-09-01</p> <p>Visualization requires high performance computers. In order to use these shared high performance computers located at national centers, the authors need an environment for remote visualization. Remote visualization is a special process that uses computing resources and data that are physically distributed over long distances. In their experimental environment, a <span class="hlt">parallel</span> raytracer is designed for the rendering task. It allows one to efficiently visualize molecular dynamics <span class="hlt">simulations</span> represented by three dimensional ball-and-stick models. Different issues encountered in creating their platform are discussed, such as I/O, load balancing, and data distribution.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19880007021','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19880007021"><span id="translatedtitle">LISP based <span class="hlt">simulation</span> generators for modeling complex space processes</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Tseng, Fan T.; Schroer, Bernard J.; Dwan, Wen-Shing</p> <p>1987-01-01</p> <p>The development of a <span class="hlt">simulation</span> assistant for modeling <span class="hlt">discrete</span> <span class="hlt">event</span> processes is presented. Included are an overview of the system, a description of the <span class="hlt">simulation</span> generators, and a sample process generated using the <span class="hlt">simulation</span> assistant.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4257577','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4257577"><span id="translatedtitle">Evaluating the performance of <span class="hlt">parallel</span> subsurface <span class="hlt">simulators</span>: An illustrative example with PFLOTRAN</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Hammond, G E; Lichtner, P C; Mills, R T</p> <p>2014-01-01</p> <p>[1] To better inform the subsurface scientist on the expected performance of <span class="hlt">parallel</span> <span class="hlt">simulators</span>, this work investigates performance of the reactive multiphase flow and multicomponent biogeochemical transport code PFLOTRAN as it is applied to several realistic modeling scenarios run on the Jaguar supercomputer. After a brief introduction to the code's <span class="hlt">parallel</span> layout and code design, PFLOTRAN's <span class="hlt">parallel</span> performance (measured through strong and weak scalability analyses) is evaluated in the context of conceptual model layout, software and algorithmic design, and known hardware limitations. PFLOTRAN scales well (with regard to strong scaling) for three realistic problem scenarios: (1) in situ leaching of copper from a mineral ore deposit within a 5-spot flow regime, (2) transient flow and solute transport within a regional doublet, and (3) a real-world problem involving uranium surface complexation within a heterogeneous and extremely dynamic variably saturated flow field. Weak scalability is discussed in detail for the regional doublet problem, and several difficulties with its interpretation are noted. PMID:25506097</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2005SPIE.6019..523L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2005SPIE.6019..523L"><span id="translatedtitle"><span class="hlt">Simulation</span> of optical devices using <span class="hlt">parallel</span> finite-difference time-domain method</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Li, Kang; Kong, Fanmin; Mei, Liangmo; Liu, Xin</p> <p>2005-11-01</p> <p>This paper presents a new <span class="hlt">parallel</span> finite-difference time-domain (FDTD) numerical method in a low-cost network environment to stimulate optical waveguide characteristics. The PC motherboard based cluster is used, as it is relatively low-cost, reliable and has high computing performance. Four clusters are networked by fast Ethernet technology. Due to the simplicity nature of FDTD algorithm, a native Ethernet packet communication mechanism is used to reduce the overhead of the communication between the adjacent clusters. To validate the method, a microcavity ring resonator based on semiconductor waveguides is chosen as an instance of FDTD <span class="hlt">parallel</span> computation. Speed-up rate under different division density is calculated. From the result we can conclude that when the decomposing size reaches a certain point, a good <span class="hlt">parallel</span> computing speed up will be maintained. This <span class="hlt">simulation</span> shows that through the overlapping of computation and communication method and controlling the decomposing size, the overhead of the communication of the shared data will be conquered. The result indicates that the implementation can achieve significant speed up for the FDTD algorithm. This will enable us to tackle the larger real electromagnetic problem by the low-cost PC clusters.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015IJMPB..2950147S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015IJMPB..2950147S"><span id="translatedtitle"><span class="hlt">Parallel</span> lattice Boltzmann <span class="hlt">simulation</span> of bubble rising and coalescence in viscous flows</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Shi, Dongyan; Wang, Zhikai</p> <p>2015-07-01</p> <p>A <span class="hlt">parallel</span> three-dimensional lattice Boltzmann scheme for multicomponent immiscible fluids is proposed to <span class="hlt">simulate</span> bubble rising and coalescence process in viscous flows. The lattice Boltzmann scheme is based on the free-energy model and is <span class="hlt">parallelized</span> in the share-memory model by using the OpenMP. Bubble interface is described by a diffusion interface method solving the Cahn-Hilliard equation and both the surface tension force and the buoyancy are introduced in a form of discrete body force. To avoid the numerical instability caused by the interface deformation, the 18 point finite difference scheme is utilized to calculate the first- and second-order space derivative. The correction of the <span class="hlt">parallel</span> scheme handling three-dimensional interfaces is verified by the Laplace law and the dynamic characteristics of an isolated bubble in stationary flows. Subsequently, effects of the initially relative position, accompanied by the size ratio on bubble-bubble interaction are studied. The results show that the present scheme can effectively describe the bubble interface dynamics, even if rupture and restructure occurs. In addition to the repulsion and coalescence phenomenon due to the relative position, the size ratio also plays an insignificant role in bubble deformation and trajectory.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/750325','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/750325"><span id="translatedtitle"><span class="hlt">Parallel</span> <span class="hlt">Simulation</span> of Three-Dimensional Free-Surface Fluid Flow Problems</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>BAER,THOMAS A.; SUBIA,SAMUEL R.; SACKINGER,PHILIP A.</p> <p>2000-01-18</p> <p>We describe <span class="hlt">parallel</span> <span class="hlt">simulations</span> of viscous, incompressible, free surface, Newtonian fluid flow problems that include dynamic contact lines. The Galerlin finite element method was used to discretize the fully-coupled governing conservation equations and a ''pseudo-solid'' mesh mapping approach was used to determine the shape of the free surface. In this approach, the finite element mesh is allowed to deform to satisfy quasi-static solid mechanics equations subject to geometric or kinematic constraints on the boundaries. As a result, nodal displacements must be included in the set of problem unknowns. Issues concerning the proper constraints along the solid-fluid dynamic contact line in three dimensions are discussed. <span class="hlt">Parallel</span> computations are carried out for an example taken from the coating flow industry, flow in the vicinity of a slot coater edge. This is a three-dimensional free-surface problem possessing a contact line that advances at the web speed in one region but transitions to static behavior in another part of the flow domain. Discussion focuses on <span class="hlt">parallel</span> speedups for fixed problem size, a class of problems of immediate practical importance.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/25570947','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/25570947"><span id="translatedtitle"><span class="hlt">Parallel</span> computing <span class="hlt">simulation</span> of electrical excitation and conduction in the 3D human heart.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Di Yu; Dongping Du; Hui Yang; Yicheng Tu</p> <p>2014-01-01</p> <p>A correctly beating heart is important to ensure adequate circulation of blood throughout the body. Normal heart rhythm is produced by the orchestrated conduction of electrical signals throughout the heart. Cardiac electrical activity is the resulted function of a series of complex biochemical-mechanical reactions, which involves transportation and bio-distribution of ionic flows through a variety of biological ion channels. Cardiac arrhythmias are caused by the direct alteration of ion channel activity that results in changes in the AP waveform. In this work, we developed a whole-heart <span class="hlt">simulation</span> model with the use of massive <span class="hlt">parallel</span> computing with GPGPU and OpenGL. The <span class="hlt">simulation</span> algorithm was implemented under several different versions for the purpose of comparisons, including one conventional CPU version and two GPU versions based on Nvidia CUDA platform. OpenGL was utilized for the visualization / interaction platform because it is open source, light weight and universally supported by various operating systems. The experimental results show that the GPU-based <span class="hlt">simulation</span> outperforms the conventional CPU-based approach and significantly improves the speed of <span class="hlt">simulation</span>. By adopting modern computer architecture, this present investigation enables real-time <span class="hlt">simulation</span> and visualization of electrical excitation and conduction in the large and complicated 3D geometry of a real-world human heart. PMID:25570947</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3876705','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3876705"><span id="translatedtitle">Large-Scale Modeling of Epileptic Seizures: Scaling Properties of Two <span class="hlt">Parallel</span> Neuronal Network <span class="hlt">Simulation</span> Algorithms</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Pesce, Lorenzo L.; Lee, Hyong C.; Stevens, Rick L.</p> <p>2013-01-01</p> <p>Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale <span class="hlt">simulations</span> of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale <span class="hlt">simulations</span>. We have determined the detailed behavior of two such <span class="hlt">simulators</span> on <span class="hlt">parallel</span> computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our <span class="hlt">simulations</span> required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, <span class="hlt">simulations</span> of epileptic seizures on networks with millions of cells should be feasible on current supercomputers. PMID:24416069</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015A%26C....12..109H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015A%26C....12..109H"><span id="translatedtitle">L-PICOLA: A <span class="hlt">parallel</span> code for fast dark matter <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Howlett, C.; Manera, M.; Percival, W. J.</p> <p>2015-09-01</p> <p>Robust measurements based on current large-scale structure surveys require precise knowledge of statistical and systematic errors. This can be obtained from large numbers of realistic mock galaxy catalogues that mimic the observed distribution of galaxies within the survey volume. To this end we present a fast, distributed-memory, planar-<span class="hlt">parallel</span> code, L-PICOLA, which can be used to generate and evolve a set of initial conditions into a dark matter field much faster than a full non-linear N-Body <span class="hlt">simulation</span>. Additionally, L-PICOLA has the ability to include primordial non-Gaussianity in the <span class="hlt">simulation</span> and <span class="hlt">simulate</span> the past lightcone at run-time, with optional replication of the <span class="hlt">simulation</span> volume. Through comparisons to fully non-linear N-Body <span class="hlt">simulations</span> we find that our code can reproduce the z = 0 power spectrum and reduced bispectrum of dark matter to within 2% and 5% respectively on all scales of interest to measurements of Baryon Acoustic Oscillations and Redshift Space Distortions, but 3 orders of magnitude faster. The accuracy, speed and scalability of this code, alongside the additional features we have implemented, make it extremely useful for both current and next generation large-scale structure surveys. L-PICOLA is publicly available at https://cullanhowlett.github.io/l-picola.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2698777','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2698777"><span id="translatedtitle">PCSIM: A <span class="hlt">Parallel</span> <span class="hlt">Simulation</span> Environment for Neural Circuits Fully Integrated with Python</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Pecevski, Dejan; Natschlger, Thomas; Schuch, Klaus</p> <p>2008-01-01</p> <p>The <span class="hlt">Parallel</span> Circuit <span class="hlt">SIMulator</span> (PCSIM) is a software package for <span class="hlt">simulation</span> of neural circuits. It is primarily designed for distributed <span class="hlt">simulation</span> of large scale networks of spiking point neurons. Although its computational core is written in C++, PCSIM's primary interface is implemented in the Python programming language, which is a powerful programming environment and allows the user to easily integrate the neural circuit <span class="hlt">simulator</span> with data analysis and visualization tools to manage the full neural modeling life cycle. The main focus of this paper is to describe PCSIM's full integration into Python and the benefits thereof. In particular we will investigate how the automatically generated bidirectional interface and PCSIM's object-oriented modular framework enable the user to adopt a hybrid modeling approach: using and extending PCSIM's functionality either employing pure Python or C++ and thus combining the advantages of both worlds. Furthermore, we describe several supplementary PCSIM packages written in pure Python and tailored towards setting up and analyzing neural <span class="hlt">simulations</span>. PMID:19543450</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_17");'>17</a></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li class="active"><span>19</span></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_19 --> <div id="page_20" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li class="active"><span>20</span></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="381"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/24416069','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/24416069"><span id="translatedtitle">Large-scale modeling of epileptic seizures: scaling properties of two <span class="hlt">parallel</span> neuronal network <span class="hlt">simulation</span> algorithms.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Pesce, Lorenzo L; Lee, Hyong C; Hereld, Mark; Visser, Sid; Stevens, Rick L; Wildeman, Albert; van Drongelen, Wim</p> <p>2013-01-01</p> <p>Our limited understanding of the relationship between the behavior of individual neurons and large neuronal networks is an important limitation in current epilepsy research and may be one of the main causes of our inadequate ability to treat it. Addressing this problem directly via experiments is impossibly complex; thus, we have been developing and studying medium-large-scale <span class="hlt">simulations</span> of detailed neuronal networks to guide us. Flexibility in the connection schemas and a complete description of the cortical tissue seem necessary for this purpose. In this paper we examine some of the basic issues encountered in these multiscale <span class="hlt">simulations</span>. We have determined the detailed behavior of two such <span class="hlt">simulators</span> on <span class="hlt">parallel</span> computer systems. The observed memory and computation-time scaling behavior for a distributed memory implementation were very good over the range studied, both in terms of network sizes (2,000 to 400,000 neurons) and processor pool sizes (1 to 256 processors). Our <span class="hlt">simulations</span> required between a few megabytes and about 150 gigabytes of RAM and lasted between a few minutes and about a week, well within the capability of most multinode clusters. Therefore, <span class="hlt">simulations</span> of epileptic seizures on networks with millions of cells should be feasible on current supercomputers. PMID:24416069</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22089677','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22089677"><span id="translatedtitle">A <span class="hlt">PARALLEL</span> MONTE CARLO CODE FOR <span class="hlt">SIMULATING</span> COLLISIONAL N-BODY SYSTEMS</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A.</p> <p>2013-02-15</p> <p>We present a new <span class="hlt">parallel</span> code for computing the dynamical evolution of collisional N-body systems with up to N {approx} 10{sup 7} particles. Our code is based on the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a <span class="hlt">parallel</span> random number generation scheme as well as a <span class="hlt">parallel</span> sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 10{sup 5} to 10{sup 7}. We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within {approx}< 0.04% throughout all <span class="hlt">simulations</span>. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N = 10{sup 5}, 128 for N = 10{sup 6} and 256 for N = 10{sup 7}. The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the <span class="hlt">parallel</span> sorting algorithm. The resulting maximum speedups we achieve are approximately 60 Multiplication-Sign , 100 Multiplication-Sign , and 220 Multiplication-Sign , respectively.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013ApJS..204...15P','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013ApJS..204...15P"><span id="translatedtitle">A <span class="hlt">Parallel</span> Monte Carlo Code for <span class="hlt">Simulating</span> Collisional N-body Systems</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A.</p> <p>2013-02-01</p> <p>We present a new <span class="hlt">parallel</span> code for computing the dynamical evolution of collisional N-body systems with up to N ~ 107 particles. Our code is based on the Hnon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a <span class="hlt">parallel</span> random number generation scheme as well as a <span class="hlt">parallel</span> sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 105 to 107. We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within <~ 0.04% throughout all <span class="hlt">simulations</span>. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N = 105, 128 for N = 106 and 256 for N = 107. The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the <span class="hlt">parallel</span> sorting algorithm. The resulting maximum speedups we achieve are approximately 60, 100, and 220, respectively.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/91920','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/91920"><span id="translatedtitle">Numerical <span class="hlt">simulation</span> via <span class="hlt">parallel</span>-distributed computing of energy absorption by metal deformation</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Plaskacz, E.J.; Kulak, R.F.</p> <p>1995-07-01</p> <p>Collapsible steering column designs are credited with saving tens-of-thousands of lives since their introduction in the late 1960`s. The collapsible steering column is a safety feature designed to absorb energy and protect-the driver in a head-on collision. One of the most frequently used design concepts employs two telescoping metal tubes that slide over one another as the occupant impacts the steering wheel. Hardened steel ball bearings are embedded in a plastic sleeve located between the two tubes. There are two primary mechanisms for energy absorption during steering column collapse. One is the friction between the bearing and tube surfaces. Another is the gouging of the tubes` surfaces by the bearings. Current analytical models are unable to adequately capture the physics behind this process. In this paper we will present an overview of a <span class="hlt">parallel</span> finite element code, currently under development, that can be used to <span class="hlt">simulate</span> the highly nonlinear response of this energy absorbing mechanism. Our <span class="hlt">parallel</span> algorithms are constructed on a message-passing foundation. The actual message-passing implementation used was the Argonne-developed p4 package. However, other message-passing libraries can easily be accommodated as they are largely identical in function and differ only in syntax. Once the algorithm is restructured as a set of processes communicating through messages, the program can run on systems as diverse as a uniprocessor workstation, multiprocessors with and without shared memory, a group of workstations that communicate over a local network, or any combination of the above. Benchmarks of the <span class="hlt">parallel</span> code performance on networks of workstations and the IBM SP1 <span class="hlt">parallel</span> supercomputer will be discussed.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20140009920','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20140009920"><span id="translatedtitle"><span class="hlt">Simulation</span>/Emulation Techniques: Compressing Schedules With <span class="hlt">Parallel</span> (HW/SW) Development</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Mangieri, Mark L.; Hoang, June</p> <p>2014-01-01</p> <p>NASA has always been in the business of balancing new technologies and techniques to achieve human space travel objectives. NASA's Kedalion engineering analysis lab has been validating and using many contemporary avionics HW/SW development and integration techniques, which represent new paradigms to NASA's heritage culture. Kedalion has validated many of the Orion HW/SW engineering techniques borrowed from the adjacent commercial aircraft avionics solution space, inserting new techniques and skills into the Multi - Purpose Crew Vehicle (MPCV) Orion program. Using contemporary agile techniques, Commercial-off-the-shelf (COTS) products, early rapid prototyping, in-house expertise and tools, and extensive use of <span class="hlt">simulators</span> and emulators, NASA has achieved cost effective paradigms that are currently serving the Orion program effectively. Elements of long lead custom hardware on the Orion program have necessitated early use of <span class="hlt">simulators</span> and emulators in advance of deliverable hardware to achieve <span class="hlt">parallel</span> design and development on a compressed schedule.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016JCoPh.307..321V','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016JCoPh.307..321V"><span id="translatedtitle">Massively <span class="hlt">parallel</span> kinetic Monte Carlo <span class="hlt">simulations</span> of charge carrier transport in organic semiconductors</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>van der Kaap, N. J.; Koster, L. J. A.</p> <p>2016-02-01</p> <p>A <span class="hlt">parallel</span>, lattice based Kinetic Monte Carlo <span class="hlt">simulation</span> is developed that runs on a GPGPU board and includes Coulomb like particle-particle interactions. The performance of this computationally expensive problem is improved by modifying the interaction potential due to nearby particle moves, instead of fully recalculating it. This modification is achieved by adding dipole correction terms that represent the particle move. Exact evaluation of these terms is guaranteed by representing all interactions as 32-bit floating numbers, where only the integers between -222 and 222 are used. We validate our method by modelling the charge transport in disordered organic semiconductors, including Coulomb interactions between charges. Performance is mainly governed by the particle density in the <span class="hlt">simulation</span> volume, and improves for increasing densities. Our method allows calculations on large volumes including particle-particle interactions, which is important in the field of organic semiconductors.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1093069','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1093069"><span id="translatedtitle"><span class="hlt">Parallel</span> adaptive fluid-structure interaction <span class="hlt">simulation</span> of explosions impacting on building structures</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Deiterding, Ralf; Wood, Stephen L</p> <p>2013-01-01</p> <p>We pursue a level set approach to couple an Eulerian shock-capturing fluid solver with space-time refinement to an explicit solid dynamics solver for large deformations and fracture. The coupling algorithms considering recursively finer fluid time steps as well as overlapping solver updates are discussed in detail. Our ideas are implemented in the AMROC adaptive fluid solver framework and are used for effective fluid-structure coupling to the general purpose solid dynamics code DYNA3D. Beside <span class="hlt">simulations</span> verifying the coupled fluid-structure solver and assessing its <span class="hlt">parallel</span> scalability, the detailed structural analysis of a reinforced concrete column under blast loading and the <span class="hlt">simulation</span> of a prototypical blast explosion in a realistic multistory building are presented.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1194329','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1194329"><span id="translatedtitle">A Many-Task <span class="hlt">Parallel</span> Approach for Multiscale <span class="hlt">Simulations</span> of Subsurface Flow and Reactive Transport</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Scheibe, Timothy D.; Yang, Xiaofan; Schuchardt, Karen L.; Agarwal, Khushbu; Chase, Jared M.; Palmer, Bruce J.; Tartakovsky, Alexandre M.</p> <p>2014-12-16</p> <p>Continuum-scale models have long been used to study subsurface flow, transport, and reactions but lack the ability to resolve processes that are governed by pore-scale mixing. Recently, pore-scale models, which explicitly resolve individual pores and soil grains, have been developed to more accurately model pore-scale phenomena, particularly reaction processes that are controlled by local mixing. However, pore-scale models are prohibitively expensive for modeling application-scale domains. This motivates the use of a hybrid multiscale approach in which continuum- and pore-scale codes are coupled either hierarchically or concurrently within an overall <span class="hlt">simulation</span> domain (time and space). This approach is naturally suited to an adaptive, loosely-coupled many-task methodology with three potential levels of concurrency. Each individual code (pore- and continuum-scale) can be implemented in <span class="hlt">parallel</span>; multiple semi-independent instances of the pore-scale code are required at each time step providing a second level of concurrency; and Monte Carlo <span class="hlt">simulations</span> of the overall system to represent uncertainty in material property distributions provide a third level of concurrency. We have developed a hybrid multiscale model of a mixing-controlled reaction in a porous medium wherein the reaction occurs only over a limited portion of the domain. Loose, minimally-invasive coupling of pre-existing <span class="hlt">parallel</span> continuum- and pore-scale codes has been accomplished by an adaptive script-based workflow implemented in the Swift workflow system. We describe here the methods used to create the model system, adaptively control multiple coupled instances of pore- and continuum-scale <span class="hlt">simulations</span>, and maximize the scalability of the overall system. We present results of numerical experiments conducted on NERSC supercomputing systems; our results demonstrate that loose many-task coupling provides a scalable solution for multiscale subsurface <span class="hlt">simulations</span> with minimal overhead.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1096496','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1096496"><span id="translatedtitle">Xyce <span class="hlt">parallel</span> electronic <span class="hlt">simulator</span> users%3CU%2B2019%3E guide, version 6.0.</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Keiter, Eric Richard; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Verley, Jason C.; Fixel, Deborah A.; Coffey, Todd Stirling; Pawlowski, Roger Patrick; Warrender, Christina E.; Baur, David G.</p> <p>2013-08-01</p> <p>This manual describes the use of the Xyce <span class="hlt">Parallel</span> Electronic <span class="hlt">Simulator</span>. Xyce has been designed as a SPICE-compatible, high-performance analog circuit <span class="hlt">simulator</span>, and has been written to support the <span class="hlt">simulation</span> needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: Capability to solve extremely large circuit problems by supporting large-scale <span class="hlt">parallel</span> computing platforms (up to thousands of processors). This includes support for most popular <span class="hlt">parallel</span> and serial computers. A differential-algebraic-equation (DAE) formulation, which better isolates the device model package from solver algorithms. This allows one to develop new types of analysis without requiring the implementation of analysis-specific device models. Device models that are specifically tailored to meet Sandia's needs, including some radiationaware devices (for Sandia users only). Object-oriented code design and implementation using modern coding practices. Xyce is a <span class="hlt">parallel</span> code in the most general sense of the phrase - a message passing <span class="hlt">parallel</span> implementation - which allows it to run efficiently a wide range of computing platforms. These include serial, shared-memory and distributed-memory <span class="hlt">parallel</span> platforms. Attention has been paid to the specific nature of circuit-<span class="hlt">simulation</span> problems to ensure that optimal <span class="hlt">parallel</span> efficiency is achieved as the number of processors grows.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/26123630','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/26123630"><span id="translatedtitle">SDA 7: A modular and <span class="hlt">parallel</span> implementation of the <span class="hlt">simulation</span> of diffusional association software.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Martinez, Michael; Bruce, Neil J; Romanowska, Julia; Kokh, Daria B; Ozboyaci, Musa; Yu, Xiaofeng; ztrk, Mehmet Ali; Richter, Stefan; Wade, Rebecca C</p> <p>2015-08-01</p> <p>The <span class="hlt">simulation</span> of diffusional association (SDA) Brownian dynamics software package has been widely used in the study of biomacromolecular association. Initially developed to calculate bimolecular protein-protein association rate constants, it has since been extended to study electron transfer rates, to predict the structures of biomacromolecular complexes, to investigate the adsorption of proteins to inorganic surfaces, and to <span class="hlt">simulate</span> the dynamics of large systems containing many biomacromolecular solutes, allowing the study of concentration-dependent effects. These extensions have led to a number of divergent versions of the software. In this article, we report the development of the latest version of the software (SDA 7). This release was developed to consolidate the existing codes into a single framework, while improving the <span class="hlt">parallelization</span> of the code to better exploit modern multicore shared memory computer architectures. It is built using a modular object-oriented programming scheme, to allow for easy maintenance and extension of the software, and includes new features, such as adding flexible solute representations. We discuss a number of application examples, which describe some of the methods available in the release, and provide benchmarking data to demonstrate the <span class="hlt">parallel</span> performance. PMID:26123630</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/16806377','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/16806377"><span id="translatedtitle"><span class="hlt">Parallel</span> numerical <span class="hlt">simulation</span> of the ultrasonic waves in a prestressed formation.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Chen, Hao; Wang, Xiuming; Lin, Weijun</p> <p>2006-12-22</p> <p>Formation stress prediction plays an important role in petroleum production. Understanding ultrasonic wave propagation in a stress-induced anisotropic formation will help us to find an efficient method to correctly predict formation stress or formation pore pressure. In this work, a <span class="hlt">parallel</span> 3D finite-difference time domain (FDTD) method is developed to <span class="hlt">simulate</span> elastic wave propagation in pre-stressed formations. A perfectly matched layer (PML) is used as an absorbing boundary condition. The acceleration ration of total CPU computation time and the lasting time of the program run in the super computer-ShenTeng 6800 in the Super Computation Center of Chinese Academy of Science (CAS) are tested. It shows that the acceleration factor of the <span class="hlt">parallel</span> FDTD program is considerably high even if the domain is only divided in one direction. When the total computation model size fixed, the acceleration factor of 8 CPU and 64 CPU is 3.0 and 13.8, respectively. The velocities under various static stresses are obtained by processing the array data calculated with the FDTD using Prony's method. The linear relation between velocity and the applied pre-stress is in agreement with that predicted by the acoustoelasticity theory. Results from the numerical <span class="hlt">simulation</span> confirm the reciprocity principle and the superposition principle. PMID:16806377</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=retrieve&db=pubmed&list_uids=24955411&dopt=AbstractPlus','TOXNETTOXLINE'); return false;" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?cmd=retrieve&db=pubmed&list_uids=24955411&dopt=AbstractPlus"><span id="translatedtitle"><span class="hlt">Parallel</span> <span class="hlt">simulation</span> of HGMS of weakly magnetic nanoparticles in irrotational flow of inviscid fluid.</span></a></p> <p><a target="_blank" href="http://toxnet.nlm.nih.gov/cgi-bin/sis/htmlgen?TOXLINE">TOXLINE Toxicology Bibliographic Information</a></p> <p>Hournkumnuard K; Dolwithayakul B; Chantrapornchai C</p> <p>1965-01-01</p> <p>The process of high gradient magnetic separation (HGMS) using a microferromagnetic wire for capturing weakly magnetic nanoparticles in the irrotational flow of inviscid fluid is <span class="hlt">simulated</span> by using <span class="hlt">parallel</span> algorithm developed based on openMP. The two-dimensional problem of particle transport under the influences of magnetic force and fluid flow is considered in an annular domain surrounding the wire with inner radius equal to that of the wire and outer radius equal to various multiples of wire radius. The differential equations governing particle transport are solved numerically as an initial and boundary values problem by using the finite-difference method. Concentration distribution of the particles around the wire is investigated and compared with some previously reported results and shows the good agreement between them. The results show the feasibility of accumulating weakly magnetic nanoparticles in specific regions on the wire surface which is useful for applications in biomedical and environmental works. The speedup of <span class="hlt">parallel</span> <span class="hlt">simulation</span> ranges from 1.8 to 21 depending on the number of threads and the domain problem size as well as the number of iterations. With the nature of computing in the application and current multicore technology, it is observed that 4-8 threads are sufficient to obtain the optimized speedup.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/24955411','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/24955411"><span id="translatedtitle"><span class="hlt">Parallel</span> <span class="hlt">simulation</span> of HGMS of weakly magnetic nanoparticles in irrotational flow of inviscid fluid.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Hournkumnuard, Kanok; Dolwithayakul, Banpot; Chantrapornchai, Chantana</p> <p>2014-01-01</p> <p>The process of high gradient magnetic separation (HGMS) using a microferromagnetic wire for capturing weakly magnetic nanoparticles in the irrotational flow of inviscid fluid is <span class="hlt">simulated</span> by using <span class="hlt">parallel</span> algorithm developed based on openMP. The two-dimensional problem of particle transport under the influences of magnetic force and fluid flow is considered in an annular domain surrounding the wire with inner radius equal to that of the wire and outer radius equal to various multiples of wire radius. The differential equations governing particle transport are solved numerically as an initial and boundary values problem by using the finite-difference method. Concentration distribution of the particles around the wire is investigated and compared with some previously reported results and shows the good agreement between them. The results show the feasibility of accumulating weakly magnetic nanoparticles in specific regions on the wire surface which is useful for applications in biomedical and environmental works. The speedup of <span class="hlt">parallel</span> <span class="hlt">simulation</span> ranges from 1.8 to 21 depending on the number of threads and the domain problem size as well as the number of iterations. With the nature of computing in the application and current multicore technology, it is observed that 4-8 threads are sufficient to obtain the optimized speedup. PMID:24955411</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014ChPhB..23b8903W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014ChPhB..23b8903W"><span id="translatedtitle">MDSLB: A new static load balancing method for <span class="hlt">parallel</span> molecular dynamics <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wu, Yun-Long; Xu, Xin-Hai; Yang, Xue-Jun; Zou, Shun; Ren, Xiao-Guang</p> <p>2014-02-01</p> <p>Large-scale <span class="hlt">parallelization</span> of molecular dynamics <span class="hlt">simulations</span> is facing challenges which seriously affect the <span class="hlt">simulation</span> efficiency, among which the load imbalance problem is the most critical. In this paper, we propose, a new molecular dynamics static load balancing method (MDSLB). By analyzing the characteristics of the short-range force of molecular dynamics programs running in <span class="hlt">parallel</span>, we divide the short-range force into three kinds of force models, and then package the computations of each force model into many tiny computational units called “cell loads”, which provide the basic data structures for our load balancing method. In MDSLB, the spatial region is separated into sub-regions called “local domains”, and the cell loads of each local domain are allocated to every processor in turn. Compared with the dynamic load balancing method, MDSLB can guarantee load balance by executing the algorithm only once at program startup without migrating the loads dynamically. We implement MDSLB in OpenFOAM software and test it on TianHe-1A supercomputer with 16 to 512 processors. Experimental results show that MDSLB can save 34%-64% time for the load imbalanced cases.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/24732497','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/24732497"><span id="translatedtitle">pWeb: A High-Performance, <span class="hlt">Parallel</span>-Computing Framework for Web-Browser-Based Medical <span class="hlt">Simulation</span>.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Halic, Tansel; Ahn, Woojin; De, Suvranu</p> <p>2014-01-01</p> <p>This work presents a pWeb - a new language and compiler for <span class="hlt">parallelization</span> of client-side compute intensive web applications such as surgical <span class="hlt">simulations</span>. The recently introduced HTML5 standard has enabled creating unprecedented applications on the web. Low performance of the web browser, however, remains the bottleneck of computationally intensive applications including visualization of complex scenes, real time physical <span class="hlt">simulations</span> and image processing compared to native ones. The new proposed language is built upon web workers for multithreaded programming in HTML5. The language provides fundamental functionalities of <span class="hlt">parallel</span> programming languages as well as the fork/join <span class="hlt">parallel</span> model which is not supported by web workers. The language compiler automatically generates an equivalent <span class="hlt">parallel</span> script that complies with the HTML5 standard. A case study on realistic rendering for surgical <span class="hlt">simulations</span> demonstrates enhanced performance with a compact set of instructions. PMID:24732497</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2012EGUGA..14.2840H','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2012EGUGA..14.2840H"><span id="translatedtitle"><span class="hlt">Parallel</span> grid library with adaptive mesh refinement for development of highly scalable <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Honkonen, I.; von Alfthan, S.; Sandroos, A.; Janhunen, P.; Palmroth, M.</p> <p>2012-04-01</p> <p>As the single CPU core performance is saturating while the number of cores in the fastest supercomputers increases exponentially, the <span class="hlt">parallel</span> performance of <span class="hlt">simulations</span> on distributed memory machines is crucial. At the same time, utilizing efficiently the large number of available cores presents a challenge, especially in <span class="hlt">simulations</span> with run-time adaptive mesh refinement. We have developed a generic grid library (dccrg) aimed at finite volume <span class="hlt">simulations</span> that is easy to use and scales well up to tens of thousands of cores. The grid has several attractive features: It 1) allows an arbitrary C++ class or structure to be used as cell data; 2) provides a simple interface for adaptive mesh refinement during a <span class="hlt">simulation</span>; 3) encapsulates the details of MPI communication when updating the data of neighboring cells between processes; and 4) provides a simple interface to run-time load balancing, e.g. domain decomposition, through the Zoltan library. Dccrg is freely available for anyone to use, study and modify under the GNU Lesser General Public License v3. We will present the implementation of dccrg, simple and advanced usage examples and scalability results on various supercomputers and problems.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016CG.....89..174K','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016CG.....89..174K"><span id="translatedtitle"><span class="hlt">Parallel</span> <span class="hlt">simulation</span> of particle transport in an advection field applied to volcanic explosive eruptions</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Künzli, Pierre; Tsunematsu, Kae; Albuquerque, Paul; Falcone, Jean-Luc; Chopard, Bastien; Bonadonna, Costanza</p> <p>2016-04-01</p> <p>Volcanic ash transport and dispersal models typically describe particle motion via a turbulent velocity field. Particles are advected inside this field from the moment they leave the vent of the volcano until they deposit on the ground. Several techniques exist to <span class="hlt">simulate</span> particles in an advection field such as finite difference Eulerian, Lagrangian-puff or pure Lagrangian techniques. In this paper, we present a new flexible <span class="hlt">simulation</span> tool called TETRAS (TEphra TRAnsport <span class="hlt">Simulator</span>) based on a hybrid Eulerian-Lagrangian model. This scheme offers the advantages of being numerically stable with no numerical diffusion and easily parallelizable. It also allows us to output particle atmospheric concentration or ground mass load at any given time. The model is validated using the advection-diffusion analytical equation. We also obtained a good agreement with field observations of the tephra deposit associated with the 2450 BP Pululagua (Ecuador) and the 1996 Ruapehu (New Zealand) eruptions. As this kind of model can lead to computationally intensive <span class="hlt">simulations</span>, a <span class="hlt">parallelization</span> on a distributed memory architecture was developed. A related performance model, taking into account load imbalance, is proposed and its accuracy tested.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016CoPhC.200..324N','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016CoPhC.200..324N"><span id="translatedtitle">MaMiCo: Software design for <span class="hlt">parallel</span> molecular-continuum flow <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Neumann, Philipp; Flohr, Hanno; Arora, Rahul; Jarmatz, Piet; Tchipev, Nikola; Bungartz, Hans-Joachim</p> <p>2016-03-01</p> <p>The macro-micro-coupling tool (MaMiCo) was developed to ease the development of and modularize molecular-continuum <span class="hlt">simulations</span>, retaining sequential and <span class="hlt">parallel</span> performance. We demonstrate the functionality and performance of MaMiCo by coupling the spatially adaptive Lattice Boltzmann framework waLBerla with four molecular dynamics (MD) codes: the light-weight Lennard-Jones-based implementation SimpleMD, the node-level optimized software ls1 mardyn, and the community codes ESPResSo and LAMMPS. We detail interface implementations to connect each solver with MaMiCo. The coupling for each waLBerla-MD setup is validated in three-dimensional channel flow <span class="hlt">simulations</span> which are solved by means of a state-based coupling method. We provide sequential and strong scaling measurements for the four molecular-continuum <span class="hlt">simulations</span>. The overhead of MaMiCo is found to come at 10%-20% of the total (MD) runtime. The measurements further show that scalability of the hybrid <span class="hlt">simulations</span> is reached on up to 500 Intel SandyBridge, and more than 1000 AMD Bulldozer compute cores.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013MNRAS.433.2194E','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013MNRAS.433.2194E"><span id="translatedtitle">Super-Earths and dynamical stability of planetary systems: first <span class="hlt">parallel</span> GPU <span class="hlt">simulations</span> using GENGA</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Elser, S.; Grimm, S. L.; Stadel, J. G.</p> <p>2013-08-01</p> <p>We report on the stability of hypothetical super-Earths in the habitable zone of known multiplanetary systems. Most of them have not yet been studied in detail concerning the existence of additional low-mass planets. The new N-body code GENGA developed at the University of Zrich allows us to perform numerous N-body <span class="hlt">simulations</span> in <span class="hlt">parallel</span> on graphics processing units. With this numerical tool, we can study the stability of orbits of hypothetical planets in the semimajor axis and eccentricity parameter space in high resolution. Massless test particle <span class="hlt">simulations</span> give good predictions on the extension of the stable region and show that HIP 14810 and HD 37124 do not provide stable orbits in the habitable zone. Based on these <span class="hlt">simulations</span>, we carry out <span class="hlt">simulations</span> of 10 M? planets in several systems (HD 11964, HD 47186, HD 147018, HD 163607, HD 168443, HD 187123, HD 190360, HD 217107 and HIP 57274). They provide more exact information about orbits at the location of mean motion resonances and at the edges of the stability zones. Beside the stability of orbits, we study the secular evolution of the planets to constrain probable locations of hypothetical planets. Assuming that planetary systems are in general closely packed, we find that apart from HD 168443, all of the systems can harbour 10 M? planets in the habitable zone.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/920870','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/920870"><span id="translatedtitle">6th International Special Session on Current Trends in Numerical <span class="hlt">Simulation</span> for <span class="hlt">Parallel</span> Engineering Environments</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Schulz, M; Trinitis, C</p> <p>2007-07-09</p> <p>In today's world, the use of <span class="hlt">parallel</span> programming and architectures is essential for <span class="hlt">simulating</span> practical problems in engineering and related disciplines. Remarkable progress in CPU architecture (multi- and many-core, SMT, transactional memory, virtualization support, etc.), system scalability, and interconnect technology continues to provide new opportunities, as well as new challenges for both system architects and software developers. These trends are <span class="hlt">paralleled</span> by progress in <span class="hlt">parallel</span> algorithms, <span class="hlt">simulation</span> techniques, and software integration from multiple disciplines. In its 6th year ParSim continues to build a bridge between computer science and the application disciplines and to help with fostering cooperations between the different fields. In contrast to traditional conferences, emphasis is put on the presentation of up-to-date results with a shorter turn-around time. This offers the unique opportunity to present new aspects in this dynamic field and discuss them with a wide, interdisciplinary audience. The EuroPVM/MPI conference series, as one of the prime events in <span class="hlt">parallel</span> computation, serves as an ideal surrounding for ParSim. This combination enables the participants to present and discuss their work within the scope of both the session and the host conference. This year, ten papers with authors in ten countries were submitted to ParSim, and after a quick turn-around, yet thorough review process we decided to accept three of them for publication and presentation during the ParSim session. These three papers show the use of <span class="hlt">simulation</span> in a range of different application fields including earthquake and turbulence <span class="hlt">simulation</span>. At the same time, they also address computer science aspects and discuss different <span class="hlt">parallelization</span> strategies, programming models and environments, as well as scalability. We are confident that this provides an attractive program and that ParSim will yet again be an informal setting for lively discussions and for fostering new collaborations. Several people contributed to this event. Thanks go to Jack Dongarra, the EuroPVM/MPI general chair, and to Thomas Herault and Franck Cappello, the PC chairs, for their support to continue the ParSim series at EuroPVM/MPI 2007. We would also like to thank the numerous reviewers, who provided us with their reviews in such a short amount of time (in most cases in just a few days) and thereby helped us to maintain the tight schedule. Last, but certainly not least, we would like to thank all those who took the time to submit papers and hence made this event possible in the first place. We are confident that this session will fulfill its purpose to provide new insights from both the engineering and the computer science side and encourages interdisciplinary exchange of ideas and cooperations. We hope that this will continue ParSim's tradition at EuroPVM/MPI.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_18");'>18</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li class="active"><span>20</span></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_20 --> <div id="page_21" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li class="active"><span>21</span></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="401"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2001AAS...198.4007K','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2001AAS...198.4007K"><span id="translatedtitle">Application of a 3D, Adaptive, <span class="hlt">Parallel</span>, MHD Code to Supernova Remnant <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Kominsky, P.; Drake, R. P.; Powell, K. G.</p> <p>2001-05-01</p> <p>We at Michigan have a computational model, BATS-R-US, which incorporates several modern features that make it suitable for calculations of supernova remnant evolution. In particular, it is a three-dimensional MHD model, using a method called the Multiscale Adaptive Upwind Scheme for MagnetoHydroDynamics (MAUS-MHD). It incorporates a data structure that allows for adaptive refinement of the mesh, even in massively <span class="hlt">parallel</span> calculations. Its advanced Godunov method, a solution-adaptive, upwind, high-resolution scheme, incorporates a new, flux-based approach to the Riemann solver with improved numerical properties. This code has been successfully applied to several problems, including the <span class="hlt">simulation</span> of comets and of planetary magnetospheres, in the 3D context of the Heliosphere. The code was developed under a NASA computational grand challenge grant to run very rapidly on <span class="hlt">parallel</span> platforms. It is also now being used to study time-dependent systems such as the transport of particles and energy from solar coronal mass ejections to the Earth. We are in the process of modifying this code so that it can accommodate the very strong shocks present in supernova remnants. Our test case <span class="hlt">simulates</span> the explosion of a star of 1.4 solar masses with an energy of 1 foe, in a uniform background medium. We have performed runs of 250,000 to 1 million cells on 8 nodes of an Origin 2000. These relatively coarse grids do not allow fine details of instabilities to become visible. Nevertheless, the macroscopic evolution of the shock is <span class="hlt">simulated</span> well, with the forward and reverse shocks visible in velocity profiles. We will show our work to date. This work was supported by NASA through its GSRP program.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/6818542','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/6818542"><span id="translatedtitle">Forced-to-natural convection transition tests in <span class="hlt">parallel</span> <span class="hlt">simulated</span> liquid metal reactor fuel assemblies</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Levin, A.E. ); Montgomery, B.H. )</p> <p>1990-01-01</p> <p>The Thermal-Hydraulic Out of Reactor Safety (THORS) Program at Oak Ridge National Laboratory (ORNL) had as its objective the testing of <span class="hlt">simulated</span>, electrically heated liquid metal reactor (LMR) fuel assemblies in an engineering-scale, sodium loop. Between 1971 and 1985, the THORS Program operated 11 <span class="hlt">simulated</span> fuel bundles in conditions covering a wide range of normal and off-normal conditions. The last test series in the Program, THORS-SHRS Assembly 1, employed two <span class="hlt">parallel</span>, 19-pin, full-length, <span class="hlt">simulated</span> fuel assemblies of a design consistent with the large LMR (Large Scale Prototype Breeder -- LSPB) under development at that time. These bundles were installed in the THORS Facility, allowing single- and <span class="hlt">parallel</span>-bundle testing in thermal-hydraulic conditions up to and including sodium boiling and dryout. As the name SHRS (Shutdown Heat Removal System) implies, a major objective of the program was testing under conditions expected during low-power reactor operation, including low-flow forced convection, natural convection, and forced-to-natural convection transition at various powers. The THORS-SHRS Assembly 1 experimental program was divided up into four phases. Phase 1 included preliminary and shakedown tests, including the collection of baseline steady-state thermal-hydraulic data. Phase 2 comprised natural convection testing. Forced convection testing was conducted in Phase 3. The final phase of testing included forced-to-natural convection transition tests. Phases 1, 2, and 3 have been discussed in previous papers. The fourth phase is described in this paper. 3 refs., 2 figs.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/966572','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/966572"><span id="translatedtitle">8th International Special Session on Current Trends in Numerical <span class="hlt">Simulation</span> for <span class="hlt">Parallel</span> Engineering Environments</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Trinitis, C; Bader, M; Schulz, M</p> <p>2009-06-09</p> <p>In today's world, the use of <span class="hlt">parallel</span> programming and architectures is essential for <span class="hlt">simulating</span> practical problems in engineering and related disciplines. Significant progress in CPU architecture (multi- and many-core CPUs, SMT, transactional memory, virtualization support, shared caches etc.) system scalability, and interconnect technology, continues to provide new opportunities, as well as new challenges for both system architects and software developers. These trends are <span class="hlt">paralleled</span> by progress in algorithms, <span class="hlt">simulation</span> techniques, and software integration from multiple disciplines. In its 8th year, ParSim continues to build a bridge between application disciplines and computer science and to help fostering closer cooperations between these fields. Since its successful introduction in 2002, ParSim has established itself as an integral part of the EuroPVM/MPI conference series. In contrast to traditional conferences, emphasis is put on the presentation of up-to-date results with a short turn-around time. We believe that this offers a unique opportunity to present new aspects in this dynamic field and discuss them with a wide, interdisciplinary audience. The EuroPVM/MPI conference series, as one of the prime events in <span class="hlt">parallel</span> computation, serves as an ideal surrounding for ParSim. This combination enables participants to present and discuss their work within the scope of both the session and the host conference. This year, five papers from authors in five countries were submitted to Par-Sim, and we selected three of them. They cover a range of different application fields including mechanical engineering, material science, and structural engineering <span class="hlt">simulations</span>. We are confident that this resulted in an attractive special session and that this will be an informal setting for lively discussions as well as for fostering new collaborations. Several people contributed to this event. Thanks go to Jack Dongarra, the EuroPVM/MPI general chair, and to Jan Westerholm, Juha Fagerholm and Jussi Heikonen, the PC chairs, for their encouragement and support to continue the ParSim series at EuroPVM/MPI 2009. We would also like to thank the numerous reviewers, who provided us with their reviews in such a short amount of time (in most cases in just a few days) and thereby helped us to maintain the tight schedule. Last, but certainly not least, we would like to thank all those who took the time to submit papers and hence made this event possible in the first place. We are confident that this session will fulfill its purpose to provide new insights from both the engineering and the computer science side and encourages interdisciplinary exchange of ideas and cooperations, and that this will continue ParSim's tradition at EuroPVM/MPI.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013AGUFMIN23A1416G','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013AGUFMIN23A1416G"><span id="translatedtitle">Accelerating Dust Storm <span class="hlt">Simulation</span> by Balancing Task Allocation in <span class="hlt">Parallel</span> Computing Environment</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Gui, Z.; Yang, C.; XIA, J.; Huang, Q.; YU, M.</p> <p>2013-12-01</p> <p>Dust storm has serious negative impacts on environment, human health, and assets. The continuing global climate change has increased the frequency and intensity of dust storm in the past decades. To better understand and predict the distribution, intensity and structure of dust storm, a series of dust storm models have been developed, such as Dust Regional Atmospheric Model (DREAM), the NMM meteorological module (NMM-dust) and Chinese Unified Atmospheric Chemistry Environment for Dust (CUACE/Dust). The developments and applications of these models have contributed significantly to both scientific research and our daily life. However, dust storm <span class="hlt">simulation</span> is a data and computing intensive process. Normally, a <span class="hlt">simulation</span> for a single dust storm event may take several days or hours to run. It seriously impacts the timeliness of prediction and potential applications. To speed up the process, high performance computing is widely adopted. By partitioning a large study area into small subdomains according to their geographic location and executing them on different computing nodes in a <span class="hlt">parallel</span> fashion, the computing performance can be significantly improved. Since spatiotemporal correlations exist in the geophysical process of dust storm <span class="hlt">simulation</span>, each subdomain allocated to a node need to communicate with other geographically adjacent subdomains to exchange data. Inappropriate allocations may introduce imbalance task loads and unnecessary communications among computing nodes. Therefore, task allocation method is the key factor, which may impact the feasibility of the <span class="hlt">paralleling</span>. The allocation algorithm needs to carefully leverage the computing cost and communication cost for each computing node to minimize total execution time and reduce overall communication cost for the entire system. This presentation introduces two algorithms for such allocation and compares them with evenly distributed allocation method. Specifically, 1) In order to get optimized solutions, a quadratic programming based modeling method is proposed. This algorithm performs well with small amount of computing tasks. However, its efficiency decreases significantly as the subdomain number and computing node number increase. 2) To compensate performance decreasing for large scale tasks, a K-Means clustering based algorithm is introduced. Instead of dedicating to get optimized solutions, this method can get relatively good feasible solutions within acceptable time. However, it may introduce imbalance communication for nodes or node-isolated subdomains. This research shows both two algorithms have their own strength and weakness for task allocation. A combination of the two algorithms is under study to obtain a better performance. Keywords: Scheduling; <span class="hlt">Parallel</span> Computing; Load Balance; Optimization; Cost Model</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://files.eric.ed.gov/fulltext/ED210037.pdf','ERIC'); return false;" href="http://files.eric.ed.gov/fulltext/ED210037.pdf"><span id="translatedtitle">Multiple-Instruction, Multiple-Data Path Computers: <span class="hlt">Parallel</span> Processing Impact on Flight <span class="hlt">Simulation</span> Software. Final Report.</span></a></p> <p><a target="_blank" href="http://www.eric.ed.gov/ERICWebPortal/search/extended.jsp?_pageLabel=advanced">ERIC Educational Resources Information Center</a></p> <p>Lord, Robert E.; And Others</p> <p></p> <p>The purpose of this study was to evaluate the <span class="hlt">parallel</span> processing impact of multiple-instruction multiple-data path (MIMD) computers on flight <span class="hlt">simulation</span> software. Basic mathematical functions and arithmetic expressions from typical flight <span class="hlt">simulation</span> software were selected and run on an MIMD computer to evaluate the improvement in execution time</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014PhDT........13Z','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014PhDT........13Z"><span id="translatedtitle">Scalable <span class="hlt">parallel</span> programming for high performance seismic <span class="hlt">simulation</span> on petascale heterogeneous supercomputers</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Zhou, Jun</p> <p></p> <p>The 1994 Northridge earthquake in Los Angeles, California, killed 57 people, injured over 8,700 and caused an estimated $20 billion in damage. Petascale <span class="hlt">simulations</span> are needed in California and elsewhere to provide society with a better understanding of the rupture and wave dynamics of the largest earthquakes at shaking frequencies required to engineer safe structures. As the heterogeneous supercomputing infrastructures are becoming more common, numerical developments in earthquake system research are particularly challenged by the dependence on the accelerator elements to enable "the Big One" <span class="hlt">simulations</span> with higher frequency and finer resolution. Reducing time to solution and power consumption are two primary focus area today for the enabling technology of fault rupture dynamics and seismic wave propagation in realistic 3D models of the crust's heterogeneous structure. This dissertation presents scalable <span class="hlt">parallel</span> programming techniques for high performance seismic <span class="hlt">simulation</span> running on petascale heterogeneous supercomputers. A real world earthquake <span class="hlt">simulation</span> code, AWP-ODC, one of the most advanced earthquake codes to date, was chosen as the base code in this research, and the testbed is based on Titan at Oak Ridge National Laboraratory, the world's largest hetergeneous supercomputer. The research work is primarily related to architecture study, computation performance tuning and software system scalability. An earthquake <span class="hlt">simulation</span> workflow has also been developed to support the efficient production sets of <span class="hlt">simulations</span>. The highlights of the technical development are an aggressive performance optimization focusing on data locality and a notable data communication model that hides the data communication latency. This development results in the optimal computation efficiency and throughput for the 13-point stencil code on heterogeneous systems, which can be extended to general high-order stencil codes. Started from scratch, the hybrid CPU/GPU version of AWP-ODC code is ready now for real world petascale earthquake <span class="hlt">simulations</span>. This GPU-based code has demonstrated excellent weak scaling up to the full Titan scale and achieved 2.3 PetaFLOPs sustained computation performance in single precision. The production <span class="hlt">simulation</span> demonstrated the first 0-10Hz deterministic rough fault <span class="hlt">simulation</span>. Using the accelerated AWP-ODC, Southern California Earthquake Center (SCEC) has recently created the physics-based probablistic seismic hazard analysis model of the Los Angeles region, CyberShake 14.2, as of the time of the dissertation writing. The tensor-valued wavefield code based on this GPU research has dramatically reduced time-to-solution, making a statewide hazard model a goal reachable with existing heterogeneous supercomputers.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/974699','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/974699"><span id="translatedtitle">Massively <span class="hlt">parallel</span> <span class="hlt">simulation</span> with DOE's ASCI supercomputers : an overview of the Los Alamos Crestone project</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Weaver, R. P.; Gittings, M. L.</p> <p>2004-01-01</p> <p>The Los Alamos Crestone Project is part of the Department of Energy's (DOE) Accelerated Strategic Computing Initiative, or ASCI Program. The main goal of this software development project is to investigate the use of continuous adaptive mesh refinement (CAMR) techniques for application to problems of interest to the Laboratory. There are many code development efforts in the Crestone Project, both unclassified and classified codes. In this overview I will discuss the unclassified SAGE and the RAGE codes. The SAGE (SAIC adaptive grid Eulerian) code is a one-, two-, and three-dimensional multimaterial Eulerian massively <span class="hlt">parallel</span> hydrodynamics code for use in solving a variety of high-deformation flow problems. The RAGE CAMR code is built from the SAGE code by adding various radiation packages, improved setup utilities and graphics packages and is used for problems in which radiation transport of energy is important. The goal of these massively-<span class="hlt">parallel</span> versions of the codes is to run extremely large problems in a reasonable amount of calendar time. Our target is scalable performance to {approx}10,000 processors on a 1 billion CAMR computational cell problem that requires hundreds of variables per cell, multiple physics packages (e.g. radiation and hydrodynamics), and implicit matrix solves for each cycle. A general description of the RAGE code has been published in [l],[ 2], [3] and [4]. Currently, the largest <span class="hlt">simulations</span> we do are three-dimensional, using around 500 million computation cells and running for literally months of calendar time using {approx}2000 processors. Current ASCI platforms range from several 3-teraOPS supercomputers to one 12-teraOPS machine at Lawrence Livermore National Laboratory, the White machine, and one 20-teraOPS machine installed at Los Alamos, the Q machine. Each machine is a system comprised of many component parts that must perform in unity for the successful run of these <span class="hlt">simulations</span>. Key features of any massively <span class="hlt">parallel</span> system include the processors, the disks, the interconnection between processors, the operating system, libraries for message passing and <span class="hlt">parallel</span> 1/0 and other fundamental units of the system. We will give an overview of the current status of the Crestone Project codes SAGE and RAGE. These codes are intended for general applications without tuning of algorithms or parameters. We have run a wide variety of physical applications from millimeter-scale laboratory laser experiments to the multikilometer-scale asteroid impacts into the Pacific Ocean to parsec-scale galaxy formation. Examples of these <span class="hlt">simulations</span> will be shown. The goal of our effort is to avoid ad hoc models and attempt to rely on first-principles physics. In addition to the large effort on developing <span class="hlt">parallel</span> code physics packages, a substantial effort in the project is devoted to improving the computer science and software quality engineering (SQE) of the Project codes as well as a sizable effort on the verification and validation (V&V) of the resulting codes. Examples of these efforts for our project will be discussed.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013PhDT.......119R','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013PhDT.......119R"><span id="translatedtitle"><span class="hlt">Parallel</span> Algorithms for Monte Carlo Particle Transport <span class="hlt">Simulation</span> on Exascale Computing Architectures</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Romano, Paul Kollath</p> <p></p> <p>Monte Carlo particle transport methods are being considered as a viable option for high-fidelity <span class="hlt">simulation</span> of nuclear reactors. While Monte Carlo methods offer several potential advantages over deterministic methods, there are a number of algorithmic shortcomings that would prevent their immediate adoption for full-core analyses. In this thesis, algorithms are proposed both to ameliorate the degradation in <span class="hlt">parallel</span> efficiency typically observed for large numbers of processors and to offer a means of decomposing large tally data that will be needed for reactor analysis. A nearest-neighbor fission bank algorithm was proposed and subsequently implemented in the OpenMC Monte Carlo code. A theoretical analysis of the communication pattern shows that the expected cost is O( N ) whereas traditional fission bank algorithms are O(N) at best. The algorithm was tested on two supercomputers, the Intrepid Blue Gene/P and the Titan Cray XK7, and demonstrated nearly linear <span class="hlt">parallel</span> scaling up to 163,840 processor cores on a full-core benchmark problem. An algorithm for reducing network communication arising from tally reduction was analyzed and implemented in OpenMC. The proposed algorithm groups only particle histories on a single processor into batches for tally purposes---in doing so it prevents all network communication for tallies until the very end of the <span class="hlt">simulation</span>. The algorithm was tested, again on a full-core benchmark, and shown to reduce network communication substantially. A model was developed to predict the impact of load imbalances on the performance of domain decomposed <span class="hlt">simulations</span>. The analysis demonstrated that load imbalances in domain decomposed <span class="hlt">simulations</span> arise from two distinct phenomena: non-uniform particle densities and non-uniform spatial leakage. The dominant performance penalty for domain decomposition was shown to come from these physical effects rather than insufficient network bandwidth or high latency. The model predictions were verified with measured data from <span class="hlt">simulations</span> in OpenMC on a full-core benchmark problem. Finally, a novel algorithm for decomposing large tally data was proposed, analyzed, and implemented/tested in OpenMC. The algorithm relies on disjoint sets of compute processes and tally servers. The analysis showed that for a range of parameters relevant to LWR analysis, the tally server algorithm should perform with minimal overhead. Tests were performed on Intrepid and Titan and demonstrated that the algorithm did indeed perform well over a wide range of parameters. (Copies available exclusively from MIT Libraries, libraries.mit.edu/docs - docs mit.edu)</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2006CoPhC.175..440B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2006CoPhC.175..440B"><span id="translatedtitle">A package of Linux scripts for the <span class="hlt">parallelization</span> of Monte Carlo <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Badal, Andreu; Sempau, Josep</p> <p>2006-09-01</p> <p>Despite the fact that fast computers are nowadays available at low cost, there are many situations where obtaining a reasonably low statistical uncertainty in a Monte Carlo (MC) <span class="hlt">simulation</span> involves a prohibitively large amount of time. This limitation can be overcome by having recourse to <span class="hlt">parallel</span> computing. Most tools designed to facilitate this approach require modification of the source code and the installation of additional software, which may be inconvenient for some users. We present a set of tools, named clonEasy, that implement a <span class="hlt">parallelization</span> scheme of a MC <span class="hlt">simulation</span> that is free from these drawbacks. In clonEasy, which is designed to run under Linux, a set of "clone" CPUs is governed by a "master" computer by taking advantage of the capabilities of the Secure Shell (ssh) protocol. Any Linux computer on the Internet that can be ssh-accessed by the user can be used as a clone. A key ingredient for the <span class="hlt">parallel</span> calculation to be reliable is the availability of an independent string of random numbers for each CPU. Many generators—such as RANLUX, RANECU or the Mersenne Twister—can readily produce these strings by initializing them appropriately and, hence, they are suitable to be used with clonEasy. This work was primarily motivated by the need to find a straightforward way to <span class="hlt">parallelize</span> PENELOPE, a code for MC <span class="hlt">simulation</span> of radiation transport that (in its current 2005 version) employs the generator RANECU, which uses a combination of two multiplicative linear congruential generators (MLCGs). Thus, this paper is focused on this class of generators and, in particular, we briefly present an extension of RANECU that increases its period up to ˜5×10 and we introduce seedsMLCG, a tool that provides the information necessary to initialize disjoint sequences of an MLCG to feed different CPUs. This program, in combination with clonEasy, allows to run PENELOPE in <span class="hlt">parallel</span> easily, without requiring specific libraries or significant alterations of the sequential code. Program summary 1Title of program:clonEasy Catalogue identifier:ADYD_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYD_v1_0 Program obtainable from:CPC Program Library, Queen's University of Belfast, Northern Ireland Computer for which the program is designed and others in which it is operable:Any computer with a Unix style shell (bash), support for the Secure Shell protocol and a FORTRAN compiler Operating systems under which the program has been tested:Linux (RedHat 8.0, SuSe 8.1, Debian Woody 3.1) Compilers:GNU FORTRAN g77 (Linux); g95 (Linux); Intel Fortran Compiler 7.1 (Linux) Programming language used:Linux shell (bash) script, FORTRAN 77 No. of bits in a word:32 No. of lines in distributed program, including test data, etc.:1916 No. of bytes in distributed program, including test data, etc.:18 202 Distribution format:tar.gz Nature of the physical problem:There are many situations where a Monte Carlo <span class="hlt">simulation</span> involves a huge amount of CPU time. The <span class="hlt">parallelization</span> of such calculations is a simple way of obtaining a relatively low statistical uncertainty using a reasonable amount of time. Method of solution:The presented collection of Linux scripts and auxiliary FORTRAN programs implement Secure Shell-based communication between a "master" computer and a set of "clones". The aim of this communication is to execute a code that performs a Monte Carlo <span class="hlt">simulation</span> on all the clones simultaneously. The code is unique, but each clone is fed with a different set of random seeds. Hence, clonEasy effectively permits the <span class="hlt">parallelization</span> of the calculation. Restrictions on the complexity of the program:clonEasy can only be used with programs that produce statistically independent results using the same code, but with a different sequence of random numbers. Users must choose the initialization values for the random number generator on each computer and combine the output from the different executions. A FORTRAN program to combine the final results is also provided. Typical running time:The execution time of each script largely depends on the number of computers that are used, the actions that are to be performed and, to a lesser extent, on the network connexion bandwidth. Unusual features of the program:Any computer on the Internet with a Secure Shell client/server program installed can be used as a node of a virtual computer cluster for <span class="hlt">parallel</span> calculations with the sequential source code. The simplicity of the <span class="hlt">parallelization</span> scheme makes the use of this package a straightforward task, which does not require installing any additional libraries. Program summary 2Title of program:seedsMLCG Catalogue identifier:ADYE_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADYE_v1_0 Program obtainable from:CPC Program Library, Queen's University of Belfast, Northern Ireland Computer for which the program is designed and others in which it is operable:Any computer with a FORTRAN compiler Operating systems under which the program has been tested:Linux (RedHat 8.0, SuSe 8.1, Debian Woody 3.1), MS Windows (2000, XP) Compilers:GNU FORTRAN g77 (Linux and Windows); g95 (Linux); Intel Fortran Compiler 7.1 (Linux); Compaq Visual Fortran 6.1 (Windows) Programming language used:FORTRAN 77 No. of bits in a word:32 Memory required to execute with typical data:500 kilobytes No. of lines in distributed program, including test data, etc.:492 No. of bytes in distributed program, including test data, etc.:5582 Distribution format:tar.gz Nature of the physical problem:Statistically independent results from different runs of a Monte Carlo code can be obtained using uncorrelated sequences of random numbers on each execution. Multiplicative linear congruential generators (MLCG), or other generators that are based on them such as RANECU, can be adapted to produce these sequences. Method of solution:For a given MLCG, the presented program calculates initialization values that produce disjoint, consecutive sequences of pseudo-random numbers. The calculated values initiate the generator in distant positions of the random number cycle and can be used, for instance, on a <span class="hlt">parallel</span> <span class="hlt">simulation</span>. The values are found using the formula S=(aS)MODm, which gives the random value that will be generated after J iterations of the MLCG. Restrictions on the complexity of the program:The 32-bit length restriction for the integer variables in standard FORTRAN 77 limits the produced seeds to be separated a distance smaller than 2 31, when the distance J is expressed as an integer value. The program allows the user to input the distance as a power of 10 for the purpose of efficiently splitting the sequence of generators with a very long period. Typical running time:The execution time depends on the parameters of the used MLCG and the distance between the generated seeds. The generation of 10 6 seeds separated 10 12 units in the sequential cycle, for one of the MLCGs found in the RANECU generator, takes 3 s on a 2.4 GHz Intel Pentium 4 using the g77 compiler.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2003APS..DPPFP1114S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2003APS..DPPFP1114S"><span id="translatedtitle">MPI <span class="hlt">parallelization</span> of Vlasov codes for the <span class="hlt">simulation</span> of nonlinear laser-plasma interactions</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Savchenko, V.; Won, K.; Afeyan, B.; Decyk, V.; Albrecht-Marc, M.; Ghizzo, A.; Bertrand, P.</p> <p>2003-10-01</p> <p>The <span class="hlt">simulation</span> of optical mixing driven KEEN waves [1] and electron plasma waves [1] in laser-produced plasmas require nonlinear kinetic models and massive <span class="hlt">parallelization</span>. We use Massage Passing Interface (MPI) libraries and Appleseed [2] to solve the Vlasov Poisson system of equations on an 8 node dual processor MAC G4 cluster. We use the semi-Lagrangian time splitting method [3]. It requires only row-column exchanges in the global data redistribution, minimizing the total number of communications between processors. Recurrent communication patterns for 2D FFTs involves global transposition. In the Vlasov-Maxwell case, we use splitting into two 1D spatial advections and a 2D momentum advection [4]. Discretized momentum advection equations have a double loop structure with the outer index being assigned to different processors. We adhere to a code structure with separate routines for calculations and data management for <span class="hlt">parallel</span> computations. [1] B. Afeyan et al., IFSA 2003 Conference Proceedings, Monterey, CA [2] V. K. Decyk, Computers in Physics, 7, 418 (1993) [3] Sonnendrucker et al., JCP 149, 201 (1998) [4] Begue et al., JCP 151, 458 (1999)</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2007APS..4CF.B3005P','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2007APS..4CF.B3005P"><span id="translatedtitle"><span class="hlt">Parallelizing</span> and Optimizing <span class="hlt">Simulations</span> of Nonneutral Plasma Instabilities in a Malmberg-Penning Trap</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Powell, Melissa; Mason, Grant; Spencer, Ross</p> <p>2007-10-01</p> <p>A Malmberg-Penning trap is a cylindrical apparatus which confines non-neutral plasma (electrons only) with an axial magnetic field and negative electric potentials on both ends. It is a simple system for studying basic plasma behavior, so simple that theory and experiment ought to agree. Theory predicts that a hollow plasma density profile is unstable, and experiments agree. However, the experimental growth rate of the m =1 diocotron mode of the instability is much larger than the theoretical growth rate, by a factor of around 2-4. We are collaborating with Travis Mitchell's experimental research group at the University of Delaware to find the cause for this discrepancy by recreating experimental conditions in our <span class="hlt">simulation</span>. The growth rates of our <span class="hlt">simulation</span> test cases have remained less than half the growth rates of Mitchell's experiments. I will report the results of <span class="hlt">parallelizing</span> the <span class="hlt">simulation</span> to increase the number of particles to 2 billion. We also optimize the code by converting the field solver from a two grid to a three grid multigrid solver in order to increase the number of grid points.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014APS..MARM27008W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014APS..MARM27008W"><span id="translatedtitle">Large-scale massively <span class="hlt">parallel</span> atomistic <span class="hlt">simulations</span> of short pulse laser interaction with metals</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wu, Chengping; Zhigilei, Leonid; Computational Materials Group Team</p> <p>2014-03-01</p> <p>Taking advantage of petascale supercomputing architectures, large-scale massively <span class="hlt">parallel</span> atomistic <span class="hlt">simulations</span> (108-109 atoms) are performed to study the microscopic mechanisms of short pulse laser interaction with metals. The results of the <span class="hlt">simulations</span> reveal a complex picture of highly non-equilibrium processes responsible for material modification and/or ejection. At low laser fluences below the ablation threshold, fast melting and resolidification occur under conditions of extreme heating and cooling rates resulting in surface microstructure modification. At higher laser fluences in the spallation regime, the material is ejected by the relaxation of laser-induced stresses and proceeds through the nucleation, growth and percolation of multiple voids in the sub-surface region of the irradiated target. At a fluence of ~ 2.5 times the spallation threshold, the top part of the target reaches the conditions for an explosive decomposition into vapor and small droplets, marking the transition to the phase explosion regime of laser ablation. The dynamics of plume formation and the characteristics of the ablation plume are obtained from the <span class="hlt">simulations</span> and compared with the results of time-resolved plume imaging experiments. Financial support for this work was provided by NSF (DMR-0907247 and CMMI-1301298) and AFOSR (FA9550-10-1-0541). Computational support was provided by the OLCF (MAT048) and XSEDE (TG-DMR110090).</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22068805','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22068805"><span id="translatedtitle">Effect of <span class="hlt">parallel</span> currents on drift-interchange turbulence: Comparison of <span class="hlt">simulation</span> and experiment</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>D'Ippolito, D. A.; Russell, D. A.; Myra, J. R.; Thakur, S. C.; Tynan, G. R.; Holland, C.</p> <p>2012-10-15</p> <p>Two-dimensional (2D) turbulence <span class="hlt">simulations</span> are reported in which the balancing of the <span class="hlt">parallel</span> and perpendicular currents is modified by changing the axial boundary condition (BC) to vary the sheath conductivity. The <span class="hlt">simulations</span> are carried out using the 2D scrape-off-layer turbulence (SOLT) code. The results are compared with recent experiments on the controlled shear de-correlation experiment (CSDX) in which the axial BC was modified by changing the composition of the end plate. Reasonable qualitative agreement is found between the <span class="hlt">simulations</span> and the experiment. When an insulating axial BC is used, broadband turbulence is obtained and an inverse cascade occurs down to low frequencies and long spatial scales. Robust sheared flows are obtained. By contrast, employing a conducting BC at the plate resulted in coherent (drift wave) modes rather than broadband turbulence, with weaker inverse cascade, and smaller zonal flows. The dependence of the two instability mechanisms (rotationally driven interchange mode and drift waves) on the axial BC is also discussed.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2010PhPl...17g3107W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2010PhPl...17g3107W"><span id="translatedtitle">Three-dimensional <span class="hlt">parallel</span> UNIPIC-3D code for <span class="hlt">simulations</span> of high-power microwave devices</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wang, Jianguo; Chen, Zaigao; Wang, Yue; Zhang, Dianhui; Liu, Chunliang; Li, Yongdong; Wang, Hongguang; Qiao, Hailiang; Fu, Meiyan; Yuan, Yuan</p> <p>2010-07-01</p> <p>This paper introduces a self-developed, three-dimensional <span class="hlt">parallel</span> fully electromagnetic particle <span class="hlt">simulation</span> code UNIPIC-3D. In this code, the electromagnetic fields are updated using the second-order, finite-difference time-domain method, and the particles are moved using the relativistic Newton-Lorentz force equation. The electromagnetic field and particles are coupled through the current term in Maxwell's equations. Two numerical examples are used to verify the algorithms adopted in this code, numerical results agree well with theoretical ones. This code can be used to <span class="hlt">simulate</span> the high-power microwave (HPM) devices, such as the relativistic backward wave oscillator, coaxial vircator, and magnetically insulated line oscillator, etc. UNIPIC-3D is written in the object-oriented C++ language and can be run on a variety of platforms including WINDOWS, LINUX, and UNIX. Users can use the graphical user's interface to create the complex geometric structures of the <span class="hlt">simulated</span> HPM devices, which can be automatically meshed by UNIPIC-3D code. This code has a powerful postprocessor which can display the electric field, magnetic field, current, voltage, power, spectrum, momentum of particles, etc. For the sake of comparison, the results computed by using the two-and-a-half-dimensional UNIPIC code are also provided for the same parameters of HPM devices, the numerical results computed from these two codes agree well with each other.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19880008905','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19880008905"><span id="translatedtitle">Experiences with serial and <span class="hlt">parallel</span> algorithms for channel routing using <span class="hlt">simulated</span> annealing</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Brouwer, Randall Jay</p> <p>1988-01-01</p> <p>Two algorithms for channel routing using <span class="hlt">simulated</span> annealing are presented. <span class="hlt">Simulated</span> annealing is an optimization methodology which allows the solution process to back up out of local minima that may be encountered by inappropriate selections. By properly controlling the annealing process, it is very likely that the optimal solution to an NP-complete problem such as channel routing may be found. The algorithm presented proposes very relaxed restrictions on the types of allowable transformations, including overlapping nets. By freeing that restriction and controlling overlap situations with an appropriate cost function, the algorithm becomes very flexible and can be applied to many extensions of channel routing. The selection of the transformation utilizes a number of heuristics, still retaining the pseudorandom nature of <span class="hlt">simulated</span> annealing. The algorithm was implemented as a serial program for a workstation, and a <span class="hlt">parallel</span> program designed for a hypercube computer. The details of the serial implementation are presented, including many of the heuristics used and some of the resulting solutions.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/21389136','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/21389136"><span id="translatedtitle">Three-dimensional <span class="hlt">parallel</span> UNIPIC-3D code for <span class="hlt">simulations</span> of high-power microwave devices</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Wang Jianguo; Chen Zaigao; Wang Yue; Zhang Dianhui; Qiao Hailiang; Fu Meiyan; Yuan Yuan; Liu Chunliang; Li Yongdong; Wang Hongguang</p> <p>2010-07-15</p> <p>This paper introduces a self-developed, three-dimensional <span class="hlt">parallel</span> fully electromagnetic particle <span class="hlt">simulation</span> code UNIPIC-3D. In this code, the electromagnetic fields are updated using the second-order, finite-difference time-domain method, and the particles are moved using the relativistic Newton-Lorentz force equation. The electromagnetic field and particles are coupled through the current term in Maxwell's equations. Two numerical examples are used to verify the algorithms adopted in this code, numerical results agree well with theoretical ones. This code can be used to <span class="hlt">simulate</span> the high-power microwave (HPM) devices, such as the relativistic backward wave oscillator, coaxial vircator, and magnetically insulated line oscillator, etc. UNIPIC-3D is written in the object-oriented C++ language and can be run on a variety of platforms including WINDOWS, LINUX, and UNIX. Users can use the graphical user's interface to create the complex geometric structures of the <span class="hlt">simulated</span> HPM devices, which can be automatically meshed by UNIPIC-3D code. This code has a powerful postprocessor which can display the electric field, magnetic field, current, voltage, power, spectrum, momentum of particles, etc. For the sake of comparison, the results computed by using the two-and-a-half-dimensional UNIPIC code are also provided for the same parameters of HPM devices, the numerical results computed from these two codes agree well with each other.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016IAUS..312...79V','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016IAUS..312...79V"><span id="translatedtitle"><span class="hlt">Simulation</span> of disc-bulge-halo galaxies using <span class="hlt">parallel</span> GPU based codes</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Veles, O.; Berczik, P.; Just, A.</p> <p>2016-02-01</p> <p>We compare the performance of the very popular Tree-GPU code BONSAI with the older Particle-(Multi)Mesh code SUPERBOX. Both code we run on a same hardware using the GPU acceleration for the force calculation. SUPERBOX is a particle-mesh code with high resolution sub-grid and a higher order NGP (nearest grid point) force-calculation scheme. In our research, we are aiming to demonstrate that the new <span class="hlt">parallel</span> version of SUPERBOX is capable to do the high resolution <span class="hlt">simulations</span> of the interaction of the system of disc-bulge-halo composed galaxy. We describe the improvement of performance and scalability of SUPERBOX particularly for the Kepler cluster (NVIDIA K20 GPU). A comparison was made with the very popular and publicly available Tree-GPU code BONSAI†.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014ChPhL..31k5201W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014ChPhL..31k5201W"><span id="translatedtitle"><span class="hlt">Simulation</span> of the Quasi-Monoenergetic Protons Generation by <span class="hlt">Parallel</span> Laser Pulses Interaction with Foils</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wang, Wei-Quan; Yin, Yan; Zou, De-Bin; Yu, Tong-Pu; Yang, Xiao-Hu; Xu, Han; Yu, Ming-Yang; Ma, Yan-Yun; Zhuo, Hong-Bin; Shao, Fu-Qiu</p> <p>2014-11-01</p> <p>A new scheme of radiation pressure acceleration for generating high-quality protons by using two overlapping-<span class="hlt">parallel</span> laser pulses is proposed. Particle-in-cell <span class="hlt">simulation</span> shows that the overlapping of two pulses with identical Gaussian profiles in space and trapezoidal profiles in the time domain can result in a composite light pulse with a spatial profile suitable for stable acceleration of protons to high energies. At ~2.46 1021 W/cm2 intensity of the combination light pulse, a quasi-monoenergetic proton beam with peak energy ~200 MeV/nucleon, energy spread <15%, and divergency angle <4 is obtained, which is appropriate for tumor therapy. The proton beam quality can be controlled by adjusting the incidence points of two laser pulses.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014snam.conf04304F','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014snam.conf04304F"><span id="translatedtitle">Hybrid <span class="hlt">parallel</span> strategy for the <span class="hlt">simulation</span> of fast transient accidental situations at reactor scale</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Faucher, V.; Galon, P.; Beccantini, A.; Crouzet, F.; Debaud, F.; Gautier, T.</p> <p>2014-06-01</p> <p>This contribution is dedicated to the latest methodological developments implemented in the fast transient dynamics software EUROPLEXUS (EPX) to <span class="hlt">simulate</span> the mechanical response of fully coupled fluid-structure systems to accidental situations to be considered at reactor scale, among which the Loss of Coolant Accident, the Core Disruptive Accident and the Hydrogen Explosion. Time integration is explicit and the search for reference solutions within the safety framework prevents any simplification and approximations in the coupled algorithm: for instance, all kinematic constraints are dealt with using Lagrange Multipliers, yielding a complex flow chart when non-permanent constraints such as unilateral contact or immersed fluid-structure boundaries are considered. The <span class="hlt">parallel</span> acceleration of the solution process is then achieved through a hybrid approach, based on a weighted domain decomposition for distributed memory computing and the use of the KAAPI library for self-balanced shared memory processing inside subdomains.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/372178','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/372178"><span id="translatedtitle">A three-phase series-<span class="hlt">parallel</span> resonant converter -- analysis, design, <span class="hlt">simulation</span>, and experimental results</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Bhat, A.K.S.; Zheng, R.L.</p> <p>1996-07-01</p> <p>A three-phase dc-to-dc series-<span class="hlt">parallel</span> resonant converter is proposed /and its operating modes for a 180{degree} wide gating pulse scheme are explained. A detailed analysis of the converter using a constant current model and the Fourier series approach is presented. Based on the analysis, design curves are obtained and a design example of a 1-kW converter is given. SPICE <span class="hlt">simulation</span> results for the designed converter and experimental results for a 500-W converter are presented to verify the performance of the proposed converter for varying load conditions. The converter operates in lagging power factor (PF) mode for the entire load range and requires a narrow variation in switching frequency, to adequately regulate the output power.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_19");'>19</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li class="active"><span>21</span></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_21 --> <div id="page_22" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li class="active"><span>22</span></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="421"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014EPJWC..6702098R','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014EPJWC..6702098R"><span id="translatedtitle"><span class="hlt">Parallel</span> numerical <span class="hlt">simulation</span> of oscillating airfoil NACA0015 in the channel due to flutter instability</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>?idk, Vclav; idlof, Petr</p> <p>2014-03-01</p> <p>The work is devoted to 3D and 2D <span class="hlt">parallel</span> numerical computation of pressure and velocity fields around an elastically supported airfoil self-oscillating due to interaction with the airflow. Numerical solution is computed in the OpenFOAM package, an open-source software package based on finite volume method. Movement of airfoil is described by translation and rotation, identified from experimental data. A new boundary condition for the 2DOF motion of the airfoil was implemented. The results of numerical <span class="hlt">simulations</span> (velocity) are compared with data measured in a wind tunnel, where a physical model of NACA0015 airfoil was mounted and tuned to exhibit the flutter instability. The experimental results were obtained previously in the Institute of Thermomechanics by interferographic measurements in a subsonic wind tunnel in Nov Knn.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22043417','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22043417"><span id="translatedtitle">The role of the electron convection term for the <span class="hlt">parallel</span> electric field and electron acceleration in MHD <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Matsuda, K.; Terada, N.; Katoh, Y.; Misawa, H.</p> <p>2011-08-15</p> <p>There has been a great concern about the origin of the <span class="hlt">parallel</span> electric field in the frame of fluid equations in the auroral acceleration region. This paper proposes a new method to <span class="hlt">simulate</span> magnetohydrodynamic (MHD) equations that include the electron convection term and shows its efficiency with <span class="hlt">simulation</span> results in one dimension. We apply a third-order semi-discrete central scheme to investigate the characteristics of the electron convection term including its nonlinearity. At a steady state discontinuity, the sum of the ion and electron convection terms balances with the ion pressure gradient. We find that the electron convection term works like the gradient of the negative pressure and reduces the ion sound speed or amplifies the sound mode when <span class="hlt">parallel</span> current flows. The electron convection term enables us to describe a situation in which a <span class="hlt">parallel</span> electric field and <span class="hlt">parallel</span> electron acceleration coexist, which is impossible for ideal or resistive MHD.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1185588','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1185588"><span id="translatedtitle">Improving the Performance of the Extreme-scale <span class="hlt">Simulator</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Engelmann, Christian; Naughton III, Thomas J</p> <p>2014-01-01</p> <p>Investigating the performance of <span class="hlt">parallel</span> applications at scale on future high-performance computing (HPC) architectures and the performance impact of different architecture choices is an important component of HPC hardware/software co-design. The Extreme-scale <span class="hlt">Simulator</span> (xSim) is a <span class="hlt">simulation</span>-based toolkit for investigating the performance of <span class="hlt">parallel</span> applications at scale. xSim scales to millions of <span class="hlt">simulated</span> Message Passing Interface (MPI) processes. The overhead introduced by a <span class="hlt">simulation</span> tool is an important performance and productivity aspect. This paper documents two improvements to xSim: (1) a new deadlock resolution protocol to reduce the <span class="hlt">parallel</span> <span class="hlt">discrete</span> <span class="hlt">event</span> <span class="hlt">simulation</span> management overhead and (2) a new <span class="hlt">simulated</span> MPI message matching algorithm to reduce the oversubscription management overhead. The results clearly show a significant performance improvement, such as by reducing the <span class="hlt">simulation</span> overhead for running the NAS <span class="hlt">Parallel</span> Benchmark suite inside the <span class="hlt">simulator</span> from 1,020\\% to 238% for the conjugate gradient (CG) benchmark and from 102% to 0% for the embarrassingly <span class="hlt">parallel</span> (EP) and benchmark, as well as, from 37,511% to 13,808% for CG and from 3,332% to 204% for EP with accurate process failure <span class="hlt">simulation</span>.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/10108404','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/10108404"><span id="translatedtitle">Implementation of a <span class="hlt">parallel</span> algorithm for thermo-chemical nonequilibrium flow <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Wong, C.C.; Blottner, F.G.; Payne, J.L.; Soetrisno, M.</p> <p>1995-01-01</p> <p>Massively <span class="hlt">parallel</span> (MP) computing is considered to be the future direction of high performance computing. When engineers apply this new MP computing technology to solve large-scale problems, one major interest is what is the maximum problem size that a MP computer can handle. To determine the maximum size, it is important to address the code scalability issue. Scalability implies whether the code can provide an increase in performance proportional to an increase in problem size. If the size of the problem increases, by utilizing more computer nodes, the ideal elapsed time to <span class="hlt">simulate</span> a problem should not increase much. Hence one important task in the development of the MP computing technology is to ensure scalability. A scalable code is an efficient code. In order to obtain good scaled performance, it is necessary to first have the code optimized for a single node performance before proceeding to a large-scale <span class="hlt">simulation</span> with a large number of computer nodes. This paper will discuss the implementation of a massively <span class="hlt">parallel</span> computing strategy and the process of optimization to improve the scaled performance. Specifically, we will look at domain decomposition, resource management in the code, communication overhead, and problem mapping. By incorporating these improvements and adopting an efficient MP computing strategy, an efficiency of about 85% and 96%, respectively, has been achieved using 64 nodes on MP computers for both perfect gas and chemically reactive gas problems. A comparison of the performance between MP computers and a vectorized computer, such as Cray-YMP, will also be presented.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1165004','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1165004"><span id="translatedtitle">Acceleration of the matrix multiplication of Radiance three phase daylighting <span class="hlt">simulations</span> with <span class="hlt">parallel</span> computing on heterogeneous hardware of personal computer</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Zuo, Wangda; McNeil, Andrew; Wetter, Michael; Lee, Eleanor S.</p> <p>2013-05-23</p> <p>Building designers are increasingly relying on complex fenestration systems to reduce energy consumed for lighting and HVAC in low energy buildings. Radiance, a lighting <span class="hlt">simulation</span> program, has been used to conduct daylighting <span class="hlt">simulations</span> for complex fenestration systems. Depending on the configurations, the <span class="hlt">simulation</span> can take hours or even days using a personal computer. This paper describes how to accelerate the matrix multiplication portion of a Radiance three-phase daylight <span class="hlt">simulation</span> by conducting <span class="hlt">parallel</span> computing on heterogeneous hardware of a personal computer. The algorithm was optimized and the computational part was implemented in <span class="hlt">parallel</span> using OpenCL. The speed of new approach was evaluated using various daylighting <span class="hlt">simulation</span> cases on a multicore central processing unit and a graphics processing unit. Based on the measurements and analysis of the time usage for the Radiance daylighting <span class="hlt">simulation</span>, further speedups can be achieved by using fast I/O devices and storing the data in a binary format.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/979295','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/979295"><span id="translatedtitle"><span class="hlt">Parallel</span> Adaptive <span class="hlt">Simulation</span> of Weak and Strong Transverse-Wave Structures in H2-O2 Detonations</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Deiterding, Ralf</p> <p>2010-01-01</p> <p>Two- and three-dimensional <span class="hlt">simulation</span> results are presented that investigate at great detail the temporal evolution of Mach reflection sub-structure patterns intrinsic to gaseous detonation waves. High local resolution is achieved by utilizing a distributed memory <span class="hlt">parallel</span> shock-capturing finite volume code that employs block-structured dynamic mesh adaptation. The computational approach, the implemented <span class="hlt">parallelization</span> strategy, and the software design are discussed.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2006OcMod..14..139F','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2006OcMod..14..139F"><span id="translatedtitle">An unstructured-grid, finite-volume, nonhydrostatic, <span class="hlt">parallel</span> coastal ocean <span class="hlt">simulator</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Fringer, O. B.; Gerritsen, M.; Street, R. L.</p> <p></p> <p>A finite-volume formulation is presented that solves the three-dimensional, nonhydrostatic Navier-Stokes equations with the Boussinesq approximation on an unstructured, staggered, z-level grid, with the goal of <span class="hlt">simulating</span> nonhydrostatic processes in the coastal ocean with grid resolutions of tens of meters. In particular, the code has been developed to <span class="hlt">simulate</span> the nonlinear, nonhydrostatic internal wave field in the littoral ocean. The method is based on the formulation developed by Casulli, in that the free-surface and vertical diffusion are semi-implicit, thereby removing stability limitations associated with the surface gravity wave and vertical diffusion terms. The remaining terms in the momentum equations are discretized explicitly with the second-order Adams-Bashforth method, while the pressure-correction method is employed for the nonhydrostatic pressure in order to achieve overall second-order temporal accuracy. Advection of momentum is accomplished with an Eulerian discretization which conserves momentum in cells that do not contain the free surface, and scalar advection is discretized in a way that ensures consistency with continuity, thereby ensuring local and global mass conservation using a velocity field that conserves volume on a local and global basis. The nonhydrostatic pressure field is solved efficiently using a block-Jacobi preconditioner, and while stability is limited by the internal gravity wave speed and vertical advection of momentum, applications requiring relatively small time steps due to accuracy or stability constraints are run efficiently on <span class="hlt">parallel</span> computers, since the present formulation is written entirely with the message-passing interface (MPI). The ParMETIS libraries are employed in order to achieve a load-balanced <span class="hlt">parallel</span> partitioning that minimizes interprocessor communication, and the grid is reordered to optimize per-processor performance by limiting cache misses while accessing arrays in memory. Test cases demonstrate the ability of the code to efficiently and accurately compute the nonhydrostatic lock exchange and internal waves in idealized as well as real domains, and we evaluate the <span class="hlt">parallel</span> efficiency of the code using up to 32 processors.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/890856','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/890856"><span id="translatedtitle">A Multi-Bunch, Three-Dimensional, Strong-Strong Beam-Beam <span class="hlt">Simulation</span> Code for <span class="hlt">Parallel</span> Computers</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Cai, Y.; Kabel, A.C.; /SLAC</p> <p>2005-05-11</p> <p>For <span class="hlt">simulating</span> the strong-strong beam-beam effect, using Particle-In-Cell codes has become one of the methods of choice. While the two-dimensional problem is readily treatable using PC-class machines, the three-dimensional problem, i.e., a problem encompassing hourglass and phase-averaging effects, requires the use of <span class="hlt">parallel</span> processors. In this paper, we introduce a strong-strong code NIMZOVICH, which was specifically designed for <span class="hlt">parallel</span> processors and which is optimally used for many bunches and parasitic crossings. We describe the <span class="hlt">parallelization</span> scheme and give some benchmarking results.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013JChPh.139v4706N','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013JChPh.139v4706N"><span id="translatedtitle"><span class="hlt">Parallel</span> kinetic Monte Carlo <span class="hlt">simulation</span> framework incorporating accurate models of adsorbate lateral interactions</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Nielsen, Jens; d'Avezac, Mayeul; Hetherington, James; Stamatakis, Michail</p> <p>2013-12-01</p> <p>Ab initio kinetic Monte Carlo (KMC) <span class="hlt">simulations</span> have been successfully applied for over two decades to elucidate the underlying physico-chemical phenomena on the surfaces of heterogeneous catalysts. These <span class="hlt">simulations</span> necessitate detailed knowledge of the kinetics of elementary reactions constituting the reaction mechanism, and the energetics of the species participating in the chemistry. The information about the energetics is encoded in the formation energies of gas and surface-bound species, and the lateral interactions between adsorbates on the catalytic surface, which can be modeled at different levels of detail. The majority of previous works accounted for only pairwise-additive first nearest-neighbor interactions. More recently, cluster-expansion Hamiltonians incorporating long-range interactions and many-body terms have been used for detailed estimations of catalytic rate [C. Wu, D. J. Schmidt, C. Wolverton, and W. F. Schneider, J. Catal. 286, 88 (2012)]. In view of the increasing interest in accurate predictions of catalytic performance, there is a need for general-purpose KMC approaches incorporating detailed cluster expansion models for the adlayer energetics. We have addressed this need by building on the previously introduced graph-theoretical KMC framework, and we have developed Zacros, a FORTRAN2003 KMC package for <span class="hlt">simulating</span> catalytic chemistries. To tackle the high computational cost in the presence of long-range interactions we introduce <span class="hlt">parallelization</span> with OpenMP. We further benchmark our framework by <span class="hlt">simulating</span> a KMC analogue of the NO oxidation system established by Schneider and co-workers [J. Catal. 286, 88 (2012)]. We show that taking into account only first nearest-neighbor interactions may lead to large errors in the prediction of the catalytic rate, whereas for accurate estimates thereof, one needs to include long-range terms in the cluster expansion.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016CoPhC.200...57J','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016CoPhC.200...57J"><span id="translatedtitle"><span class="hlt">Parallel</span> implementation of 3D FFT with volumetric decomposition schemes for efficient molecular dynamics <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Jung, Jaewoon; Kobayashi, Chigusa; Imamura, Toshiyuki; Sugita, Yuji</p> <p>2016-03-01</p> <p>Three-dimensional Fast Fourier Transform (3D FFT) plays an important role in a wide variety of computer <span class="hlt">simulations</span> and data analyses, including molecular dynamics (MD) <span class="hlt">simulations</span>. In this study, we develop hybrid (MPI+OpenMP) <span class="hlt">parallelization</span> schemes of 3D FFT based on two new volumetric decompositions, mainly for the particle mesh Ewald (PME) calculation in MD <span class="hlt">simulations</span>. In one scheme, (1d_Alltoall), five all-to-all communications in one dimension are carried out, and in the other, (2d_Alltoall), one two-dimensional all-to-all communication is combined with two all-to-all communications in one dimension. 2d_Alltoall is similar to the conventional volumetric decomposition scheme. We performed benchmark tests of 3D FFT for the systems with different grid sizes using a large number of processors on the K computer in RIKEN AICS. The two schemes show comparable performances, and are better than existing 3D FFTs. The performances of 1d_Alltoall and 2d_Alltoall depend on the supercomputer network system and number of processors in each dimension. There is enough leeway for users to optimize performance for their conditions. In the PME method, short-range real-space interactions as well as long-range reciprocal-space interactions are calculated. Our volumetric decomposition schemes are particularly useful when used in conjunction with the recently developed midpoint cell method for short-range interactions, due to the same decompositions of real and reciprocal spaces. The 1d_Alltoall scheme of 3D FFT takes 4.7 ms to <span class="hlt">simulate</span> one MD cycle for a virus system containing more than 1 million atoms using 32,768 cores on the K computer.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/24329081','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/24329081"><span id="translatedtitle"><span class="hlt">Parallel</span> kinetic Monte Carlo <span class="hlt">simulation</span> framework incorporating accurate models of adsorbate lateral interactions.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Nielsen, Jens; d'Avezac, Mayeul; Hetherington, James; Stamatakis, Michail</p> <p>2013-12-14</p> <p>Ab initio kinetic Monte Carlo (KMC) <span class="hlt">simulations</span> have been successfully applied for over two decades to elucidate the underlying physico-chemical phenomena on the surfaces of heterogeneous catalysts. These <span class="hlt">simulations</span> necessitate detailed knowledge of the kinetics of elementary reactions constituting the reaction mechanism, and the energetics of the species participating in the chemistry. The information about the energetics is encoded in the formation energies of gas and surface-bound species, and the lateral interactions between adsorbates on the catalytic surface, which can be modeled at different levels of detail. The majority of previous works accounted for only pairwise-additive first nearest-neighbor interactions. More recently, cluster-expansion Hamiltonians incorporating long-range interactions and many-body terms have been used for detailed estimations of catalytic rate [C. Wu, D. J. Schmidt, C. Wolverton, and W. F. Schneider, J. Catal. 286, 88 (2012)]. In view of the increasing interest in accurate predictions of catalytic performance, there is a need for general-purpose KMC approaches incorporating detailed cluster expansion models for the adlayer energetics. We have addressed this need by building on the previously introduced graph-theoretical KMC framework, and we have developed Zacros, a FORTRAN2003 KMC package for <span class="hlt">simulating</span> catalytic chemistries. To tackle the high computational cost in the presence of long-range interactions we introduce <span class="hlt">parallelization</span> with OpenMP. We further benchmark our framework by <span class="hlt">simulating</span> a KMC analogue of the NO oxidation system established by Schneider and co-workers [J. Catal. 286, 88 (2012)]. We show that taking into account only first nearest-neighbor interactions may lead to large errors in the prediction of the catalytic rate, whereas for accurate estimates thereof, one needs to include long-range terms in the cluster expansion. PMID:24329081</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22253805','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22253805"><span id="translatedtitle"><span class="hlt">Parallel</span> kinetic Monte Carlo <span class="hlt">simulation</span> framework incorporating accurate models of adsorbate lateral interactions</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Nielsen, Jens; DAvezac, Mayeul; Hetherington, James; Stamatakis, Michail</p> <p>2013-12-14</p> <p>Ab initio kinetic Monte Carlo (KMC) <span class="hlt">simulations</span> have been successfully applied for over two decades to elucidate the underlying physico-chemical phenomena on the surfaces of heterogeneous catalysts. These <span class="hlt">simulations</span> necessitate detailed knowledge of the kinetics of elementary reactions constituting the reaction mechanism, and the energetics of the species participating in the chemistry. The information about the energetics is encoded in the formation energies of gas and surface-bound species, and the lateral interactions between adsorbates on the catalytic surface, which can be modeled at different levels of detail. The majority of previous works accounted for only pairwise-additive first nearest-neighbor interactions. More recently, cluster-expansion Hamiltonians incorporating long-range interactions and many-body terms have been used for detailed estimations of catalytic rate [C. Wu, D. J. Schmidt, C. Wolverton, and W. F. Schneider, J. Catal. 286, 88 (2012)]. In view of the increasing interest in accurate predictions of catalytic performance, there is a need for general-purpose KMC approaches incorporating detailed cluster expansion models for the adlayer energetics. We have addressed this need by building on the previously introduced graph-theoretical KMC framework, and we have developed Zacros, a FORTRAN2003 KMC package for <span class="hlt">simulating</span> catalytic chemistries. To tackle the high computational cost in the presence of long-range interactions we introduce <span class="hlt">parallelization</span> with OpenMP. We further benchmark our framework by <span class="hlt">simulating</span> a KMC analogue of the NO oxidation system established by Schneider and co-workers [J. Catal. 286, 88 (2012)]. We show that taking into account only first nearest-neighbor interactions may lead to large errors in the prediction of the catalytic rate, whereas for accurate estimates thereof, one needs to include long-range terms in the cluster expansion.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/5501769','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/5501769"><span id="translatedtitle">Knowledge-based environment for hierarchical modeling and <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Kim, Taggon.</p> <p>1988-01-01</p> <p>This dissertation develops a knowledge-based environment for hierarchical modeling and <span class="hlt">simulation</span> of <span class="hlt">discrete-event</span> systems as the major part of a longer, ongoing research project in artificial intelligence and distributed <span class="hlt">simulation</span>. In developing the environment, a knowledge representation framework for modeling and <span class="hlt">simulation</span>, which unifies structural and behavioral knowledge of <span class="hlt">simulation</span> models, is proposed by incorporating knowledge-representation schemes in artificial intelligence within <span class="hlt">simulation</span> models. The knowledge base created using the framework is composed of a structural knowledge base called entity structure base and a behavioral knowledge base called model base. The DEVS-Scheme, a realization of DEVS (<span class="hlt">Discrete</span> <span class="hlt">Event</span> System Specifiation) formalism in a LISP-based, object-oriented environment, is extended to facilitate the specification of behavioral knowledge of models, especially for kernel models that are suited to model massively <span class="hlt">parallel</span> computer architectures. The ESP Scheme, a realization of entity structure formalism in a frame-theoretic representation, is extended to represent structural knowledge of models and to manage it in the structural knowledge base.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015PhRvE..92a3303W','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015PhRvE..92a3303W"><span id="translatedtitle">Comparing Monte Carlo methods for finding ground states of Ising spin glasses: Population annealing, <span class="hlt">simulated</span> annealing, and <span class="hlt">parallel</span> tempering</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Wang, Wenlong; Machta, Jonathan; Katzgraber, Helmut G.</p> <p>2015-07-01</p> <p>Population annealing is a Monte Carlo algorithm that marries features from <span class="hlt">simulated</span>-annealing and <span class="hlt">parallel</span>-tempering Monte Carlo. As such, it is ideal to overcome large energy barriers in the free-energy landscape while minimizing a Hamiltonian. Thus, population-annealing Monte Carlo can be used as a heuristic to solve combinatorial optimization problems. We illustrate the capabilities of population-annealing Monte Carlo by computing ground states of the three-dimensional Ising spin glass with Gaussian disorder, while comparing to <span class="hlt">simulated</span>-annealing and <span class="hlt">parallel</span>-tempering Monte Carlo. Our results suggest that population annealing Monte Carlo is significantly more efficient than <span class="hlt">simulated</span> annealing but comparable to <span class="hlt">parallel</span>-tempering Monte Carlo for finding spin-glass ground states.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/26274303','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/26274303"><span id="translatedtitle">Comparing Monte Carlo methods for finding ground states of Ising spin glasses: Population annealing, <span class="hlt">simulated</span> annealing, and <span class="hlt">parallel</span> tempering.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Wang, Wenlong; Machta, Jonathan; Katzgraber, Helmut G</p> <p>2015-07-01</p> <p>Population annealing is a Monte Carlo algorithm that marries features from <span class="hlt">simulated</span>-annealing and <span class="hlt">parallel</span>-tempering Monte Carlo. As such, it is ideal to overcome large energy barriers in the free-energy landscape while minimizing a Hamiltonian. Thus, population-annealing Monte Carlo can be used as a heuristic to solve combinatorial optimization problems. We illustrate the capabilities of population-annealing Monte Carlo by computing ground states of the three-dimensional Ising spin glass with Gaussian disorder, while comparing to <span class="hlt">simulated</span>-annealing and <span class="hlt">parallel</span>-tempering Monte Carlo. Our results suggest that population annealing Monte Carlo is significantly more efficient than <span class="hlt">simulated</span> annealing but comparable to <span class="hlt">parallel</span>-tempering Monte Carlo for finding spin-glass ground states. PMID:26274303</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/23833331','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/23833331"><span id="translatedtitle">A <span class="hlt">parallel</span> overset-curvilinear-immersed boundary framework for <span class="hlt">simulating</span> complex 3D incompressible flows.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Borazjani, Iman; Ge, Liang; Le, Trung; Sotiropoulos, Fotis</p> <p>2013-04-01</p> <p>We develop an overset-curvilinear immersed boundary (overset-CURVIB) method in a general non-inertial frame of reference to <span class="hlt">simulate</span> a wide range of challenging biological flow problems. The method incorporates overset-curvilinear grids to efficiently handle multi-connected geometries and increase the resolution locally near immersed boundaries. Complex bodies undergoing arbitrarily large deformations may be embedded within the overset-curvilinear background grid and treated as sharp interfaces using the curvilinear immersed boundary (CURVIB) method (Ge and Sotiropoulos, Journal of Computational Physics, 2007). The incompressible flow equations are formulated in a general non-inertial frame of reference to enhance the overall versatility and efficiency of the numerical approach. Efficient search algorithms to identify areas requiring blanking, donor cells, and interpolation coefficients for constructing the boundary conditions at grid interfaces of the overset grid are developed and implemented using efficient <span class="hlt">parallel</span> computing communication strategies to transfer information among sub-domains. The governing equations are discretized using a second-order accurate finite-volume approach and integrated in time via an efficient fractional-step method. Various strategies for ensuring globally conservative interpolation at grid interfaces suitable for incompressible flow fractional step methods are implemented and evaluated. The method is verified and validated against experimental data, and its capabilities are demonstrated by <span class="hlt">simulating</span> the flow past multiple aquatic swimmers and the systolic flow in an anatomic left ventricle with a mechanical heart valve implanted in the aortic position. PMID:23833331</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3699968','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3699968"><span id="translatedtitle">A <span class="hlt">parallel</span> overset-curvilinear-immersed boundary framework for <span class="hlt">simulating</span> complex 3D incompressible flows</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Borazjani, Iman; Ge, Liang; Le, Trung; Sotiropoulos, Fotis</p> <p>2013-01-01</p> <p>We develop an overset-curvilinear immersed boundary (overset-CURVIB) method in a general non-inertial frame of reference to <span class="hlt">simulate</span> a wide range of challenging biological flow problems. The method incorporates overset-curvilinear grids to efficiently handle multi-connected geometries and increase the resolution locally near immersed boundaries. Complex bodies undergoing arbitrarily large deformations may be embedded within the overset-curvilinear background grid and treated as sharp interfaces using the curvilinear immersed boundary (CURVIB) method (Ge and Sotiropoulos, Journal of Computational Physics, 2007). The incompressible flow equations are formulated in a general non-inertial frame of reference to enhance the overall versatility and efficiency of the numerical approach. Efficient search algorithms to identify areas requiring blanking, donor cells, and interpolation coefficients for constructing the boundary conditions at grid interfaces of the overset grid are developed and implemented using efficient <span class="hlt">parallel</span> computing communication strategies to transfer information among sub-domains. The governing equations are discretized using a second-order accurate finite-volume approach and integrated in time via an efficient fractional-step method. Various strategies for ensuring globally conservative interpolation at grid interfaces suitable for incompressible flow fractional step methods are implemented and evaluated. The method is verified and validated against experimental data, and its capabilities are demonstrated by <span class="hlt">simulating</span> the flow past multiple aquatic swimmers and the systolic flow in an anatomic left ventricle with a mechanical heart valve implanted in the aortic position. PMID:23833331</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20120014386','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20120014386"><span id="translatedtitle"><span class="hlt">Simulated</span> Wake Characteristics Data for Closely Spaced <span class="hlt">Parallel</span> Runway Operations Analysis</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Guerreiro, Nelson M.; Neitzke, Kurt W.</p> <p>2012-01-01</p> <p>A <span class="hlt">simulation</span> experiment was performed to generate and compile wake characteristics data relevant to the evaluation and feasibility analysis of closely spaced <span class="hlt">parallel</span> runway (CSPR) operational concepts. While the experiment in this work is not tailored to any particular operational concept, the generated data applies to the broader class of CSPR concepts, where a trailing aircraft on a CSPR approach is required to stay ahead of the wake vortices generated by a lead aircraft on an adjacent CSPR. Data for wake age, circulation strength, and wake altitude change, at various lateral offset distances from the wake-generating lead aircraft approach path were compiled for a set of nine aircraft spanning the full range of FAA and ICAO wake classifications. A total of 54 scenarios were <span class="hlt">simulated</span> to generate data related to key parameters that determine wake behavior. Of particular interest are wake age characteristics that can be used to evaluate both time- and distance- based in-trail separation concepts for all aircraft wake-class combinations. A simple first-order difference model was developed to enable the computation of wake parameter estimates for aircraft models having weight, wingspan and speed characteristics similar to those of the nine aircraft modeled in this work.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1998APS..DFD..GH04S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1998APS..DFD..GH04S"><span id="translatedtitle">Computational aeroacoustic <span class="hlt">simulation</span> of flow-induced cavity noise using <span class="hlt">parallel</span> computers</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Shieh, Chingwei M.; Morris, Philip J.</p> <p>1998-11-01</p> <p>A <span class="hlt">parallel</span>, multiblock, high-order accurate code has been developed for cavity noise prediction. An implicit, second-order time accurate, dual time-stepping algorithm that has shown promise in viscous aeroacoustic <span class="hlt">simulations</span> has been implemented for the long time integration. The accuracy of the solution obtained with this method is comparable to typical explicit computational aeroacoustic algorithms, but eliminate the stringent time step requirement due to numerical stability for such schemes. Inner fictitious subiterations are performed with a four-stage Runge-Kutta method, with the implementation of multigrid and implicit residual smoothing to accelerate convergence. To account for the turbulent nature of the flow, the one-equation turbulence model has been used in the analysis. Far-field acoustic data are extrapolated from the near-field flow solution with the use of the Ffowcs-Williams and Hawkings equation. Sound generation has been <span class="hlt">simulated</span> from two-dimensional cavities of various length-to-depth ratios in the subsonic flow regime. The mechanisms for cavity noise generation are discussed.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/929325','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/929325"><span id="translatedtitle">Progress on H5Part: A Portable High Performance <span class="hlt">Parallel</span> DataInterface for Electromagnetics <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Adelmann, Andreas; Gsell, Achim; Oswald, Benedikt; Schietinger,Thomas; Bethel, Wes; Shalf, John; Siegerist, Cristina; Stockinger, Kurt</p> <p>2007-06-22</p> <p>Significant problems facing all experimental andcomputationalsciences arise from growing data size and complexity. Commonto allthese problems is the need to perform efficient data I/O ondiversecomputer architectures. In our scientific application, thelargestparallel particle <span class="hlt">simulations</span> generate vast quantitiesofsix-dimensional data. Such a <span class="hlt">simulation</span> run produces data foranaggregate data size up to several TB per run. Motived by the needtoaddress data I/O and access challenges, we have implemented H5Part,anopen source data I/O API that simplifies the use of the HierarchicalDataFormat v5 library (HDF5). HDF5 is an industry standard forhighperformance, cross-platform data storage and retrieval that runsonall contemporary architectures from large <span class="hlt">parallel</span> supercomputerstolaptops. H5Part, which is oriented to the needs of the particlephysicsand cosmology communities, provides support for parallelstorage andretrieval of particles, structured and in the future unstructuredmeshes.In this paper, we describe recent work focusing on I/O supportforparticles and structured meshes and provide data showing performance onmodernsupercomputer architectures like the IBM POWER 5.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_20");'>20</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li class="active"><span>22</span></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_22 --> <div id="page_23" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li class="active"><span>23</span></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li><a href="#" onclick='return showDiv("page_25");'>25</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="441"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2016NewA...43...49B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2016NewA...43...49B"><span id="translatedtitle">Radiation hydrodynamics using characteristics on adaptive decomposed domains for massively <span class="hlt">parallel</span> star formation <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Buntemeyer, Lars; Banerjee, Robi; Peters, Thomas; Klassen, Mikhail; Pudritz, Ralph E.</p> <p>2016-02-01</p> <p>We present an algorithm for solving the radiative transfer problem on massively <span class="hlt">parallel</span> computers using adaptive mesh refinement and domain decomposition. The solver is based on the method of characteristics which requires an adaptive raytracer that integrates the equation of radiative transfer. The radiation field is split into local and global components which are handled separately to overcome the non-locality problem. The solver is implemented in the framework of the magneto-hydrodynamics code FLASH and is coupled by an operator splitting step. The goal is the study of radiation in the context of star formation <span class="hlt">simulations</span> with a focus on early disc formation and evolution. This requires a proper treatment of radiation physics that covers both the optically thin as well as the optically thick regimes and the transition region in particular. We successfully show the accuracy and feasibility of our method in a series of standard radiative transfer problems and two 3D collapse <span class="hlt">simulations</span> resembling the early stages of protostar and disc formation.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22270730','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22270730"><span id="translatedtitle">MONTE CARLO <span class="hlt">SIMULATIONS</span> OF NONLINEAR PARTICLE ACCELERATION IN <span class="hlt">PARALLEL</span> TRANS-RELATIVISTIC SHOCKS</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Ellison, Donald C.; Warren, Donald C.; Bykov, Andrei M. E-mail: ambykov@yahoo.com</p> <p>2013-10-10</p> <p>We present results from a Monte Carlo <span class="hlt">simulation</span> of a <span class="hlt">parallel</span> collisionless shock undergoing particle acceleration. Our <span class="hlt">simulation</span>, which contains parameterized scattering and a particular thermal leakage injection model, calculates the feedback between accelerated particles ahead of the shock, which influence the shock precursor and 'smooth' the shock, and thermal particle injection. We show that there is a transition between nonrelativistic shocks, where the acceleration efficiency can be extremely high and the nonlinear compression ratio can be substantially greater than the Rankine-Hugoniot value, and fully relativistic shocks, where diffusive shock acceleration is less efficient and the compression ratio remains at the Rankine-Hugoniot value. This transition occurs in the trans-relativistic regime and, for the particular parameters we use, occurs around a shock Lorentz factor ?{sub 0} = 1.5. We also find that nonlinear shock smoothing dramatically reduces the acceleration efficiency presumed to occur with large-angle scattering in ultra-relativistic shocks. Our ability to seamlessly treat the transition from ultra-relativistic to trans-relativistic to nonrelativistic shocks may be important for evolving relativistic systems, such as gamma-ray bursts and Type Ibc supernovae. We expect a substantial evolution of shock accelerated spectra during this transition from soft early on to much harder when the blast-wave shock becomes nonrelativistic.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://ntrs.nasa.gov/search.jsp?R=19910032090&hterms=computer+architecture&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D10%26Ntt%3Dcomputer%2Barchitecture','NASA-TRS'); return false;" href="http://ntrs.nasa.gov/search.jsp?R=19910032090&hterms=computer+architecture&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D10%26Ntt%3Dcomputer%2Barchitecture"><span id="translatedtitle">Implementation of a blade element UH-60 helicopter <span class="hlt">simulation</span> on a <span class="hlt">parallel</span> computer architecture in real-time</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Moxon, Bruce C.; Green, John A.</p> <p>1990-01-01</p> <p>A high-performance platform for development of real-time helicopter flight <span class="hlt">simulations</span> based on a <span class="hlt">simulation</span> development and analysis platform combining a <span class="hlt">parallel</span> <span class="hlt">simulation</span> development and analysis environment with a scalable multiprocessor computer system is described. <span class="hlt">Simulation</span> functional decomposition is covered, including the sequencing and data dependency of <span class="hlt">simulation</span> modules and <span class="hlt">simulation</span> functional mapping to multiple processors. The multiprocessor-based implementation of a blade-element <span class="hlt">simulation</span> of the UH-60 helicopter is presented, and a prototype developed for a TC2000 computer is generalized in order to arrive at a portable multiprocessor software architecture. It is pointed out that the proposed approach coupled with a pilot's station creates a setting in which <span class="hlt">simulation</span> engineers, computer scientists, and pilots can work together in the design and evaluation of advanced real-time helicopter <span class="hlt">simulations</span>.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1009919','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1009919"><span id="translatedtitle">On Deciding between Conservative and Optimistic Approaches on Massively <span class="hlt">Parallel</span> Platforms</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Carothers, Prof. Christopher D.; Perumalla, Kalyan S</p> <p>2010-01-01</p> <p>Over 5000 publications on <span class="hlt">parallel</span> <span class="hlt">discrete</span> <span class="hlt">event</span> <span class="hlt">simulation</span> (PDES) have appeared in the literature to date. Nevertheless, few articles have focused on empirical studies of PDES performance on large supercomputer-based systems. This gap is bridged here, by undertaking a parameterized performance study on thousands of processor cores of a Blue Gene supercomputing system. In contrast to theoretical insights from analytical studies, our study is based on actual implementation in software, incurring the actual messaging and computational overheads for both conservative and optimistic synchronization approaches of PDES. Complex and counter-intuitive effects are uncovered and analyzed, with different event timestamp distributions and available levels of concurrency in the synthetic benchmark models. The results are intended to provide guidance to the PDES community in terms of how the synchronization protocols behave at high processor core counts using a state-of-the-art supercomputing systems.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/28186','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/28186"><span id="translatedtitle">Automated integration of genomic physical mapping data via <span class="hlt">parallel</span> <span class="hlt">simulated</span> annealing</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Slezak, T.</p> <p>1994-06-01</p> <p>The Human Genome Center at the Lawrence Livermore National Laboratory (LLNL) is nearing closure on a high-resolution physical map of human chromosome 19. We have build automated tools to assemble 15,000 fingerprinted cosmid clones into 800 contigs with minimal spanning paths identified. These islands are being ordered, oriented, and spanned by a variety of other techniques including: Fluorescence Insitu Hybridization (FISH) at 3 levels of resolution, ECO restriction fragment mapping across all contigs, and a multitude of different hybridization and PCR techniques to link cosmid, YAC, AC, PAC, and Pl clones. The FISH data provide us with partial order and distance data as well as orientation. We made the observation that map builders need a much rougher presentation of data than do map readers; the former wish to see raw data since these can expose errors or interesting biology. We further noted that by ignoring our length and distance data we could simplify our problem into one that could be readily attacked with optimization techniques. The data integration problem could then be seen as an M x N ordering of our N cosmid clones which ``intersect`` M larger objects by defining ``intersection`` to mean either contig/map membership or hybridization results. Clearly, the goal of making an integrated map is now to rearrange the N cosmid clone ``columns`` such that the number of gaps on the object ``rows`` are minimized. Our FISH partially-ordered cosmid clones provide us with a set of constraints that cannot be violated by the rearrangement process. We solved the optimization problem via <span class="hlt">simulated</span> annealing performed on a network of 40+ Unix machines in <span class="hlt">parallel</span>, using a server/client model built on explicit socket calls. For current maps we can create a map in about 4 hours on the <span class="hlt">parallel</span> net versus 4+ days on a single workstation. Our biologists are now using this software on a daily basis to guide their efforts toward final closure.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19930002341','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19930002341"><span id="translatedtitle">Direct numerical <span class="hlt">simulation</span> of instabilities in <span class="hlt">parallel</span> flow with spherical roughness elements</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Deanna, R. G.</p> <p>1992-01-01</p> <p>Results from a direct numerical <span class="hlt">simulation</span> of laminar flow over a flat surface with spherical roughness elements using a spectral-element method are given. The numerical <span class="hlt">simulation</span> approximates roughness as a cellular pattern of identical spheres protruding from a smooth wall. Periodic boundary conditions on the domain's horizontal faces <span class="hlt">simulate</span> an infinite array of roughness elements extending in the streamwise and spanwise directions, which implies the <span class="hlt">parallel</span>-flow assumption, and results in a closed domain. A body force, designed to yield the horizontal Blasius velocity in the absence of roughness, sustains the flow. Instabilities above a critical Reynolds number reveal negligible oscillations in the recirculation regions behind each sphere and in the free stream, high-amplitude oscillations in the layer directly above the spheres, and a mean profile with an inflection point near the sphere's crest. The inflection point yields an unstable layer above the roughness (where U''(y) is less than 0) and a stable region within the roughness (where U''(y) is greater than 0). Evidently, the instability begins when the low-momentum or wake region behind an element, being the region most affected by disturbances (purely numerical in this case), goes unstable and moves. In compressible flow with periodic boundaries, this motion sends disturbances to all regions of the domain. In the unstable layer just above the inflection point, the disturbances grow while being carried downstream with a propagation speed equal to the local mean velocity; they do not grow amid the low energy region near the roughness patch. The most amplified disturbance eventually arrives at the next roughness element downstream, perturbing its wake and inducing a global response at a frequency governed by the streamwise spacing between spheres and the mean velocity of the most amplified layer.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1992STIN...9311529D','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1992STIN...9311529D"><span id="translatedtitle">Direct numerical <span class="hlt">simulation</span> of instabilities in <span class="hlt">parallel</span> flow with spherical roughness elements</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Deanna, R. G.</p> <p>1992-08-01</p> <p>Results from a direct numerical <span class="hlt">simulation</span> of laminar flow over a flat surface with spherical roughness elements using a spectral-element method are given. The numerical <span class="hlt">simulation</span> approximates roughness as a cellular pattern of identical spheres protruding from a smooth wall. Periodic boundary conditions on the domain's horizontal faces <span class="hlt">simulate</span> an infinite array of roughness elements extending in the streamwise and spanwise directions, which implies the <span class="hlt">parallel</span>-flow assumption, and results in a closed domain. A body force, designed to yield the horizontal Blasius velocity in the absence of roughness, sustains the flow. Instabilities above a critical Reynolds number reveal negligible oscillations in the recirculation regions behind each sphere and in the free stream, high-amplitude oscillations in the layer directly above the spheres, and a mean profile with an inflection point near the sphere's crest. The inflection point yields an unstable layer above the roughness (where U''(y) is less than 0) and a stable region within the roughness (where U''(y) is greater than 0). Evidently, the instability begins when the low-momentum or wake region behind an element, being the region most affected by disturbances (purely numerical in this case), goes unstable and moves. In compressible flow with periodic boundaries, this motion sends disturbances to all regions of the domain. In the unstable layer just above the inflection point, the disturbances grow while being carried downstream with a propagation speed equal to the local mean velocity; they do not grow amid the low energy region near the roughness patch. The most amplified disturbance eventually arrives at the next roughness element downstream, perturbing its wake and inducing a global response at a frequency governed by the streamwise spacing between spheres and the mean velocity of the most amplified layer.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/5289523','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/5289523"><span id="translatedtitle">Particle <span class="hlt">simulation</span> on radio frequency stabilization of flute modes in a tandem mirror. I. <span class="hlt">Parallel</span> antenna</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Kadoya, Y.; Abe, H.</p> <p>1988-04-01</p> <p>A two- and one-half-dimensional electromagnetic particle code (PS2M) (H. Abe and S. Nakajima, J. Phys. Soc. Jpn. 53, xxx (1987)) is used to study how an electric field applied <span class="hlt">parallel</span> to the magnetic field affects the radio frequency stabilization of flute modes in a tandem mirror plasma. The <span class="hlt">parallel</span> electric field E/sub <span class="hlt">parallel</span>/ perturbs the electron velocity v/sub <span class="hlt">parallel</span>/ <span class="hlt">parallel</span> to the magnetic field and also induces a perpendicular magnetic field perturbation B/sub perpendicular/. The unstable growth of the flute mode in the absence of such a radio frequency electric field is first studied as a basis for comparison. The ponderomotive force originating from the time-averaged product <v/sub <span class="hlt">parallel</span>/B/sub perpendicular/> is then shown to stabilize the flute modes. The stabilizing wave power threshold, the frequency dependency, and the dependence on delchemically bondE/sub <span class="hlt">parallel</span>/chemically bond all agree with the theoretical predictions.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20040111318','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20040111318"><span id="translatedtitle">Scalability of <span class="hlt">Parallel</span> Spatial Direct Numerical <span class="hlt">Simulations</span> on Intel Hypercube and IBM SP1 and SP2</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Joslin, Ronald D.; Hanebutte, Ulf R.; Zubair, Mohammad</p> <p>1995-01-01</p> <p>The implementation and performance of a <span class="hlt">parallel</span> spatial direct numerical <span class="hlt">simulation</span> (PSDNS) approach on the Intel iPSC/860 hypercube and IBM SP1 and SP2 <span class="hlt">parallel</span> computers is documented. Spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows are computed with the PSDNS code. The feasibility of using the PSDNS to perform transition studies on these computers is examined. The results indicate that PSDNS approach can effectively be <span class="hlt">parallelized</span> on a distributed-memory <span class="hlt">parallel</span> machine by remapping the distributed data structure during the course of the calculation. Scalability information is provided to estimate computational costs to match the actual costs relative to changes in the number of grid points. By increasing the number of processors, slower than linear speedups are achieved with optimized (machine-dependent library) routines. This slower than linear speedup results because the computational cost is dominated by FFT routine, which yields less than ideal speedups. By using appropriate compile options and optimized library routines on the SP1, the serial code achieves 52-56 M ops on a single node of the SP1 (45 percent of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a "real world" <span class="hlt">simulation</span> that consists of 1.7 million grid points. One time step of this <span class="hlt">simulation</span> is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP supercomputer. For the same <span class="hlt">simulation</span>, 32-nodes of the SP1 and SP2 are required to reach the performance of a Cray C-90. A 32 node SP1 (SP2) configuration is 2.9 (4.6) times faster than a Cray Y/MP for this <span class="hlt">simulation</span>, while the hypercube is roughly 2 times slower than the Y/MP for this application. KEY WORDS: Spatial direct numerical <span class="hlt">simulations</span>; incompressible viscous flows; spectral methods; finite differences; <span class="hlt">parallel</span> computing.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/988956','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/988956"><span id="translatedtitle">Mesoscale <span class="hlt">Simulations</span> of Particulate Flows with <span class="hlt">Parallel</span> Distributed Lagrange Multiplier Technique</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Kanarska, Y</p> <p>2010-03-24</p> <p>Fluid particulate flows are common phenomena in nature and industry. Modeling of such flows at micro and macro levels as well establishing relationships between these approaches are needed to understand properties of the particulate matter. We propose a computational technique based on the direct numerical <span class="hlt">simulation</span> of the particulate flows. The numerical method is based on the distributed Lagrange multiplier technique following the ideas of Glowinski et al. (1999). Each particle is explicitly resolved on an Eulerian grid as a separate domain, using solid volume fractions. The fluid equations are solved through the entire computational domain, however, Lagrange multiplier constrains are applied inside the particle domain such that the fluid within any volume associated with a solid particle moves as an incompressible rigid body. Mutual forces for the fluid-particle interactions are internal to the system. Particles interact with the fluid via fluid dynamic equations, resulting in implicit fluid-rigid-body coupling relations that produce realistic fluid flow around the particles (i.e., no-slip boundary conditions). The particle-particle interactions are implemented using explicit force-displacement interactions for frictional inelastic particles similar to the DEM method of Cundall et al. (1979) with some modifications using a volume of an overlapping region as an input to the contact forces. The method is flexible enough to handle arbitrary particle shapes and size distributions. A <span class="hlt">parallel</span> implementation of the method is based on the SAMRAI (Structured Adaptive Mesh Refinement Application Infrastructure) library, which allows handling of large amounts of rigid particles and enables local grid refinement. Accuracy and convergence of the presented method has been tested against known solutions for a falling sphere as well as by examining fluid flows through stationary particle beds (periodic and cubic packing). To evaluate code performance and validate particle contact physics algorithm, we performed <span class="hlt">simulations</span> of a representative experiment conducted at the University of California at Berkley for pebble flow through a narrow opening.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20020060457','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20020060457"><span id="translatedtitle">A Three Dimensional <span class="hlt">Parallel</span> Time Accurate Turbopump <span class="hlt">Simulation</span> Procedure Using Overset Grid Systems</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Kiris, Cetin; Chan, William; Kwak, Dochan</p> <p>2001-01-01</p> <p>The objective of the current effort is to provide a computational framework for design and analysis of the entire fuel supply system of a liquid rocket engine, including high-fidelity unsteady turbopump flow analysis. This capability is needed to support the design of pump sub-systems for advanced space transportation vehicles that are likely to involve liquid propulsion systems. To date, computational tools for design/analysis of turbopump flows are based on relatively lower fidelity methods. An unsteady, three-dimensional viscous flow analysis tool involving stationary and rotational components for the entire turbopump assembly has not been available for real-world engineering applications. The present effort provides developers with information such as transient flow phenomena at start up, and non-uniform inflows, and will eventually impact on system vibration and structures. In the proposed paper, the progress toward the capability of complete <span class="hlt">simulation</span> of the turbo-pump for a liquid rocket engine is reported. The Space Shuttle Main Engine (SSME) turbo-pump is used as a test case for evaluation of the hybrid MPI/Open-MP and MLP versions of the INS3D code. CAD to solution auto-scripting capability is being developed for turbopump applications. The relative motion of the grid systems for the rotor-stator interaction was obtained using overset grid techniques. Unsteady computations for the SSME turbo-pump, which contains 114 zones with 34.5 million grid points, are carried out on Origin 3000 systems at NASA Ames Research Center. Results from these time-accurate <span class="hlt">simulations</span> with moving boundary capability will be presented along with the performance of <span class="hlt">parallel</span> versions of the code.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20020073408','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20020073408"><span id="translatedtitle">A Three-Dimensional <span class="hlt">Parallel</span> Time-Accurate Turbopump <span class="hlt">Simulation</span> Procedure Using Overset Grid System</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Kiris, Cetin; Chan, William; Kwak, Dochan</p> <p>2002-01-01</p> <p>The objective of the current effort is to provide a computational framework for design and analysis of the entire fuel supply system of a liquid rocket engine, including high-fidelity unsteady turbopump flow analysis. This capability is needed to support the design of pump sub-systems for advanced space transportation vehicles that are likely to involve liquid propulsion systems. To date, computational tools for design/analysis of turbopump flows are based on relatively lower fidelity methods. An unsteady, three-dimensional viscous flow analysis tool involving stationary and rotational components for the entire turbopump assembly has not been available for real-world engineering applications. The present effort provides developers with information such as transient flow phenomena at start up, and nonuniform inflows, and will eventually impact on system vibration and structures. In the proposed paper, the progress toward the capability of complete <span class="hlt">simulation</span> of the turbo-pump for a liquid rocket engine is reported. The Space Shuttle Main Engine (SSME) turbo-pump is used as a test case for evaluation of the hybrid MPI/Open-MP and MLP versions of the INS3D code. CAD to solution auto-scripting capability is being developed for turbopump applications. The relative motion of the grid systems for the rotor-stator interaction was obtained using overset grid techniques. Unsteady computations for the SSME turbo-pump, which contains 114 zones with 34.5 million grid points, are carried out on Origin 3000 systems at NASA Ames Research Center. Results from these time-accurate <span class="hlt">simulations</span> with moving boundary capability are presented along with the performance of <span class="hlt">parallel</span> versions of the code.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/20674066','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/20674066"><span id="translatedtitle"><span class="hlt">Parallelized</span> computation for computer <span class="hlt">simulation</span> of electrocardiograms using personal computers with multi-core CPU and general-purpose GPU.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Shen, Wenfeng; Wei, Daming; Xu, Weimin; Zhu, Xin; Yuan, Shizhong</p> <p>2010-10-01</p> <p>Biological computations like electrocardiological modelling and <span class="hlt">simulation</span> usually require high-performance computing environments. This paper introduces an implementation of <span class="hlt">parallel</span> computation for computer <span class="hlt">simulation</span> of electrocardiograms (ECGs) in a personal computer environment with an Intel CPU of Core (TM) 2 Quad Q6600 and a GPU of Geforce 8800GT, with software support by OpenMP and CUDA. It was tested in three <span class="hlt">parallelization</span> device setups: (a) a four-core CPU without a general-purpose GPU, (b) a general-purpose GPU plus 1 core of CPU, and (c) a four-core CPU plus a general-purpose GPU. To effectively take advantage of a multi-core CPU and a general-purpose GPU, an algorithm based on load-prediction dynamic scheduling was developed and applied to setting (c). In the <span class="hlt">simulation</span> with 1600 time steps, the speedup of the <span class="hlt">parallel</span> computation as compared to the serial computation was 3.9 in setting (a), 16.8 in setting (b), and 20.0 in setting (c). This study demonstrates that a current PC with a multi-core CPU and a general-purpose GPU provides a good environment for <span class="hlt">parallel</span> computations in biological modelling and <span class="hlt">simulation</span> studies. PMID:20674066</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/957425','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/957425"><span id="translatedtitle"><span class="hlt">Parallel</span> Higher-order Finite Element Method for Accurate Field Computations in Wakefield and PIC <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Limborg, C.; Ng, C.; Prudencio, E.; Schussman, G.; Uplenchwar, R.; Ko, K.; /SLAC</p> <p>2009-06-19</p> <p>Over the past years, SLAC's Advanced Computations Department (ACD), under SciDAC sponsorship, has developed a suite of 3D (2D) <span class="hlt">parallel</span> higher-order finite element (FE) codes, T3P (T2P) and Pic3P (Pic2P), aimed at accurate, large-scale <span class="hlt">simulation</span> of wakefields and particle-field interactions in radio-frequency (RF) cavities of complex shape. The codes are built on the FE infrastructure that supports SLAC's frequency domain codes, Omega3P and S3P, to utilize conformal tetrahedral (triangular)meshes, higher-order basis functions and quadratic geometry approximation. For time integration, they adopt an unconditionally stable implicit scheme. Pic3P (Pic2P) extends T3P (T2P) to treat charged-particle dynamics self-consistently using the PIC (particle-in-cell) approach, the first such implementation on a conformal, unstructured grid using Whitney basis functions. Examples from applications to the International Linear Collider (ILC), Positron Electron Project-II (PEP-II), Linac Coherent Light Source (LCLS) and other accelerators will be presented to compare the accuracy and computational efficiency of these codes versus their counterparts using structured grids.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015JThSc..24..140D','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015JThSc..24..140D"><span id="translatedtitle"><span class="hlt">Simulation</span> and instability investigation of the flow around a cylinder between two <span class="hlt">parallel</span> walls</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Dou, Hua-Shu; Ben, An-Qing</p> <p>2015-04-01</p> <p>The two-dimensional flows around a cylinder between two <span class="hlt">parallel</span> walls at Re=40 and Re=100 are <span class="hlt">simulated</span> with computational fluid dynamics (CFD). The governing equations are Navier-Stokes equations. They are discretized with finite volume method (FVM) and the solution is iterated with PISO Algorithm. Then, the calculating results are compared with the numerical results in literature, and good agreements are obtained. After that, the mechanism of the formation of Karman vortex street is investigated and the instability of the entire flow field is analyzed with the energy gradient theory. It is found that the two eddies attached at the rear of the cylinder have no effect on the flow instability for steady flow, i.e., they don't contribute to the formation of Karman vortex street. The formation of Karman vortex street originates from the combinations of the interaction of two shear layers at two lateral sides of the cylinder and the absolute instability in the cylinder wake. For the flow with Karman vortex street, the initial instability occurs at the region in a vortex downstream of the wake and the center of a vortex firstly loses its stability in a vortex. For pressure driven flow, it is confirmed that the inflection point on the time-averaged velocity profile leads to the instability. It is concluded that the energy gradient theory is potentially applicable to study the flow stability and to reveal the mechanism of turbulent transition.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014JPhCS.490a2138R','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014JPhCS.490a2138R"><span id="translatedtitle">A <span class="hlt">Parallel</span> 2D Numerical <span class="hlt">Simulation</span> of Tumor Cells Necrosis by Local Hyperthermia</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Reis, R. F.; Loureiro, F. S.; Lobosco, M.</p> <p>2014-03-01</p> <p>Hyperthermia has been widely used in cancer treatment to destroy tumors. The main idea of the hyperthermia is to heat a specific region like a tumor so that above a threshold temperature the tumor cells are destroyed. This can be accomplished by many heat supply techniques and the use of magnetic nanoparticles that generate heat when an alternating magnetic field is applied has emerged as a promise technique. In the present paper, the Pennes bioheat transfer equation is adopted to model the thermal tumor ablation in the context of magnetic nanoparticles. Numerical <span class="hlt">simulations</span> are carried out considering different injection sites for the nanoparticles in an attempt to achieve better hyperthermia conditions. Explicit finite difference method is employed to solve the equations. However, a large amount of computation is required for this purpose. Therefore, this work also presents an initial attempt to improve performance using OpenMP, a <span class="hlt">parallel</span> programming API. Experimental results were quite encouraging: speedups around 35 were obtained on a 64-core machine.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/5256735','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/5256735"><span id="translatedtitle">Monte Carlo <span class="hlt">simulation</span> of photoelectron energization in <span class="hlt">parallel</span> electric fields: Electroglow on Uranus</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Singhal, R.P.; Bhardwaj, A. )</p> <p>1991-09-01</p> <p>A Monte Carlo <span class="hlt">simulation</span> of photoelectron energization and energy degradation in H{sub 2} gas in the presence of <span class="hlt">parallel</span> electric fields has been carried out. Numerical yield spectra which contain information about the electron energy degradation process and can be used to calculate the yield for any inelastic event are obtained. The variation of yield spectra with incident electron energy, electric field, pitch angle, and cutoff limit has been studied. The yield function is employed to determine the photoelectron fluxes. H{sub 2} Lyman and Werner band excitation rates and integrated column intensity are computed for three different electric field profiles taking various low-energy cutoff limits. It is found that an electric field profile with peak value of 4 mV/m at neutral number density of 3{times}10{sup 10} cm{sup {minus}3} produces enhanced volume emission rates of H{sub 2} bands ({lambda} < 1100 {angstrom}) explaining about 20% of the observed electroglow emission on Uranus. The effect of solar zenith angle and solar cycle variation on peak excitation rate is discussed.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1131524','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1131524"><span id="translatedtitle">Supporting the Development of Resilient Message Passing Applications using <span class="hlt">Simulation</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Naughton, III, Thomas J; Engelmann, Christian; Vallee, Geoffroy R; Boehm, Swen</p> <p>2014-01-01</p> <p>An emerging aspect of high-performance computing (HPC) hardware/software co-design is investigating performance under failure. The work in this paper extends the Extreme-scale <span class="hlt">Simulator</span> (xSim), which was designed for evaluating the performance of message passing interface (MPI) applications on future HPC architectures, with fault-tolerant MPI extensions proposed by the MPI Fault Tolerance Working Group. xSim permits running MPI applications with millions of concurrent MPI ranks, while observing application performance in a <span class="hlt">simulated</span> extreme-scale system using a lightweight <span class="hlt">parallel</span> <span class="hlt">discrete</span> <span class="hlt">event</span> <span class="hlt">simulation</span>. The newly added features offer user-level failure mitigation (ULFM) extensions at the <span class="hlt">simulated</span> MPI layer to support algorithm-based fault tolerance (ABFT). The presented solution permits investigating performance under failure and failure handling of ABFT solutions. The newly enhanced xSim is the very first performance tool that supports ULFM and ABFT.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015EGUGA..17.6111S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015EGUGA..17.6111S"><span id="translatedtitle">A heterogeneous and <span class="hlt">parallel</span> computing framework for high-resolution hydrodynamic <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Smith, Luke; Liang, Qiuhua</p> <p>2015-04-01</p> <p>Shock-capturing hydrodynamic models are now widely applied in the context of flood risk assessment and forecasting, accurately capturing the behaviour of surface water over ground and within rivers. Such models are generally explicit in their numerical basis, and can be computationally expensive; this has prohibited full use of high-resolution topographic data for complex urban environments, now easily obtainable through airborne altimetric surveys (LiDAR). As processor clock speed advances have stagnated in recent years, further computational performance gains are largely dependent on the use of <span class="hlt">parallel</span> processing. Heterogeneous computing architectures (e.g. graphics processing units or compute accelerator cards) provide a cost-effective means of achieving high throughput in cases where the same calculation is performed with a large input dataset. In recent years this technique has been applied successfully for flood risk mapping, such as within the national surface water flood risk assessment for the United Kingdom. We present a flexible software framework for hydrodynamic <span class="hlt">simulations</span> across multiple processors of different architectures, within multiple computer systems, enabled using OpenCL and Message Passing Interface (MPI) libraries. A finite-volume Godunov-type scheme is implemented using the HLLC approach to solving the Riemann problem, with optional extension to second-order accuracy in space and time using the MUSCL-Hancock approach. The framework is successfully applied on personal computers and a small cluster to provide considerable improvements in performance. The most significant performance gains were achieved across two servers, each containing four NVIDIA GPUs, with a mix of K20, M2075 and C2050 devices. Advantages are found with respect to decreased parametric sensitivity, and thus in reducing uncertainty, for a major fluvial flood within a large catchment during 2005 in Carlisle, England. <span class="hlt">Simulations</span> for the three-day event could be performed on a 2m grid within a few hours. In the context of a rapid pluvial flood event in Newcastle upon Tyne during 2012, the technique allows <span class="hlt">simulation</span> of inundation for a 31km2 of the city centre in less than an hour on a 2m grid; however, further grid refinement is required to fully capture important smaller flow pathways. Good agreement between the model and observed inundation is achieved for a variety of dam failure, slow fluvial inundation, rapid pluvial inundation, and defence breach scenarios in the UK.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/22303583','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/22303583"><span id="translatedtitle">Extending molecular <span class="hlt">simulation</span> time scales: <span class="hlt">Parallel</span> in time integrations for high-level quantum chemistry and complex force representations</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.</p> <p>2013-08-21</p> <p><span class="hlt">Parallel</span> in time <span class="hlt">simulation</span> algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time t{sub i} (trajectory positions and velocities x{sub i} = (r{sub i}, v{sub i})) to time t{sub i+1} (x{sub i+1}) by x{sub i+1} = f{sub i}(x{sub i}), the dynamics problem spanning an interval from t{sub 0}…t{sub M} can be transformed into a root finding problem, F(X) = [x{sub i} − f(x{sub (i−1})]{sub i} {sub =1,M} = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are <span class="hlt">parallelized</span> by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed <span class="hlt">parallel</span> in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD <span class="hlt">simulations</span>, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The <span class="hlt">parallel</span> in time algorithms developed are tested by applying them to MD and AIMD <span class="hlt">simulations</span> of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD <span class="hlt">simulation</span> using Stillinger-Weber potentials, and a HCl + 4H{sub 2}O AIMD <span class="hlt">simulation</span> at the MP2 level. The maximum speedup ((serial execution time)/(<span class="hlt">parallel</span> execution time) ) obtained by <span class="hlt">parallelizing</span> the Stillinger-Weber MD <span class="hlt">simulation</span> was nearly 3.0. For the AIMD MP2 <span class="hlt">simulations</span>, the algorithms achieved speedups of up to 14.3. The <span class="hlt">parallel</span> in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD <span class="hlt">simulation</span> of HCl + 4H{sub 2}O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD <span class="hlt">simulations</span> at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already <span class="hlt">parallel</span>. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) <span class="hlt">simulation</span> that had reached its maximum possible speedup in the <span class="hlt">parallelization</span> of the electronic structure calculation from 32 s/time step to 6.9 s/time step.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li class="active"><span>23</span></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li><a href="#" onclick='return showDiv("page_25");'>25</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_23 --> <div id="page_24" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li class="active"><span>24</span></li> <li><a href="#" onclick='return showDiv("page_25");'>25</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="461"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2013JChPh.139g4114B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2013JChPh.139g4114B"><span id="translatedtitle">Extending molecular <span class="hlt">simulation</span> time scales: <span class="hlt">Parallel</span> in time integrations for high-level quantum chemistry and complex force representations</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.</p> <p>2013-08-01</p> <p><span class="hlt">Parallel</span> in time <span class="hlt">simulation</span> algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0…tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are <span class="hlt">parallelized</span> by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed <span class="hlt">parallel</span> in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD <span class="hlt">simulations</span>, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The <span class="hlt">parallel</span> in time algorithms developed are tested by applying them to MD and AIMD <span class="hlt">simulations</span> of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD <span class="hlt">simulation</span> using Stillinger-Weber potentials, and a HCl + 4H2O AIMD <span class="hlt">simulation</span> at the MP2 level. The maximum speedup (serial execution time/<span class="hlt">parallel</span> execution time) obtained by <span class="hlt">parallelizing</span> the Stillinger-Weber MD <span class="hlt">simulation</span> was nearly 3.0. For the AIMD MP2 <span class="hlt">simulations</span>, the algorithms achieved speedups of up to 14.3. The <span class="hlt">parallel</span> in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD <span class="hlt">simulation</span> of HCl + 4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD <span class="hlt">simulations</span> at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already <span class="hlt">parallel</span>. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) <span class="hlt">simulation</span> that had reached its maximum possible speedup in the <span class="hlt">parallelization</span> of the electronic structure calculation from 32 s/time step to 6.9 s/time step.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://ntrs.nasa.gov/search.jsp?R=19920057344&hterms=Parallel+processing&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D50%26Ntt%3DParallel%2Bprocessing','NASA-TRS'); return false;" href="http://ntrs.nasa.gov/search.jsp?R=19920057344&hterms=Parallel+processing&qs=Ntx%3Dmode%2Bmatchall%26Ntk%3DAll%26N%3D0%26No%3D50%26Ntt%3DParallel%2Bprocessing"><span id="translatedtitle">Comparison of elastic and rigid blade-element rotor models using <span class="hlt">parallel</span> processing technology for piloted <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Hill, Gary; Du Val, Ronald W.; Green, John A.; Huynh, Loc C.</p> <p>1991-01-01</p> <p>A piloted comparison of rigid and aeroelastic blade-element rotor models was conducted at the Crew Station Research and Development Facility (CSRDF) at Ames Research Center. A <span class="hlt">simulation</span> development and analysis tool, FLIGHTLAB, was used to implement these models in real time using <span class="hlt">parallel</span> processing technology. Pilot comments and qualitative analysis performed both on-line and off-line confirmed that elastic degrees of freedom significantly affect perceived handling qualities. Trim comparisons show improved correlation with flight test data when elastic modes are modeled. The results demonstrate the efficiency with which the mathematical modeling sophistication of existing <span class="hlt">simulation</span> facilities can be upgraded using <span class="hlt">parallel</span> processing, and the importance of these upgrades to <span class="hlt">simulation</span> fidelity.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19910022757','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19910022757"><span id="translatedtitle">Comparisons of elastic and rigid blade-element rotor models using <span class="hlt">parallel</span> processing technology for piloted <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Hill, Gary; Duval, Ronald W.; Green, John A.; Huynh, Loc C.</p> <p>1991-01-01</p> <p>A piloted comparison of rigid and aeroelastic blade-element rotor models was conducted at the Crew Station Research and Development Facility (CSRDF) at Ames Research Center. A <span class="hlt">simulation</span> development and analysis tool, FLIGHTLAB, was used to implement these models in real time using <span class="hlt">parallel</span> processing technology. Pilot comments and quantitative analysis performed both on-line and off-line confirmed that elastic degrees of freedom significantly affect perceived handling qualities. Trim comparisons show improved correlation with flight test data when elastic modes are modeled. The results demonstrate the efficiency with which the mathematical modeling sophistication of existing <span class="hlt">simulation</span> facilities can be upgraded using <span class="hlt">parallel</span> processing, and the importance of these upgrades to <span class="hlt">simulation</span> fidelity.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19900013690','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19900013690"><span id="translatedtitle"><span class="hlt">Parallel</span> processing of real-time dynamic systems <span class="hlt">simulation</span> on OSCAR (Optimally SCheduled Advanced multiprocessoR)</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Kasahara, Hironori; Honda, Hiroki; Narita, Seinosuke</p> <p>1989-01-01</p> <p><span class="hlt">Parallel</span> processing of real-time dynamic systems <span class="hlt">simulation</span> on a multiprocessor system named OSCAR is presented. In the <span class="hlt">simulation</span> of dynamic systems, generally, the same calculation are repeated every time step. However, we cannot apply to Do-all or the Do-across techniques for <span class="hlt">parallel</span> processing of the <span class="hlt">simulation</span> since there exist data dependencies from the end of an iteration to the beginning of the next iteration and furthermore data-input and data-output are required every sampling time period. Therefore, <span class="hlt">parallelism</span> inside the calculation required for a single time step, or a large basic block which consists of arithmetic assignment statements, must be used. In the proposed method, near fine grain tasks, each of which consists of one or more floating point operations, are generated to extract the <span class="hlt">parallelism</span> from the calculation and assigned to processors by using optimal static scheduling at compile time in order to reduce large run time overhead caused by the use of near fine grain tasks. The practicality of the scheme is demonstrated on OSCAR (Optimally SCheduled Advanced multiprocessoR) which has been developed to extract advantageous features of static scheduling algorithms to the maximum extent.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2015CMMPh..55.1407L','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2015CMMPh..55.1407L"><span id="translatedtitle">Roe-Einfeldt-Osher scheme as applied to the mathematical <span class="hlt">simulation</span> of accretion disks on <span class="hlt">parallel</span> computers</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Lugovsky, A. Yu.; Popov, Yu. P.</p> <p>2015-08-01</p> <p>The Roe-Einfeldt-Osher scheme is considered, which has the third order of accuracy. Its advantages over the first-order accurate Roe scheme are demonstrated, and its choice for the <span class="hlt">simulation</span> of accretion disk flows is justified. The Roe-Einfeldt-Osher scheme is shown to be efficient as applied to the <span class="hlt">simulation</span> of real-world problems on <span class="hlt">parallel</span> computers. Results of <span class="hlt">simulation</span> of flows in accretion disks in two and three dimensions are presented. Limited capabilities of two-dimensional disk models are noted.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014EGUGA..16.2584S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014EGUGA..16.2584S"><span id="translatedtitle"><span class="hlt">Parallel</span> Processing of Numerical Tsunami <span class="hlt">Simulations</span> on a High Performance Cluster based on the GDAL Library</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Schroeder, Matthias; Jankowski, Cedric; Hammitzsch, Martin; Wächter, Joachim</p> <p>2014-05-01</p> <p>Thousands of numerical tsunami <span class="hlt">simulations</span> allow the computation of inundation and run-up along the coast for vulnerable areas over the time. A so-called Matching Scenario Database (MSDB) [1] contains this large number of <span class="hlt">simulations</span> in text file format. In order to visualize these wave propagations the scenarios have to be reprocessed automatically. In the TRIDEC project funded by the seventh Framework Programme of the European Union a Virtual Scenario Database (VSDB) and a Matching Scenario Database (MSDB) were established amongst others by the working group of the University of Bologna (UniBo) [1]. One part of TRIDEC was the developing of a new generation of a Decision Support System (DSS) for tsunami Early Warning Systems (TEWS) [2]. A working group of the GFZ German Research Centre for Geosciences was responsible for developing the Command and Control User Interface (CCUI) as central software application which support operator activities, incident management and message disseminations. For the integration and visualization in the CCUI, the numerical tsunami <span class="hlt">simulations</span> from MSDB must be converted into the shapefiles format. The usage of shapefiles enables a much easier integration into standard Geographic Information Systems (GIS). Since also the CCUI is based on two widely used open source products (GeoTools library and uDig), whereby the integration of shapefiles is provided by these libraries a priori. In this case, for an example area around the Western Iberian margin several thousand tsunami variations were processed. Due to the mass of data only a program-controlled process was conceivable. In order to optimize the computing efforts and operating time the use of an existing GFZ High Performance Computing Cluster (HPC) had been chosen. Thus, a geospatial software was sought after that is capable for <span class="hlt">parallel</span> processing. The FOSS tool Geospatial Data Abstraction Library (GDAL/OGR) was used to match the coordinates with the wave heights and generates the different shapefiles for certain time steps. The shapefiles contain afterwards lines for visualizing the isochrones of the wave propagation and moreover, data about the maximum wave height and the Estimated Time of Arrival (ETA) at the coast. Our contribution shows the entire workflow and the visualizing results of the-processing for the example region Western Iberian ocean margin. [1] Armigliato A., Pagnoni G., Zaniboni F, Tinti S. (2013), Database of tsunami scenario <span class="hlt">simulations</span> for Western Iberia: a tool for the TRIDEC Project Decision Support System for tsunami early warning, Vol. 15, EGU2013-5567, EGU General Assembly 2013, Vienna (Austria). [2] Löwe, P., Wächter, J., Hammitzsch, M., Lendholt, M., Häner, R. (2013): The Evolution of Service-oriented Disaster Early Warning Systems in the TRIDEC Project, 23rd International Ocean and Polar Engineering Conference - ISOPE-2013, Anchorage (USA).</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2010EGUGA..12.7428P','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2010EGUGA..12.7428P"><span id="translatedtitle">Efficient <span class="hlt">parallel</span> seismic <span class="hlt">simulations</span> including topography and 3-D material heterogeneities on locally refined composite grids</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Petersson, Anders; Rodgers, Arthur</p> <p>2010-05-01</p> <p>The finite difference method on a uniform Cartesian grid is a highly efficient and easy to implement technique for solving the elastic wave equation in seismic applications. However, the spacing in a uniform Cartesian grid is fixed throughout the computational domain, whereas the resolution requirements in realistic seismic <span class="hlt">simulations</span> usually are higher near the surface than at depth. This can be seen from the well-known formula h ? L-P which relates the grid spacing h to the wave length L, and the required number of grid points per wavelength P for obtaining an accurate solution. The compressional and shear wave lengths in the earth generally increase with depth and are often a factor of ten larger below the Moho discontinuity (at about 30 km depth), than in sedimentary basins near the surface. A uniform grid must have a grid spacing based on the small wave lengths near the surface, which results in over-resolving the solution at depth. As a result, the number of points in a uniform grid is unnecessarily large. In the wave propagation project (WPP) code, we address the over-resolution-at-depth issue by generalizing our previously developed single grid finite difference scheme to work on a composite grid consisting of a set of structured rectangular grids of different spacings, with hanging nodes on the grid refinement interfaces. The computational domain in a regional seismic <span class="hlt">simulation</span> often extends to depth 40-50 km. Hence, using a refinement ratio of two, we need about three grid refinements from the bottom of the computational domain to the surface, to keep the local grid size in approximate parity with the local wave lengths. The challenge of the composite grid approach is to find a stable and accurate method for coupling the solution across the grid refinement interface. Of particular importance is the treatment of the solution at the hanging nodes, i.e., the fine grid points which are located in between coarse grid points. WPP implements a new, energy conserving, coupling procedure for the elastic wave equation at grid refinement interfaces. When used together with our single grid finite difference scheme, it results in a method which is provably stable, without artificial dissipation, for arbitrary heterogeneous isotropic elastic materials. The new coupling procedure is based on satisfying the summation-by-parts principle across refinement interfaces. From a practical standpoint, an important advantage of the proposed method is the absence of tunable numerical parameters, which seldom are appreciated by application experts. In WPP, the composite grid discretization is combined with a curvilinear grid approach that enables accurate modeling of free surfaces on realistic (non-planar) topography. The overall method satisfies the summation-by-parts principle and is stable under a CFL time step restriction. A feature of great practical importance is that WPP automatically generates the composite grid based on the user provided topography and the depths of the grid refinement interfaces. The WPP code has been verified extensively, for example using the method of manufactured solutions, by solving Lamb's problem, by solving various layer over half- space problems and comparing to semi-analytic (FK) results, and by <span class="hlt">simulating</span> scenario earthquakes where results from other seismic <span class="hlt">simulation</span> codes are available. WPP has also been validated against seismographic recordings of moderate earthquakes. WPP performs well on large <span class="hlt">parallel</span> computers and has been run on up to 32,768 processors using about 26 Billion grid points (78 Billion DOF) and 41,000 time steps. WPP is an open source code that is available under the Gnu general public license.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1091975','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1091975"><span id="translatedtitle">Extending molecular <span class="hlt">simulation</span> time scales: <span class="hlt">Parallel</span> in time integrations for high-level quantum chemistry and complex force representations</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Bylaska, Eric J.; Weare, Jonathan Q.; Weare, John H.</p> <p>2013-08-21</p> <p><span class="hlt">Parallel</span> in time <span class="hlt">simulation</span> algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f , (e.g. Verlet algorithm) is available to propagate the system from time ti (trajectory positions and velocities xi = (ri; vi)) to time ti+1 (xi+1) by xi+1 = fi(xi), the dynamics problem spanning an interval from t0 : : : tM can be transformed into a root finding problem, F(X) = [xi - f (x(i-1)]i=1;M = 0, for the trajectory variables. The root finding problem is solved using a variety of optimization techniques, including quasi-Newton and preconditioned quasi-Newton optimization schemes that are all unconditionally convergent. The algorithms are <span class="hlt">parallelized</span> by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed <span class="hlt">parallel</span> in time methods is discussed and the effectiveness of various approaches to solving the root finding problem are tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD <span class="hlt">simulations</span> such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The <span class="hlt">parallel</span> in time algorithms developed are tested by applying them to MD and AIMD <span class="hlt">simulations</span> of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD <span class="hlt">simulation</span> using Stillinger-Weber potentials, and a HCl+4H2O AIMD <span class="hlt">simulation</span> at the MP2 level. The maximum speedup obtained by <span class="hlt">parallelizing</span> the Stillinger-Weber MD <span class="hlt">simulation</span> was nearly 3.0. For the AIMD MP2 <span class="hlt">simulations</span> the algorithms achieved speedups of up to 14.3. The <span class="hlt">parallel</span> in time algorithms can be implemented in a distributed computing environment using very slow TCP/IP networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD <span class="hlt">simulation</span> of HCl+4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD <span class="hlt">simulations</span> at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already <span class="hlt">parallel</span>. By using these algorithms we are able to reduce the cost of a MP2/6-311++G(2d,2p) <span class="hlt">simulation</span> that had reached its maximum possible speedup in the <span class="hlt">parallelization</span> of the electronic structure calculation from 32 seconds per time step to 6.9 seconds per time step.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/23968079','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/23968079"><span id="translatedtitle">Extending molecular <span class="hlt">simulation</span> time scales: <span class="hlt">Parallel</span> in time integrations for high-level quantum chemistry and complex force representations.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Bylaska, Eric J; Weare, Jonathan Q; Weare, John H</p> <p>2013-08-21</p> <p><span class="hlt">Parallel</span> in time <span class="hlt">simulation</span> algorithms are presented and applied to conventional molecular dynamics (MD) and ab initio molecular dynamics (AIMD) models of realistic complexity. Assuming that a forward time integrator, f (e.g., Verlet algorithm), is available to propagate the system from time ti (trajectory positions and velocities xi = (ri, vi)) to time ti + 1 (xi + 1) by xi + 1 = fi(xi), the dynamics problem spanning an interval from t0[ellipsis (horizontal)]tM can be transformed into a root finding problem, F(X) = [xi - f(x(i - 1)]i = 1, M = 0, for the trajectory variables. The root finding problem is solved using a variety of root finding techniques, including quasi-Newton and preconditioned quasi-Newton schemes that are all unconditionally convergent. The algorithms are <span class="hlt">parallelized</span> by assigning a processor to each time-step entry in the columns of F(X). The relation of this approach to other recently proposed <span class="hlt">parallel</span> in time methods is discussed, and the effectiveness of various approaches to solving the root finding problem is tested. We demonstrate that more efficient dynamical models based on simplified interactions or coarsening time-steps provide preconditioners for the root finding problem. However, for MD and AIMD <span class="hlt">simulations</span>, such preconditioners are not required to obtain reasonable convergence and their cost must be considered in the performance of the algorithm. The <span class="hlt">parallel</span> in time algorithms developed are tested by applying them to MD and AIMD <span class="hlt">simulations</span> of size and complexity similar to those encountered in present day applications. These include a 1000 Si atom MD <span class="hlt">simulation</span> using Stillinger-Weber potentials, and a HCl + 4H2O AIMD <span class="hlt">simulation</span> at the MP2 level. The maximum speedup (serial execution/timeparallel execution time) obtained by <span class="hlt">parallelizing</span> the Stillinger-Weber MD <span class="hlt">simulation</span> was nearly 3.0. For the AIMD MP2 <span class="hlt">simulations</span>, the algorithms achieved speedups of up to 14.3. The <span class="hlt">parallel</span> in time algorithms can be implemented in a distributed computing environment using very slow transmission control protocol/Internet protocol networks. Scripts written in Python that make calls to a precompiled quantum chemistry package (NWChem) are demonstrated to provide an actual speedup of 8.2 for a 2.5 ps AIMD <span class="hlt">simulation</span> of HCl + 4H2O at the MP2/6-31G* level. Implemented in this way these algorithms can be used for long time high-level AIMD <span class="hlt">simulations</span> at a modest cost using machines connected by very slow networks such as WiFi, or in different time zones connected by the Internet. The algorithms can also be used with programs that are already <span class="hlt">parallel</span>. Using these algorithms, we are able to reduce the cost of a MP2/6-311++G(2d,2p) <span class="hlt">simulation</span> that had reached its maximum possible speedup in the <span class="hlt">parallelization</span> of the electronic structure calculation from 32 s/time step to 6.9 s/time step. PMID:23968079</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4696414','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4696414"><span id="translatedtitle">GENESIS: a hybrid-<span class="hlt">parallel</span> and multi-scale molecular dynamics <span class="hlt">simulator</span> with enhanced sampling algorithms for biomolecular and cellular <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Jung, Jaewoon; Mori, Takaharu; Kobayashi, Chigusa; Matsunaga, Yasuhiro; Yoda, Takao; Feig, Michael; Sugita, Yuji</p> <p>2015-01-01</p> <p>GENESIS (Generalized-Ensemble <span class="hlt">Simulation</span> System) is a new software package for molecular dynamics (MD) <span class="hlt">simulations</span> of macromolecules. It has two MD <span class="hlt">simulators</span>, called ATDYN and SPDYN. ATDYN is <span class="hlt">parallelized</span> based on an atomic decomposition algorithm for the <span class="hlt">simulations</span> of all-atom force-field models as well as coarse-grained Go-like models. SPDYN is highly <span class="hlt">parallelized</span> based on a domain decomposition scheme, allowing large-scale MD <span class="hlt">simulations</span> on supercomputers. Hybrid schemes combining OpenMP and MPI are used in both <span class="hlt">simulators</span> to target modern multicore computer architectures. Key advantages of GENESIS are (1) the highly <span class="hlt">parallel</span> performance of SPDYN for very large biological systems consisting of more than one million atoms and (2) the availability of various REMD algorithms (T-REMD, REUS, multi-dimensional REMD for both all-atom and Go-like models under the NVT, NPT, NPAT, and NPγT ensembles). The former is achieved by a combination of the midpoint cell method and the efficient three-dimensional Fast Fourier Transform algorithm, where the domain decomposition space is shared in real-space and reciprocal-space calculations. Other features in SPDYN, such as avoiding concurrent memory access, reducing communication times, and usage of <span class="hlt">parallel</span> input/output files, also contribute to the performance. We show the REMD <span class="hlt">simulation</span> results of a mixed (POPC/DMPC) lipid bilayer as a real application using GENESIS. GENESIS is released as free software under the GPLv2 licence and can be easily modified for the development of new algorithms and molecular models. WIREs Comput Mol Sci 2015, 5:310–323. doi: 10.1002/wcms.1220 PMID:26753008</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1115367','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1115367"><span id="translatedtitle">SCORPIO: A Scalable Two-Phase <span class="hlt">Parallel</span> I/O Library With Application To A Large Scale Subsurface <span class="hlt">Simulator</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Sreepathi, Sarat; Sripathi, Vamsi; Mills, Richard T; Hammond, Glenn; Mahinthakumar, Kumar</p> <p>2013-01-01</p> <p>Inefficient <span class="hlt">parallel</span> I/O is known to be a major bottleneck among scientific applications employed on supercomputers as the number of processor cores grows into the thousands. Our prior experience indicated that <span class="hlt">parallel</span> I/O libraries such as HDF5 that rely on MPI-IO do not scale well beyond 10K processor cores, especially on <span class="hlt">parallel</span> file systems (like Lustre) with single point of resource contention. Our previous optimization efforts for a massively <span class="hlt">parallel</span> multi-phase and multi-component subsurface <span class="hlt">simulator</span> (PFLOTRAN) led to a two-phase I/O approach at the application level where a set of designated processes participate in the I/O process by splitting the I/O operation into a communication phase and a disk I/O phase. The designated I/O processes are created by splitting the MPI global communicator into multiple sub-communicators. The root process in each sub-communicator is responsible for performing the I/O operations for the entire group and then distributing the data to rest of the group. This approach resulted in over 25X speedup in HDF I/O read performance and 3X speedup in write performance for PFLOTRAN at over 100K processor cores on the ORNL Jaguar supercomputer. This research describes the design and development of a general purpose <span class="hlt">parallel</span> I/O library, SCORPIO (SCalable block-ORiented <span class="hlt">Parallel</span> I/O) that incorporates our optimized two-phase I/O approach. The library provides a simplified higher level abstraction to the user, sitting atop existing <span class="hlt">parallel</span> I/O libraries (such as HDF5) and implements optimized I/O access patterns that can scale on larger number of processors. Performance results with standard benchmark problems and PFLOTRAN indicate that our library is able to maintain the same speedups as before with the added flexibility of being applicable to a wider range of I/O intensive applications.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/20090007630','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/20090007630"><span id="translatedtitle">A Framework for <span class="hlt">Parallel</span> Unstructured Grid Generation for Complex Aerodynamic <span class="hlt">Simulations</span></span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Zagaris, George; Pirzadeh, Shahyar Z.; Chrisochoides, Nikos</p> <p>2009-01-01</p> <p>A framework for <span class="hlt">parallel</span> unstructured grid generation targeting both shared memory multi-processors and distributed memory architectures is presented. The two fundamental building-blocks of the framework consist of: (1) the Advancing-Partition (AP) method used for domain decomposition and (2) the Advancing Front (AF) method used for mesh generation. Starting from the surface mesh of the computational domain, the AP method is applied recursively to generate a set of sub-domains. Next, the sub-domains are meshed in <span class="hlt">parallel</span> using the AF method. The recursive nature of domain decomposition naturally maps to a divide-and-conquer algorithm which exhibits inherent <span class="hlt">parallelism</span>. For the <span class="hlt">parallel</span> implementation, the Master/Worker pattern is employed to dynamically balance the varying workloads of each task on the set of available CPUs. Performance results by this approach are presented and discussed in detail as well as future work and improvements.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1998AIPC..426..141P','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1998AIPC..426..141P"><span id="translatedtitle">LPIC++ a <span class="hlt">parallel</span> one-dimensional relativistic electromagnetic Particle-In-Cell code for <span class="hlt">simulating</span> laser-plasma-interaction</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Pfund, R. E. W.; Lichters, R.; Meyer-ter-Vehn, J.</p> <p>1998-02-01</p> <p>We report on a recently developed electromagnetic relativistic 1D3V (one spatial, three velocity dimensions) Particle-In-Cell code for <span class="hlt">simulating</span> laser-plasma interaction at normal and oblique incidence. The code is written in C++ and easy to extend. The data structure is characterized by the use of chained lists for the grid cells as well as particles belonging to one cell. The <span class="hlt">parallel</span> version of the code is based on PVM. It splits the grid into several spatial domains each belonging to one processor. Since particles can cross boundaries of cells as well as domains, the processor loads will generally change in time. This is counteracted by adjusting the domain sizes dynamically, for which the use of chained lists has proven to be very convenient. Moreover, an option for restarting the <span class="hlt">simulation</span> from intermediate stages of the time evolution has been implemented even in the <span class="hlt">parallel</span> version. The code will be published and distributed freely.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/24663957','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/24663957"><span id="translatedtitle">cuTauLeaping: a GPU-powered tau-leaping stochastic <span class="hlt">simulator</span> for massive <span class="hlt">parallel</span> analyses of biological systems.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Nobile, Marco S; Cazzaniga, Paolo; Besozzi, Daniela; Pescini, Dario; Mauri, Giancarlo</p> <p>2014-01-01</p> <p>Tau-leaping is a stochastic <span class="hlt">simulation</span> algorithm that efficiently reconstructs the temporal evolution of biological systems, modeled according to the stochastic formulation of chemical kinetics. The analysis of dynamical properties of these systems in physiological and perturbed conditions usually requires the execution of a large number of <span class="hlt">simulations</span>, leading to high computational costs. Since each <span class="hlt">simulation</span> can be executed independently from the others, a massive <span class="hlt">parallelization</span> of tau-leaping can bring to relevant reductions of the overall running time. The emerging field of General Purpose Graphic Processing Units (GPGPU) provides power-efficient high-performance computing at a relatively low cost. In this work we introduce cuTauLeaping, a stochastic <span class="hlt">simulator</span> of biological systems that makes use of GPGPU computing to execute multiple <span class="hlt">parallel</span> tau-leaping <span class="hlt">simulations</span>, by fully exploiting the Nvidia's Fermi GPU architecture. We show how a considerable computational speedup is achieved on GPU by partitioning the execution of tau-leaping into multiple separated phases, and we describe how to avoid some implementation pitfalls related to the scarcity of memory resources on the GPU streaming multiprocessors. Our results show that cuTauLeaping largely outperforms the CPU-based tau-leaping implementation when the number of <span class="hlt">parallel</span> <span class="hlt">simulations</span> increases, with a break-even directly depending on the size of the biological system and on the complexity of its emergent dynamics. In particular, cuTauLeaping is exploited to investigate the probability distribution of bistable states in the Schlgl model, and to carry out a bidimensional parameter sweep analysis to study the oscillatory regimes in the Ras/cAMP/PKA pathway in S. cerevisiae. PMID:24663957</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3963881','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=3963881"><span id="translatedtitle">cuTauLeaping: A GPU-Powered Tau-Leaping Stochastic <span class="hlt">Simulator</span> for Massive <span class="hlt">Parallel</span> Analyses of Biological Systems</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Besozzi, Daniela; Pescini, Dario; Mauri, Giancarlo</p> <p>2014-01-01</p> <p>Tau-leaping is a stochastic <span class="hlt">simulation</span> algorithm that efficiently reconstructs the temporal evolution of biological systems, modeled according to the stochastic formulation of chemical kinetics. The analysis of dynamical properties of these systems in physiological and perturbed conditions usually requires the execution of a large number of <span class="hlt">simulations</span>, leading to high computational costs. Since each <span class="hlt">simulation</span> can be executed independently from the others, a massive <span class="hlt">parallelization</span> of tau-leaping can bring to relevant reductions of the overall running time. The emerging field of General Purpose Graphic Processing Units (GPGPU) provides power-efficient high-performance computing at a relatively low cost. In this work we introduce cuTauLeaping, a stochastic <span class="hlt">simulator</span> of biological systems that makes use of GPGPU computing to execute multiple <span class="hlt">parallel</span> tau-leaping <span class="hlt">simulations</span>, by fully exploiting the Nvidia's Fermi GPU architecture. We show how a considerable computational speedup is achieved on GPU by partitioning the execution of tau-leaping into multiple separated phases, and we describe how to avoid some implementation pitfalls related to the scarcity of memory resources on the GPU streaming multiprocessors. Our results show that cuTauLeaping largely outperforms the CPU-based tau-leaping implementation when the number of <span class="hlt">parallel</span> <span class="hlt">simulations</span> increases, with a break-even directly depending on the size of the biological system and on the complexity of its emergent dynamics. In particular, cuTauLeaping is exploited to investigate the probability distribution of bistable states in the Schlgl model, and to carry out a bidimensional parameter sweep analysis to study the oscillatory regimes in the Ras/cAMP/PKA pathway in S. cerevisiae. PMID:24663957</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/585029','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/585029"><span id="translatedtitle">Infrastructure for distributed enterprise <span class="hlt">simulation</span></span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Johnson, M.M.; Yoshimura, A.S.; Goldsby, M.E.</p> <p>1998-01-01</p> <p>Traditional <span class="hlt">discrete-event</span> <span class="hlt">simulations</span> employ an inherently sequential algorithm and are run on a single computer. However, the demands of many real-world problems exceed the capabilities of sequential <span class="hlt">simulation</span> systems. Often the capacity of a computer`s primary memory limits the size of the models that can be handled, and in some cases <span class="hlt">parallel</span> execution on multiple processors could significantly reduce the <span class="hlt">simulation</span> time. This paper describes the development of an Infrastructure for Distributed Enterprise <span class="hlt">Simulation</span> (IDES) - a large-scale portable <span class="hlt">parallel</span> <span class="hlt">simulation</span> framework developed to support Sandia National Laboratories` mission in stockpile stewardship. IDES is based on the Breathing-Time-Buckets synchronization protocol, and maps a message-based model of distributed computing onto an object-oriented programming model. IDES is portable across heterogeneous computing architectures, including single-processor systems, networks of workstations and multi-processor computers with shared or distributed memory. The system provides a simple and sufficient application programming interface that can be used by scientists to quickly model large-scale, complex enterprise systems. In the background and without involving the user, IDES is capable of making dynamic use of idle processing power available throughout the enterprise network. 16 refs., 14 figs.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2801163','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=2801163"><span id="translatedtitle">A <span class="hlt">Parallel</span> Adaptive Finite Element Method for the <span class="hlt">Simulation</span> of Photon Migration with the Radiative-Transfer-Based Model</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>Lu, Yujie; Chatziioannou, Arion F.</p> <p>2009-01-01</p> <p>Whole-body optical molecular imaging of mouse models in preclinical research is rapidly developing in recent years. In this context, it is essential and necessary to develop novel <span class="hlt">simulation</span> methods of light propagation for optical imaging, especially when a priori knowledge, large-volume domain and a wide-range of optical properties need to be considered in the reconstruction algorithm. In this paper, we propose a three dimensional <span class="hlt">parallel</span> adaptive finite element method with simplified spherical harmonics (SPN) approximation to <span class="hlt">simulate</span> optical photon propagation in large-volumes of heterogenous tissues. The <span class="hlt">simulation</span> speed is significantly improved by a posteriori <span class="hlt">parallel</span> adaptive mesh refinement and dynamic mesh repartitioning. Compared with the diffusion equation and the Monte Carlo methods, the SPN method shows improved performance and the necessity of high-order approximation in heterogeneous domains. Optimal solver selection and time-costing analysis in real mouse geometry further improve the performance of the proposed algorithm and show the superiority of the proposed <span class="hlt">parallel</span> adaptive framework for whole-body optical molecular imaging in murine models. PMID:20052300</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/86949','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/86949"><span id="translatedtitle">Large-eddy <span class="hlt">simulation</span> of the Rayleigh-Taylor instability on a massively <span class="hlt">parallel</span> computer</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Amala, P.A.K.</p> <p>1995-03-01</p> <p>A computational model for the solution of the three-dimensional Navier-Stokes equations is developed. This model includes a turbulence model: a modified Smagorinsky eddy-viscosity with a stochastic backscatter extension. The resultant equations are solved using finite difference techniques: the second-order explicit Lax-Wendroff schemes. This computational model is implemented on a massively <span class="hlt">parallel</span> computer. Programming models on massively <span class="hlt">parallel</span> computers are next studied. It is desired to determine the best programming model for the developed computational model. To this end, three different codes are tested on a current massively <span class="hlt">parallel</span> computer: the CM-5 at Los Alamos. Each code uses a different programming model: one is a data <span class="hlt">parallel</span> code; the other two are message passing codes. Timing studies are done to determine which method is the fastest. The data <span class="hlt">parallel</span> approach turns out to be the fastest method on the CM-5 by at least an order of magnitude. The resultant code is then used to study a current problem of interest to the computational fluid dynamics community. This is the Rayleigh-Taylor instability. The Lax-Wendroff methods handle shocks and sharp interfaces poorly. To this end, the Rayleigh-Taylor linear analysis is modified to include a smoothed interface. The linear growth rate problem is then investigated. Finally, the problem of the randomly perturbed interface is examined. Stochastic backscatter breaks the symmetry of the stationary unstable interface and generates a mixing layer growing at the experimentally observed rate. 115 refs., 51 figs., 19 tabs.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.ncbi.nlm.nih.gov/pubmed/26575558','PUBMED'); return false;" href="http://www.ncbi.nlm.nih.gov/pubmed/26575558"><span id="translatedtitle">Molecular <span class="hlt">simulation</span> workflows as <span class="hlt">parallel</span> algorithms: the execution engine of Copernicus, a distributed high-performance computing platform.</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pubmed">PubMed</a></p> <p>Pronk, Sander; Pouya, Iman; Lundborg, Magnus; Rotskoff, Grant; Wesn, Bjrn; Kasson, Peter M; Lindahl, Erik</p> <p>2015-06-01</p> <p>Computational chemistry and other <span class="hlt">simulation</span> fields are critically dependent on computing resources, but few problems scale efficiently to the hundreds of thousands of processors available in current supercomputers-particularly for molecular dynamics. This has turned into a bottleneck as new hardware generations primarily provide more processing units rather than making individual units much faster, which <span class="hlt">simulation</span> applications are addressing by increasingly focusing on sampling with algorithms such as free-energy perturbation, Markov state modeling, metadynamics, or milestoning. All these rely on combining results from multiple <span class="hlt">simulations</span> into a single observation. They are potentially powerful approaches that aim to predict experimental observables directly, but this comes at the expense of added complexity in selecting sampling strategies and keeping track of dozens to thousands of <span class="hlt">simulations</span> and their dependencies. Here, we describe how the distributed execution framework Copernicus allows the expression of such algorithms in generic workflows: dataflow programs. Because dataflow algorithms explicitly state dependencies of each constituent part, algorithms only need to be described on conceptual level, after which the execution is maximally <span class="hlt">parallel</span>. The fully automated execution facilitates the optimization of these algorithms with adaptive sampling, where undersampled regions are automatically detected and targeted without user intervention. We show how several such algorithms can be formulated for computational chemistry problems, and how they are executed efficiently with many loosely coupled <span class="hlt">simulations</span> using either distributed or <span class="hlt">parallel</span> resources with Copernicus. PMID:26575558</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://hdl.handle.net/2060/19940010166','NASA-TRS'); return false;" href="http://hdl.handle.net/2060/19940010166"><span id="translatedtitle">Real-time dynamic <span class="hlt">simulation</span> of the Cassini spacecraft using DARTS. Part 2: <span class="hlt">Parallel</span>/vectorized real-time implementation</span></a></p> <p><a target="_blank" href="http://ntrs.nasa.gov/search.jsp">NASA Technical Reports Server (NTRS)</a></p> <p>Fijany, A.; Roberts, J. A.; Jain, A.; Man, G. K.</p> <p>1993-01-01</p> <p>Part 1 of this paper presented the requirements for the real-time <span class="hlt">simulation</span> of Cassini spacecraft along with some discussion of the DARTS algorithm. Here, in Part 2 we discuss the development and implementation of <span class="hlt">parallel</span>/vectorized DARTS algorithm and architecture for real-time <span class="hlt">simulation</span>. Development of the fast algorithms and architecture for real-time hardware-in-the-loop <span class="hlt">simulation</span> of spacecraft dynamics is motivated by the fact that it represents a hard real-time problem, in the sense that the correctness of the <span class="hlt">simulation</span> depends on both the numerical accuracy and the exact timing of the computation. For a given model fidelity, the computation should be computed within a predefined time period. Further reduction in computation time allows increasing the fidelity of the model (i.e., inclusion of more flexible modes) and the integration routine.</p> </li> </ol> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li class="active"><span>24</span></li> <li><a href="#" onclick='return showDiv("page_25");'>25</a></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div><!-- col-sm-12 --> </div><!-- row --> </div><!-- page_24 --> <div id="page_25" class="hiddenDiv"> <div class="row"> <div class="col-sm-12"> <div class="pull-right"> <ul class="pagination"> <li><a href="#" onclick='return showDiv("page_1");'>«</a></li> <li><a href="#" onclick='return showDiv("page_21");'>21</a></li> <li><a href="#" onclick='return showDiv("page_22");'>22</a></li> <li><a href="#" onclick='return showDiv("page_23");'>23</a></li> <li><a href="#" onclick='return showDiv("page_24");'>24</a></li> <li class="active"><span>25</span></li> <li><a href="#" onclick='return showDiv("page_25");'>»</a></li> </ul> </div> </div> </div> <div class="row"> <div class="col-sm-12"> <ol class="result-class" start="481"> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/1994CoPhC..83....1B','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/1994CoPhC..83....1B"><span id="translatedtitle">A domain decomposition <span class="hlt">parallel</span> processing algorithm for molecular dynamics <span class="hlt">simulations</span> of polymers</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Brown, David; Clarke, Julian H. R.; Okuda, Motoi; Yamazaki, Takao</p> <p>1994-10-01</p> <p>We describe in this paper a domain decomposition molecular dynamics algorithm for use on distributed memory <span class="hlt">parallel</span> computers which is capable of handling systems containing rigid bond constraints and three- and four-body potentials as well as non-bonded potentials. The algorithm has been successfully implemented on the Fujitsu 1024 processor element AP1000 machine. The performance has been compared with and benchmarked against the alternative cloning method of <span class="hlt">parallel</span> processing [D. Brown, J.H.R. Clarke, M. Okuda and T. Yamazaki, J. Chem. Phys., 100 (1994) 1684] and results obtained using other scalar and vector machines. Two <span class="hlt">parallel</span> versions of the SHAKE algorithm, which solves the bond length constraints problem, have been compared with regard to optimising the performance of this procedure.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/servlets/purl/1090857','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/servlets/purl/1090857"><span id="translatedtitle">Coupled models and <span class="hlt">parallel</span> <span class="hlt">simulations</span> for three-dimensional full-Stokes ice sheet modeling</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Zhang, Huai; Ju, Lili; Gunzburger, Max; Ringler, Todd; Price, Stephen</p> <p>2011-01-01</p> <p>A three-dimensional full-Stokes computational model is considered for determining the dynamics, temperature, and thickness of ice sheets. The governing thermomechanical equations consist of the three-dimensional full-Stokes system with nonlinear rheology for the momentum, an advective-diffusion energy equation for temperature evolution, and a mass conservation equation for icethickness changes. Here, we discuss the variable resolution meshes, the finite element discretizations, and the <span class="hlt">parallel</span> algorithms employed by the model components. The solvers are integrated through a well-designed coupler for the exchange of parametric data between components. The discretization utilizes high-quality, variable-resolution centroidal Voronoi Delaunay triangulation meshing and existing <span class="hlt">parallel</span> solvers. We demonstrate the gridding technology, discretization schemes, and the efficiency and scalability of the <span class="hlt">parallel</span> solvers through computational experiments using both simplified geometries arising from benchmark test problems and a realistic Greenland ice sheet geometry.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014SPIE.9145E..2PY','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014SPIE.9145E..2PY"><span id="translatedtitle">Modeling and <span class="hlt">simulation</span> of a 6-DOF <span class="hlt">parallel</span> platform for telescope secondary mirror</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Yue, Zhongyu; Ye, Yu; Gu, Bozhong</p> <p>2014-07-01</p> <p>The 6-DOF <span class="hlt">parallel</span> platform in this paper is a kind of Stewart platform. It can be used as supporting structure for telescope secondary mirror. In order to adapt the special dynamic environment of the telescope secondary mirror and to be installed in extremely narrow space, a unique <span class="hlt">parallel</span> platform is designed. PSS Stewart platform and SPS Stewart platform are analyzed and compared. Then the PSS Stewart platform is chosen for detailed design. The virtual prototyping model of the <span class="hlt">parallel</span> platform is built. The model is used for the analysis and calculation of multi-body dynamics. With the help of ANSYS, the finite element model of the platform is built and then the analysis is performed. According to the above analysis the experimental prototype of the platform is built.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014amos.confE...9S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014amos.confE...9S"><span id="translatedtitle">LightForce Photon-Pressure Collision Avoidance: Updated Efficiency Analysis Utilizing a Highly <span class="hlt">Parallel</span> <span class="hlt">Simulation</span> Approach</span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Stupl, J.; Faber, N.; Foster, C.; Yang, F.; Nelson, B.; Aziz, J.; Nuttall, A.; Henze, C.; Levit, C.</p> <p>2014-09-01</p> <p>This paper provides an updated efficiency analysis of the LightForce space debris collision avoidance scheme. LightForce aims to prevent collisions on warning by utilizing photon pressure from ground based, commercial off the shelf lasers. Past research has proven that a few ground-based systems consisting of 10 kW class lasers directed by 1.5 m telescopes with adaptive optics could lower the expected number of collisions in Low Earth Orbit (LEO) by an order of magnitude. Our <span class="hlt">simulation</span> approach utilizes the entire Two Line Element (TLE) catalogue in LEO for a given day as initial input. Least-squares fitting of a TLE time series is used for an improved orbit estimate. We then calculate the probability of collision for all LEO objects in the catalogue for a time step of the <span class="hlt">simulation</span>. The conjunctions that exceed a threshold probability of collision are then engaged by a <span class="hlt">simulated</span> network of laser ground stations. After those engagements, the perturbed orbits are used to re-assess the probability of collision and evaluate the efficiency. This paper describes new <span class="hlt">simulations</span> with three updated aspects: 1) By utilizing a highly <span class="hlt">parallel</span> <span class="hlt">simulation</span> approach employing hundreds of processors, we have extended our analysis to a much broader dataset. The <span class="hlt">simulation</span> time is extended to one year. 2) We analyze not only the efficiency of LightForce on conjunctions that naturally occur, but also take into account conjunctions caused by orbit perturbations due to LightForce engagements. 3) We use a new <span class="hlt">simulation</span> approach that is regularly updating the LightForce engagement strategy, as it would be during actual operations. In this paper we present both our <span class="hlt">simulation</span> approach to <span class="hlt">parallelize</span> the efficiency analysis, its computational performance and the resulting expected efficiency of the LightForce collision avoidance system.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4334526','PMC'); return false;" href="http://www.pubmedcentral.nih.gov/articlerender.fcgi?tool=pmcentrez&artid=4334526"><span id="translatedtitle">Neurite, a Finite Difference Large Scale <span class="hlt">Parallel</span> Program for the <span class="hlt">Simulation</span> of Electrical Signal Propagation in Neurites under Mechanical Loading</span></a></p> <p><a target="_blank" href="http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?DB=pmc">PubMed Central</a></p> <p>García-Grajales, Julián A.; Rucabado, Gabriel; García-Dopico, Antonio; Peña, José-María; Jérusalem, Antoine</p> <p>2015-01-01</p> <p>With the growing body of research on traumatic brain injury and spinal cord injury, computational neuroscience has recently focused its modeling efforts on neuronal functional deficits following mechanical loading. However, in most of these efforts, cell damage is generally only characterized by purely mechanistic criteria, functions of quantities such as stress, strain or their corresponding rates. The modeling of functional deficits in neurites as a consequence of macroscopic mechanical insults has been rarely explored. In particular, a quantitative mechanically based model of electrophysiological impairment in neuronal cells, Neurite, has only very recently been proposed. In this paper, we present the implementation details of this model: a finite difference <span class="hlt">parallel</span> program for <span class="hlt">simulating</span> electrical signal propagation along neurites under mechanical loading. Following the application of a macroscopic strain at a given strain rate produced by a mechanical insult, Neurite is able to <span class="hlt">simulate</span> the resulting neuronal electrical signal propagation, and thus the corresponding functional deficits. The <span class="hlt">simulation</span> of the coupled mechanical and electrophysiological behaviors requires computational expensive calculations that increase in complexity as the network of the <span class="hlt">simulated</span> cells grows. The solvers implemented in Neurite—explicit and implicit—were therefore <span class="hlt">parallelized</span> using graphics processing units in order to reduce the burden of the <span class="hlt">simulation</span> costs of large scale scenarios. Cable Theory and Hodgkin-Huxley models were implemented to account for the electrophysiological passive and active regions of a neurite, respectively, whereas a coupled mechanical model accounting for the neurite mechanical behavior within its surrounding medium was adopted as a link between electrophysiology and mechanics. This paper provides the details of the <span class="hlt">parallel</span> implementation of Neurite, along with three different application examples: a long myelinated axon, a segmented dendritic tree, and a damaged axon. The capabilities of the program to deal with large scale scenarios, segmented neuronal structures, and functional deficits under mechanical loading are specifically highlighted. PMID:25680098</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2005JCoPh.207..493M','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2005JCoPh.207..493M"><span id="translatedtitle">K-means clustering for optimal partitioning and dynamic load balancing of <span class="hlt">parallel</span> hierarchical N-body <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Marzouk, Youssef M.; Ghoniem, Ahmed F.</p> <p>2005-08-01</p> <p>A number of complex physical problems can be approached through N-body <span class="hlt">simulation</span>, from fluid flow at high Reynolds number to gravitational astrophysics and molecular dynamics. In all these applications, direct summation is prohibitively expensive for large N and thus hierarchical methods are employed for fast summation. This work introduces new algorithms, based on k-means clustering, for partitioning <span class="hlt">parallel</span> hierarchical N-body interactions. We demonstrate that the number of particle-cluster interactions and the order at which they are performed are directly affected by partition geometry. Weighted k-means partitions minimize the sum of clusters' second moments and create well-localized domains, and thus reduce the computational cost of N-body approximations by enabling the use of lower-order approximations and fewer cells. We also introduce compatible techniques for dynamic load balancing, including adaptive scaling of cluster volumes and adaptive redistribution of cluster centroids. We demonstrate the performance of these algorithms by constructing a <span class="hlt">parallel</span> treecode for vortex particle <span class="hlt">simulations</span>, based on the serial variable-order Cartesian code developed by Lindsay and Krasny [Journal of Computational Physics 172 (2) (2001) 879-907]. The method is applied to vortex <span class="hlt">simulations</span> of a transverse jet. Results show outstanding <span class="hlt">parallel</span> efficiencies even at high concurrencies, with velocity evaluation errors maintained at or below their serial values; on a realistic distribution of 1.2 million vortex particles, we observe a <span class="hlt">parallel</span> efficiency of 98% on 1024 processors. Excellent load balance is achieved even in the face of several obstacles, such as an irregular, time-evolving particle distribution containing a range of length scales and the continual introduction of new vortex particles throughout the domain. Moreover, results suggest that k-means yields a more efficient partition of the domain than a global oct-tree.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://adsabs.harvard.edu/abs/2014APS..MAR.C1284S','NASAADS'); return false;" href="http://adsabs.harvard.edu/abs/2014APS..MAR.C1284S"><span id="translatedtitle"><span class="hlt">Parallelized</span> Multi-Worm Algorithm for Large Scale Quantum Monte-Carlo <span class="hlt">simulations</span></span></a></p> <p><a target="_blank" href="http://adsabs.harvard.edu/abstract_service.html">NASA Astrophysics Data System (ADS)</a></p> <p>Suzuki, Takafumi; Masaki-Kato, Akiko; Harada, Kenji; Todo, Synge; Kawashima, Naoki</p> <p>2014-03-01</p> <p>The quantum Monte Carlo (QMC) calculation is a powerful and accurate method for quantum many body interacting systems. In this study, we present a new algorithm for the worldline Monte Carlo method based on the Feynman path integral. While the worm algorithm (WA) has been used widely because of its broader range of applicability, the <span class="hlt">parallelization</span> of WA is not straightforward. We present a general QMC algorithm based on the directed-loop algorithm with the domain decomposition. This new algorithm is referred to as <span class="hlt">Parallelized</span> Multi-Worm Algorithm (PMWA). In PMWA, a large number of worms are introduced by controlling a fictitious transverse field. For a benchmark, we applied the PMWA to the hardcore Bose-Hubbard model on the square lattice, and computed the system-size dependence of the Bose-condensation order parameter up to L2 =102402 by using 3200 processors. The benchmark results showed high <span class="hlt">parallelization</span> efficiency. This indicates that the PMWA is suitable for <span class="hlt">parallelizing</span> on a distributed-memory computer.</p> </li> <li> <p><a target="_blank" onclick="trackOutboundLink('http://www.osti.gov/scitech/biblio/1027425','SCIGOV-STC'); return false;" href="http://www.osti.gov/scitech/biblio/1027425"><span id="translatedtitle">A Scalable <span class="hlt">Parallel</span> Wideband MLFMA for Efficient Electromagnetic <span class="hlt">Simulations</span> on Large Scale Clusters</span></a></p> <p><a target="_blank" href="http://www.osti.gov/scitech">SciTech Connect</a></p> <p>Melapudi, Vikram; Balasubramaniam, Shanker; Seal, Sudip K; Aluru, Srinivas</p> <p>2011-01-01</p> <p>The development of the multilevel fast multipole algorithm (MLFMA) and its multiscale variants have enabled the use of integral equation (IE) based solvers to compute scattering from complicated structures. Development of s