Science.gov

Sample records for parallel tree code

  1. Parallelized tree-code for clusters of personal computers

    NASA Astrophysics Data System (ADS)

    Viturro, H. R.; Carpintero, D. D.

    2000-02-01

    We present a tree-code for integrating the equations of the motion of collisionless systems, which has been fully parallelized and adapted to run in several PC-based processors simultaneously, using the well-known PVM message passing library software. SPH algorithms, not yet included, may be easily incorporated to the code. The code is written in ANSI C; it can be freely downloaded from a public ftp site. Simulations of collisions of galaxies are presented, with which the performance of the code is tested.

  2. A parallel TreeSPH code for galaxy formation

    NASA Astrophysics Data System (ADS)

    Lia, Cesario; Carraro, Giovanni

    2000-05-01

    We describe a new implementation of a parallel TreeSPH code with the aim of simulating galaxy formation and evolution. The code has been parallelized using shmem, a Cray proprietary library to handle communications between the 256 processors of the Silicon Graphics T3E massively parallel supercomputer hosted by the Cineca Super-computing Center (Bologna, Italy).1 The code combines the smoothed particle hydrodynamics (SPH) method for solving hydrodynamical equations with the popular Barnes & Hut tree-code to perform gravity calculation with an N×logN scaling, and it is based on the scalar TreeSPH code developed by Carraro et al. Parallelization is achieved by distributing particles along processors according to a workload criterion. Benchmarks, in terms of load balance and scalability, of the code are analysed and critically discussed against the adiabatic collapse of an isothermal gas sphere test using 2×104 particles on 8 processors. The code results balance at more than the 95per cent level. Increasing the number of processors, the load balance slightly worsens. The deviation from perfect scalability for increasing number of processors is almost negligible up to 32 processors. Finally, we present a simulation of the formation of an X-ray galaxy cluster in a flat cold dark matter cosmology, using 2×105 particles and 32 processors, and compare our results with Evrard's P3M-SPH simulations. Additionally we have incorporated radiative cooling, star formation, feedback from SNe of types II and Ia, stellar winds and UV flux from massive stars, and an algorithm to follow the chemical enrichment of the interstellar medium. Simulations with some of these ingredients are also presented.

  3. FLY MPI-2: a parallel tree code for LSS

    NASA Astrophysics Data System (ADS)

    Becciani, U.; Comparato, M.; Antonuccio-Delogu, V.

    2006-04-01

    New version program summaryProgram title: FLY 3.1 Catalogue identifier: ADSC_v2_0 Licensing provisions: yes Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADSC_v2_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland No. of lines in distributed program, including test data, etc.: 158 172 No. of bytes in distributed program, including test data, etc.: 4 719 953 Distribution format: tar.gz Programming language: Fortran 90, C Computer: Beowulf cluster, PC, MPP systems Operating system: Linux, Aix RAM: 100M words Catalogue identifier of previous version: ADSC_v1_0 Journal reference of previous version: Comput. Phys. Comm. 155 (2003) 159 Does the new version supersede the previous version?: yes Nature of problem: FLY is a parallel collisionless N-body code for the calculation of the gravitational force Solution method: FLY is based on the hierarchical oct-tree domain decomposition introduced by Barnes and Hut (1986) Reasons for the new version: The new version of FLY is implemented by using the MPI-2 standard: the distributed version 3.1 was developed by using the MPICH2 library on a PC Linux cluster. Today the FLY performance allows us to consider the FLY code among the most powerful parallel codes for tree N-body simulations. Another important new feature regards the availability of an interface with hydrodynamical Paramesh based codes. Simulations must follow a box large enough to accurately represent the power spectrum of fluctuations on very large scales so that we may hope to compare them meaningfully with real data. The number of particles then sets the mass resolution of the simulation, which we would like to make as fine as possible. The idea to build an interface between two codes, that have different and complementary cosmological tasks, allows us to execute complex cosmological simulations with FLY, specialized for DM evolution, and a code specialized for hydrodynamical components that uses a Paramesh block

  4. An implementation of a tree code on a SIMD, parallel computer

    NASA Technical Reports Server (NTRS)

    Olson, Kevin M.; Dorband, John E.

    1994-01-01

    We describe a fast tree algorithm for gravitational N-body simulation on SIMD parallel computers. The tree construction uses fast, parallel sorts. The sorted lists are recursively divided along their x, y and z coordinates. This data structure is a completely balanced tree (i.e., each particle is paired with exactly one other particle) and maintains good spatial locality. An implementation of this tree-building algorithm on a 16k processor Maspar MP-1 performs well and constitutes only a small fraction (approximately 15%) of the entire cycle of finding the accelerations. Each node in the tree is treated as a monopole. The tree search and the summation of accelerations also perform well. During the tree search, node data that is needed from another processor is simply fetched. Roughly 55% of the tree search time is spent in communications between processors. We apply the code to two problems of astrophysical interest. The first is a simulation of the close passage of two gravitationally, interacting, disk galaxies using 65,636 particles. We also simulate the formation of structure in an expanding, model universe using 1,048,576 particles. Our code attains speeds comparable to one head of a Cray Y-MP, so single instruction, multiple data (SIMD) type computers can be used for these simulations. The cost/performance ratio for SIMD machines like the Maspar MP-1 make them an extremely attractive alternative to either vector processors or large multiple instruction, multiple data (MIMD) type parallel computers. With further optimizations (e.g., more careful load balancing), speeds in excess of today's vector processing computers should be possible.

  5. An implementation of a tree code on a SIMD, parallel computer

    NASA Astrophysics Data System (ADS)

    Olson, Kevin M.; Dorband, John E.

    1994-08-01

    We describe a fast tree algorithm for gravitational N-body simulation on SIMD parallel computers. The tree construction uses fast, parallel sorts. The sorted lists are recursively divided along their x, y and z coordinates. This data structure is a completely balanced tree (i.e., each particle is paired with exactly one other particle) and maintains good spatial locality. An implementation of this tree-building algorithm on a 16k processor Maspar MP-1 performs well and constitutes only a small fraction (approximately 15%) of the entire cycle of finding the accelerations. Each node in the tree is treated as a monopole. The tree search and the summation of accelerations also perform well. During the tree search, node data that is needed from another processor is simply fetched. Roughly 55% of the tree search time is spent in communications between processors. We apply the code to two problems of astrophysical interest. The first is a simulation of the close passage of two gravitationally, interacting, disk galaxies using 65,636 particles. We also simulate the formation of structure in an expanding, model universe using 1,048,576 particles. Our code attains speeds comparable to one head of a Cray Y-MP, so single instruction, multiple data (SIMD) type computers can be used for these simulations. The cost/performance ratio for SIMD machines like the Maspar MP-1 make them an extremely attractive alternative to either vector processors or large multiple instruction, multiple data (MIMD) type parallel computers. With further optimizations (e.g., more careful load balancing), speeds in excess of today's vector processing computers should be possible.

  6. FLY. A parallel tree N-body code for cosmological simulations

    NASA Astrophysics Data System (ADS)

    Antonuccio-Delogu, V.; Becciani, U.; Ferro, D.

    2003-10-01

    FLY is a parallel treecode which makes heavy use of the one-sided communication paradigm to handle the management of the tree structure. In its public version the code implements the equations for cosmological evolution, and can be run for different cosmological models. This reference guide describes the actual implementation of the algorithms of the public version of FLY, and suggests how to modify them to implement other types of equations (for instance, the Newtonian ones). Program summary Title of program: FLY Catalogue identifier: ADSC Program summary URL: http://cpc.cs.qub.ac.uk/summaries/ADSC Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: Cray T3E, Sgi Origin 3000, IBM SP Operating systems or monitors under which the program has been tested: Unicos 2.0.5.40, Irix 6.5.14, Aix 4.3.3 Programming language used: Fortran 90, C Memory required to execute with typical data: about 100 Mwords with 2 million-particles Number of bits in a word: 32 Number of processors used: parallel program. The user can select the number of processors >=1 Has the code been vectorized or parallelized?: parallelized Number of bytes in distributed program, including test data, etc.: 4615604 Distribution format: tar gzip file Keywords: Parallel tree N-body code for cosmological simulations Nature of physical problem: FLY is a parallel collisionless N-body code for the calculation of the gravitational force. Method of solution: It is based on the hierarchical oct-tree domain decomposition introduced by Barnes and Hut (1986). Restrictions on the complexity of the program: The program uses the leapfrog integrator schema, but could be changed by the user. Typical running time: 50 seconds for each time-step, running a 2-million-particles simulation on an Sgi Origin 3800 system with 8 processors having 512 Mbytes RAM for each processor. Unusual features of the program: FLY

  7. EvoL: the new Padova Tree-SPH parallel code for cosmological simulations. I. Basic code: gravity and hydrodynamics

    NASA Astrophysics Data System (ADS)

    Merlin, E.; Buonomo, U.; Grassi, T.; Piovan, L.; Chiosi, C.

    2010-04-01

    Context. We present the new release of the Padova N-body code for cosmological simulations of galaxy formation and evolution, EvoL. The basic Tree + SPH code is presented and analysed, together with an overview of the software architectures. Aims: EvoL is a flexible parallel Fortran95 code, specifically designed for simulations of cosmological structure formations on cluster, galactic and sub-galactic scales. Methods: EvoL is a fully Lagrangian self-adaptive code, based on the classical oct-tree by Barnes & Hut (1986, Nature, 324, 446) and on the smoothed particle hydrodynamics algorithm (SPH, Lucy 1977, AJ, 82, 1013). It includes special features like adaptive softening lengths with correcting extra-terms, and modern formulations of SPH and artificial viscosity. It is designed to be run in parallel on multiple CPUs to optimise the performance and save computational time. Results: We describe the code in detail, and present the results of a number of standard hydrodynamical tests.

  8. Giant impacts during planet formation: Parallel tree code simulations using smooth particle hydrodynamics

    NASA Astrophysics Data System (ADS)

    Cohen, Randi L.

    There is both theoretical and observational evidence that giant planets collided with objects ≥ Mearth during their evolution. These impacts may play a key role in giant planet formation. This paper describes impacts of a ˜ Earth-mass object onto a suite of proto-giant-planets, as simulated using an SPH parallel tree code. We run 6 simulations, varying the impact angle and evolutionary stage of the proto-Jupiter. We find that it is possible for an impactor to free some mass from the core of the proto-planet it impacts through direct collision, as well as to make physical contact with the core yet escape partially, or even completely, intact. None of the 6 cases we consider produced a solid disk or resulted in a net decrease in the core mass of the pinto-planet (since the mass decrease due to disruption was outweighed by the increase due to the addition of the impactor's mass to the core). However, we suggest parameters which may have these effects, and thus decrease core mass and formation time in protoplanetary models and/or create satellite systems. We find that giant impacts can remove significant envelope mass from forming giant planets, leaving only 2 MEarth of gas, similar to Uranus and Neptune. They can also create compositional inhomogeneities in planetary cores, which creates differences in planetary thermal emission characteristics.

  9. Giant Impacts During Planet Formation: Parallel Tree Code Simulations Using Smooth Particle Hydrodynamics

    NASA Astrophysics Data System (ADS)

    Cohen, R.; Bodenheimer, P.; Asphaug, E.

    2000-12-01

    There is both theoretical and observational evidence that giant planets collided with objects with mass >= Mearth during their evolution. These impacts may help shorten planetary formation timescales by changing the opacity of the planetary atmosphere to allow quicker cooling. They may also redistribute heavy metals within giant planets, affect the core/envelope mass ratio, and help determine the ratio of emitted to absorbed energy within giant planets. Thus, the researchers propose to simulate the impact of a ~ Earth-mass object onto a proto-giant-planet with SPH. Results of the SPH collision models will be input into a steady-state planetary evolution code and the effect of impacts on formation timescales, core/envelope mass ratios, density profiles, and thermal emissions of giant planets will be quantified. The collision will be modelled using a modified version of an SPH routine which simulates the collision of two polytropes. The Saumon-Chabrier and Tillotson equations of state will replace the polytropic equation of state. The parallel tree algorithm of Olson & Packer will be used for the domain decomposition and neighbor search necessary to calculate pressure and self-gravity efficiently. This work is funded by the NASA Graduate Student Researchers Program.

  10. TPM: Tree-Particle-Mesh code

    NASA Astrophysics Data System (ADS)

    Bode, Paul

    2013-05-01

    TPM carries out collisionless (dark matter) cosmological N-body simulations, evolving a system of N particles as they move under their mutual gravitational interaction. It combines aspects of both Tree and Particle-Mesh algorithms. After the global PM forces are calculated, spatially distinct regions above a given density contrast are located; the tree code calculates the gravitational interactions inside these denser objects at higher spatial and temporal resolution. The code is parallel and uses MPI for message passing.

  11. Efficient tree codes on SIMD computer architectures

    NASA Astrophysics Data System (ADS)

    Olson, Kevin M.

    1996-11-01

    This paper describes changes made to a previous implementation of an N -body tree code developed for a fine-grained, SIMD computer architecture. These changes include (1) switching from a balanced binary tree to a balanced oct tree, (2) addition of quadrupole corrections, and (3) having the particles search the tree in groups rather than individually. An algorithm for limiting errors is also discussed. In aggregate, these changes have led to a performance increase of over a factor of 10 compared to the previous code. For problems several times larger than the processor array, the code now achieves performance levels of ~ 1 Gflop on the Maspar MP-2 or roughly 20% of the quoted peak performance of this machine. This percentage is competitive with other parallel implementations of tree codes on MIMD architectures. This is significant, considering the low relative cost of SIMD architectures.

  12. Parallelization of the SIR code

    NASA Astrophysics Data System (ADS)

    Thonhofer, S.; Bellot Rubio, L. R.; Utz, D.; Jurčak, J.; Hanslmeier, A.; Piantschitsch, I.; Pauritsch, J.; Lemmerer, B.; Guttenbrunner, S.

    A high-resolution 3-dimensional model of the photospheric magnetic field is essential for the investigation of small-scale solar magnetic phenomena. The SIR code is an advanced Stokes-inversion code that deduces physical quantities, e.g. magnetic field vector, temperature, and LOS velocity, from spectropolarimetric data. We extended this code by the capability of directly using large data sets and inverting the pixels in parallel. Due to this parallelization it is now feasible to apply the code directly on extensive data sets. Besides, we included the possibility to use different initial model atmospheres for the inversion, which enhances the quality of the results.

  13. PARAVT: Parallel Voronoi Tessellation code

    NASA Astrophysics Data System (ADS)

    Gonzalez, Roberto E.

    2016-01-01

    We present a new open source code for massive parallel computation of Voronoi tessellations(VT hereafter) in large data sets. The code is focused for astrophysical purposes where VT densities and neighbors are widely used. There are several serial Voronoi tessellation codes, however no open source and parallel implementations are available to handle the large number of particles/galaxies in current N-body simulations and sky surveys. Parallelization is implemented under MPI and VT using Qhull library. Domain decomposition take into account consistent boundary computation between tasks, and support periodic conditions. In addition, the code compute neighbors lists, Voronoi density and Voronoi cell volumes for each particle, and can compute density on a regular grid.

  14. Parallel Tree-SPH: A Tool for Galaxy Formation

    NASA Astrophysics Data System (ADS)

    Lia, C.; Carraro, G.

    We describe a new implementation of a parallel Tree-SPH code with the aim of simulating galaxy formation and evolution. The code has been parallelized using SHMEM, a Cray proprietary library to handle communications between the 256 processors of the Silicon Graphics T3E massively parallel supercomputer hosted by the Cineca Super-computing Center (Bologna, Italy). The code combines the smoothed particle hydrodynamics (SPH) method to solve hydrodynamical equations with the popular Barnes and Hut (1986) tree-code to perform gravity calculation with a N × log N scaling, and it is based on the scalar Tree-SPH code developed by Carraro et al. (1998). Parallelization is achieved by distributing particles along processors according to a workload criterion. Benchmarks of the code, in terms of load balance and scalability, are analysed and critically discussed against the adiabatic collapse of an isothermal gas sphere test using 2 × 10^4 particles on eight processors. The code turns out to be balanced at more than 95% level. If the number of processors is increased, the load balance worsens slightly. The deviation from perfect scalability at increasing number of processors is negligible up to 64 processors. Additionally we have incorporated radiative cooling, star formation, feedback and an algorithm to follow the chemical enrichment of the interstellar medium.

  15. Parallel search of strongly ordered game trees

    SciTech Connect

    Marsland, T.A.; Campbell, M.

    1982-12-01

    The alpha-beta algorithm forms the basis of many programs that search game trees. A number of methods have been designed to improve the utility of the sequential version of this algorithm, especially for use in game-playing programs. These enhancements are based on the observation that alpha beta is most effective when the best move in each position is considered early in the search. Trees that have this so-called strong ordering property are not only of practical importance but possess characteristics that can be exploited in both sequential and parallel environments. This paper draws upon experiences gained during the development of programs which search chess game trees. Over the past decade major enhancements of the alpha beta algorithm have been developed by people building game-playing programs, and many of these methods will be surveyed and compared here. The balance of the paper contains a study of contemporary methods for searching chess game trees in parallel, using an arbitrary number of independent processors. To make efficient use of these processors, one must have a clear understanding of the basic properties of the trees actually traversed when alpha-beta cutoffs occur. This paper provides such insights and concludes with a brief description of a refinement to a standard parallel search algorithm for this problem. 33 references.

  16. Code Parallelization with CAPO: A User Manual

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)

    2001-01-01

    A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.

  17. National Combustion Code: Parallel Implementation and Performance

    NASA Technical Reports Server (NTRS)

    Quealy, A.; Ryder, R.; Norris, A.; Liu, N.-S.

    2000-01-01

    The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. CORSAIR-CCD is the current baseline reacting flow solver for NCC. This is a parallel, unstructured grid code which uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC flow solver to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This paper describes the parallel implementation of the NCC flow solver and summarizes its current parallel performance on an SGI Origin 2000. Earlier parallel performance results on an IBM SP-2 are also included. The performance improvements which have enabled a turnaround of less than 15 hours for a 1.3 million element fully reacting combustion simulation are described.

  18. National Combustion Code Parallel Performance Enhancements

    NASA Technical Reports Server (NTRS)

    Quealy, Angela; Benyo, Theresa (Technical Monitor)

    2002-01-01

    The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. The unstructured grid, reacting flow code uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC code to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This report describes recent parallel processing modifications to NCC that have improved the parallel scalability of the code, enabling a two hour turnaround for a 1.3 million element fully reacting combustion simulation on an SGI Origin 2000.

  19. OVERAERO-MPI: Parallel Overset Aeroelasticity Code

    NASA Technical Reports Server (NTRS)

    Gee, Ken; Rizk, Yehia M.

    1999-01-01

    An overset modal structures analysis code was integrated with a parallel overset Navier-Stokes flow solver to obtain a code capable of static aeroelastic computations. The new code was used to compute the static aeroelastic deformation of an arrow-wing-body geometry and a complex, full aircraft configuration. For the simple geometry, the results were similar to the results obtained with the ENSAERO code and the PVM version of OVERAERO. The full potential of this code suite was illustrated in the complex, full aircraft computations.

  20. Bitplane Image Coding With Parallel Coefficient Processing.

    PubMed

    Auli-Llinas, Francesc; Enfedaque, Pablo; Moure, Juan C; Sanchez, Victor

    2016-01-01

    Image coding systems have been traditionally tailored for multiple instruction, multiple data (MIMD) computing. In general, they partition the (transformed) image in codeblocks that can be coded in the cores of MIMD-based processors. Each core executes a sequential flow of instructions to process the coefficients in the codeblock, independently and asynchronously from the others cores. Bitplane coding is a common strategy to code such data. Most of its mechanisms require sequential processing of the coefficients. The last years have seen the upraising of processing accelerators with enhanced computational performance and power efficiency whose architecture is mainly based on the single instruction, multiple data (SIMD) principle. SIMD computing refers to the execution of the same instruction to multiple data in a lockstep synchronous way. Unfortunately, current bitplane coding strategies cannot fully profit from such processors due to inherently sequential coding task. This paper presents bitplane image coding with parallel coefficient (BPC-PaCo) processing, a coding method that can process many coefficients within a codeblock in parallel and synchronously. To this end, the scanning order, the context formation, the probability model, and the arithmetic coder of the coding engine have been re-formulated. The experimental results suggest that the penalization in coding performance of BPC-PaCo with respect to the traditional strategies is almost negligible. PMID:26441420

  1. Parallel applications of the USNRC consolidated code

    NASA Astrophysics Data System (ADS)

    Gan, Jun; Downar, Thomas J.; Mahaffy, John H.; Uhle, Jennifer L.

    2001-07-01

    The United States Nuclear Regulatory Commission has developed the thermal-hydraulic analysis code TRAC-M to consolidate the capabilities of its suite of reactor safety analysis codes. One of the requirements for the new consolidated code is that it supports parallel computations to extend code functionality and to improve execution speed. A flexible request driven Exterior Communication Interface (ECI) was developed at Penn State University for use with the consolidated code and has enabled distributed parallel computing. This paper reports the application of TRAC-M and the ECI at Purdue University to a series of practical nuclear reactor problems. The performance of the consolidated code is studied on a shared memory machine, DEC Alpha 8400, in which a Large Break Loss of Coolant Accident (LBLOCA) analysis is applied for the safety analysis of the new generation reactor, AP600. The problem demonstrates the importance of balancing the computational for practical applications. Other computational platforms are also examined, to include the implementation of Linux and Windows OS on multiprocessor PCs. In general, the parallel performance on UNIX and Linux platforms is found to be the most stable and efficient.

  2. GRADSPMHD: A parallel MHD code based on the SPH formalism

    NASA Astrophysics Data System (ADS)

    Vanaverbeke, S.; Keppens, R.; Poedts, S.

    2014-03-01

    We present GRADSPMHD, a completely Lagrangian parallel magnetohydrodynamics code based on the SPH formalism. The implementation of the equations of SPMHD in the “GRAD-h” formalism assembles known results, including the derivation of the discretized MHD equations from a variational principle, the inclusion of time-dependent artificial viscosity, resistivity and conductivity terms, as well as the inclusion of a mixed hyperbolic/parabolic correction scheme for satisfying the ∇ṡB→ constraint on the magnetic field. The code uses a tree-based formalism for neighbor finding and can optionally use the tree code for computing the self-gravity of the plasma. The structure of the code closely follows the framework of our parallel GRADSPH FORTRAN 90 code which we added previously to the CPC program library. We demonstrate the capabilities of GRADSPMHD by running 1, 2, and 3 dimensional standard benchmark tests and we find good agreement with previous work done by other researchers. The code is also applied to the problem of simulating the magnetorotational instability in 2.5D shearing box tests as well as in global simulations of magnetized accretion disks. We find good agreement with available results on this subject in the literature. Finally, we discuss the performance of the code on a parallel supercomputer with distributed memory architecture. Catalogue identifier: AERP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERP_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 620503 No. of bytes in distributed program, including test data, etc.: 19837671 Distribution format: tar.gz Programming language: FORTRAN 90/MPI. Computer: HPC cluster. Operating system: Unix. Has the code been vectorized or parallelized?: Yes, parallelized using MPI. RAM: ˜30 MB for a

  3. Generalized parallelization methodology for video coding

    NASA Astrophysics Data System (ADS)

    Leung, Kwong-Keung; Yung, Nelson H. C.

    1998-12-01

    This paper describes a generalized parallelization methodology for mapping video coding algorithms onto a multiprocessing architecture, through systematic task decomposition, scheduling and performance analysis. It exploits data parallelism inherent in the coding process and performs task scheduling base on task data size and access locality with the aim to hide as much communication overhead as possible. Utilizing Petri-nets and task graphs for representation and analysis, the method enables parallel video frame capturing, buffering and encoding without extra communication overhead. The theoretical speedup analysis indicates that this method offers excellent communication hiding, resulting in system efficiency well above 90%. A H.261 video encoder has been implemented on a TMS320C80 system using this method, and its performance was measured. The theoretical and measured performances are similar in that the measured speedup of the H.261 is 3.67 and 3.76 on four PP for QCIF and 352 X 240 video, respectively. They correspond to frame rates of 30.7 frame per second (fps) and 9.25 fps, and system efficiency of 91.8% and 94% respectively. As it is, this method is particularly efficient for platforms with small number of parallel processors.

  4. Parallel object-oriented decision tree system

    DOEpatents

    Kamath; Chandrika , Cantu-Paz; Erick

    2006-02-28

    A data mining decision tree system that uncovers patterns, associations, anomalies, and other statistically significant structures in data by reading and displaying data files, extracting relevant features for each of the objects, and using a method of recognizing patterns among the objects based upon object features through a decision tree that reads the data, sorts the data if necessary, determines the best manner to split the data into subsets according to some criterion, and splits the data.

  5. Performance studies of the parallel VIM code

    SciTech Connect

    Shi, B.; Blomquist, R.N.

    1996-05-01

    In this paper, the authors evaluate the performance of the parallel version of the VIM Monte Carlo code on the IBM SPx at the High Performance Computing Research Facility at ANL. Three test problems with contrasting computational characteristics were used to assess effects in performance. A statistical method for estimating the inefficiencies due to load imbalance and communication is also introduced. VIM is a large scale continuous energy Monte Carlo radiation transport program and was parallelized using history partitioning, the master/worker approach, and p4 message passing library. Dynamic load balancing is accomplished when the master processor assigns chunks of histories to workers that have completed a previously assigned task, accommodating variations in the lengths of histories, processor speeds, and worker loads. At the end of each batch (generation), the fission sites and tallies are sent from each worker to the master process, contributing to the parallel inefficiency. All communications are between master and workers, and are serial. The SPx is a scalable 128-node parallel supercomputer with high-performance Omega switches of 63 {micro}sec latency and 35 MBytes/sec bandwidth. For uniform and reproducible performance, they used only the 120 identical regular processors (IBM RS/6000) and excluded the remaining eight planet nodes, which may be loaded by other`s jobs.

  6. A parallel and modular deformable cell Car-Parrinello code

    NASA Astrophysics Data System (ADS)

    Cavazzoni, Carlo; Chiarotti, Guido L.

    1999-12-01

    We have developed a modular parallel code implementing the Car-Parrinello [Phys. Rev. Lett. 55 (1985) 2471] algorithm including the variable cell dynamics [Europhys. Lett. 36 (1994) 345; J. Phys. Chem. Solids 56 (1995) 510]. Our code is written in Fortran 90, and makes use of some new programming concepts like encapsulation, data abstraction and data hiding. The code has a multi-layer hierarchical structure with tree like dependences among modules. The modules include not only the variables but also the methods acting on them, in an object oriented fashion. The modular structure allows easier code maintenance, develop and debugging procedures, and is suitable for a developer team. The layer structure permits high portability. The code displays an almost linear speed-up in a wide range of number of processors independently of the architecture. Super-linear speed up is obtained with a "smart" Fast Fourier Transform (FFT) that uses the available memory on the single node (increasing for a fixed problem with the number of processing elements) as temporary buffer to store wave function transforms. This code has been used to simulate water and ammonia at giant planet conditions for systems as large as 64 molecules for ˜50 ps.

  7. Adaptive Dynamic Event Tree in RAVEN code

    SciTech Connect

    Alfonsi, Andrea; Rabiti, Cristian; Mandelli, Diego; Cogliati, Joshua Joseph; Kinoshita, Robert Arthur

    2014-11-01

    RAVEN is a software tool that is focused on performing statistical analysis of stochastic dynamic systems. RAVEN has been designed in a high modular and pluggable way in order to enable easy integration of different programming languages (i.e., C++, Python) and coupling with other applications (system codes). Among the several capabilities currently present in RAVEN, there are five different sampling strategies: Monte Carlo, Latin Hyper Cube, Grid, Adaptive and Dynamic Event Tree (DET) sampling methodologies. The scope of this paper is to present a new sampling approach, currently under definition and implementation: an evolution of the DET me

  8. Parallel solid mechanics codes at Sandia National Laboratories

    SciTech Connect

    McGlaun, M.

    1994-08-01

    Computational physicists at Sandia National Laboratories have moved their production codes to distributed memory parallel computers. The codes include the multi-material CTH Eulerian code, structural mechanics code. This presentation discusses our experiences moving the codes to parallel computers and experiences running the codes. Moving large production codes onto parallel computers require developing parallel algorithms, parallel data bases and parallel support tools. We rewrote the Eulerian CTH code for parallel computers. We were able to move both ALEGRA and PRONTO to parallel computers with only a modest number of modifications. We restructured the restart and graphics data bases to make them parallel and minimize the I/O to the parallel computer. We developed mesh decomposition tools to divide a rectangular or arbitrary connectivity mesh into sub-meshes. The sub-meshes map to processors and minimize the communication between processors. We developed new visualization tools to process the very large, parallel data bases. This presentation also discusses our experiences running these codes on Sandia`s 1840 compute node Intel Paragon, 1024 processor nCUBE and networked workstations. The parallel version of CTH uses the Paragon and nCUBE for production calculations. The ALEGRA and PRONTO codes are moving off networked workstations onto the Paragon and nCUBE massively parallel computers.

  9. HELENA code with incompressible parallel plasma flow

    NASA Astrophysics Data System (ADS)

    Throumoulopoulos, George; Poulipoulis, George; Konz, Christian; EFDA ITM-TF Team

    2015-11-01

    It has been established that plasma rotation in connection to both zonal and equilibrium flow can play a role in the transitions to the advanced confinement regimes in tokamaks, as the L-H transition and the formation of Internal Transport Barriers. For incompressible rotation the equilibrium is governed by a generalized Grad-Shafranov (GGS) equation and a decoupled Bernoulli-type equation for the pressure. For parallel flow the GGS equation can be transformed to one identical in form with the usual Grad-Shafranov equation. In the present study on the basis of the latter equation we have extended HELENA, an equilibrium fixed boundary solver integrated in the ITM-TF modeling infrastructure. The extended code solves the GGS equation for a variety of the two free-surface-function terms involved for arbitrary Afvén Mach functions. We have constructed diverted-boundary equilibria pertinent to ITER and examined their characteristics, in particular as concerns the impact of rotation. It turns out that the rotation affects noticeably the pressure and toroidal current density with the impact on the current density being stronger in the parallel direction than in the toroidal one. Also, the linear stability of the equilibria constructed is examined This work has been carried out within the framework of the EUROfuion Consortium and has received funding from the National Programme for the Controlled Thermonuclear Fusion, Hellenic Republic.

  10. Portable, parallel, reusable Krylov space codes

    SciTech Connect

    Smith, B.; Gropp, W.

    1994-12-31

    Krylov space accelerators are an important component of many algorithms for the iterative solution of linear systems. Each Krylov space method has it`s own particular advantages and disadvantages, therefore it is desirable to have a variety of them available all with an identical, easy to use, interface. A common complaint application programmers have with available software libraries for the iterative solution of linear systems is that they require the programmer to use the data structures provided by the library. The library is not able to work with the data structures of the application code. Hence, application programmers find themselves constantly recoding the Krlov space algorithms. The Krylov space package (KSP) is a data-structure-neutral implementation of a variety of Krylov space methods including preconditioned conjugate gradient, GMRES, BiCG-Stab, transpose free QMR and CGS. Unlike all other software libraries for linear systems that the authors are aware of, KSP will work with any application codes data structures, in Fortran or C. Due to it`s data-structure-neutral design KSP runs unchanged on both sequential and parallel machines. KSP has been tested on workstations, the Intel i860 and Paragon, Thinking Machines CM-5 and the IBM SP1.

  11. Parafrase restructuring of FORTRAN code for parallel processing

    NASA Technical Reports Server (NTRS)

    Wadhwa, Atul

    1988-01-01

    Parafrase transforms a FORTRAN code, subroutine by subroutine, into a parallel code for a vector and/or shared-memory multiprocessor system. Parafrase is not a compiler; it transforms a code and provides information for a vector or concurrent process. Parafrase uses a data dependency to reveal parallelism among instructions. The data dependency test distinguishes between recurrences and statements that can be directly vectorized or parallelized. A number of transformations are required to build a data dependency graph.

  12. Identifying failure in a tree network of a parallel computer

    DOEpatents

    Archer, Charles J.; Pinnow, Kurt W.; Wallenfelt, Brian P.

    2010-08-24

    Methods, parallel computers, and products are provided for identifying failure in a tree network of a parallel computer. The parallel computer includes one or more processing sets including an I/O node and a plurality of compute nodes. For each processing set embodiments include selecting a set of test compute nodes, the test compute nodes being a subset of the compute nodes of the processing set; measuring the performance of the I/O node of the processing set; measuring the performance of the selected set of test compute nodes; calculating a current test value in dependence upon the measured performance of the I/O node of the processing set, the measured performance of the set of test compute nodes, and a predetermined value for I/O node performance; and comparing the current test value with a predetermined tree performance threshold. If the current test value is below the predetermined tree performance threshold, embodiments include selecting another set of test compute nodes. If the current test value is not below the predetermined tree performance threshold, embodiments include selecting from the test compute nodes one or more potential problem nodes and testing individually potential problem nodes and links to potential problem nodes.

  13. Petascale Parallelization of the Gyrokinetic Toroidal Code

    SciTech Connect

    Ethier, Stephane; Adams, Mark; Carter, Jonathan; Oliker, Leonid

    2010-05-01

    The Gyrokinetic Toroidal Code (GTC) is a global, three-dimensional particle-in-cell application developed to study microturbulence in tokamak fusion devices. The global capability of GTC is unique, allowing researchers to systematically analyze important dynamics such as turbulence spreading. In this work we examine a new radial domain decomposition approach to allow scalability onto the latest generation of petascale systems. Extensive performance evaluation is conducted on three high performance computing systems: the IBM BG/P, the Cray XT4, and an Intel Xeon Cluster. Overall results show that the radial decomposition approach dramatically increases scalability, while reducing the memory footprint - allowing for fusion device simulations at an unprecedented scale. After a decade where high-end computing (HEC) was dominated by the rapid pace of improvements to processor frequencies, the performance of next-generation supercomputers is increasingly differentiated by varying interconnect designs and levels of integration. Understanding the tradeoffs of these system designs is a key step towards making effective petascale computing a reality. In this work, we examine a new parallelization scheme for the Gyrokinetic Toroidal Code (GTC) [?] micro-turbulence fusion application. Extensive scalability results and analysis are presented on three HEC systems: the IBM BlueGene/P (BG/P) at Argonne National Laboratory, the Cray XT4 at Lawrence Berkeley National Laboratory, and an Intel Xeon cluster at Lawrence Livermore National Laboratory. Overall results indicate that the new radial decomposition approach successfully attains unprecedented scalability to 131,072 BG/P cores by overcoming the memory limitations of the previous approach. The new version is well suited to utilize emerging petascale resources to access new regimes of physical phenomena.

  14. Parallel Algorithms for Graph Optimization using Tree Decompositions

    SciTech Connect

    Weerapurage, Dinesh P; Sullivan, Blair D; Groer, Christopher S

    2013-01-01

    Although many NP-hard graph optimization problems can be solved in polynomial time on graphs of bounded tree-width, the adoption of these techniques into mainstream scientific computation has been limited due to the high memory requirements of required dynamic programming tables and excessive running times of sequential implementations. This work addresses both challenges by proposing a set of new parallel algorithms for all steps of a tree-decomposition based approach to solve maximum weighted independent set. A hybrid OpenMP/MPI implementation includes a highly scalable parallel dynamic programming algorithm leveraging the MADNESS task-based runtime, and computational results demonstrate scaling. This work enables a significant expansion of the scale of graphs on which exact solutions to maximum weighted independent set can be obtained, and forms a framework for solving additional graph optimization problems with similar techniques.

  15. Parallel Algorithms for Graph Optimization using Tree Decompositions

    SciTech Connect

    Sullivan, Blair D; Weerapurage, Dinesh P; Groer, Christopher S

    2012-06-01

    Although many $\\cal{NP}$-hard graph optimization problems can be solved in polynomial time on graphs of bounded tree-width, the adoption of these techniques into mainstream scientific computation has been limited due to the high memory requirements of the necessary dynamic programming tables and excessive runtimes of sequential implementations. This work addresses both challenges by proposing a set of new parallel algorithms for all steps of a tree decomposition-based approach to solve the maximum weighted independent set problem. A hybrid OpenMP/MPI implementation includes a highly scalable parallel dynamic programming algorithm leveraging the MADNESS task-based runtime, and computational results demonstrate scaling. This work enables a significant expansion of the scale of graphs on which exact solutions to maximum weighted independent set can be obtained, and forms a framework for solving additional graph optimization problems with similar techniques.

  16. Parallel Continuous Flow: A Parallel Suffix Tree Construction Tool for Whole Genomes

    PubMed Central

    Farreras, Montse

    2014-01-01

    Abstract The construction of suffix trees for very long sequences is essential for many applications, and it plays a central role in the bioinformatic domain. With the advent of modern sequencing technologies, biological sequence databases have grown dramatically. Also the methodologies required to analyze these data have become more complex everyday, requiring fast queries to multiple genomes. In this article, we present parallel continuous flow (PCF), a parallel suffix tree construction method that is suitable for very long genomes. We tested our method for the suffix tree construction of the entire human genome, about 3GB. We showed that PCF can scale gracefully as the size of the input genome grows. Our method can work with an efficiency of 90% with 36 processors and 55% with 172 processors. We can index the human genome in 7 minutes using 172 processes. PMID:24597675

  17. Parallel family trees for transfer matrices in the Potts model

    NASA Astrophysics Data System (ADS)

    Navarro, Cristobal A.; Canfora, Fabrizio; Hitschfeld, Nancy; Navarro, Gonzalo

    2015-02-01

    The computational cost of transfer matrix methods for the Potts model is related to the question in how many ways can two layers of a lattice be connected? Answering the question leads to the generation of a combinatorial set of lattice configurations. This set defines the configuration space of the problem, and the smaller it is, the faster the transfer matrix can be computed. The configuration space of generic (q , v) transfer matrix methods for strips is in the order of the Catalan numbers, which grows asymptotically as O(4m) where m is the width of the strip. Other transfer matrix methods with a smaller configuration space indeed exist but they make assumptions on the temperature, number of spin states, or restrict the structure of the lattice. In this paper we propose a parallel algorithm that uses a sub-Catalan configuration space of O(3m) to build the generic (q , v) transfer matrix in a compressed form. The improvement is achieved by grouping the original set of Catalan configurations into a forest of family trees, in such a way that the solution to the problem is now computed by solving the root node of each family. As a result, the algorithm becomes exponentially faster than the Catalan approach while still highly parallel. The resulting matrix is stored in a compressed form using O(3m ×4m) of space, making numerical evaluation and decompression to be faster than evaluating the matrix in its O(4m ×4m) uncompressed form. Experimental results for different sizes of strip lattices show that the parallel family trees (PFT) strategy indeed runs exponentially faster than the Catalan Parallel Method (CPM), especially when dealing with dense transfer matrices. In terms of parallel performance, we report strong-scaling speedups of up to 5.7 × when running on an 8-core shared memory machine and 28 × for a 32-core cluster. The best balance of speedup and efficiency for the multi-core machine was achieved when using p = 4 processors, while for the cluster

  18. Parallel Spectral Transform Shallow Water Model: A runtime-tunable parallel benchmark code

    SciTech Connect

    Worley, P.H.; Foster, I.T.

    1994-05-01

    Fairness is an important issue when benchmarking parallel computers using application codes. The best parallel algorithm on one platform may not be the best on another. While it is not feasible to reevaluate parallel algorithms and reimplement large codes whenever new machines become available, it is possible to embed algorithmic options into codes that allow them to be ``tuned`` for a paticular machine without requiring code modifications. In this paper, we describe a code in which such an approach was taken. PSTSWM was developed for evaluating parallel algorithms for the spectral transform method in atmospheric circulation models. Many levels of runtime-selectable algorithmic options are supported. We discuss these options and our evaluation methodology. We also provide empirical results from a number of parallel machines, indicating the importance of tuning for each platform before making a comparison.

  19. Multiple wavelet-tree-based image coding and robust transmission

    NASA Astrophysics Data System (ADS)

    Cao, Lei; Chen, Chang Wen

    2004-10-01

    In this paper, we present techniques based on multiple wavelet-tree coding for robust image transmission. The algorithm of set partitioning in hierarchical trees (SPIHT) is a state-of-the-art technique for image compression. This variable length coding (VLC) technique, however, is extremely sensitive to channel errors. To improve the error resilience capability and in the meantime to keep the high source coding efficiency through VLC, we propose to encode each wavelet tree or a group of wavelet trees using SPIHT algorithm independently. Instead of encoding the entire image as one bitstream, multiple bitstreams are generated. Therefore, error propagation is limited within individual bitstream. Two methods based on subsampling and human visual sensitivity are proposed to group the wavelet trees. The multiple bitstreams are further protected by the rate compatible puncture convolutional (RCPC) codes. Unequal error protection are provided for both different bitstreams and different bit segments inside each bitstream. We also investigate the improvement of error resilience through error resilient entropy coding (EREC) and wavelet tree coding when channels are slightly corruptive. A simple post-processing technique is also proposed to alleviate the effect of residual errors. We demonstrate through simulations that systems with these techniques can achieve much better performance than systems transmitting a single bitstream in noisy environments.

  20. Memory Scalability and Efficiency Analysis of Parallel Codes

    SciTech Connect

    Janjusic, Tommy; Kartsaklis, Christos

    2015-01-01

    Memory scalability is an enduring problem and bottleneck that plagues many parallel codes. Parallel codes designed for High Performance Systems are typically designed over the span of several, and in some instances 10+, years. As a result, optimization practices which were appropriate for earlier systems may no longer be valid and thus require careful optimization consideration. Specifically, parallel codes whose memory footprint is a function of their scalability must be carefully considered for future exa-scale systems. In this paper we present a methodology and tool to study the memory scalability of parallel codes. Using our methodology we evaluate an application s memory footprint as a function of scalability, which we coined memory efficiency, and describe our results. In particular, using our in-house tools we can pinpoint the specific application components which contribute to the application s overall memory foot-print (application data- structures, libraries, etc.).

  1. Shot level parallelization of a seismic inversion code using PVM

    SciTech Connect

    Versteeg, R.J.; Gockenback, M.; Symes, W.W.; Kern, M.

    1994-12-31

    This paper presents experience with parallelization using PVM of DSO, a seismic inversion code developed in The Rice Inversion Project. It focuses on one aspect: trying to run efficiently on a cluster of 4 workstations. The authors use a coarse grain parallelism in which they dynamically distribute the shots over the available machines in the cluster. The modeling and migration of their code is parallelized very effectively by this strategy; they have reached a overall performance of 104 Mflops using a configuration of one manager with 3 workers, a speedup of 2.4 versus the serial version, which according to Amdahl`s law is optimal given the current design of their code. Further speedup is currently limited by the non parallelized part of their code optimization, linear algebra and i(o).

  2. The Forest Method as a New Parallel Tree Method with the Sectional Voronoi Tessellation

    NASA Astrophysics Data System (ADS)

    Yahagi, Hideki; Mori, Masao; Yoshii, Yuzuru

    1999-09-01

    We have developed a new parallel tree method which will be called the forest method hereafter. This new method uses the sectional Voronoi tessellation (SVT) for the domain decomposition. The SVT decomposes a whole space into polyhedra and allows their flat borders to move by assigning different weights. The forest method determines these weights based on the load balancing among processors by means of the overload diffusion (OLD). Moreover, since all the borders are flat, before receiving the data from other processors, each processor can collect enough data to calculate the gravity force with precision. Both the SVT and the OLD are coded in a highly vectorizable manner to accommodate on vector parallel processors. The parallel code based on the forest method with the Message Passing Interface is run on various platforms so that a wide portability is guaranteed. Extensive calculations with 15 processors of Fujitsu VPP300/16R indicate that the code can calculate the gravity force exerted on 105 particles in each second for some ideal dark halo. This code is found to enable an N-body simulation with 107 or more particles for a wide dynamic range and is therefore a very powerful tool for the study of galaxy formation and large-scale structure in the universe.

  3. CALTRANS: A parallel, deterministic, 3D neutronics code

    SciTech Connect

    Carson, L.; Ferguson, J.; Rogers, J.

    1994-04-01

    Our efforts to parallelize the deterministic solution of the neutron transport equation has culminated in a new neutronics code CALTRANS, which has full 3D capability. In this article, we describe the layout and algorithms of CALTRANS and present performance measurements of the code on a variety of platforms. Explicit implementation of the parallel algorithms of CALTRANS using both the function calls of the Parallel Virtual Machine software package (PVM 3.2) and the Meiko CS-2 tagged message passing library (based on the Intel NX/2 interface) are provided in appendices.

  4. Distributed Inference in Tree Networks Using Coding Theory

    NASA Astrophysics Data System (ADS)

    Kailkhura, Bhavya; Vempaty, Aditya; Varshney, Pramod K.

    2015-07-01

    In this paper, we consider the problem of distributed inference in tree based networks. In the framework considered in this paper, distributed nodes make a 1-bit local decision regarding a phenomenon before sending it to the fusion center (FC) via intermediate nodes. We propose the use of coding theory based techniques to solve this distributed inference problem in such structures. Data is progressively compressed as it moves towards the FC. The FC makes the global inference after receiving data from intermediate nodes. Data fusion at nodes as well as at the FC is implemented via error correcting codes. In this context, we analyze the performance for a given code matrix and also design the optimal code matrices at every level of the tree. We address the problems of distributed classification and distributed estimation separately and develop schemes to perform these tasks in tree networks. The proposed schemes are of practical significance due to their simple structure. We study the asymptotic inference performance of our schemes for two different classes of tree networks: fixed height tree networks, and fixed degree tree networks. We show that the proposed schemes are asymptotically optimal under certain conditions.

  5. Advances in Parallelization for Large Scale Oct-Tree Mesh Generation

    NASA Technical Reports Server (NTRS)

    O'Connell, Matthew; Karman, Steve L.

    2015-01-01

    Despite great advancements in the parallelization of numerical simulation codes over the last 20 years, it is still common to perform grid generation in serial. Generating large scale grids in serial often requires using special "grid generation" compute machines that can have more than ten times the memory of average machines. While some parallel mesh generation techniques have been proposed, generating very large meshes for LES or aeroacoustic simulations is still a challenging problem. An automated method for the parallel generation of very large scale off-body hierarchical meshes is presented here. This work enables large scale parallel generation of off-body meshes by using a novel combination of parallel grid generation techniques and a hybrid "top down" and "bottom up" oct-tree method. Meshes are generated using hardware commonly found in parallel compute clusters. The capability to generate very large meshes is demonstrated by the generation of off-body meshes surrounding complex aerospace geometries. Results are shown including a one billion cell mesh generated around a Predator Unmanned Aerial Vehicle geometry, which was generated on 64 processors in under 45 minutes.

  6. A parallel hashed oct-tree N-body algorithm

    SciTech Connect

    Warren, M.S.; Salmon, J.K.

    1993-03-29

    We report on an efficient adaptive N-body method which we have recently designed and implemented. The algorithm computers the forces on an arbitrary distribution of bodies in a time which scales as N log N with particle number. The accuracy of the force calculations is analytically bounded, and can be adjusted via a user defined parameter between a few percent relative accuracy, down to machine arithmetic accuracy. Instead of using pointers to indicate the topology of the tree, we identify each possible cell with a key. The mapping of keys into memory locations is achieved via a hash table. This allows us to access data in an efficient manner across multiple processors using a virtual shared memory model. Performance of the parallel program is measured on the 512 processor Intel Touchstone Delta system. We also comment on a number of wide-ranging applications which can benefit from application of this type of algorithm.

  7. Parallelization of a Monte Carlo particle transport simulation code

    NASA Astrophysics Data System (ADS)

    Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

    2010-05-01

    We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.

  8. Parallel Scaling Characteristics of Selected NERSC User ProjectCodes

    SciTech Connect

    Skinner, David; Verdier, Francesca; Anand, Harsh; Carter,Jonathan; Durst, Mark; Gerber, Richard

    2005-03-05

    This report documents parallel scaling characteristics of NERSC user project codes between Fiscal Year 2003 and the first half of Fiscal Year 2004 (Oct 2002-March 2004). The codes analyzed cover 60% of all the CPU hours delivered during that time frame on seaborg, a 6080 CPU IBM SP and the largest parallel computer at NERSC. The scale in terms of concurrency and problem size of the workload is analyzed. Drawing on batch queue logs, performance data and feedback from researchers we detail the motivations, benefits, and challenges of implementing highly parallel scientific codes on current NERSC High Performance Computing systems. An evaluation and outlook of the NERSC workload for Allocation Year 2005 is presented.

  9. A Data Parallel Multizone Navier-Stokes Code

    NASA Technical Reports Server (NTRS)

    Jespersen, Dennis C.; Levit, Creon; Kwak, Dochan (Technical Monitor)

    1995-01-01

    We have developed a data parallel multizone compressible Navier-Stokes code on the Connection Machine CM-5. The code is set up for implicit time-stepping on single or multiple structured grids. For multiple grids and geometrically complex problems, we follow the "chimera" approach, where flow data on one zone is interpolated onto another in the region of overlap. We will describe our design philosophy and give some timing results for the current code. The design choices can be summarized as: 1. finite differences on structured grids; 2. implicit time-stepping with either distributed solves or data motion and local solves; 3. sequential stepping through multiple zones with interzone data transfer via a distributed data structure. We have implemented these ideas on the CM-5 using CMF (Connection Machine Fortran), a data parallel language which combines elements of Fortran 90 and certain extensions, and which bears a strong similarity to High Performance Fortran (HPF). One interesting feature is the issue of turbulence modeling, where the architecture of a parallel machine makes the use of an algebraic turbulence model awkward, whereas models based on transport equations are more natural. We will present some performance figures for the code on the CM-5, and consider the issues involved in transitioning the code to HPF for portability to other parallel platforms.

  10. An Expert System for the Development of Efficient Parallel Code

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Chun, Robert; Jin, Hao-Qiang; Labarta, Jesus; Gimenez, Judit

    2004-01-01

    We have built the prototype of an expert system to assist the user in the development of efficient parallel code. The system was integrated into the parallel programming environment that is currently being developed at NASA Ames. The expert system interfaces to tools for automatic parallelization and performance analysis. It uses static program structure information and performance data in order to automatically determine causes of poor performance and to make suggestions for improvements. In this paper we give an overview of our programming environment, describe the prototype implementation of our expert system, and demonstrate its usefulness with several case studies.

  11. Parallelizing the MARS15 Code with MPI for shielding applications

    SciTech Connect

    Mikhail A. Kostin and Nikolai V. Mokhov

    2004-05-12

    The MARS15 Monte Carlo code capabilities to deal with time-consuming deep penetration shielding problems and other computationally tough tasks in accelerator, detector and shielding applications, have been enhanced by a parallel processing option. It has been developed, implemented and tested on the Fermilab Accelerator Division Linux cluster and network of Sun workstations. The code uses MPI. It is scalable and demonstrates good performance. The general architecture of the code, specific uses of message passing, and effects of a scheduling on the performance and fault tolerance are described.

  12. New Parallel computing framework for radiation transport codes

    SciTech Connect

    Kostin, M.A.; Mokhov, N.V.; Niita, K.; /JAERI, Tokai

    2010-09-01

    A new parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was integrated with the MARS15 code, and an effort is under way to deploy it in PHITS. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility can be used in single process calculations as well as in the parallel regime. Several checkpoint files can be merged into one thus combining results of several calculations. The framework also corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.

  13. Parallelizing a DNA simulation code for the Cray MTA-2.

    PubMed

    Bokhari, Shahid H; Glaser, Matthew A; Jordan, Harry F; Lansac, Yves; Sauer, Jon R; Van Zeghbroeck, Bart

    2002-01-01

    The Cray MTA-2 (Multithreaded Architecture) is an unusual parallel supercomputer that promises ease of use and high performance. We describe our experience on the MTA-2 with a molecular dynamics code, SIMU-MD, that we are using to simulate the translocation of DNA through a nanopore in a silicon based ultrafast sequencer. Our sequencer is constructed using standard VLSI technology and consists of a nanopore surrounded by Field Effect Transistors (FETs). We propose to use the FETs to sense variations in charge as a DNA molecule translocates through the pore and thus differentiate between the four building block nucleotides of DNA. We were able to port SIMU-MD, a serial C code, to the MTA with only a modest effort and with good performance. Our porting process needed neither a parallelism support platform nor attention to the intimate details of parallel programming and interprocessor communication, as would have been the case with more conventional supercomputers. PMID:15838145

  14. LUDWIG: A parallel Lattice-Boltzmann code for complex fluids

    NASA Astrophysics Data System (ADS)

    Desplat, Jean-Christophe; Pagonabarraga, Ignacio; Bladon, Peter

    2001-03-01

    This paper describes Ludwig, a versatile code for the simulation of Lattice-Boltzmann (LB) models in 3D on cubic lattices. In fact, Ludwig is not a single code, but a set of codes that share certain common routines, such as I/O and communications. If Ludwig is used as intended, a variety of complex fluid models with different equilibrium free energies are simple to code, so that the user may concentrate on the physics of the problem, rather than on parallel computing issues. Thus far, Ludwig's main application has been to symmetric binary fluid mixtures. We first explain the philosophy and structure of Ludwig which is argued to be a very effective way of developing large codes for academic consortia. Next we elaborate on some parallel implementation issues such as parallel I/O, and the use of MPI to achieve full portability and good efficiency on both MPP and SMP systems. Finally, we describe how to implement generic solid boundaries, and look in detail at the particular case of a symmetric binary fluid mixture near a solid wall. We present a novel scheme for the thermodynamically consistent simulation of wetting phenomena, in the presence of static and moving solid boundaries, and check its performance.

  15. Parallelization of a three-dimensional compressible transition code

    NASA Technical Reports Server (NTRS)

    Erlebacher, G.; Hussaini, M. Y.; Bokhari, Shahid H.

    1990-01-01

    The compressible, three-dimensional, time-dependent Navier-Stokes equations are solved on a 20 processor Flex/32 computer. The code is a parallel implementation of an existing code operational on the Cray-2 at NASA Ames, which performs direct simulations of the initial stages of the transition process of wall-bounded flow at supersonic Mach numbers. Spectral collocation in all three spatial directions (Fourier along the plate and Chebyshev normal to it) ensures high accuracy of the flow variables. By hiding most of the parallelism in low-level routines, the casual user is shielded from most of the nonstandard coding constructs. Speedups of 13 out of a maximum of 16 are achieved on the largest computational grids.

  16. Parallel Processing of Distributed Video Coding to Reduce Decoding Time

    NASA Astrophysics Data System (ADS)

    Tonomura, Yoshihide; Nakachi, Takayuki; Fujii, Tatsuya; Kiya, Hitoshi

    This paper proposes a parallelized DVC framework that treats each bitplane independently to reduce the decoding time. Unfortunately, simple parallelization generates inaccurate bit probabilities because additional side information is not available for the decoding of subsequent bitplanes, which degrades encoding efficiency. Our solution is an effective estimation method that can calculate the bit probability as accurately as possible by index assignment without recourse to side information. Moreover, we improve the coding performance of Rate-Adaptive LDPC (RA-LDPC), which is used in the parallelized DVC framework. This proposal selects a fitting sparse matrix for each bitplane according to the syndrome rate estimation results at the encoder side. Simulations show that our parallelization method reduces the decoding time by up to 35[%] and achieves a bit rate reduction of about 10[%].

  17. Compressed data organization for high throughput parallel entropy coding

    NASA Astrophysics Data System (ADS)

    Said, Amir; Mahfoodh, Abo-Talib; Yea, Sehoon

    2015-09-01

    The difficulty of parallelizing entropy coding is increasingly limiting the data throughputs achievable in media compression. In this work we analyze what are the fundamental limitations, using finite-state-machine models for identifying the best manner of separating tasks that can be processed independently, while minimizing compression losses. This analysis confirms previous works showing that effective parallelization is feasible only if the compressed data is organized in a proper way, which is quite different from conventional formats. The proposed new formats exploit the fact that optimal compression is not affected by the arrangement of coded bits, but it goes further in exploiting the decreasing cost of data processing and memory. Additional advantages include the ability to use, within this framework, increasingly more complex data modeling techniques, and the freedom to mix different types of coding. We confirm the parallelization effectiveness using coding simulations that run on multi-core processors, and show how throughput scales with the number of cores, and analyze the additional bit-rate overhead.

  18. Advances in Parallel Electromagnetic Codes for Accelerator Science and Development

    SciTech Connect

    Ko, Kwok; Candel, Arno; Ge, Lixin; Kabel, Andreas; Lee, Rich; Li, Zenghai; Ng, Cho; Rawat, Vineet; Schussman, Greg; Xiao, Liling; /SLAC

    2011-02-07

    Over a decade of concerted effort in code development for accelerator applications has resulted in a new set of electromagnetic codes which are based on higher-order finite elements for superior geometry fidelity and better solution accuracy. SLAC's ACE3P code suite is designed to harness the power of massively parallel computers to tackle large complex problems with the increased memory and solve them at greater speed. The US DOE supports the computational science R&D under the SciDAC project to improve the scalability of ACE3P, and provides the high performance computing resources needed for the applications. This paper summarizes the advances in the ACE3P set of codes, explains the capabilities of the modules, and presents results from selected applications covering a range of problems in accelerator science and development important to the Office of Science.

  19. Boltzmann Transport Code Update: Parallelization and Integrated Design Updates

    NASA Technical Reports Server (NTRS)

    Heinbockel, J. H.; Nealy, J. E.; DeAngelis, G.; Feldman, G. A.; Chokshi, S.

    2003-01-01

    The on going efforts at developing a web site for radiation analysis is expected to result in an increased usage of the High Charge and Energy Transport Code HZETRN. It would be nice to be able to do the requested calculations quickly and efficiently. Therefore the question arose, "Could the implementation of parallel processing speed up the calculations required?" To answer this question two modifications of the HZETRN computer code were created. The first modification selected the shield material of Al(2219) , then polyethylene and then Al(2219). The modified Fortran code was labeled 1SSTRN.F. The second modification considered the shield material of CO2 and Martian regolith. This modified Fortran code was labeled MARSTRN.F.

  20. Decoding of QOSTBC concatenates RS code using parallel interference cancellation

    NASA Astrophysics Data System (ADS)

    Yan, Zhenghang; Lu, Yilong; Ma, Maode; Yang, Yuhang

    2010-02-01

    Comparing with orthogonal space time block code (OSTBC), quasi orthogonal space time block code (QOSTBC) can achieve high transmission rate with partial diversity. In this paper, we present a QOSTBC concatenated Reed-Solomon (RS) error correction code structure. At the receiver, pairwise detection and error correction are first implemented. The decoded data are regrouped. Parallel interference cancellation (PIC) and dual orthogonal space time block code (OSTBC) maximum likelihood decoding are deployed to the regrouped data. The pure concatenated scheme is shown to have higher diversity order and have better error performance at high signal-to-noise ratio (SNR) scenario than both QOSTBC and OSTBC schemes. The PIC and dual OSTBC decoding algorithm can further obtain more than 1.3 dB gains than the pure concatenated scheme at 10-6 bit error probability.

  1. athena: Tree code for second-order correlation functions

    NASA Astrophysics Data System (ADS)

    Kilbinger, Martin; Bonnett, Christopher; Coupon, Jean

    2014-02-01

    athena is a 2d-tree code that estimates second-order correlation functions from input galaxy catalogues. These include shear-shear correlations (cosmic shear), position-shear (galaxy-galaxy lensing) and position-position (spatial angular correlation). Written in C, it includes a power-spectrum estimator implemented in Python; this script also calculates the aperture-mass dispersion. A test data set is available.

  2. Petascale electronic structure code with a new parallel eigensolver

    NASA Astrophysics Data System (ADS)

    Briggs, Emil; Lu, Wenchang; Hodak, Miroslav; Li, Yan; Kelley, Ct; Bernholc, Jerzy

    2015-03-01

    We describe recent developments within the Real Space Multigrid (RMG) electronic structure code. RMG uses real-space grids, a multigrid pre-conditioner, and subspace diagonalization to solve the Kohn-Sham equations. It is designed for use on massively parallel computers and has shown excellent scalability and performance, reaching 6.5 PFLOPS on 18k Cray compute nodes with 288k CPU cores and 18k GPUs. For large problems, the diagonalization becomes computationally dominant and a novel, highly parallel eigensolver was developed that makes efficient use of a large number of nodes. Test results for a range of problem sizes are presented, which execute up to 3.5 times faster than standard eigensolvers such as Scalapack. RMG is now an open source code, running on Linux, Windows and MacIntosh systems. It may be downloaded at .

  3. Parallelization of Finite Element Analysis Codes Using Heterogeneous Distributed Computing

    NASA Technical Reports Server (NTRS)

    Ozguner, Fusun

    1996-01-01

    Performance gains in computer design are quickly consumed as users seek to analyze larger problems to a higher degree of accuracy. Innovative computational methods, such as parallel and distributed computing, seek to multiply the power of existing hardware technology to satisfy the computational demands of large applications. In the early stages of this project, experiments were performed using two large, coarse-grained applications, CSTEM and METCAN. These applications were parallelized on an Intel iPSC/860 hypercube. It was found that the overall speedup was very low, due to large, inherently sequential code segments present in the applications. The overall execution time T(sub par), of the application is dependent on these sequential segments. If these segments make up a significant fraction of the overall code, the application will have a poor speedup measure.

  4. Parallelization of KENO-Va Monte Carlo code

    NASA Astrophysics Data System (ADS)

    Ramón, Javier; Peña, Jorge

    1995-07-01

    KENO-Va is a code integrated within the SCALE system developed by Oak Ridge that solves the transport equation through the Monte Carlo Method. It is being used at the Consejo de Seguridad Nuclear (CSN) to perform criticality calculations for fuel storage pools and shipping casks. Two parallel versions of the code: one for shared memory machines and other for distributed memory systems using the message-passing interface PVM have been generated. In both versions the neutrons of each generation are tracked in parallel. In order to preserve the reproducibility of the results in both versions, advanced seeds for random numbers were used. The CONVEX C3440 with four processors and shared memory at CSN was used to implement the shared memory version. A FDDI network of 6 HP9000/735 was employed to implement the message-passing version using proprietary PVM. The speedup obtained was 3.6 in both cases.

  5. Parallelization of the Legendre Transform for a Geodynamics Code

    NASA Astrophysics Data System (ADS)

    Lokavarapu, H. V.; Matsui, H.; Heien, E. M.

    2014-12-01

    Calypso is a geodynamo code designed to model magnetohydrodynamics of a Boussinesq fluid in a rotating spherical shell, such as the outer core of Earth. The code has been shown to scale well on computer clusters capable of computing at the order of millions of core hours. Depending on the resolution and time requirements, simulations may require weeks to years of clock time for specific target problems. A significant portion of the code execution time is spent transforming computed quantities between physical values and spherical harmonic coefficients, equivalent to a series of linear algebra operations. Intermixing C and Fortran code has opened the door to the parallel computing platform, Cuda and its associated libraries. We successfully implemented the parallelization of the scaling of the Legendre polynomials by both Schmidt Normalization coefficients, and a set of weighting coefficients; however, the expected speedup was not realized. Specifically, the original version of Calypso 1.1 computes the Legendre transform approximately four seconds faster than the Cuda-enabled modified version. By profiling the code, we determined that the time taken to transfer the data from host memory to GPU memory does not compare to the number of computations happening within the GPU. Nevertheless, by utilizing techniques such as memory coalescing, cached memory, pinned memory, dynamic parallelism, asynchronous calls, and overlapped memory transfers with computations, the likelihood of a speedup increases. Moreover, ideally the generation of the Legendre polynomial coefficients, Schmidt Normalization Coefficients, and the set of weights should not only be parallelized but be computed on-the-fly within the GPU. The end result is that we reduce the number of memory transfers from host to GPU, increase the number of parallelized computations on the GPU, and decrease the number of serial computations on the CPU. Also, the time taken to transform physical values to spherical

  6. Dependent video coding using a tree representation of pixel dependencies

    NASA Astrophysics Data System (ADS)

    Amati, Luca; Valenzise, Giuseppe; Ortega, Antonio; Tubaro, Stefano

    2011-09-01

    Motion-compensated prediction induces a chain of coding dependencies between pixels in video. In principle, an optimal selection of encoding parameters (motion vectors, quantization parameters, coding modes) should take into account the whole temporal horizon of a GOP. However, in practical coding schemes, these choices are made on a frame-by-frame basis, thus with a possible loss of performance. In this paper we describe a tree-based model for pixelwise coding dependencies: each pixel in a frame is the child of a pixel in a previous reference frame. We show that some tree structures are more favorable than others from a rate-distortion perspective, e.g., because they entail a large descendance of pixels which are well predicted from a common ancestor. In those cases, a higher quality has to be assigned to pixels at the top of such trees. We promote the creation of these structures by adding a special discount term to the conventional Lagrangian cost adopted at the encoder. The proposed model can be implemented through a double-pass encoding procedure. Specifically, we devise heuristic cost functions to drive the selection of quantization parameters and of motion vectors, which can be readily implemented into a state-of-the-art H.264/AVC encoder. Our experiments demonstrate that coding efficiency is improved for video sequences with low motion, while there are no apparent gains for more complex motion. We argue that this is due to both the presence of complex encoder features not captured by the model, and to the complexity of the source to be encoded.

  7. Fast parallel algorithms and enumeration techniques for partial k-trees

    SciTech Connect

    Narayanan, C.

    1989-01-01

    Recent research by several authors have resulted in systematic way of developing linear-time sequential algorithms for a host of problem: on a fairly general class of graphs variously known as bounded decomposable graphs, graphs of bounded treewidth, partial k-trees, etc. Partial k-trees arise in a variety of real-life applications such as network reliability, VLSI design and database systems and hence fast sequential algorithms on these graphs have been found to be desirable. The linear-time methodologies were independently developed by Bern, Lawler, and Wong ((10)), Arnborg and Proskurowski ((6)), Bodlaender ((14)), and Courcelle ((25)). Wimer ((89)) significantly extended the work of Bern, Lawler and Wong. All of these approaches share the common thread of using dynamic programming on a tree structure. In particular the methodology of Wimer uses a parse-tree as the data structure. The methodologies claim linear-time algorithms on partial k-trees for fixed k, for a number of combinatorial optimization problems given the tree structure as input. It is known that obtaining the tree structure is NP-hard. This dissertation investigates three important classes of problems: (1) Developing parallel algorithms for constructing a k-tree embedding, finding a tree decomposition and most notably obtaining a parse-tree for a partial k-tree. (2) Developing parallel algorithms for parse-tree computations, testing isomorphism of k-trees, and finding a 2-tree embedding of a cactus. (3) Obtaining techniques for counting vertex/edge subsets satisfying a certain property in some classes of partial k-trees. The parallel algorithms the author has developed are in class NC and are either new or improve upon the existing results of Bodlaender (13). The difference equations he has obtained for counting certain sub-graphs are not known in the literature so far.

  8. Scalability study of parallel spatial direct numerical simulation code on IBM SP1 parallel supercomputer

    NASA Technical Reports Server (NTRS)

    Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad

    1994-01-01

    The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.

  9. Composing Data Parallel Code for a SPARQL Graph Engine

    SciTech Connect

    Castellana, Vito G.; Tumeo, Antonino; Villa, Oreste; Haglin, David J.; Feo, John

    2013-09-08

    Big data analytics process large amount of data to extract knowledge from them. Semantic databases are big data applications that adopt the Resource Description Framework (RDF) to structure metadata through a graph-based representation. The graph based representation provides several benefits, such as the possibility to perform in memory processing with large amounts of parallelism. SPARQL is a language used to perform queries on RDF-structured data through graph matching. In this paper we present a tool that automatically translates SPARQL queries to parallel graph crawling and graph matching operations. The tool also supports complex SPARQL constructs, which requires more than basic graph matching for their implementation. The tool generates parallel code annotated with OpenMP pragmas for x86 Shared-memory Multiprocessors (SMPs). With respect to commercial database systems such as Virtuoso, our approach reduces memory occupation due to join operations and provides higher performance. We show the scaling of the automatically generated graph-matching code on a 48-core SMP.

  10. Performance of a parallel thermal-hydraulics code TEMPEST

    SciTech Connect

    Fann, G.I.; Trent, D.S.

    1996-11-01

    The authors describe the parallelization of the Tempest thermal-hydraulics code. The serial version of this code is used for production quality 3-D thermal-hydraulics simulations. Good speedup was obtained with a parallel diagonally preconditioned BiCGStab non-symmetric linear solver, using a spatial domain decomposition approach for the semi-iterative pressure-based and mass-conserved algorithm. The test case used here to illustrate the performance of the BiCGStab solver is a 3-D natural convection problem modeled using finite volume discretization in cylindrical coordinates. The BiCGStab solver replaced the LSOR-ADI method for solving the pressure equation in TEMPEST. BiCGStab also solves the coupled thermal energy equation. Scaling performance of 3 problem sizes (221220 nodes, 358120 nodes, and 701220 nodes) are presented. These problems were run on 2 different parallel machines: IBM-SP and SGI PowerChallenge. The largest problem attains a speedup of 68 on an 128 processor IBM-SP. In real terms, this is over 34 times faster than the fastest serial production time using the LSOR-ADI solver.

  11. 1 CFR 21.23 - Parallel citations of Code and Federal Register.

    Code of Federal Regulations, 2011 CFR

    2011-01-01

    ... § 21.23 Parallel citations of Code and Federal Register. For parallel reference, the Code of Federal Regulations and the Federal Register may be cited in the following forms, as appropriate: ___ CFR ___ (___ FR... 1 General Provisions 1 2011-01-01 2011-01-01 false Parallel citations of Code and Federal...

  12. 1 CFR 21.23 - Parallel citations of Code and Federal Register.

    Code of Federal Regulations, 2014 CFR

    2014-01-01

    ... § 21.23 Parallel citations of Code and Federal Register. For parallel reference, the Code of Federal Regulations and the Federal Register may be cited in the following forms, as appropriate: ___ CFR ___ (___ FR... 1 General Provisions 1 2014-01-01 2012-01-01 true Parallel citations of Code and Federal...

  13. 1 CFR 21.23 - Parallel citations of Code and Federal Register.

    Code of Federal Regulations, 2010 CFR

    2010-01-01

    ... § 21.23 Parallel citations of Code and Federal Register. For parallel reference, the Code of Federal Regulations and the Federal Register may be cited in the following forms, as appropriate: ___ CFR ___ (___ FR... 1 General Provisions 1 2010-01-01 2010-01-01 false Parallel citations of Code and Federal...

  14. 1 CFR 21.23 - Parallel citations of Code and Federal Register.

    Code of Federal Regulations, 2012 CFR

    2012-01-01

    ... § 21.23 Parallel citations of Code and Federal Register. For parallel reference, the Code of Federal Regulations and the Federal Register may be cited in the following forms, as appropriate: ___ CFR ___ (___ FR... 1 General Provisions 1 2012-01-01 2012-01-01 false Parallel citations of Code and Federal...

  15. 1 CFR 21.23 - Parallel citations of Code and Federal Register.

    Code of Federal Regulations, 2013 CFR

    2013-01-01

    ... § 21.23 Parallel citations of Code and Federal Register. For parallel reference, the Code of Federal Regulations and the Federal Register may be cited in the following forms, as appropriate: ___ CFR ___ (___ FR... 1 General Provisions 1 2013-01-01 2012-01-01 true Parallel citations of Code and Federal...

  16. A GPU accelerated Barnes-Hut tree code for FLASH4

    NASA Astrophysics Data System (ADS)

    Lukat, Gunther; Banerjee, Robi

    2016-05-01

    We present a GPU accelerated CUDA-C implementation of the Barnes Hut (BH) tree code for calculating the gravitational potential on octree adaptive meshes. The tree code algorithm is implemented within the FLASH4 adaptive mesh refinement (AMR) code framework and therefore fully MPI parallel. We describe the algorithm and present test results that demonstrate its accuracy and performance in comparison to the algorithms available in the current FLASH4 version. We use a MacLaurin spheroid to test the accuracy of our new implementation and use spherical, collapsing cloud cores with effective AMR to carry out performance tests also in comparison with previous gravity solvers. Depending on the setup and the GPU/CPU ratio, we find a speedup for the gravity unit of at least a factor of 3 and up to 60 in comparison to the gravity solvers implemented in the FLASH4 code. We find an overall speedup factor for full simulations of at least factor 1.6 up to a factor of 10.

  17. Development of parallel DEM for the open source code MFIX

    SciTech Connect

    Gopalakrishnan, Pradeep; Tafti, Danesh

    2013-02-01

    The paper presents the development of a parallel Discrete Element Method (DEM) solver for the open source code, Multiphase Flow with Interphase eXchange (MFIX) based on the domain decomposition method. The performance of the code was evaluated by simulating a bubbling fluidized bed with 2.5 million particles. The DEM solver shows strong scalability up to 256 processors with an efficiency of 81%. Further, to analyze weak scaling, the static height of the fluidized bed was increased to hold 5 and 10 million particles. The results show that global communication cost increases with problem size while the computational cost remains constant. Further, the effects of static bed height on the bubble hydrodynamics and mixing characteristics are analyzed.

  18. Development of a massively parallel parachute performance prediction code

    SciTech Connect

    Peterson, C.W.; Strickland, J.H.; Wolfe, W.P.; Sundberg, W.D.; McBride, D.D.

    1997-04-01

    The Department of Energy has given Sandia full responsibility for the complete life cycle (cradle to grave) of all nuclear weapon parachutes. Sandia National Laboratories is initiating development of a complete numerical simulation of parachute performance, beginning with parachute deployment and continuing through inflation and steady state descent. The purpose of the parachute performance code is to predict the performance of stockpile weapon parachutes as these parachutes continue to age well beyond their intended service life. A new massively parallel computer will provide unprecedented speed and memory for solving this complex problem, and new software will be written to treat the coupled fluid, structure and trajectory calculations as part of a single code. Verification and validation experiments have been proposed to provide the necessary confidence in the computations.

  19. Development and Application of a Parallel MHD code

    NASA Astrophysics Data System (ADS)

    Peterkin, , Jr.

    1997-08-01

    Over the past few years, we (In collaboration with S. Colella, M. H. Frese, D. E. Lileikis and U. Shumlak.) have built a general purpose, portable, scalable three-dimensional finite volume magnetohydrodynamic code, called uc(mach3,) based on an arbitrary Lagrangian-Eulerian fluid algorithm to simulate time-dependent MHD phenomena for real materials. The physical domain of integration on which uc(mach3) works is decomposed into a patchwork of rectangular logical blocks that represent hexadedral physical subdomains. This block domain decomposition technique gives us a natural framework in which to implement coarse parallelization via message passing with the single program, multiple data (SPMD) model. Portability is achieved by using a parallel library that is separate from the physics code. At present, we are using the Message Passing Interface (MPI) because it is one of the industry standards, and because its Derived Data Type supports the sending and receiving of data with an arbitrary stride in memory. This feature is consistent with the manner in which boundary data is exchanged between connected block domains via ghost cells in the serial version of uc(mach3.) In this talk, we discuss the details of the uc(mach3) algorithm. In addition, we present results from some simple test problems as well as from complex 3-D time-dependent simulations including magnetoplasmadynamic thrusters, fast z-pinches, and magnetic flux compression generators.

  20. GPU-based parallel clustered differential pulse code modulation

    NASA Astrophysics Data System (ADS)

    Wu, Jiaji; Li, Wenze; Kong, Wanqiu

    2015-10-01

    Hyperspectral remote sensing technology is widely used in marine remote sensing, geological exploration, atmospheric and environmental remote sensing. Owing to the rapid development of hyperspectral remote sensing technology, resolution of hyperspectral image has got a huge boost. Thus data size of hyperspectral image is becoming larger. In order to reduce their saving and transmission cost, lossless compression for hyperspectral image has become an important research topic. In recent years, large numbers of algorithms have been proposed to reduce the redundancy between different spectra. Among of them, the most classical and expansible algorithm is the Clustered Differential Pulse Code Modulation (CDPCM) algorithm. This algorithm contains three parts: first clusters all spectral lines, then trains linear predictors for each band. Secondly, use these predictors to predict pixels, and get the residual image by subtraction between original image and predicted image. Finally, encode the residual image. However, the process of calculating predictors is timecosting. In order to improve the processing speed, we propose a parallel C-DPCM based on CUDA (Compute Unified Device Architecture) with GPU. Recently, general-purpose computing based on GPUs has been greatly developed. The capacity of GPU improves rapidly by increasing the number of processing units and storage control units. CUDA is a parallel computing platform and programming model created by NVIDIA. It gives developers direct access to the virtual instruction set and memory of the parallel computational elements in GPUs. Our core idea is to achieve the calculation of predictors in parallel. By respectively adopting global memory, shared memory and register memory, we finally get a decent speedup.

  1. Improved video coding efficiency exploiting tree-based pixelwise coding dependencies

    NASA Astrophysics Data System (ADS)

    Valenzise, Giuseppe; Ortega, Antonio

    2010-01-01

    In a conventional hybrid video coding scheme, the choice of encoding parameters (motion vectors, quantization parameters, etc.) is carried out by optimizing frame by frame the output distortion for a given rate budget. While it is well known that motion estimation naturally induces a chain of dependencies among pixels, this is usually not explicitly exploited in the coding process in order to improve overall coding efficiency. Specifically, when considering a group of pictures with an IPPP... structure, each pixel of the first frame can be thought of as the root of a tree whose children are the pixels of the subsequent frames predicted by it. In this work, we demonstrate the advantages of such a representation by showing that, in some situations, the best motion vector is not the one that minimizes the energy of the prediction residual, but the one that produces a better tree structure, e.g., one that can be globally more favorable from a rate-distortion perspective. In this new structure, pixel with a larger descendance are allocated extra rate to produce higher quality predictors. As a proof of concept, we verify this assertion by assigning the quantization parameter in a video sequence in such a way that pixels with a larger number of descendants are coded with a higher quality. In this way we are able to improve RD performance by nearly 1 dB. Our preliminary results suggest that a deeper understanding of the temporal dependencies can potentially lead to substantial gains in coding performance.

  2. Recent developments in DYNSUB: New models, code optimization and parallelization

    SciTech Connect

    Daeubler, M.; Trost, N.; Jimenez, J.; Sanchez, V.

    2013-07-01

    DYNSUB is a high-fidelity coupled code system consisting of the reactor simulator DYN3D and the sub-channel code SUBCHANFLOW. It describes nuclear reactor core behavior with pin-by-pin resolution for both steady-state and transient scenarios. In the course of the coupled code system's active development, super-homogenization (SPH) and generalized equivalence theory (GET) discontinuity factors may be computed with and employed in DYNSUB to compensate pin level homogenization errors. Because of the largely increased numerical problem size for pin-by-pin simulations, DYNSUB has bene fitted from HPC techniques to improve its numerical performance. DYNSUB's coupling scheme has been structurally revised. Computational bottlenecks have been identified and parallelized for shared memory systems using OpenMP. Comparing the elapsed time for simulating a PWR core with one-eighth symmetry under hot zero power conditions applying the original and the optimized DYNSUB using 8 cores, overall speed up factors greater than 10 have been observed. The corresponding reduction in execution time enables a routine application of DYNSUB to study pin level safety parameters for engineering sized cases in a scientific environment. (authors)

  3. Time-Dependent, Parallel Neutral Particle Transport Code System.

    Energy Science and Technology Software Center (ESTSC)

    2009-09-10

    Version 00 PARTISN (PARallel, TIme-Dependent SN) is the evolutionary successor to CCC-547/DANTSYS. The PARTISN code package is a modular computer program package designed to solve the time-independent or dependent multigroup discrete ordinates form of the Boltzmann transport equation in several different geometries. The modular construction of the package separates the input processing, the transport equation solving, and the post processing (or edit) functions into distinct code modules: the Input Module, the Solver Module, and themore » Edit Module, respectively. PARTISN is the evolutionary successor to the DANTSYSTM code system package. The Input and Edit Modules in PARTISN are very similar to those in DANTSYS. However, unlike DANTSYS, the Solver Module in PARTISN contains one, two, and three-dimensional solvers in a single module. In addition to the diamond-differencing method, the Solver Module also has Adaptive Weighted Diamond-Differencing (AWDD), Linear Discontinuous (LD), and Exponential Discontinuous (ED) spatial differencing methods. The spatial mesh may consist of either a standard orthogonal mesh or a block adaptive orthogonal mesh. The Solver Module may be run in parallel for two and three dimensional problems. One can now run 1-D problems in parallel using Energy Domain Decomposition (triggered by Block 5 input keyword npeg>0). EDD can also be used in 2-D/3-D with or without our standard Spatial Domain Decomposition. Both the static (fixed source or eigenvalue) and time-dependent forms of the transport equation are solved in forward or adjoint mode. In addition, PARTISN now has a probabilistic mode for Probability of Initiation (static) and Probability of Survival (dynamic) calculations. Vacuum, reflective, periodic, white, or inhomogeneous boundary conditions are solved. General anisotropic scattering and inhomogeneous sources are permitted. PARTISN solves the transport equation on orthogonal (single level or block-structured AMR) grids in 1-D

  4. An Optimal Parallel Algorithm for Constructing a Spanning Tree on Circular Permutation Graphs

    NASA Astrophysics Data System (ADS)

    Honma, Hirotoshi; Honma, Saki; Masuyama, Shigeru

    The spanning tree problem is to find a tree that connects all the vertices of G. This problem has many applications, such as electric power systems, computer network design and circuit analysis. Klein and Stein demonstrated that a spanning tree can be found in O(log n) time with O(n + m) processors on the CRCW PRAM. In general, it is known that more efficient parallel algorithms can be developed by restricting classes of graphs. Circular permutation graphs properly contain the set of permutation graphs as a subclass and are first introduced by Rotem and Urrutia. They provided O(n2.376) time recognition algorithm. Circular permutation graphs and their models find several applications in VLSI layout. In this paper, we propose an optimal parallel algorithm for constructing a spanning tree on circular permutation graphs. It runs in O(log n) time with O(n/ log n) processors on the EREW PRAM.

  5. JOSEPHINE: A parallel SPH code for free-surface flows

    NASA Astrophysics Data System (ADS)

    Cherfils, J. M.; Pinon, G.; Rivoalen, E.

    2012-07-01

    JOSEPHINE is a parallel Smoothed Particle Hydrodynamics program, designed to solve unsteady free-surface flows. The adopted numerical scheme is efficient and has been validated on a first case, where a liquid drop is stretched over the time. Boundary conditions can also be modelled, as it is demonstrated in a second case: the collapse of a water column. Results show good agreement with both reference numerical solutions and experiments. The use of parallelism allows significant reduction of the computational time, even more with large number of particles. JOSEPHINE has been written so that any untrained developers can handle it easily and implement new features. Catalogue identifier: AELV_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AELV_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 5139 No. of bytes in distributed program, including test data, etc.: 22 833 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors, tested on a Xeon W3520, 2.67 GHz. Operating system: Any system with a Fortran 90 compiler and MPI, tested on Debian Linux. Has the code been vectorised or parallelised?: The code has been parallelised but has not been explicitly vectorised. RAM: Dependent upon the number of particles. Classification: 4.12 Nature of problem:JOSEPHINE is designed to solve unsteady incompressible flows with a free-surface and large deformations. Solution method:JOSEPHINE is an implementation of Smoothed Particle Hydrodynamics. SPH is a Lagrangian mesh free particle method, thus, no explicit tracking procedure is required to catch the free surface. Incompressibility is satisfied using a weakly compressible model. Boundary conditions at walls are enforced by means of the ghost particles

  6. Mitigation of parallel RF potentials using TOPICA code

    NASA Astrophysics Data System (ADS)

    Milanesio, Daniele; Maggiora, Riccardo

    2011-10-01

    The design of an Ion Cyclotron (IC) launcher is not only driven by its coupling properties, but also by its capability of maintaining low parallel electric fields, in order first to provide good power transfer to plasma and then to reduce the impurities production. However, due to the impossibility to verify the antenna performances before the starting of the operations, advanced numerical simulation tools are the only alternative to carry out reliable design. With this in mind, it should be clear that the adoption of a code, such as TOPICA, able to precisely take into account a realistic antenna geometry and an accurate plasma description, is extremely important to achieve these goals. Because of the recently introduced features that allow to compute the electric field and RF potential distribution everywhere inside the antenna enclosure and in the plasma column, the TOPICA code appears to be the best candidate in helping to understand which elements may have a not negligible impact on the antenna design. The present work reports a detailed analysis of antenna concepts and their further optimization in order to mitigate RF potentials; the evaluation of the effect of different plasma loadings is included as well.

  7. Use of Hilbert Curves in Parallelized CUDA code: Interaction of Interstellar Atoms with the Heliosphere

    NASA Astrophysics Data System (ADS)

    Destefano, Anthony; Heerikhuisen, Jacob

    2015-04-01

    Fully 3D particle simulations can be a computationally and memory expensive task, especially when high resolution grid cells are required. The problem becomes further complicated when parallelization is needed. In this work we focus on computational methods to solve these difficulties. Hilbert curves are used to map the 3D particle space to the 1D contiguous memory space. This method of organization allows for minimized cache misses on the GPU as well as a sorted structure that is equivalent to an octal tree data structure. This type of sorted structure is attractive for uses in adaptive mesh implementations due to the logarithm search time. Implementations using the Message Passing Interface (MPI) library and NVIDIA's parallel computing platform CUDA will be compared, as MPI is commonly used on server nodes with many CPU's. We will also compare static grid structures with those of adaptive mesh structures. The physical test bed will be simulating heavy interstellar atoms interacting with a background plasma, the heliosphere, simulated from fully consistent coupled MHD/kinetic particle code. It is known that charge exchange is an important factor in space plasmas, specifically it modifies the structure of the heliosphere itself. We would like to thank the Alabama Supercomputer Authority for the use of their computational resources.

  8. Description of a parallel, 3D, finite element, hydrodynamics-diffusion code

    SciTech Connect

    Milovich, J L; Prasad, M K; Shestakov, A I

    1999-04-11

    We describe a parallel, 3D, unstructured grid finite element, hydrodynamic diffusion code for inertial confinement fusion (ICF) applications and the ancillary software used to run it. The code system is divided into two entities, a controller and a stand-alone physics code. The code system may reside on different computers; the controller on the user's workstation and the physics code on a supercomputer. The physics code is composed of separate hydrodynamic, equation-of-state, laser energy deposition, heat conduction, and radiation transport packages and is parallelized for distributed memory architectures. For parallelization, a SPMD model is adopted; the domain is decomposed into a disjoint collection of subdomains, one per processing element (PE). The PEs communicate using MPI. The code is used to simulate the hydrodynamic implosion of a spherical bubble.

  9. Implementation of a 3D mixing layer code on parallel computers

    NASA Technical Reports Server (NTRS)

    Roe, K.; Thakur, R.; Dang, T.; Bogucz, E.

    1995-01-01

    This paper summarizes our progress and experience in the development of a Computational-Fluid-Dynamics code on parallel computers to simulate three-dimensional spatially-developing mixing layers. In this initial study, the three-dimensional time-dependent Euler equations are solved using a finite-volume explicit time-marching algorithm. The code was first programmed in Fortran 77 for sequential computers. The code was then converted for use on parallel computers using the conventional message-passing technique, while we have not been able to compile the code with the present version of HPF compilers.

  10. The Study of Address Tree Coding Based on the Maximum Matching Algorithm in Courier Business

    NASA Astrophysics Data System (ADS)

    Zhou, Shumin; Tang, Bin; Li, Wen

    As an important component of EMS monitoring system, address is different from user name with great uncertainty because there are many ways to represent it. Therefore, address standardization is a difficult task. Address tree coding has been trying to resolve that issue for many years. Zip code, as its most widely used algorithm, can only subdivide the address down to a designated post office, not the recipients' address. This problem needs artificial identification method to be accurately delivered. This paper puts forward a new encoding algorithm of the address tree - the maximum matching algorithm to solve the problem. This algorithm combines the characteristics of the address tree and the best matching theory, and brings in the associated layers of tree nodes to improve the matching efficiency. Taking the variability of address into account, the thesaurus of address tree should be updated timely by increasing new nodes automatically through intelligent tools.

  11. Punctured Parallel and Serial Concatenated Convolutional Codes for BPSK/QPSK Channels

    NASA Technical Reports Server (NTRS)

    Acikel, Omer Fatih

    1999-01-01

    As available bandwidth for communication applications becomes scarce, bandwidth-efficient modulation and coding schemes become ever important. Since their discovery in 1993, turbo codes (parallel concatenated convolutional codes) have been the center of the attention in the coding community because of their bit error rate performance near the Shannon limit. Serial concatenated convolutional codes have also been shown to be as powerful as turbo codes. In this dissertation, we introduce algorithms for designing bandwidth-efficient rate r = k/(k + 1),k = 2, 3,..., 16, parallel and rate 3/4, 7/8, and 15/16 serial concatenated convolutional codes via puncturing for BPSK/QPSK (Binary Phase Shift Keying/Quadrature Phase Shift Keying) channels. Both parallel and serial concatenated convolutional codes have initially, steep bit error rate versus signal-to-noise ratio slope (called the -"cliff region"). However, this steep slope changes to a moderate slope with increasing signal-to-noise ratio, where the slope is characterized by the weight spectrum of the code. The region after the cliff region is called the "error rate floor" which dominates the behavior of these codes in moderate to high signal-to-noise ratios. Our goal is to design high rate parallel and serial concatenated convolutional codes while minimizing the error rate floor effect. The design algorithm includes an interleaver enhancement procedure and finds the polynomial sets (only for parallel concatenated convolutional codes) and the puncturing schemes that achieve the lowest bit error rate performance around the floor for the code rates of interest.

  12. Bit-parallel ASCII code artificial numeric keypad

    SciTech Connect

    Hale, G.M.

    1981-03-01

    Seven integrated circuits and a voltage regulator are combined with twelve reed relays to allow the ASCII encoded numerals 0 through 9 and characters ''.'' and R or S to momentarily close switches to an applications device, simulating keypad switch closures. This invention may be used as a PARALLEL TLL (Transistor Transistor Logic) data acqusition interface to a standard Hewlett-Packard HP-97 Calculator modified with a cable.

  13. Efficient VLSI parallel algorithm for Delaunay triangulation on orthogonal tree network in two and three dimensions

    SciTech Connect

    Saena, S.; Bhatt, P.C.P.; Prasad, V.C. )

    1990-03-01

    In this paper, a parallel algorithm for two- and three-dimensional Delaunay triangulation on an orthogonal tree network is described. The worst case time complexity of this algorithm is O(log {sup 2} N) in two dimensions and O(m {sup 1/2} log N) in three dimensions with N input points and m as the number of tetrahedra in tiangulation. The AT {sup 2} VLSI complexity on Thompson's logarithmic delay model is O(N {sup 2} log {sup 6} N) in two dimensions and O(m {sup 2} N log {sup 4} N) in three dimensions.

  14. Fully Parallel Electrical Impedance Tomography Using Code Division Multiplexing.

    PubMed

    Tsoeu, M S; Inggs, M R

    2016-06-01

    Electrical Impedance Tomography (EIT) has been dominated by the use of Time Division Multiplexing (TDM) and Frequency Division Multiplexing (FDM) as methods of achieving orthogonal injection of excitation signals. Code Division Multiplexing (CDM), presented in this paper is an alternative that eliminates temporal data inconsistencies of TDM for fast changing systems. Furthermore, this approach eliminates data inconsistencies that arise in FDM when frequency bands of current injecting electrodes are chosen over frequencies that have large changes in the imaged object's impedance. To the authors knowledge no fully functional wideband system or simulation platform using simultaneous injection of Gold codes currents has been reported. In this paper, we formulate, simulate and develop a fully functional pseudo-random (Gold) code driven EIT system with 15 excitation currents and 16 separate voltage measurement electrodes. In the work we verify the use of CDM as a multiplexing modality in simultaneous injection EIT, using a prototype system with an overall bandwidth of 15 kHz, and attainable speed of 462 frames/s using codes with a period of 31 chips. Simulations and experiments are performed using the Electrical Impedance and Diffuse Optics Reconstruction Software (EIDORS). We also propose the use of image processing on reconstructed images to establish their quality quantitatively without access to raw reconstruction data. The results of this study show that CDM can be successfully used in EIT, and gives results of similar visual quality to TDM and FDM. Achieved performance shows average position error of 3.5% and size error of 6.2%. PMID:26731774

  15. On the Reversibility of Parallel Insertion, and Its Relation to Comma Codes

    NASA Astrophysics Data System (ADS)

    Cui, Bo; Kari, Lila; Seki, Shinnosuke

    This paper studies conditions under which the operation of parallel insertion can be reversed by parallel deletion, i.e., when does the equality (L_1 Leftarrow L_2) Rightarrow L_2 = L_1 hold for languages L 1 and L 2. We obtain a complete characterization of the solutions in the special case when both languages involved are singleton words. We also define comma codes, a family of codes with the property that, if L 2 is a comma code, then the above equation holds for any language L_1 subseteq {Σ}^*. Lastly, we generalize the notion of comma codes to that of comma intercodes of index m. Besides several properties, we prove that the families of comma intercodes of index m form an infinite proper inclusion hierarchy, the first element which is a subset of the family of infix codes, and the last element of which is a subset of the family of bifix codes.

  16. Virtual Simulator: An infrastructure for design and performance-prediction of massively parallel codes

    NASA Astrophysics Data System (ADS)

    Perumalla, K.; Fujimoto, R.; Pande, S.; Karimabadi, H.; Driscoll, J.; Omelchenko, Y.

    2005-12-01

    Large parallel/distributed scientific simulations are very complex, and their dynamic behavior is hard to predict. Efficient development of massively parallel codes remains a computational challenge. For example, almost none of the kinetic codes in use in space physics today have dynamic load balancing capability. Here we present a new infrastructure for design and prediction of parallel codes. Performance prediction is useful to analyze, understand and experiment with different partitioning schemes, multiple modeling alternatives and so on, without having to run the application on supercomputers. Instrumentation of the model (with least perturbance to performance) is useful to glean key metrics and understand application-level behavior. Unfortunately, traditional approaches to virtual execution and instrumentation are limited by either slow execution speed or low resolution or both. We present a new framework that provides a high-resolution framework that provides a virtual CPU abstraction (with a full thread context per CPU), yet scales to thousands of virtual CPUs. The tool, called PDES2, presents different levels of modeling interfaces, from general purpose parallel simulations to parallel grid-based particle-in-cell (PIC) codes. The tool itself runs on multiple processors in order to accommodate the high-resolution by distributing the virtual execution across processors. Validation experiments of PIC models in the framework using a 1-D hybrid shock application show close agreement of results from virtual executions with results from actual supercomputer runs. The utility of this tool is further illustrated through an application to a parallel global hybrid code.

  17. Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

    NASA Technical Reports Server (NTRS)

    OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

    1998-01-01

    This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).

  18. Load-balancing techniques for a parallel electromagnetic particle-in-cell code

    SciTech Connect

    PLIMPTON,STEVEN J.; SEIDEL,DAVID B.; PASIK,MICHAEL F.; COATS,REBECCA S.

    2000-01-01

    QUICKSILVER is a 3-d electromagnetic particle-in-cell simulation code developed and used at Sandia to model relativistic charged particle transport. It models the time-response of electromagnetic fields and low-density-plasmas in a self-consistent manner: the fields push the plasma particles and the plasma current modifies the fields. Through an LDRD project a new parallel version of QUICKSILVER was created to enable large-scale plasma simulations to be run on massively-parallel distributed-memory supercomputers with thousands of processors, such as the Intel Tflops and DEC CPlant machines at Sandia. The new parallel code implements nearly all the features of the original serial QUICKSILVER and can be run on any platform which supports the message-passing interface (MPI) standard as well as on single-processor workstations. This report describes basic strategies useful for parallelizing and load-balancing particle-in-cell codes, outlines the parallel algorithms used in this implementation, and provides a summary of the modifications made to QUICKSILVER. It also highlights a series of benchmark simulations which have been run with the new code that illustrate its performance and parallel efficiency. These calculations have up to a billion grid cells and particles and were run on thousands of processors. This report also serves as a user manual for people wishing to run parallel QUICKSILVER.

  19. Codes for a priority queue on a parallel data bus. [Deep Space Network

    NASA Technical Reports Server (NTRS)

    Wallis, D. E.; Taylor, H.

    1979-01-01

    Some codes for arbitration of priorities among subsystem computers or peripheral device controllers connected to a parallel data bus are described. At arbitration time, several subsystems present wire-OR, parallel code words to the bus, and the central computer can identify the subsystem of highest priority and determine which of two or more transmission services the subsystem requires. A mathematical discussion of the optimality of the codes with regard to the number of subsystems that may participate in the scheme for a given number of wires is presented along with the number of services that each subsystem may request.

  20. ANNarchy: a code generation approach to neural simulations on parallel hardware

    PubMed Central

    Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.

    2015-01-01

    Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957

  1. ANNarchy: a code generation approach to neural simulations on parallel hardware.

    PubMed

    Vitay, Julien; Dinkelbach, Helge Ü; Hamker, Fred H

    2015-01-01

    Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957

  2. Fast parallel implementation of multidimensional data-domain FORTRAN codes on distributed-memory processor arrays

    NASA Astrophysics Data System (ADS)

    Reale, F.; Barbera, M.; Sciortino, S.

    1992-11-01

    We illustrate a general and straightforward approach to develop FORTRAN parallel two-dimensional data-domain applications on distributed-memory systems, such as those based on transputers. We have aimed at achieving flexibility for different processor topologies and processor numbers, non-homogeneous processor configurations and coarse load-balancing. We have assumed a master-slave architecture as basic programming model in the framework of a domain decomposition approach. After developing a library of high-level general network and communication routines, based on low-level system-dependent libraries, we have used it to parallelize some specific applications: an elementary 2-D code, useful as a pattern and guide for other more complex applications, and a 2-D hydrodynamic code for astrophysical studies. Code parallelization is achieved by splitting the original code into two independent codes, one for the master and the other for the slaves, and then by adding coordinated calls to network setting and message-passing routines into the programs. The parallel applications have been implemented on a Meiko Computing Surface hosted by a SUN 4 workstation and running CSTools software package. After the basic network and communication routines were developed, the task of parallelizing the 2-D hydrodynamic code took approximately 12 man hours. The parallel efficiency of the code ranges between 98% and 58% on arrays between 2 and 20 T800 transputers, on a relatively small computational mesh (≈3000 cells). Arrays consisting of a limited number of faster Intel i860 processors achieve a high parallel efficiency on large computational grids (> 10000 grid points) with performances in the class of minisupercomputers.

  3. A parallelization approach to the COBRA-TF thermal-hydraulic subchannel code

    NASA Astrophysics Data System (ADS)

    Ramos, Enrique; Abarca, Agustín; Roman, Jose E.; Miró, Rafael

    2014-06-01

    In order to reduce the response time when simulating large reactors in detail, we have developed a parallel version of the thermal-hydraulic subchannel code COBRA-TF, with standard message passing technology (MPI). The parallelization is oriented to reactor cells, so it is best suited for models consisting of many cells. The generation of the Jacobian is parallelized, in such a way that each processor is in charge of generating the data associated to a subset of cells. Also, the solution of the linear system of equations is done in parallel, using the PETSc toolkit.

  4. Object-Oriented Parallel Particle-in-Cell Code for Beam Dynamics Simulation in Linear Accelerators

    SciTech Connect

    Qiang, J.; Ryne, R.D.; Habib, S.; Decky, V.

    1999-11-13

    In this paper, we present an object-oriented three-dimensional parallel particle-in-cell code for beam dynamics simulation in linear accelerators. A two-dimensional parallel domain decomposition approach is employed within a message passing programming paradigm along with a dynamic load balancing. Implementing object-oriented software design provides the code with better maintainability, reusability, and extensibility compared with conventional structure based code. This also helps to encapsulate the details of communications syntax. Performance tests on SGI/Cray T3E-900 and SGI Origin 2000 machines show good scalability of the object-oriented code. Some important features of this code also include employing symplectic integration with linear maps of external focusing elements and using z as the independent variable, typical in accelerators. A successful application was done to simulate beam transport through three superconducting sections in the APT linac design.

  5. Parallelization of the MAAP-A code neutronics/thermal hydraulics coupling

    SciTech Connect

    Froehle, P.H.; Wei, T.Y.C.; Weber, D.P.; Henry, R.E.

    1998-12-31

    A major new feature, one-dimensional space-time kinetics, has been added to a developmental version of the MAAP code through the introduction of the DIF3D-K module. This code is referred to as MAAP-A. To reduce the overall job time required, a capability has been provided to run the MAAP-A code in parallel. The parallel version of MAAP-A utilizes two machines running in parallel, with the DIF3D-K module executing on one machine and the rest of the MAAP-A code executing on the other machine. Timing results obtained during the development of the capability indicate that reductions in time of 30--40% are possible. The parallel version can be run on two SPARC 20 (SUN OS 5.5) workstations connected through the ethernet. MPI (Message Passing Interface standard) needs to be implemented on the machines. If necessary the parallel version can also be run on only one machine. The results obtained running in this one-machine mode identically match the results obtained from the serial version of the code.

  6. Evaluating In-Clique and Topological Parallelism Strategies for Junction Tree-Based Bayesian Inference Algorithm on the Cray XMT

    SciTech Connect

    Chin, George; Choudhury, Sutanay; Kangas, Lars J.; McFarlane, Sally A.; Marquez, Andres

    2011-09-01

    Long viewed as a strong statistical inference technique, Bayesian networks have emerged to be an important class of applications for high-performance computing. We have applied an architecture-conscious approach to parallelizing the Lauritzen-Spiegelhalter Junction Tree algorithm for exact inferencing in Bayesian networks. In optimizing the Junction Tree algorithm, we have implemented both in-clique and topological parallelism strategies to best leverage the fine-grained synchronization and massive-scale multithreading of the Cray XMT architecture. Two topological techniques were developed to parallelize the evidence propagation process through the Bayesian network. One technique involves performing intelligent scheduling of junction tree nodes based on its topology and relative size. The second technique involves decomposing the junction tree into a much finer tree-like representation to offer much more opportunities for parallelism. We evaluate these optimizations on five different Bayesian networks and report our findings and observations. Another important contribution of this paper is to demonstrate the application of massive-scale multithreading for load balancing and use of implicit parallelism-based compiler optimizations in designing scalable inferencing algorithms.

  7. Pelegant : a parallel accelerator simulation code for electron generation and tracking.

    SciTech Connect

    Wang, Y.; Borland, M. D.; Accelerator Systems Division

    2006-01-01

    elegant is a general-purpose code for electron accelerator simulation that has a worldwide user base. Recently, many of the time-intensive elements were parallelized using MPI. Development has used modest Linux clusters and the BlueGene/L supercomputer at Argonne National Laboratory. This has provided very good performance for some practical simulations, such as multiparticle tracking with synchrotron radiation and emittance blow-up in the vertical rf kick scheme. The effort began with development of a concept that allowed for gradual parallelization of the code, using the existing beamline-element classification table in elegant. This was crucial as it allowed parallelization without major changes in code structure and without major conflicts with the ongoing evolution of elegant. Because of rounding error and finite machine precision, validating a parallel program against a uniprocessor program with the requirement of bitwise identical results is notoriously difficult. We will report validating simulation results of parallel elegant against those of serial elegant by applying Kahan's algorithm to improve accuracy dramatically for both versions. The quality of random numbers in a parallel implementation is very important for some simulations. Some practical experience with generating parallel random numbers by offsetting the seed of each random sequence according to the processor ID will be reported.

  8. A portable, parallel, object-oriented Monte Carlo neutron transport code in C++

    SciTech Connect

    Lee, S.R.; Cummings, J.C.; Nolen, S.D. |

    1997-05-01

    We have developed a multi-group Monte Carlo neutron transport code using C++ and the Parallel Object-Oriented Methods and Applications (POOMA) class library. This transport code, called MC++, currently computes k and {alpha}-eigenvalues and is portable to and runs parallel on a wide variety of platforms, including MPPs, clustered SMPs, and individual workstations. It contains appropriate classes and abstractions for particle transport and, through the use of POOMA, for portable parallelism. Current capabilities of MC++ are discussed, along with physics and performance results on a variety of hardware, including all Accelerated Strategic Computing Initiative (ASCI) hardware. Current parallel performance indicates the ability to compute {alpha}-eigenvalues in seconds to minutes rather than hours to days. Future plans and the implementation of a general transport physics framework are also discussed.

  9. PARALLEL IMPLEMENTATION OF THE TOPAZ OPACITY CODE: ISSUES IN LOAD-BALANCING

    SciTech Connect

    Sonnad, V; Iglesias, C A

    2008-05-12

    The TOPAZ opacity code explicitly includes configuration term structure in the calculation of bound-bound radiative transitions. This approach involves myriad spectral lines and requires the large computational capabilities of parallel processing computers. It is important, however, to make use of these resources efficiently. For example, an increase in the number of processors should yield a comparable reduction in computational time. This proportional 'speedup' indicates that very large problems can be addressed with massively parallel computers. Opacity codes can readily take advantage of parallel architecture since many intermediate calculations are independent. On the other hand, since the different tasks entail significantly disparate computational effort, load-balancing issues emerge so that parallel efficiency does not occur naturally. Several schemes to distribute the labor among processors are discussed.

  10. Omega3P: A Parallel Finite-Element Eigenmode Analysis Code for Accelerator Cavities

    SciTech Connect

    Lee, Lie-Quan; Li, Zenghai; Ng, Cho; Ko, Kwok; /SLAC

    2009-03-04

    Omega3P is a parallel eigenmode calculation code for accelerator cavities in frequency domain analysis using finite-element methods. In this report, we will present detailed finite-element formulations and resulting eigenvalue problems for lossless cavities, cavities with lossy materials, cavities with imperfectly conducting surfaces, and cavities with waveguide coupling. We will discuss the parallel algorithms for solving those eigenvalue problems and demonstrate modeling of accelerator cavities through different examples.

  11. Data Parallel Line Relaxation (DPLR) Code User Manual: Acadia - Version 4.01.1

    NASA Technical Reports Server (NTRS)

    Wright, Michael J.; White, Todd; Mangini, Nancy

    2009-01-01

    Data-Parallel Line Relaxation (DPLR) code is a computational fluid dynamic (CFD) solver that was developed at NASA Ames Research Center to help mission support teams generate high-value predictive solutions for hypersonic flow field problems. The DPLR Code Package is an MPI-based, parallel, full three-dimensional Navier-Stokes CFD solver with generalized models for finite-rate reaction kinetics, thermal and chemical non-equilibrium, accurate high-temperature transport coefficients, and ionized flow physics incorporated into the code. DPLR also includes a large selection of generalized realistic surface boundary conditions and links to enable loose coupling with external thermal protection system (TPS) material response and shock layer radiation codes.

  12. Code Optimization and Parallelization on the Origins: Looking from Users' Perspective

    NASA Technical Reports Server (NTRS)

    Chang, Yan-Tyng Sherry; Thigpen, William W. (Technical Monitor)

    2002-01-01

    Parallel machines are becoming the main compute engines for high performance computing. Despite their increasing popularity, it is still a challenge for most users to learn the basic techniques to optimize/parallelize their codes on such platforms. In this paper, we present some experiences on learning these techniques for the Origin systems at the NASA Advanced Supercomputing Division. Emphasis of this paper will be on a few essential issues (with examples) that general users should master when they work with the Origins as well as other parallel systems.

  13. Virtual detector of synchrotron radiation (VDSR) - A C++ parallel code for particle tracking and radiation calculation

    SciTech Connect

    Rykovanov, S. G.; Chen, M.; Geddes, C. G. R.; Schroeder, C. B.; Esarey, E.; Leemans, W. P.

    2012-12-21

    The Virtual Detector for Synchrotron Radiation (VDSR) is a parallel C++ code developed to calculate the incoherent radiation from a single charged particle or a beam moving in given external electro-magnetic fields. In this proceedings the code structure and features are introduced. An example of radiation generation from the betatron motion of a beam in the focusing fields of the wake in a laser-plasma accelerator is presented.

  14. Highly efficient numerical algorithm based on random trees for accelerating parallel Vlasov-Poisson simulations

    NASA Astrophysics Data System (ADS)

    Acebrón, Juan A.; Rodríguez-Rozas, Ángel

    2013-10-01

    An efficient numerical method based on a probabilistic representation for the Vlasov-Poisson system of equations in the Fourier space has been derived. This has been done theoretically for arbitrary dimensional problems, and particularized to unidimensional problems for numerical purposes. Such a representation has been validated theoretically in the linear regime comparing the solution obtained with the classical results of the linear Landau damping. The numerical strategy followed requires generating suitable random trees combined with a Padé approximant for approximating accurately a given divergent series. Such series are obtained by summing the partial contributions to the solution coming from trees with arbitrary number of branches. These contributions, coming in general from multi-dimensional definite integrals, are efficiently computed by a quasi-Monte Carlo method. It is shown how the accuracy of the method can be effectively increased by considering more terms of the series. The new representation was used successfully to develop a Probabilistic Domain Decomposition method suited for massively parallel computers, which improves the scalability found in classical methods. Finally, a few numerical examples based on classical phenomena such as the non-linear Landau damping, and the two streaming instability are given, illustrating the remarkable performance of the algorithm, when compared the results with those obtained using a classical method.

  15. A new parallel P3M code for very large-scale cosmological simulations

    NASA Astrophysics Data System (ADS)

    MacFarland, Tom; Couchman, H. M. P.; Pearce, F. R.; Pichlmeier, Jakob

    1998-12-01

    We have developed a parallel Particle-Particle, Particle-Mesh (P3M) simulation code for the Cray T3E parallel supercomputer that is well suited to studying the time evolution of systems of particles interacting via gravity and gas forces in cosmological contexts. The parallel code is based upon the public-domain serial Adaptive P3M-SPH (http://coho.astro.uwo.ca/pub/hydra/hydra.html) code of Couchman et al. (1995)[ApJ, 452, 797]. The algorithm resolves gravitational forces into a long-range component computed by discretizing the mass distribution and solving Poisson's equation on a grid using an FFT convolution method, and a short-range component computed by direct force summation for sufficiently close particle pairs. The code consists primarily of a particle-particle computation parallelized by domain decomposition over blocks of neighbour-cells, a more regular mesh calculation distributed in planes along one dimension, and several transformations between the two distributions. The load balancing of the P3M code is static, since this greatly aids the ongoing implementation of parallel adaptive refinements of the particle and mesh systems. Great care was taken throughout to make optimal use of the available memory, so that a version of the current implementation has been used to simulate systems of up to 109 particles with a 10243 mesh for the long-range force computation. These are the largest Cosmological N-body simulations of which we are aware. We discuss these memory optimizations as well as those motivated by computational performance. Performance results are very encouraging, and, even without refinements, the code has been used effectively for simulations in which the particle distribution becomes highly clustered as well as for other non-uniform systems of astrophysical interest.

  16. On the error probability of general tree and trellis codes with applications to sequential decoding

    NASA Technical Reports Server (NTRS)

    Johannesson, R.

    1973-01-01

    An upper bound on the average error probability for maximum-likelihood decoding of the ensemble of random binary tree codes is derived and shown to be independent of the length of the tree. An upper bound on the average error probability for maximum-likelihood decoding of the ensemble of random L-branch binary trellis codes of rate R = 1/n is derived which separates the effects of the tail length T and the memory length M of the code. It is shown that the bound is independent of the length L of the information sequence. This implication is investigated by computer simulations of sequential decoding utilizing the stack algorithm. These simulations confirm the implication and further suggest an empirical formula for the true undetected decoding error probability with sequential decoding.

  17. 24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs

    NASA Astrophysics Data System (ADS)

    Bédorf, Jeroen; Gaburov, Evghenii; Fujii, Michiko S.; Nitadori, Keigo; Ishiyama, Tomoaki; Portegies Zwart, Simon

    2014-11-01

    We have simulated, for the first time, the long term evolution of the Milky Way Galaxy using 51 billion particles on the Swiss Piz Daint supercomputer with our N-body gravitational tree-code Bonsai. Herein, we describe the scientific motivation and numerical algorithms. The Milky Way model was simulated for 6 billion years, during which the bar structure and spiral arms were fully formed. This improves upon previous simulations by using 1000 times more particles, and provides a wealth of new data that can be directly compared with observations. We also report the scalability on both the Swiss Piz Daint and the US ORNL Titan. On Piz Daint the parallel efficiency of Bonsai was above 95%. The highest performance was achieved with a 242 billion particle Milky Way model using 18600 GPUs on Titan, thereby reaching a sustained GPU and application performance of 33.49 Pflops and 24.77 Pflops respectively.

  18. Understanding Performance of Parallel Scientific Simulation Codes using Open|SpeedShop

    SciTech Connect

    Ghosh, K K

    2011-11-07

    Conclusions of this presentation are: (1) Open SpeedShop's (OSS) is convenient to use for large, parallel, scientific simulation codes; (2) Large codes benefit from uninstrumented execution; (3) Many experiments can be run in a short time - might need multiple shots e.g. usertime for caller-callee, hwcsamp for HW counters; (4) Decent idea of code's performance is easily obtained; (5) Statistical sampling calls for decent number of samples; and (6) HWC data is very useful for micro-analysis but can be tricky to analyze.

  19. Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry

    1998-01-01

    This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.

  20. User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code

    SciTech Connect

    Earth Sciences Division; Zhang, Keni; Zhang, Keni; Wu, Yu-Shu; Pruess, Karsten

    2008-05-27

    TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator is to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used

  1. Five-bit parallel operation of optical quantization and coding for photonic analog-to-digital conversion

    NASA Astrophysics Data System (ADS)

    Konishi, Tsuyoshi; Takahashi, Koji; Matsui, Hideki; Satoh, Takema; Itoh, Kazuyoshi

    2011-08-01

    We report the attempt of optical quantization and coding in 5-bit parallel format for photonic A/D conversion. The proposed system is designed to realize generation of 32 different optical codes in proportion to the corresponding signal levels when fed a certain range of amplitude-varied input pulses to the setup. Optical coding in a bit-parallel format made it possible, that provides 5bit optical codes from 32 optical quantized pulses. The 5-bit parallel operation of an optical quantization and coding module with 5 multi-ports was tested in our experimental setup.

  2. A parallel finite-volume MHD code for plasma thrusters with an applied magnetic field

    NASA Astrophysics Data System (ADS)

    Norgaard, Peter; Choueiri, Edgar; Jardin, Stephen

    2006-10-01

    The Princeton Code for Advanced Plasma Propulsion Simulation (PCAPPS) is a recently developed parallel finite volume code that solves the resistive MHD equations in axisymmetric form. It is intended for simulating complex plasma flows, especially those in plasma thrusters. The code uses a flux function to represent the poloidal field. It allows for externally applied magnetic fields, necessary for efficient operation of magnetoplasmadynamic thrusters (MPDT) at low power. Separate electron and heavy species energy equations are employed, and model closure is achieved by a multi-level equilibrium ionization equation of state. We provide results from various validation tests, along with solver accuracy and parallel efficiency studies. Preliminary numerical studies of a lithium-fed MPDT are also presented.

  3. Parallel Grand Canonical Monte Carlo (ParaGrandMC) Simulation Code

    NASA Technical Reports Server (NTRS)

    Yamakov, Vesselin I.

    2016-01-01

    This report provides an overview of the Parallel Grand Canonical Monte Carlo (ParaGrandMC) simulation code. This is a highly scalable parallel FORTRAN code for simulating the thermodynamic evolution of metal alloy systems at the atomic level, and predicting the thermodynamic state, phase diagram, chemical composition and mechanical properties. The code is designed to simulate multi-component alloy systems, predict solid-state phase transformations such as austenite-martensite transformations, precipitate formation, recrystallization, capillary effects at interfaces, surface absorption, etc., which can aid the design of novel metallic alloys. While the software is mainly tailored for modeling metal alloys, it can also be used for other types of solid-state systems, and to some degree for liquid or gaseous systems, including multiphase systems forming solid-liquid-gas interfaces.

  4. Advancements and performance of iterative methods in industrial applications codes on CRAY parallel/vector supercomputers

    SciTech Connect

    Poole, G.; Heroux, M.

    1994-12-31

    This paper will focus on recent work in two widely used industrial applications codes with iterative methods. The ANSYS program, a general purpose finite element code widely used in structural analysis applications, has now added an iterative solver option. Some results are given from real applications comparing performance with the tradition parallel/vector frontal solver used in ANSYS. Discussion of the applicability of iterative solvers as a general purpose solver will include the topics of robustness, as well as memory requirements and CPU performance. The FIDAP program is a widely used CFD code which uses iterative solvers routinely. A brief description of preconditioners used and some performance enhancements for CRAY parallel/vector systems is given. The solution of large-scale applications in structures and CFD includes examples from industry problems solved on CRAY systems.

  5. Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications.

    PubMed

    Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J

    2004-09-01

    We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy. PMID:15487756

  6. An ecological and evolutionary perspective on the parallel invasion of two cross-compatible trees.

    PubMed

    Besnard, Guillaume; Cuneo, Peter

    2016-01-01

    Invasive trees are generally seen as ecosystem-transforming plants that can have significant impacts on native vegetation, and often require management and control. Understanding their history and biology is essential to guide actions of land managers. Here, we present a summary of recent research into the ecology, phylogeography and management of invasive olives, which are now established outside of their native range as high ecological impact invasive trees. The parallel invasion of European and African olive in different climatic zones of Australia provides an interesting case study of invasion, characterized by early genetic admixture between domesticated and wild taxa. Today, the impact of the invasive olives on native vegetation and ecosystem function is of conservation concern, with European olive a declared weed in areas of South Australia, and African olive a declared weed in New South Wales and Pacific islands. Population genetics was used to trace the origins and invasion of both subspecies in Australia, indicating that both olive subspecies have hybridized early after introduction. Research also indicates that African olive populations can establish from a low number of founder individuals even after successive bottlenecks. Modelling based on distributional data from the native and invasive range identified a shift of the realized ecological niche in the Australian invasive range for both olive subspecies, which was particularly marked for African olive. As highly successful and long-lived invaders, olives offer further opportunities to understand the genetic basis of invasion, and we propose that future research examines the history of introduction and admixture, the genetic basis of adaptability and the role of biotic interactions during invasion. Advances on these questions will ultimately improve predictions on the future olive expansion and provide a solid basis for better management of invasive populations. PMID:27519914

  7. An ecological and evolutionary perspective on the parallel invasion of two cross-compatible trees

    PubMed Central

    Besnard, Guillaume; Cuneo, Peter

    2016-01-01

    Invasive trees are generally seen as ecosystem-transforming plants that can have significant impacts on native vegetation, and often require management and control. Understanding their history and biology is essential to guide actions of land managers. Here, we present a summary of recent research into the ecology, phylogeography and management of invasive olives, which are now established outside of their native range as high ecological impact invasive trees. The parallel invasion of European and African olive in different climatic zones of Australia provides an interesting case study of invasion, characterized by early genetic admixture between domesticated and wild taxa. Today, the impact of the invasive olives on native vegetation and ecosystem function is of conservation concern, with European olive a declared weed in areas of South Australia, and African olive a declared weed in New South Wales and Pacific islands. Population genetics was used to trace the origins and invasion of both subspecies in Australia, indicating that both olive subspecies have hybridized early after introduction. Research also indicates that African olive populations can establish from a low number of founder individuals even after successive bottlenecks. Modelling based on distributional data from the native and invasive range identified a shift of the realized ecological niche in the Australian invasive range for both olive subspecies, which was particularly marked for African olive. As highly successful and long-lived invaders, olives offer further opportunities to understand the genetic basis of invasion, and we propose that future research examines the history of introduction and admixture, the genetic basis of adaptability and the role of biotic interactions during invasion. Advances on these questions will ultimately improve predictions on the future olive expansion and provide a solid basis for better management of invasive populations. PMID:27519914

  8. Parallel coding schemes of whisker velocity in the rat's somatosensory system.

    PubMed

    Lottem, Eran; Gugig, Erez; Azouz, Rony

    2015-03-15

    The function of rodents' whisker somatosensory system is to transform tactile cues, in the form of vibrissa vibrations, into neuronal responses. It is well established that rodents can detect numerous tactile stimuli and tell them apart. However, the transformation of tactile stimuli obtained through whisker movements to neuronal responses is not well-understood. Here we examine the role of whisker velocity in tactile information transmission and its coding mechanisms. We show that in anaesthetized rats, whisker velocity is related to the radial distance of the object contacted and its own velocity. Whisker velocity is accurately and reliably coded in first-order neurons in parallel, by both the relative time interval between velocity-independent first spike latency of rapidly adapting neurons and velocity-dependent first spike latency of slowly adapting neurons. At the same time, whisker velocity is also coded, although less robustly, by the firing rates of slowly adapting neurons. Comparing first- and second-order neurons, we find similar decoding efficiencies for whisker velocity using either temporal or rate-based methods. Both coding schemes are sufficiently robust and hardly affected by neuronal noise. Our results suggest that whisker kinematic variables are coded by two parallel coding schemes and are disseminated in a similar way through various brain stem nuclei to multiple brain areas. PMID:25552637

  9. Performance and Application of Parallel OVERFLOW Codes on Distributed and Shared Memory Platforms

    NASA Technical Reports Server (NTRS)

    Djomehri, M. Jahed; Rizk, Yehia M.

    1999-01-01

    The presentation discusses recent studies on the performance of the two parallel versions of the aerodynamics CFD code, OVERFLOW_MPI and _MLP. Developed at NASA Ames, the serial version, OVERFLOW, is a multidimensional Navier-Stokes flow solver based on overset (Chimera) grid technology. The code has recently been parallelized in two ways. One is based on the explicit message-passing interface (MPI) across processors and uses the _MPI communication package. This approach is primarily suited for distributed memory systems and workstation clusters. The second, termed the multi-level parallel (MLP) method, is simple and uses shared memory for all communications. The _MLP code is suitable on distributed-shared memory systems. For both methods, the message passing takes place across the processors or processes at the advancement of each time step. This procedure is, in effect, the Chimera boundary conditions update, which is done in an explicit "Jacobi" style. In contrast, the update in the serial code is done in more of the "Gauss-Sidel" fashion. The programming efforts for the _MPI code is more complicated than for the _MLP code; the former requires modification of the outer and some inner shells of the serial code, whereas the latter focuses only on the outer shell of the code. The _MPI version offers a great deal of flexibility in distributing grid zones across a specified number of processors in order to achieve load balancing. The approach is capable of partitioning zones across multiple processors or sending each zone and/or cluster of several zones into a single processor. The message passing across the processors consists of Chimera boundary and/or an overlap of "halo" boundary points for each partitioned zone. The MLP version is a new coarse-grain parallel concept at the zonal and intra-zonal levels. A grouping strategy is used to distribute zones into several groups forming sub-processes which will run in parallel. The total volume of grid points in each

  10. A visual parallel-BCI speller based on the time-frequency coding strategy

    NASA Astrophysics Data System (ADS)

    Xu, Minpeng; Chen, Long; Zhang, Lixin; Qi, Hongzhi; Ma, Lan; Tang, Jiabei; Wan, Baikun; Ming, Dong

    2014-04-01

    Objective. Spelling is one of the most important issues in brain-computer interface (BCI) research. This paper is to develop a visual parallel-BCI speller system based on the time-frequency coding strategy in which the sub-speller switching among four simultaneously presented sub-spellers and the character selection are identified in a parallel mode. Approach. The parallel-BCI speller was constituted by four independent P300+SSVEP-B (P300 plus SSVEP blocking) spellers with different flicker frequencies, thereby all characters had a specific time-frequency code. To verify its effectiveness, 11 subjects were involved in the offline and online spellings. A classification strategy was designed to recognize the target character through jointly using the canonical correlation analysis and stepwise linear discriminant analysis. Main results. Online spellings showed that the proposed parallel-BCI speller had a high performance, reaching the highest information transfer rate of 67.4 bit min-1, with an average of 54.0 bit min-1 and 43.0 bit min-1 in the three rounds and five rounds, respectively. Significance. The results indicated that the proposed parallel-BCI could be effectively controlled by users with attention shifting fluently among the sub-spellers, and highly improved the BCI spelling performance.

  11. Performance of a parallel code for the Euler equations on hypercube computers

    NASA Technical Reports Server (NTRS)

    Barszcz, Eric; Chan, Tony F.; Jesperson, Dennis C.; Tuminaro, Raymond S.

    1990-01-01

    The performance of hypercubes were evaluated on a computational fluid dynamics problem and the parallel environment issues were considered that must be addressed, such as algorithm changes, implementation choices, programming effort, and programming environment. The evaluation focuses on a widely used fluid dynamics code, FLO52, which solves the two dimensional steady Euler equations describing flow around the airfoil. The code development experience is described, including interacting with the operating system, utilizing the message-passing communication system, and code modifications necessary to increase parallel efficiency. Results from two hypercube parallel computers (a 16-node iPSC/2, and a 512-node NCUBE/ten) are discussed and compared. In addition, a mathematical model of the execution time was developed as a function of several machine and algorithm parameters. This model accurately predicts the actual run times obtained and is used to explore the performance of the code in interesting but yet physically realizable regions of the parameter space. Based on this model, predictions about future hypercubes are made.

  12. Software tools for developing parallel applications. Part 1: Code development and debugging

    SciTech Connect

    Brown, J.; Geist, A.; Pancake, C.; Rover, D.

    1997-04-01

    Developing an application for parallel computers can be a lengthy and frustrating process making it a perfect candidate for software tool support. Yet application programmers are often the last to hear about new tools emerging from R and D efforts. This paper provides an overview of two focuses of tool support: code development and debugging. Each is discussed in terms of the programmer needs addressed, the extent to which representative current tools meet those needs, and what new levels of tool support are important if parallel computing is to become more widespread.

  13. Three-dimensional parallel UNIPIC-3D code for simulations of high-power microwave devices

    NASA Astrophysics Data System (ADS)

    Wang, Jianguo; Chen, Zaigao; Wang, Yue; Zhang, Dianhui; Liu, Chunliang; Li, Yongdong; Wang, Hongguang; Qiao, Hailiang; Fu, Meiyan; Yuan, Yuan

    2010-07-01

    This paper introduces a self-developed, three-dimensional parallel fully electromagnetic particle simulation code UNIPIC-3D. In this code, the electromagnetic fields are updated using the second-order, finite-difference time-domain method, and the particles are moved using the relativistic Newton-Lorentz force equation. The electromagnetic field and particles are coupled through the current term in Maxwell's equations. Two numerical examples are used to verify the algorithms adopted in this code, numerical results agree well with theoretical ones. This code can be used to simulate the high-power microwave (HPM) devices, such as the relativistic backward wave oscillator, coaxial vircator, and magnetically insulated line oscillator, etc. UNIPIC-3D is written in the object-oriented C++ language and can be run on a variety of platforms including WINDOWS, LINUX, and UNIX. Users can use the graphical user's interface to create the complex geometric structures of the simulated HPM devices, which can be automatically meshed by UNIPIC-3D code. This code has a powerful postprocessor which can display the electric field, magnetic field, current, voltage, power, spectrum, momentum of particles, etc. For the sake of comparison, the results computed by using the two-and-a-half-dimensional UNIPIC code are also provided for the same parameters of HPM devices, the numerical results computed from these two codes agree well with each other.

  14. Neptune: An astrophysical smooth particle hydrodynamics code for massively parallel computer architectures

    NASA Astrophysics Data System (ADS)

    Sandalski, Stou

    Smooth particle hydrodynamics is an efficient method for modeling the dynamics of fluids. It is commonly used to simulate astrophysical processes such as binary mergers. We present a newly developed GPU accelerated smooth particle hydrodynamics code for astrophysical simulations. The code is named neptune after the Roman god of water. It is written in OpenMP parallelized C++ and OpenCL and includes octree based hydrodynamic and gravitational acceleration. The design relies on object-oriented methodologies in order to provide a flexible and modular framework that can be easily extended and modified by the user. Several pre-built scenarios for simulating collisions of polytropes and black-hole accretion are provided. The code is released under the MIT Open Source license and publicly available at http://code.google.com/p/neptune-sph/.

  15. Parallel changes in mate-attracting calls and female preferences in autotriploid tree frogs

    PubMed Central

    Tucker, Mitch A.; Gerhardt, H. C.

    2012-01-01

    For polyploid species to persist, they must be reproductively isolated from their diploid parental species, which coexist at the same time and place at least initially. In a complex of biparentally reproducing tetraploid and diploid tree frogs in North America, selective phonotaxis—mediated by differences in the pulse-repetition (pulse rate) of their mate-attracting vocalizations—ensures assortative mating. We show that artificially produced autotriploid females of the diploid species (Hyla chrysoscelis) show a shift in pulse-rate preference in the direction of the pulse rate produced by males of the tetraploid species (Hyla versicolor). The estimated preference function is centred near the mean pulse rate of the calls of artificially produced male autotriploids. Such a parallel shift, which is caused by polyploidy per se and whose magnitude is expected to be greater in autotetraploids, may have facilitated sympatric speciation by promoting reproductive isolation of the initially formed polyploids from their diploid parental forms. This process also helps to explain why tetraploid lineages with different origins have similar advertisement calls and freely interbreed. PMID:22113033

  16. Cupid: Cluster-Based Exploration of Geometry Generators with Parallel Coordinates and Radial Trees.

    PubMed

    Beham, Michael; Herzner, Wolfgang; Gröller, M Eduard; Kehrer, Johannes

    2014-12-01

    Geometry generators are commonly used in video games and evaluation systems for computer vision to create geometric shapes such as terrains, vegetation or airplanes. The parameters of the generator are often sampled automatically which can lead to many similar or unwanted geometric shapes. In this paper, we propose a novel visual exploration approach that combines the abstract parameter space of the geometry generator with the resulting 3D shapes in a composite visualization. Similar geometric shapes are first grouped using hierarchical clustering and then nested within an illustrative parallel coordinates visualization. This helps the user to study the sensitivity of the generator with respect to its parameter space and to identify invalid parameter settings. Starting from a compact overview representation, the user can iteratively drill-down into local shape differences by clicking on the respective clusters. Additionally, a linked radial tree gives an overview of the cluster hierarchy and enables the user to manually split or merge clusters. We evaluate our approach by exploring the parameter space of a cup generator and provide feedback from domain experts. PMID:26356883

  17. Reading out a spatiotemporal population code by imaging neighbouring parallel fibre axons in vivo

    PubMed Central

    Wilms, Christian D.; Häusser, Michael

    2015-01-01

    The spatiotemporal pattern of synaptic inputs to the dendritic tree is crucial for synaptic integration and plasticity. However, it is not known if input patterns driven by sensory stimuli are structured or random. Here we investigate the spatial patterning of synaptic inputs by directly monitoring presynaptic activity in the intact mouse brain on the micron scale. Using in vivo calcium imaging of multiple neighbouring cerebellar parallel fibre axons, we find evidence for clustered patterns of axonal activity during sensory processing. The clustered parallel fibre input we observe is ideally suited for driving dendritic spikes, postsynaptic calcium signalling, and synaptic plasticity in downstream Purkinje cells, and is thus likely to be a major feature of cerebellar function during sensory processing. PMID:25751648

  18. Shared and Distributed Memory Parallel Security Analysis of Large-Scale Source Code and Binary Applications

    SciTech Connect

    Quinlan, D; Barany, G; Panas, T

    2007-08-30

    Many forms of security analysis on large scale applications can be substantially automated but the size and complexity can exceed the time and memory available on conventional desktop computers. Most commercial tools are understandably focused on such conventional desktop resources. This paper presents research work on the parallelization of security analysis of both source code and binaries within our Compass tool, which is implemented using the ROSE source-to-source open compiler infrastructure. We have focused on both shared and distributed memory parallelization of the evaluation of rules implemented as checkers for a wide range of secure programming rules, applicable to desktop machines, networks of workstations and dedicated clusters. While Compass as a tool focuses on source code analysis and reports violations of an extensible set of rules, the binary analysis work uses the exact same infrastructure but is less well developed into an equivalent final tool.

  19. Parallelization of the GKEM Electromagnetic PIC code using MPI and OpenMP

    NASA Astrophysics Data System (ADS)

    Benjamin, Mark; Ethier, Stephane; Lee, Wei-Li

    2009-11-01

    GKEM is a legacy gyrokinetic PIC code in slab geometry that calculates anomalous transport in fusion plasmas due to drift wave microturbulence. It is currently being used to develop new algorithms for high-beta electromagnetic PIC simulations. This work focuses on the modernization and performance improvement of GKEM through the use of FORTRAN 90 language features and parallelization. MPI-based particle parallelization was implemented as well as loop-level multi-threading using OpenMP directives. Performance improvements and speedup curves for the different stages of the code are discussed. Project supported by the DOE-PPPL High School Internship Program and DOE contract DE-AC02-09CH11466.

  20. Spatial parallelism of a 3D finite difference, velocity-stress elastic wave propagation code

    SciTech Connect

    Minkoff, S.E.

    1999-12-01

    Finite difference methods for solving the wave equation more accurately capture the physics of waves propagating through the earth than asymptotic solution methods. Unfortunately, finite difference simulations for 3D elastic wave propagation are expensive. The authors model waves in a 3D isotropic elastic earth. The wave equation solution consists of three velocity components and six stresses. The partial derivatives are discretized using 2nd-order in time and 4th-order in space staggered finite difference operators. Staggered schemes allow one to obtain additional accuracy (via centered finite differences) without requiring additional storage. The serial code is most unique in its ability to model a number of different types of seismic sources. The parallel implementation uses the MPI library, thus allowing for portability between platforms. Spatial parallelism provides a highly efficient strategy for parallelizing finite difference simulations. In this implementation, one can decompose the global problem domain into one-, two-, and three-dimensional processor decompositions with 3D decompositions generally producing the best parallel speedup. Because I/O is handled largely outside of the time-step loop (the most expensive part of the simulation) the authors have opted for straight-forward broadcast and reduce operations to handle I/O. The majority of the communication in the code consists of passing subdomain face information to neighboring processors for use as ghost cells. When this communication is balanced against computation by allocating subdomains of reasonable size, they observe excellent scaled speedup. Allocating subdomains of size 25 x 25 x 25 on each node, they achieve efficiencies of 94% on 128 processors. Numerical examples for both a layered earth model and a homogeneous medium with a high-velocity blocky inclusion illustrate the accuracy of the parallel code.

  1. Spatial Parallelism of a 3D Finite Difference, Velocity-Stress Elastic Wave Propagation Code

    SciTech Connect

    MINKOFF,SUSAN E.

    1999-12-09

    Finite difference methods for solving the wave equation more accurately capture the physics of waves propagating through the earth than asymptotic solution methods. Unfortunately. finite difference simulations for 3D elastic wave propagation are expensive. We model waves in a 3D isotropic elastic earth. The wave equation solution consists of three velocity components and six stresses. The partial derivatives are discretized using 2nd-order in time and 4th-order in space staggered finite difference operators. Staggered schemes allow one to obtain additional accuracy (via centered finite differences) without requiring additional storage. The serial code is most unique in its ability to model a number of different types of seismic sources. The parallel implementation uses the MP1 library, thus allowing for portability between platforms. Spatial parallelism provides a highly efficient strategy for parallelizing finite difference simulations. In this implementation, one can decompose the global problem domain into one-, two-, and three-dimensional processor decompositions with 3D decompositions generally producing the best parallel speed up. Because i/o is handled largely outside of the time-step loop (the most expensive part of the simulation) we have opted for straight-forward broadcast and reduce operations to handle i/o. The majority of the communication in the code consists of passing subdomain face information to neighboring processors for use as ''ghost cells''. When this communication is balanced against computation by allocating subdomains of reasonable size, we observe excellent scaled speed up. Allocating subdomains of size 25 x 25 x 25 on each node, we achieve efficiencies of 94% on 128 processors. Numerical examples for both a layered earth model and a homogeneous medium with a high-velocity blocky inclusion illustrate the accuracy of the parallel code.

  2. Parallel sparse and dense information coding streams in the electrosensory midbrain

    PubMed Central

    Sproule, Michael K.J.; Metzen, Michael G.; Chacron, Maurice J.

    2015-01-01

    Efficient processing of incoming sensory information is critical for an organism’s survival. It has been widely observed across systems and species that the representation of sensory information changes across successive brain areas. Indeed, peripheral sensory neurons tend to respond densely to a broad range of sensory stimuli while more central neurons tend to instead respond sparsely to a narrow range of stimuli. Such a transition might be advantageous as sparse neural codes are thought to be metabolically efficient and optimize coding efficiency. Here we investigated whether the neural code transitions from dense to sparse within the midbrain Torus semicircularis (TS) of weakly electric fish. Confirming previous results, we found both dense and sparse coding neurons. However, subsequent histological classification revealed that most dense neurons projected to higher brain areas. Our results thus provide strong evidence against the hypothesis that the neural code transitions from dense to sparse in the electrosensory system. Rather, they support the alternative hypothesis that higher brain areas receive parallel streams of dense and sparse coded information from the electrosensory midbrain. We discuss the implications and possible advantages of such a coding strategy and argue that it is a general feature of sensory processing. PMID:26375927

  3. Parallel sparse and dense information coding streams in the electrosensory midbrain.

    PubMed

    Sproule, Michael K J; Metzen, Michael G; Chacron, Maurice J

    2015-10-21

    Efficient processing of incoming sensory information is critical for an organism's survival. It has been widely observed across systems and species that the representation of sensory information changes across successive brain areas. Indeed, peripheral sensory neurons tend to respond densely to a broad range of sensory stimuli while more central neurons tend to instead respond sparsely to a narrow range of stimuli. Such a transition might be advantageous as sparse neural codes are thought to be metabolically efficient and optimize coding efficiency. Here we investigated whether the neural code transitions from dense to sparse within the midbrain Torus semicircularis (TS) of weakly electric fish. Confirming previous results, we found both dense and sparse coding neurons. However, subsequent histological classification revealed that most dense neurons projected to higher brain areas. Our results thus provide strong evidence against the hypothesis that the neural code transitions from dense to sparse in the electrosensory system. Rather, they support the alternative hypothesis that higher brain areas receive parallel streams of dense and sparse coded information from the electrosensory midbrain. We discuss the implications and possible advantages of such a coding strategy and argue that it is a general feature of sensory processing. PMID:26375927

  4. A PARALLEL MONTE CARLO CODE FOR SIMULATING COLLISIONAL N-BODY SYSTEMS

    SciTech Connect

    Pattabiraman, Bharath; Umbreit, Stefan; Liao, Wei-keng; Choudhary, Alok; Kalogera, Vassiliki; Memik, Gokhan; Rasio, Frederic A.

    2013-02-15

    We present a new parallel code for computing the dynamical evolution of collisional N-body systems with up to N {approx} 10{sup 7} particles. Our code is based on the Henon Monte Carlo method for solving the Fokker-Planck equation, and makes assumptions of spherical symmetry and dynamical equilibrium. The principal algorithmic developments involve optimizing data structures and the introduction of a parallel random number generation scheme as well as a parallel sorting algorithm required to find nearest neighbors for interactions and to compute the gravitational potential. The new algorithms we introduce along with our choice of decomposition scheme minimize communication costs and ensure optimal distribution of data and workload among the processing units. Our implementation uses the Message Passing Interface library for communication, which makes it portable to many different supercomputing architectures. We validate the code by calculating the evolution of clusters with initial Plummer distribution functions up to core collapse with the number of stars, N, spanning three orders of magnitude from 10{sup 5} to 10{sup 7}. We find that our results are in good agreement with self-similar core-collapse solutions, and the core-collapse times generally agree with expectations from the literature. Also, we observe good total energy conservation, within {approx}< 0.04% throughout all simulations. We analyze the performance of the code, and demonstrate near-linear scaling of the runtime with the number of processors up to 64 processors for N = 10{sup 5}, 128 for N = 10{sup 6} and 256 for N = 10{sup 7}. The runtime reaches saturation with the addition of processors beyond these limits, which is a characteristic of the parallel sorting algorithm. The resulting maximum speedups we achieve are approximately 60 Multiplication-Sign , 100 Multiplication-Sign , and 220 Multiplication-Sign , respectively.

  5. SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX/80

    NASA Astrophysics Data System (ADS)

    Kamat, Manohar P.; Watson, Brian C.

    1992-02-01

    The results of a research activity aimed at providing a finite element capability for analyzing turbo-machinery bladed-disk assemblies in a vector/parallel processing environment are summarized. Analysis of aircraft turbofan engines is very computationally intensive. The performance limit of modern day computers with a single processing unit was estimated at 3 billions of floating point operations per second (3 gigaflops). In view of this limit of a sequential unit, performance rates higher than 3 gigaflops can be achieved only through vectorization and/or parallelization as on Alliant FX/80. Accordingly, the efforts of this critically needed research were geared towards developing and evaluating parallel finite element methods for static and vibration analysis. A special purpose code, named with the acronym SAPNEW, performs static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements.

  6. SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX/80

    NASA Technical Reports Server (NTRS)

    Kamat, Manohar P.; Watson, Brian C.

    1992-01-01

    The results of a research activity aimed at providing a finite element capability for analyzing turbo-machinery bladed-disk assemblies in a vector/parallel processing environment are summarized. Analysis of aircraft turbofan engines is very computationally intensive. The performance limit of modern day computers with a single processing unit was estimated at 3 billions of floating point operations per second (3 gigaflops). In view of this limit of a sequential unit, performance rates higher than 3 gigaflops can be achieved only through vectorization and/or parallelization as on Alliant FX/80. Accordingly, the efforts of this critically needed research were geared towards developing and evaluating parallel finite element methods for static and vibration analysis. A special purpose code, named with the acronym SAPNEW, performs static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements.

  7. Self-Scheduling Parallel Methods for Multiple Serial Codes with Application to WOPWOP

    NASA Technical Reports Server (NTRS)

    Long, Lyle N.; Brentner, Kenneth S.

    2000-01-01

    This paper presents a scheme for efficiently running a large number of serial jobs on parallel computers. Two examples are given of computer programs that run relatively quickly, but often they must be run numerous times to obtain all the results needed. It is very common in science and engineering to have codes that are not massive computing challenges in themselves, but due to the number of instances that must be run, they do become large-scale computing problems. The two examples given here represent common problems in aerospace engineering: aerodynamic panel methods and aeroacoustic integral methods. The first example simply solves many systems of linear equations. This is representative of an aerodynamic panel code where someone would like to solve for numerous angles of attack. The complete code for this first example is included in the appendix so that it can be readily used by others as a template. The second example is an aeroacoustics code (WOPWOP) that solves the Ffowcs Williams Hawkings equation to predict the far-field sound due to rotating blades. In this example, one quite often needs to compute the sound at numerous observer locations, hence parallelization is utilized to automate the noise computation for a large number of observers.

  8. Recent Improvements to the IMPACT-T Parallel Particle TrackingCode

    SciTech Connect

    Qiang, J.; Pogorelov, I.V.; Ryne, R.

    2006-11-16

    The IMPACT-T code is a parallel three-dimensional quasi-static beam dynamics code for modeling high brightness beams in photoinjectors and RF linacs. Developed under the US DOE Scientific Discovery through Advanced Computing (SciDAC) program, it includes several key features including a self-consistent calculation of 3D space-charge forces using a shifted and integrated Green function method, multiple energy bins for beams with large energy spread, and models for treating RF standing wave and traveling wave structures. In this paper, we report on recent improvements to the IMPACT-T code including modeling traveling wave structures, short-range transverse and longitudinal wakefields, and longitudinal coherent synchrotron radiation through bending magnets.

  9. Performance analysis of parallel gravitational N-body codes on large GPU clusters

    NASA Astrophysics Data System (ADS)

    Huang, Si-Yi; Spurzem, Rainer; Berczik, Peter

    2016-01-01

    We compare the performance of two very different parallel gravitational N-body codes for astrophysical simulations on large Graphics Processing Unit (GPU) clusters, both of which are pioneers in their own fields as well as on certain mutual scales - NBODY6++ and Bonsai. We carry out benchmarks of the two codes by analyzing their performance, accuracy and efficiency through the modeling of structure decomposition and timing measurements. We find that both codes are heavily optimized to leverage the computational potential of GPUs as their performance has approached half of the maximum single precision performance of the underlying GPU cards. With such performance we predict that a speed-up of 200 - 300 can be achieved when up to 1k processors and GPUs are employed simultaneously. We discuss the quantitative information about comparisons of the two codes, finding that in the same cases Bonsai adopts larger time steps as well as larger relative energy errors than NBODY6++, typically ranging from 10 - 50 times larger, depending on the chosen parameters of the codes. Although the two codes are built for different astrophysical applications, in specified conditions they may overlap in performance at certain physical scales, thus allowing the user to choose either one by fine-tuning parameters accordingly.

  10. HOTB: High precision parallel code for calculation of four-particle harmonic oscillator transformation brackets

    NASA Astrophysics Data System (ADS)

    Stepšys, A.; Mickevicius, S.; Germanas, D.; Kalinauskas, R. K.

    2014-11-01

    This new version of the HOTB program for calculation of the three and four particle harmonic oscillator transformation brackets provides some enhancements and corrections to the earlier version (Germanas et al., 2010) [1]. In particular, new version allows calculations of harmonic oscillator transformation brackets be performed in parallel using MPI parallel communication standard. Moreover, higher precision of intermediate calculations using GNU Quadruple Precision and arbitrary precision library FMLib [2] is done. A package of Fortran code is presented. Calculation time of large matrices can be significantly reduced using effective parallel code. Use of Higher Precision methods in intermediate calculations increases the stability of algorithms and extends the validity of used algorithms for larger input values. Catalogue identifier: AEFQ_v4_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEFQ_v4_0.html Program obtainable from: CPC Program Library, Queen’s University of Belfast, N. Ireland Licensing provisions: GNU General Public License, version 3 Number of lines in programs, including test data, etc.: 1711 Number of bytes in distributed programs, including test data, etc.: 11667 Distribution format: tar.gz Program language used: FORTRAN 90 with MPI extensions for parallelism Computer: Any computer with FORTRAN 90 compiler Operating system: Windows, Linux, FreeBSD, True64 Unix Has the code been vectorized of parallelized?: Yes, parallelism using MPI extensions. Number of CPUs used: up to 999 RAM(per CPU core): Depending on allocated binomial and trinomial matrices and use of precision; at least 500 MB Catalogue identifier of previous version: AEFQ_v1_0 Journal reference of previous version: Comput. Phys. Comm. 181, Issue 2, (2010) 420-425 Does the new version supersede the previous version? Yes Nature of problem: Calculation of matrices of three-particle harmonic oscillator brackets (3HOB) and four-particle harmonic oscillator brackets (4HOB) in a more

  11. A new parallel solver suited for arbitrary semilinear parabolic partial differential equations based on generalized random trees

    NASA Astrophysics Data System (ADS)

    Acebrón, Juan A.; Rodríguez-Rozas, Ángel

    2011-09-01

    A probabilistic representation for initial value semilinear parabolic problems based on generalized random trees has been derived. Two different strategies have been proposed, both requiring generating suitable random trees combined with a Pade approximant for approximating accurately a given divergent series. Such series are obtained by summing the partial contribution to the solution coming from trees with arbitrary number of branches. The new representation greatly expands the class of problems amenable to be solved probabilistically, and was used successfully to develop a generalized probabilistic domain decomposition method. Such a method has been shown to be suited for massively parallel computers, enjoying full scalability and fault tolerance. Finally, a few numerical examples are given to illustrate the remarkable performance of the algorithm, comparing the results with those obtained with a classical method.

  12. Optimization and Openmp Parallelization of a Discrete Element Code for Convex Polyhedra on Multi-Core Machines

    NASA Astrophysics Data System (ADS)

    Chen, Jian; Matuttis, Hans-Georg

    2013-02-01

    We report our experiences with the optimization and parallelization of a discrete element code for convex polyhedra on multi-core machines and introduce a novel variant of the sort-and-sweep neighborhood algorithm. While in theory the whole code in itself parallelizes ideally, in practice the results on different architectures with different compilers and performance measurement tools depend very much on the particle number and optimization of the code. After difficulties with the interpretation of the data for speedup and efficiency are overcome, respectable parallelization speedups could be obtained.

  13. Manchester code telemetry system for well logging using quasi-parallel inductive-capacitive resonance

    NASA Astrophysics Data System (ADS)

    Xu, Lijun; Chen, Jianjun; Cao, Zhang; Liu, Xingbin; Hu, Jinhai

    2014-07-01

    In this paper, a quasi-parallel inductive-capacitive (LC) resonance method is proposed to improve the recovery of MIL-STD-1553 Manchester code with several frequency components from attenuated, distorted, and drifted signal for data telemetry in well logging, and corresponding telemetry system is developed. Required resonant frequency and quality factor are derived, and the quasi-parallel LC resonant circuit is established at the receiving end of the logging cable to suppress the low-pass filtering effect caused by the distributed capacitance of the cable and provide balanced pass for all the three frequency components of the Manchester code. The performance of the method for various encoding frequencies and cable lengths at different bit energy to noise density ratios (Eb/No) have been evaluated in the simulation. A 5 km single-core cable used in on-site well logging and various encoding frequencies were employed to verify the proposed telemetry system in the experiment. Results obtained demonstrate that the telemetry system is feasible and effective to improve the code recovery in terms of anti-attenuation, anti-distortion, and anti-drift performances, decrease the bit error rate, and increase the reachable transmission rate and distance greatly.

  14. Portable implementation of implicit methods for the UEDGE and BOUT codes on parallel computers

    SciTech Connect

    Rognlien, T D; Xu, X Q

    1999-02-17

    A description is given of the parallelization algorithms and results for two codes used ex- tensively to model edge-plasmas in magnetic fusion energy devices. The codes are UEDGE, which calculates two-dimensional plasma and neutral gas profiles, and BOUT, which cal- culates three-dimensional plasma turbulence using experimental or UEDGE profiles. Both codes describe the plasma behavior using fluid equations. A domain decomposition model is used for parallelization by dividing the global spatial simulation region into a set of domains. This approach allows the used of two recently developed LLNL Newton-Krylov numerical solvers, PVODE and KINSOL. Results show an order of magnitude speed up in execution time for the plasma equations with UEDGE. A problem which is identified for UEDGE is the solution of the fluid gas equations on a highly anisotropic mesh. The speed up of BOUT is closer to two orders of magnitude, especially if one includes the initial improvement from switching to the fully implicit Newton-Krylov solver. The turbulent transport coefficients obtained from BOUT guide the use of anomalous transport models within UEDGE, with the eventual goal of a self-consistent coupling.

  15. Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

    DOE PAGESBeta

    Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; Davidson, Gregory G.; Hamilton, Steven P.; Godfrey, Andrew T.

    2015-12-21

    This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Somemore » specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000® problems. These benchmark and scaling studies show promising results.« less

  16. Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

    SciTech Connect

    Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; Davidson, Gregory G.; Hamilton, Steven P.; Godfrey, Andrew T.

    2015-12-21

    This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Some specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000® problems. These benchmark and scaling studies show promising results.

  17. Shared Memory Parallelization of an Implicit ADI-type CFD Code

    NASA Technical Reports Server (NTRS)

    Hauser, Th.; Huang, P. G.

    1999-01-01

    A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.

  18. Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

    NASA Astrophysics Data System (ADS)

    Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; Davidson, Gregory G.; Hamilton, Steven P.; Godfrey, Andrew T.

    2016-03-01

    This work discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package authored at Oak Ridge National Laboratory. Shift has been developed to scale well from laptops to small computing clusters to advanced supercomputers and includes features such as support for multiple geometry and physics engines, hybrid capabilities for variance reduction methods such as the Consistent Adjoint-Driven Importance Sampling methodology, advanced parallel decompositions, and tally methods optimized for scalability on supercomputing architectures. The scaling studies presented in this paper demonstrate good weak and strong scaling behavior for the implemented algorithms. Shift has also been validated and verified against various reactor physics benchmarks, including the Consortium for Advanced Simulation of Light Water Reactors' Virtual Environment for Reactor Analysis criticality test suite and several Westinghouse AP1000® problems presented in this paper. These benchmark results compare well to those from other contemporary Monte Carlo codes such as MCNP5 and KENO.

  19. CPIC: A Parallel Particle-In-Cell Code for Studying Spacecraft Charging

    NASA Astrophysics Data System (ADS)

    Meierbachtol, Collin; Delzanno, Gian Luca; Moulton, David; Vernon, Louis

    2015-11-01

    CPIC is a three-dimensional electrostatic particle-in-cell code designed for use with curvilinear meshes. One of its primary objectives is to aid in studying spacecraft charging in the magnetosphere. CPIC maintains near-optimal computational performance and scaling thanks to a mapped logical mesh field solver, and a hybrid physical-logical space particle mover (avoiding the need to track particles). CPIC is written for parallel execution, utilizing a combination of both OpenMP threading and MPI distributed memory. New capabilities are being actively developed and added to CPIC, including the ability to handle multi-block curvilinear mesh structures. Verification results comparing CPIC to analytic test problems will be provided. Particular emphasis will be placed on the charging and shielding of a sphere-in-plasma system. Simulated charging results of representative spacecraft geometries will also be presented. Finally, its performance capabilities will be demonstrated through parallel scaling data.

  20. Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

    SciTech Connect

    Larkins, D. B.; Dinan, James S.; Krishnamoorthy, Sriram; Parthasarathy, Srinivasan; Rountev, Atanas; Sadayappan, Ponnuswamy

    2008-11-17

    This paper describes the Global Trees (GT) system that provides a multi-layered interface to a global address space view of distributed tree data structures, while providing scalable performance on distributed memory systems. The Global Trees system utilizes coarse-grained data movement to enhance locality and communication efficiency. We describe the design and implementation of GT, illustrate its use in the context of a gravitational simulation application, and provide experimental results that demonstrate the effectiveness of the approach. The key benefits of using this system include efficient sharedmemory style programming of distributed trees, tree-specific optimizations for data access and computation, and the ability to customize many aspects of GT to optimize application performance.

  1. Dynamical evolution of massive black holes in galactic-scale N-body simulations - introducing the regularized tree code `rVINE'

    NASA Astrophysics Data System (ADS)

    Karl, Simon J.; Aarseth, Sverre J.; Naab, Thorsten; Haehnelt, Martin G.; Spurzem, Rainer

    2015-09-01

    We present a hybrid code combining the OpenMP-parallel tree code VINE with an algorithmic chain regularization scheme. The new code, called `rVINE', aims to significantly improve the accuracy of close encounters of massive bodies with supermassive black holes (SMBHs) in galaxy-scale numerical simulations. We demonstrate the capabilities of the code by studying two test problems, the sinking of a single massive black hole to the centre of a gas-free galaxy due to dynamical friction and the hardening of an SMBH binary due to close stellar encounters. We show that results obtained with rVINE compare well with NBODY7 for problems with particle numbers that can be simulated with NBODY7. In particular, in both NBODY7 and rVINE we find a clear N-dependence of the binary hardening rate, a low binary eccentricity and moderate eccentricity evolution, as well as the conversion of the galaxy's inner density profile from a cusp to a core via the ejection of stars at high velocity. The much larger number of particles that can be handled by rVINE will open up exciting opportunities to model stellar dynamics close to SMBHs much more accurately in a realistic galactic context. This will help to remedy the inherent limitations of commonly used tree solvers to follow the correct dynamical evolution of black holes in galaxy-scale simulations.

  2. A Parallel Two-fluid Code for Global Magnetic Reconnection Studies

    SciTech Connect

    J.A. Breslau; S.C. Jardin

    2001-08-09

    This paper describes a new algorithm for the computation of two-dimensional resistive magnetohydrodynamic (MHD) and two-fluid studies of magnetic reconnection in plasmas. It has been implemented on several parallel platforms and shows good scalability up to 32 CPUs for reasonable problem sizes. A fixed, nonuniform rectangular mesh is used to resolve the different spatial scales in the reconnection problem. The resistive MHD version of the code uses an implicit/explicit hybrid method, while the two-fluid version uses an alternating-direction implicit (ADI) method. The technique has proven useful for comparing several different theories of collisional and collisionless reconnection.

  3. Multi-Zone Liquid Thrust Chamber Performance Code with Domain Decomposition for Parallel Processing

    NASA Technical Reports Server (NTRS)

    Navaz, Homayun K.

    2002-01-01

    -equation turbulence model, and two-phase flow. To overcome these limitations, the LTCP code is rewritten to include the multi-zone capability with domain decomposition that makes it suitable for parallel processing, i.e., enabling the code to run every zone or sub-domain on a separate processor. This can reduce the run time by a factor of 6 to 8, depending on the problem.

  4. Grid-based Parallel Data Streaming Implemented for the Gyrokinetic Toroidal Code

    SciTech Connect

    S. Klasky; S. Ethier; Z. Lin; K. Martins; D. McCune; R. Samtaney

    2003-09-15

    We have developed a threaded parallel data streaming approach using Globus to transfer multi-terabyte simulation data from a remote supercomputer to the scientist's home analysis/visualization cluster, as the simulation executes, with negligible overhead. Data transfer experiments show that this concurrent data transfer approach is more favorable compared with writing to local disk and then transferring this data to be post-processed. The present approach is conducive to using the grid to pipeline the simulation with post-processing and visualization. We have applied this method to the Gyrokinetic Toroidal Code (GTC), a 3-dimensional particle-in-cell code used to study microturbulence in magnetic confinement fusion from first principles plasma theory.

  5. Coding for Parallel Links to Maximize the Expected Value of Decodable Messages

    NASA Technical Reports Server (NTRS)

    Klimesh, Matthew A.; Chang, Christopher S.

    2011-01-01

    When multiple parallel communication links are available, it is useful to consider link-utilization strategies that provide tradeoffs between reliability and throughput. Interesting cases arise when there are three or more available links. Under the model considered, the links have known probabilities of being in working order, and each link has a known capacity. The sender has a number of messages to send to the receiver. Each message has a size and a value (i.e., a worth or priority). Messages may be divided into pieces arbitrarily, and the value of each piece is proportional to its size. The goal is to choose combinations of messages to send on the links so that the expected value of the messages decodable by the receiver is maximized. There are three parts to the innovation: (1) Applying coding to parallel links under the model; (2) Linear programming formulation for finding the optimal combinations of messages to send on the links; and (3) Algorithms for assisting in finding feasible combinations of messages, as support for the linear programming formulation. There are similarities between this innovation and methods developed in the field of network coding. However, network coding has generally been concerned with either maximizing throughput in a fixed network, or robust communication of a fixed volume of data. In contrast, under this model, the throughput is expected to vary depending on the state of the network. Examples of error-correcting codes that are useful under this model but which are not needed under previous models have been found. This model can represent either a one-shot communication attempt, or a stream of communications. Under the one-shot model, message sizes and link capacities are quantities of information (e.g., measured in bits), while under the communications stream model, message sizes and link capacities are information rates (e.g., measured in bits/second). This work has the potential to increase the value of data returned from

  6. ALEGRA -- A massively parallel h-adaptive code for solid dynamics

    SciTech Connect

    Summers, R.M.; Wong, M.K.; Boucheron, E.A.; Weatherby, J.R.

    1997-12-31

    ALEGRA is a multi-material, arbitrary-Lagrangian-Eulerian (ALE) code for solid dynamics designed to run on massively parallel (MP) computers. It combines the features of modern Eulerian shock codes, such as CTH, with modern Lagrangian structural analysis codes using an unstructured grid. ALEGRA is being developed for use on the teraflop supercomputers to conduct advanced three-dimensional (3D) simulations of shock phenomena important to a variety of systems. ALEGRA was designed with the Single Program Multiple Data (SPMD) paradigm, in which the mesh is decomposed into sub-meshes so that each processor gets a single sub-mesh with approximately the same number of elements. Using this approach the authors have been able to produce a single code that can scale from one processor to thousands of processors. A current major effort is to develop efficient, high precision simulation capabilities for ALEGRA, without the computational cost of using a global highly resolved mesh, through flexible, robust h-adaptivity of finite elements. H-adaptivity is the dynamic refinement of the mesh by subdividing elements, thus changing the characteristic element size and reducing numerical error. The authors are working on several major technical challenges that must be met to make effective use of HAMMER on MP computers.

  7. Hybrid threshold adaptable quantum secret sharing scheme with reverse Huffman-Fibonacci-tree coding

    PubMed Central

    Lai, Hong; Zhang, Jun; Luo, Ming-Xing; Pan, Lei; Pieprzyk, Josef; Xiao, Fuyuan; Orgun, Mehmet A.

    2016-01-01

    With prevalent attacks in communication, sharing a secret between communicating parties is an ongoing challenge. Moreover, it is important to integrate quantum solutions with classical secret sharing schemes with low computational cost for the real world use. This paper proposes a novel hybrid threshold adaptable quantum secret sharing scheme, using an m-bonacci orbital angular momentum (OAM) pump, Lagrange interpolation polynomials, and reverse Huffman-Fibonacci-tree coding. To be exact, we employ entangled states prepared by m-bonacci sequences to detect eavesdropping. Meanwhile, we encode m-bonacci sequences in Lagrange interpolation polynomials to generate the shares of a secret with reverse Huffman-Fibonacci-tree coding. The advantages of the proposed scheme is that it can detect eavesdropping without joint quantum operations, and permits secret sharing for an arbitrary but no less than threshold-value number of classical participants with much lower bandwidth. Also, in comparison with existing quantum secret sharing schemes, it still works when there are dynamic changes, such as the unavailability of some quantum channel, the arrival of new participants and the departure of participants. Finally, we provide security analysis of the new hybrid quantum secret sharing scheme and discuss its useful features for modern applications. PMID:27515908

  8. Hybrid threshold adaptable quantum secret sharing scheme with reverse Huffman-Fibonacci-tree coding.

    PubMed

    Lai, Hong; Zhang, Jun; Luo, Ming-Xing; Pan, Lei; Pieprzyk, Josef; Xiao, Fuyuan; Orgun, Mehmet A

    2016-01-01

    With prevalent attacks in communication, sharing a secret between communicating parties is an ongoing challenge. Moreover, it is important to integrate quantum solutions with classical secret sharing schemes with low computational cost for the real world use. This paper proposes a novel hybrid threshold adaptable quantum secret sharing scheme, using an m-bonacci orbital angular momentum (OAM) pump, Lagrange interpolation polynomials, and reverse Huffman-Fibonacci-tree coding. To be exact, we employ entangled states prepared by m-bonacci sequences to detect eavesdropping. Meanwhile, we encode m-bonacci sequences in Lagrange interpolation polynomials to generate the shares of a secret with reverse Huffman-Fibonacci-tree coding. The advantages of the proposed scheme is that it can detect eavesdropping without joint quantum operations, and permits secret sharing for an arbitrary but no less than threshold-value number of classical participants with much lower bandwidth. Also, in comparison with existing quantum secret sharing schemes, it still works when there are dynamic changes, such as the unavailability of some quantum channel, the arrival of new participants and the departure of participants. Finally, we provide security analysis of the new hybrid quantum secret sharing scheme and discuss its useful features for modern applications. PMID:27515908

  9. HOTB: High precision parallel code for calculation of four-particle harmonic oscillator transformation brackets

    NASA Astrophysics Data System (ADS)

    Stepšys, A.; Mickevicius, S.; Germanas, D.; Kalinauskas, R. K.

    2014-11-01

    This new version of the HOTB program for calculation of the three and four particle harmonic oscillator transformation brackets provides some enhancements and corrections to the earlier version (Germanas et al., 2010) [1]. In particular, new version allows calculations of harmonic oscillator transformation brackets be performed in parallel using MPI parallel communication standard. Moreover, higher precision of intermediate calculations using GNU Quadruple Precision and arbitrary precision library FMLib [2] is done. A package of Fortran code is presented. Calculation time of large matrices can be significantly reduced using effective parallel code. Use of Higher Precision methods in intermediate calculations increases the stability of algorithms and extends the validity of used algorithms for larger input values. Catalogue identifier: AEFQ_v4_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEFQ_v4_0.html Program obtainable from: CPC Program Library, Queen’s University of Belfast, N. Ireland Licensing provisions: GNU General Public License, version 3 Number of lines in programs, including test data, etc.: 1711 Number of bytes in distributed programs, including test data, etc.: 11667 Distribution format: tar.gz Program language used: FORTRAN 90 with MPI extensions for parallelism Computer: Any computer with FORTRAN 90 compiler Operating system: Windows, Linux, FreeBSD, True64 Unix Has the code been vectorized of parallelized?: Yes, parallelism using MPI extensions. Number of CPUs used: up to 999 RAM(per CPU core): Depending on allocated binomial and trinomial matrices and use of precision; at least 500 MB Catalogue identifier of previous version: AEFQ_v1_0 Journal reference of previous version: Comput. Phys. Comm. 181, Issue 2, (2010) 420-425 Does the new version supersede the previous version? Yes Nature of problem: Calculation of matrices of three-particle harmonic oscillator brackets (3HOB) and four-particle harmonic oscillator brackets (4HOB) in a more

  10. OpenGeoSys-GEMS: Hybrid parallelization of a reactive transport code with MPI and threads

    NASA Astrophysics Data System (ADS)

    Kosakowski, G.; Kulik, D. A.; Shao, H.

    2012-04-01

    OpenGeoSys-GEMS is a generic purpose reactive transport code based on the operator splitting approach. The code couples the Finite-Element groundwater flow and multi-species transport modules of the OpenGeoSys (OGS) project (http://www.ufz.de/index.php?en=18345) with the GEM-Selektor research package to model thermodynamic equilibrium of aquatic (geo)chemical systems utilizing the Gibbs Energy Minimization approach (http://gems.web.psi.ch/). The combination of OGS and the GEM-Selektor kernel (GEMS3K) is highly flexible due to the object-oriented modular code structures and the well defined (memory based) data exchange modules. Like other reactive transport codes, the practical applicability of OGS-GEMS is often hampered by the long calculation time and large memory requirements. • For realistic geochemical systems which might include dozens of mineral phases and several (non-ideal) solid solutions the time needed to solve the chemical system with GEMS3K may increase exceptionally. • The codes are coupled in a sequential non-iterative loop. In order to keep the accuracy, the time step size is restricted. In combination with a fine spatial discretization the time step size may become very small which increases calculation times drastically even for small 1D problems. • The current version of OGS is not optimized for memory use and the MPI version of OGS does not distribute data between nodes. Even for moderately small 2D problems the number of MPI processes that fit into memory of up-to-date workstations or HPC hardware is limited. One strategy to overcome the above mentioned restrictions of OGS-GEMS is to parallelize the coupled code. For OGS a parallelized version already exists. It is based on a domain decomposition method implemented with MPI and provides a parallel solver for fluid and mass transport processes. In the coupled code, after solving fluid flow and solute transport, geochemical calculations are done in form of a central loop over all finite

  11. Trees

    ERIC Educational Resources Information Center

    Al-Khaja, Nawal

    2007-01-01

    This is a thematic lesson plan for young learners about palm trees and the importance of taking care of them. The two part lesson teaches listening, reading and speaking skills. The lesson includes parts of a tree; the modal auxiliary, can; dialogues and a role play activity.

  12. Implementation and performance of FDPS: a framework for developing parallel particle simulation codes

    NASA Astrophysics Data System (ADS)

    Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro

    2016-08-01

    We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle Simulators). FDPS is an application-development framework which helps researchers to develop simulation programs using particle methods for large-scale distributed-memory parallel supercomputers. A particle-based simulation program for distributed-memory parallel computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory parallel computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient parallel execution of particle-based simulations as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale parallel supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication

  13. Implementation and performance of FDPS: a framework for developing parallel particle simulation codes

    NASA Astrophysics Data System (ADS)

    Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro

    2016-06-01

    We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle Simulators). FDPS is an application-development framework which helps researchers to develop simulation programs using particle methods for large-scale distributed-memory parallel supercomputers. A particle-based simulation program for distributed-memory parallel computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory parallel computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient parallel execution of particle-based simulations as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale parallel supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication

  14. SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX-80

    NASA Technical Reports Server (NTRS)

    Kamat, Manohar P.; Watson, Brian C.

    1992-01-01

    The finite element method has proven to be an invaluable tool for analysis and design of complex, high performance systems, such as bladed-disk assemblies in aircraft turbofan engines. However, as the problem size increase, the computation time required by conventional computers can be prohibitively high. Parallel processing computers provide the means to overcome these computation time limits. This report summarizes the results of a research activity aimed at providing a finite element capability for analyzing turbomachinery bladed-disk assemblies in a vector/parallel processing environment. A special purpose code, named with the acronym SAPNEW, has been developed to perform static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements. SAPNEW provides a stand alone capability for static and eigen analysis on the Alliant FX/80, a parallel processing computer. A preprocessor, named with the acronym NTOS, has been developed to accept NASTRAN input decks and convert them to the SAPNEW format to make SAPNEW more readily used by researchers at NASA Lewis Research Center.

  15. FISH: A THREE-DIMENSIONAL PARALLEL MAGNETOHYDRODYNAMICS CODE FOR ASTROPHYSICAL APPLICATIONS

    SciTech Connect

    Kaeppeli, R.; Whitehouse, S. C.; Scheidegger, S.; Liebendoerfer, M.; Pen, U.-L.

    2011-08-01

    FISH is a fast and simple ideal magnetohydrodynamics code that scales to {approx}10,000 processes for a Cartesian computational domain of {approx}1000{sup 3} cells. The simplicity of FISH has been achieved by the rigorous application of the operator splitting technique, while second-order accuracy is maintained by the symmetric ordering of the operators. Between directional sweeps, the three-dimensional data are rotated in memory so that the sweep is always performed in a cache-efficient way along the direction of contiguous memory. Hence, the code only requires a one-dimensional description of the conservation equations to be solved. This approach also enables an elegant novel parallelization of the code that is based on persistent communications with MPI for cubic domain decomposition on machines with distributed memory. This scheme is then combined with an additional OpenMP parallelization of different sweeps that can take advantage of clusters of shared memory. We document the detailed implementation of a second-order total variation diminishing advection scheme based on flux reconstruction. The magnetic fields are evolved by a constrained transport scheme. We show that the subtraction of a simple estimate of the hydrostatic gradient from the total gradients can significantly reduce the dissipation of the advection scheme in simulations of gravitationally bound hydrostatic objects. Through its simplicity and efficiency, FISH is as well suited for hydrodynamics classes as for large-scale astrophysical simulations on high-performance computer clusters. In preparation for the release of a public version, we demonstrate the performance of FISH in a suite of astrophysically orientated test cases.

  16. A 3D Parallel Beam Dynamics Code for Modeling High Brightness Beams in Photoinjectors

    SciTech Connect

    Qiang, Ji; Lidia, S.; Ryne, R.D.; Limborg, C.; /SLAC

    2006-02-13

    In this paper we report on IMPACT-T, a 3D beam dynamics code for modeling high brightness beams in photoinjectors and rf linacs. IMPACT-T is one of the few codes used in the photoinjector community that has a parallel implementation, making it very useful for high statistics simulations of beam halos and beam diagnostics. It has a comprehensive set of beamline elements, and furthermore allows arbitrary overlap of their fields. It is unique in its use of space-charge solvers based on an integrated Green function to efficiently and accurately treat beams with large aspect ratio, and a shifted Green function to efficiently treat image charge effects of a cathode. It is also unique in its inclusion of energy binning in the space-charge calculation to model beams with large energy spread. Together, all these features make IMPACT-T a powerful and versatile tool for modeling beams in photoinjectors and other systems. In this paper we describe the code features and present results of IMPACT-T simulations of the LCLS photoinjectors. We also include a comparison of IMPACT-T and PARMELA results.

  17. A 3d Parallel Beam Dynamics Code for Modeling High BrightnessBeams in Photoinjectors

    SciTech Connect

    Qiang, J.; Lidia, S.; Ryne, R.; Limborg, C.

    2005-05-16

    In this paper we report on IMPACT-T, a 3D beam dynamics code for modeling high brightness beams in photoinjectors and rf linacs. IMPACT-T is one of the few codes used in the photoinjector community that has a parallel implementation, making it very useful for high statistics simulations of beam halos and beam diagnostics. It has a comprehensive set of beamline elements, and furthermore allows arbitrary overlap of their fields. It is unique in its use of space-charge solvers based on an integrated Green function to efficiently and accurately treat beams with large aspect ratio, and a shifted Green function to efficiently treat image charge effects of a cathode. It is also unique in its inclusion of energy binning in the space-charge calculation to model beams with large energy spread. Together, all these features make IMPACT-T a powerful and versatile tool for modeling beams in photoinjectors and other systems. In this paper we describe the code features and present results of IMPACT-T simulations of the LCLS photoinjectors. We also include a comparison of IMPACT-T and PARMELA results.

  18. Hybrid parallel code acceleration methods in full-core reactor physics calculations

    SciTech Connect

    Courau, T.; Plagne, L.; Ponicot, A.; Sjoden, G.

    2012-07-01

    When dealing with nuclear reactor calculation schemes, the need for three dimensional (3D) transport-based reference solutions is essential for both validation and optimization purposes. Considering a benchmark problem, this work investigates the potential of discrete ordinates (Sn) transport methods applied to 3D pressurized water reactor (PWR) full-core calculations. First, the benchmark problem is described. It involves a pin-by-pin description of a 3D PWR first core, and uses a 8-group cross-section library prepared with the DRAGON cell code. Then, a convergence analysis is performed using the PENTRAN parallel Sn Cartesian code. It discusses the spatial refinement and the associated angular quadrature required to properly describe the problem physics. It also shows that initializing the Sn solution with the EDF SPN solver COCAGNE reduces the number of iterations required to converge by nearly a factor of 6. Using a best estimate model, PENTRAN results are then compared to multigroup Monte Carlo results obtained with the MCNP5 code. Good consistency is observed between the two methods (Sn and Monte Carlo), with discrepancies that are less than 25 pcm for the k{sub eff}, and less than 2.1% and 1.6% for the flux at the pin-cell level and for the pin-power distribution, respectively. (authors)

  19. L-PICOLA: A parallel code for fast dark matter simulation

    NASA Astrophysics Data System (ADS)

    Howlett, C.; Manera, M.; Percival, W. J.

    2015-09-01

    Robust measurements based on current large-scale structure surveys require precise knowledge of statistical and systematic errors. This can be obtained from large numbers of realistic mock galaxy catalogues that mimic the observed distribution of galaxies within the survey volume. To this end we present a fast, distributed-memory, planar-parallel code, L-PICOLA, which can be used to generate and evolve a set of initial conditions into a dark matter field much faster than a full non-linear N-Body simulation. Additionally, L-PICOLA has the ability to include primordial non-Gaussianity in the simulation and simulate the past lightcone at run-time, with optional replication of the simulation volume. Through comparisons to fully non-linear N-Body simulations we find that our code can reproduce the z = 0 power spectrum and reduced bispectrum of dark matter to within 2% and 5% respectively on all scales of interest to measurements of Baryon Acoustic Oscillations and Redshift Space Distortions, but 3 orders of magnitude faster. The accuracy, speed and scalability of this code, alongside the additional features we have implemented, make it extremely useful for both current and next generation large-scale structure surveys. L-PICOLA is publicly available at https://cullanhowlett.github.io/l-picola.

  20. Enhancements, Parallelization and Future Directions of the V3FIT 3-D Equilibrium Reconstruction Code

    NASA Astrophysics Data System (ADS)

    Cianciosa, M. R.; Hanson, J. D.; Maurer, D. A.; Hartwell, G. J.; Archmiller, M. C.; Ma, X.; Herfindal, J.

    2014-10-01

    Three-dimensional equilibrium reconstruction is spreading beyond its original application to stellarators. Three-dimensional effects in nominally axisymmetric systems, including quasi-helical states in reversed field pinches and error fields in tokamaks, are becoming increasingly important. V3FIT is a fully three dimensional equilibrium reconstruction code in widespread use throughout the fusion community. The code has recently undergone extensive revision to prepare for the next generation of equilibrium reconstruction problems. The most notable changes are the abstraction of the equilibrium model, the propagation of experimental errors to the reconstructed results, support for multicolor soft x-ray emissivity cameras, and recent efforts to add parallelization for efficient computation on multi-processor system. Work presented will contain discussions on these new capabilities. We will compare probability distributions of reconstructed parameters with results from whole shot reconstructions. We will show benchmarking and profiling results of initial performance improvements through the addition of OpenMP and MPI support. We will discuss future directions of the V3FIT code including steps taken for support of the W-7X stellarator. Work supported by US. Department of Energy Grant No. DEFG-0203-ER-54692B.

  1. QuickPIC: a highly efficient fully parallelized PIC code for plasma-based acceleration

    NASA Astrophysics Data System (ADS)

    Huang, C.; Decyk, V. K.; Zhou, M.; Lu, W.; Mori, W. B.; Cooley, J. H.; Antonsen, T. M., Jr.; Feng, B.; Katsouleas, T.; Vieira, J.; Silva, L. O.

    2006-09-01

    A highly efficient, fully parallelized, fully relativistic, three-dimensional particle-incell model for simulating plasma and laser wakefield acceleration is described. The model is based on the quasi-static approximation, which reduces a fully three-dimensional electromagnetic field solve and particle push to a two-dimensional field solve and particle push. This is done by calculating the plasma wake assuming that the drive beam and/or laser does not evolve during the time it takes for it to pass a plasma particle. The complete electromagnetic fields of the plasma wake and its associated index of refraction are then used to evolve the drive beam and/or laser using very large time steps. This algorithm reduces the computation time by 2 to 3 orders of magnitude without loss of accuracy for highly nonlinear problems of interest. The code is fully parallelizable with different domain decompositions for the 2D and 3D pieces of the code. The code also has dynamic load balancing. We present the basic algorithms and design of QuickPIC, as well as comparison between the new algorithm and conventional fully explicit models (OSIRIS). Direction for future work is also presented including a software pipeline technique to further scale QuickPIC to 10,000+ processors.

  2. Acceleration of the Geostatistical Software Library (GSLIB) by code optimization and hybrid parallel programming

    NASA Astrophysics Data System (ADS)

    Peredo, Oscar; Ortiz, Julián M.; Herrero, José R.

    2015-12-01

    The Geostatistical Software Library (GSLIB) has been used in the geostatistical community for more than thirty years. It was designed as a bundle of sequential Fortran codes, and today it is still in use by many practitioners and researchers. Despite its widespread use, few attempts have been reported in order to bring this package to the multi-core era. Using all CPU resources, GSLIB algorithms can handle large datasets and grids, where tasks are compute- and memory-intensive applications. In this work, a methodology is presented to accelerate GSLIB applications using code optimization and hybrid parallel processing, specifically for compute-intensive applications. Minimal code modifications are added decreasing as much as possible the elapsed time of execution of the studied routines. If multi-core processing is available, the user can activate OpenMP directives to speed up the execution using all resources of the CPU. If multi-node processing is available, the execution is enhanced using MPI messages between the compute nodes.Four case studies are presented: experimental variogram calculation, kriging estimation, sequential gaussian and indicator simulation. For each application, three scenarios (small, large and extra large) are tested using a desktop environment with 4 CPU-cores and a multi-node server with 128 CPU-nodes. Elapsed times, speedup and efficiency results are shown.

  3. Delta: An object-oriented finite element code architecture for massively parallel computers

    SciTech Connect

    Weatherby, J.R.; Schutt, J.A.; Peery, J.S.; Hogan, R.E.

    1996-02-01

    Delta is an object-oriented code architecture based on the finite element method which enables simulation of a wide range of engineering mechanics problems in a parallel processing environment. Written in C{sup ++}, Delta is a natural framework for algorithm development and for research involving coupling of mechanics from different Engineering Science disciplines. To enhance flexibility and encourage code reuse, the architecture provides a clean separation of the major aspects of finite element programming. Spatial discretization, temporal discretization, and the solution of linear and nonlinear systems of equations are each implemented separately, independent from the governing field equations. Other attractive features of the Delta architecture include support for constitutive models with internal variables, reusable ``matrix-free`` equation solvers, and support for region-to-region variations in the governing equations and the active degrees of freedom. A demonstration code built from the Delta architecture has been used in two-dimensional and three-dimensional simulations involving dynamic and quasi-static solid mechanics, transient and steady heat transport, and flow in porous media.

  4. Parallel Adaptive Mesh Refinement Library

    NASA Technical Reports Server (NTRS)

    Mac-Neice, Peter; Olson, Kevin

    2005-01-01

    Parallel Adaptive Mesh Refinement Library (PARAMESH) is a package of Fortran 90 subroutines designed to provide a computer programmer with an easy route to extension of (1) a previously written serial code that uses a logically Cartesian structured mesh into (2) a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, PARAMESH can operate as a domain-decomposition tool for users who want to parallelize their serial codes but who do not wish to utilize adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain of a given application program, with spatial resolution varying to satisfy the demands of the application. The sub-grid blocks form the nodes of a tree data structure (a quad-tree in two or an oct-tree in three dimensions). Each grid block has a logically Cartesian mesh. The package supports one-, two- and three-dimensional models.

  5. Fortran code for SU(3) lattice gauge theory with and without MPI checkerboard parallelization

    NASA Astrophysics Data System (ADS)

    Berg, Bernd A.; Wu, Hao

    2012-10-01

    We document plain Fortran and Fortran MPI checkerboard code for Markov chain Monte Carlo simulations of pure SU(3) lattice gauge theory with the Wilson action in D dimensions. The Fortran code uses periodic boundary conditions and is suitable for pedagogical purposes and small scale simulations. For the Fortran MPI code two geometries are covered: the usual torus with periodic boundary conditions and the double-layered torus as defined in the paper. Parallel computing is performed on checkerboards of sublattices, which partition the full lattice in one, two, and so on, up to D directions (depending on the parameters set). For updating, the Cabibbo-Marinari heatbath algorithm is used. We present validations and test runs of the code. Performance is reported for a number of currently used Fortran compilers and, when applicable, MPI versions. For the parallelized code, performance is studied as a function of the number of processors. Program summary Program title: STMC2LSU3MPI Catalogue identifier: AEMJ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEMJ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 26666 No. of bytes in distributed program, including test data, etc.: 233126 Distribution format: tar.gz Programming language: Fortran 77 compatible with the use of Fortran 90/95 compilers, in part with MPI extensions. Computer: Any capable of compiling and executing Fortran 77 or Fortran 90/95, when needed with MPI extensions. Operating system: Red Hat Enterprise Linux Server 6.1 with OpenMPI + pgf77 11.8-0, Centos 5.3 with OpenMPI + gfortran 4.1.2, Cray XT4 with MPICH2 + pgf90 11.2-0. Has the code been vectorised or parallelized?: Yes, parallelized using MPI extensions. Number of processors used: 2 to 11664 RAM: 200 Mega bytes per process. Classification: 11

  6. LCODE: A parallel quasistatic code for computationally heavy problems of plasma wakefield acceleration

    NASA Astrophysics Data System (ADS)

    Sosedkin, A. P.; Lotov, K. V.

    2016-09-01

    LCODE is a freely distributed quasistatic 2D3V code for simulating plasma wakefield acceleration, mainly specialized at resource-efficient studies of long-term propagation of ultrarelativistic particle beams in plasmas. The beam is modeled with fully relativistic macro-particles in a simulation window copropagating with the light velocity; the plasma can be simulated with either kinetic or fluid model. Several techniques are used to obtain exceptional numerical stability and precision while maintaining high resource efficiency, enabling LCODE to simulate the evolution of long particle beams over long propagation distances even on a laptop. A recent upgrade enabled LCODE to perform the calculations in parallel. A pipeline of several LCODE processes communicating via MPI (Message-Passing Interface) is capable of executing multiple consecutive time steps of the simulation in a single pass. This approach can speed up the calculations by hundreds of times.

  7. An object-oriented implementation of a parallel Monte Carlo code for radiation transport

    NASA Astrophysics Data System (ADS)

    Santos, Pedro Duarte; Lani, Andrea

    2016-05-01

    This paper describes the main features of a state-of-the-art Monte Carlo solver for radiation transport which has been implemented within COOLFluiD, a world-class open source object-oriented platform for scientific simulations. The Monte Carlo code makes use of efficient ray tracing algorithms (for 2D, axisymmetric and 3D arbitrary unstructured meshes) which are described in detail. The solver accuracy is first verified in testcases for which analytical solutions are available, then validated for a space re-entry flight experiment (i.e. FIRE II) for which comparisons against both experiments and reference numerical solutions are provided. Through the flexible design of the physical models, ray tracing and parallelization strategy (fully reusing the mesh decomposition inherited by the fluid simulator), the implementation was made efficient and reusable.

  8. Development of Parallel Computing Framework to Enhance Radiation Transport Code Capabilities for Rare Isotope Beam Facility Design

    SciTech Connect

    Kostin, Mikhail; Mokhov, Nikolai; Niita, Koji

    2013-09-25

    A parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. It is intended to be used with older radiation transport codes implemented in Fortran77, Fortran 90 or C. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was developed and tested in conjunction with the MARS15 code. It is possible to use it with other codes such as PHITS, FLUKA and MCNP after certain adjustments. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility can be used in single process calculations as well as in the parallel regime. The framework corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.

  9. Spin wave based parallel logic operations for binary data coded with domain walls

    SciTech Connect

    Urazuka, Y.; Oyabu, S.; Chen, H.; Peng, B.; Otsuki, H.; Tanaka, T. Matsuyama, K.

    2014-05-07

    We numerically investigate the feasibility of spin wave (SW) based parallel logic operations, where the phase of SW packet (SWP) is exploited as a state variable and the phase shift caused by the interaction with domain wall (DW) is utilized as a logic inversion functionality. A designed functional element consists of parallel ferromagnetic nanowires (6 nm-thick, 36 nm-width, 5120 nm-length, and 200 nm separation) with the perpendicular magnetization and sub-μm scale overlaid conductors. The logic outputs for binary data, coded with the existence (“1”) or absence (“0”) of the DW, are inductively read out from interferometric aspect of the superposed SWPs, one of them propagating through the stored data area. A practical exclusive-or operation, based on 2π periodicity in the phase logic, is demonstrated for the individual nanowire with an order of different output voltage V{sub out}, depending on the logic output for the stored data. The inductive output from the two nanowires exhibits well defined three different signal levels, corresponding to the information distance (Hamming distance) between 2-bit data stored in the multiple nanowires.

  10. On distributed memory MPI-based parallelization of SPH codes in massive HPC context

    NASA Astrophysics Data System (ADS)

    Oger, G.; Le Touzé, D.; Guibert, D.; de Leffe, M.; Biddiscombe, J.; Soumagne, J.; Piccinali, J.-G.

    2016-03-01

    Most of particle methods share the problem of high computational cost and in order to satisfy the demands of solvers, currently available hardware technologies must be fully exploited. Two complementary technologies are now accessible. On the one hand, CPUs which can be structured into a multi-node framework, allowing massive data exchanges through a high speed network. In this case, each node is usually comprised of several cores available to perform multithreaded computations. On the other hand, GPUs which are derived from the graphics computing technologies, able to perform highly multi-threaded calculations with hundreds of independent threads connected together through a common shared memory. This paper is primarily dedicated to the distributed memory parallelization of particle methods, targeting several thousands of CPU cores. The experience gained clearly shows that parallelizing a particle-based code on moderate numbers of cores can easily lead to an acceptable scalability, whilst a scalable speedup on thousands of cores is much more difficult to obtain. The discussion revolves around speeding up particle methods as a whole, in a massive HPC context by making use of the MPI library. We focus on one particular particle method which is Smoothed Particle Hydrodynamics (SPH), one of the most widespread today in the literature as well as in engineering.

  11. A massively parallel method of characteristic neutral particle transport code for GPUs

    SciTech Connect

    Boyd, W. R.; Smith, K.; Forget, B.

    2013-07-01

    Over the past 20 years, parallel computing has enabled computers to grow ever larger and more powerful while scientific applications have advanced in sophistication and resolution. This trend is being challenged, however, as the power consumption for conventional parallel computing architectures has risen to unsustainable levels and memory limitations have come to dominate compute performance. Heterogeneous computing platforms, such as Graphics Processing Units (GPUs), are an increasingly popular paradigm for solving these issues. This paper explores the applicability of GPUs for deterministic neutron transport. A 2D method of characteristics (MOC) code - OpenMOC - has been developed with solvers for both shared memory multi-core platforms as well as GPUs. The multi-threading and memory locality methodologies for the GPU solver are presented. Performance results for the 2D C5G7 benchmark demonstrate 25-35 x speedup for MOC on the GPU. The lessons learned from this case study will provide the basis for further exploration of MOC on GPUs as well as design decisions for hardware vendors exploring technologies for the next generation of machines for scientific computing. (authors)

  12. MPI parallelization of Vlasov codes for the simulation of nonlinear laser-plasma interactions

    NASA Astrophysics Data System (ADS)

    Savchenko, V.; Won, K.; Afeyan, B.; Decyk, V.; Albrecht-Marc, M.; Ghizzo, A.; Bertrand, P.

    2003-10-01

    The simulation of optical mixing driven KEEN waves [1] and electron plasma waves [1] in laser-produced plasmas require nonlinear kinetic models and massive parallelization. We use Massage Passing Interface (MPI) libraries and Appleseed [2] to solve the Vlasov Poisson system of equations on an 8 node dual processor MAC G4 cluster. We use the semi-Lagrangian time splitting method [3]. It requires only row-column exchanges in the global data redistribution, minimizing the total number of communications between processors. Recurrent communication patterns for 2D FFTs involves global transposition. In the Vlasov-Maxwell case, we use splitting into two 1D spatial advections and a 2D momentum advection [4]. Discretized momentum advection equations have a double loop structure with the outer index being assigned to different processors. We adhere to a code structure with separate routines for calculations and data management for parallel computations. [1] B. Afeyan et al., IFSA 2003 Conference Proceedings, Monterey, CA [2] V. K. Decyk, Computers in Physics, 7, 418 (1993) [3] Sonnendrucker et al., JCP 149, 201 (1998) [4] Begue et al., JCP 151, 458 (1999)

  13. Optimization of Parallel Legendre Transform using Graphics Processing Unit (GPU) for a Geodynamo Code

    NASA Astrophysics Data System (ADS)

    Lokavarapu, H. V.; Matsui, H.

    2015-12-01

    Convection and magnetic field of the Earth's outer core are expected to have vast length scales. To resolve these flows, high performance computing is required for geodynamo simulations using spherical harmonics transform (SHT), a significant portion of the execution time is spent on the Legendre transform. Calypso is a geodynamo code designed to model magnetohydrodynamics of a Boussinesq fluid in a rotating spherical shell, such as the outer core of the Earth. The code has been shown to scale well on computer clusters capable of computing at the order of 10⁵ cores using Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) parallelization for CPUs. To further optimize, we investigate three different algorithms of the SHT using GPUs. One is to preemptively compute the Legendre polynomials on the CPU before executing SHT on the GPU within the time integration loop. In the second approach, both the Legendre polynomials and the SHT are computed on the GPU simultaneously. In the third approach , we initially partition the radial grid for the forward transform and the harmonic order for the backward transform between the CPU and GPU. There after, the partitioned works are simultaneously computed in the time integration loop. We examine the trade-offs between space and time, memory bandwidth and GPU computations on Maverick, a Texas Advanced Computing Center (TACC) supercomputer. We have observed improved performance using a GPU enabled Legendre transform. Furthermore, we will compare and contrast the different algorithms in the context of GPUs.

  14. Overview of development and design of MPACT: Michigan parallel characteristics transport code

    SciTech Connect

    Kochunas, B.; Collins, B.; Jabaay, D.; Downar, T. J.; Martin, W. R.

    2013-07-01

    MPACT (Michigan Parallel Characteristics Transport Code) is a new reactor analysis tool. It is being developed by students and research staff at the University of Michigan to be used for an advanced pin-resolved transport capability within VERA (Virtual Environment for Reactor Analysis). VERA is the end-user reactor simulation tool being produced by the Consortium for the Advanced Simulation of Light Water Reactors (CASL). The MPACT development project is itself unique for the way it is changing how students do research to achieve the instructional and research goals of an academic institution, while providing immediate value to industry. The MPACT code makes use of modern lean/agile software processes and extensive testing to maintain a level of productivity and quality required by CASL. MPACT's design relies heavily on object-oriented programming concepts and design patterns and is programmed in Fortran 2003. These designs are explained and illustrated as to how they can be readily extended to incorporate new capabilities and research ideas in support of academic research objectives. The transport methods currently implemented in MPACT include the 2-D and 3-D method of characteristics (MOC) and 2-D and 3-D method of collision direction probabilities (CDP). For the cross section resonance treatment, presently the subgroup method and the new embedded self-shielding method (ESSM) are implemented within MPACT. (authors)

  15. The Hymenopteran Tree of Life: Evidence from Protein-Coding Genes and Objectively Aligned Ribosomal Data

    PubMed Central

    Klopfstein, Seraina; Vilhelmsen, Lars; Heraty, John M.; Sharkey, Michael; Ronquist, Fredrik

    2013-01-01

    Previous molecular analyses of higher hymenopteran relationships have largely been based on subjectively aligned ribosomal sequences (18S and 28S). Here, we reanalyze the 18S and 28S data (unaligned about 4.4 kb) using an objective and a semi-objective alignment approach, based on MAFFT and BAli-Phy, respectively. Furthermore, we present the first analyses of a substantial protein-coding data set (4.6 kb from one mitochondrial and four nuclear genes). Our results indicate that previous studies may have suffered from inflated support values due to subjective alignment of the ribosomal sequences, but apparently not from significant biases. The protein data provide independent confirmation of several earlier results, including the monophyly of non-xyelid hymenopterans, Pamphilioidea + Unicalcarida, Unicalcarida, Vespina, Apocrita, Proctotrupomorpha and core Proctotrupomorpha. The protein data confirm that Aculeata are nested within a paraphyletic Evaniomorpha, but cast doubt on the monophyly of Evanioidea. Combining the available morphological, ribosomal and protein-coding data, we examine the total-evidence signal as well as congruence and conflict among the three data sources. Despite an emerging consensus on many higher-level hymenopteran relationships, several problems remain unresolved or contentious, including rooting of the hymenopteran tree, relationships of the woodwasps, placement of Stephanoidea and Ceraphronoidea, and the sister group of Aculeata. PMID:23936325

  16. Parallel coding of first- and second-order stimulus attributes by midbrain electrosensory neurons.

    PubMed

    McGillivray, Patrick; Vonderschen, Katrin; Fortune, Eric S; Chacron, Maurice J

    2012-04-18

    Natural stimuli often have time-varying first-order (i.e., mean) and second-order (i.e., variance) attributes that each carry critical information for perception and can vary independently over orders of magnitude. Experiments have shown that sensory systems continuously adapt their responses based on changes in each of these attributes. This adaptation creates ambiguity in the neural code as multiple stimuli may elicit the same neural response. While parallel processing of first- and second-order attributes by separate neural pathways is sufficient to remove this ambiguity, the existence of such pathways and the neural circuits that mediate their emergence have not been uncovered to date. We recorded the responses of midbrain electrosensory neurons in the weakly electric fish Apteronotus leptorhynchus to stimuli with first- and second-order attributes that varied independently in time. We found three distinct groups of midbrain neurons: the first group responded to both first- and second-order attributes, the second group responded selectively to first-order attributes, and the last group responded selectively to second-order attributes. In contrast, all afferent hindbrain neurons responded to both first- and second-order attributes. Using computational analyses, we show how inputs from a heterogeneous population of ON- and OFF-type afferent neurons are combined to give rise to response selectivity to either first- or second-order stimulus attributes in midbrain neurons. Our study thus uncovers, for the first time, generic and widely applicable mechanisms by which parallel processing of first- and second-order stimulus attributes emerges in the brain. PMID:22514313

  17. Parallelization of a multiregion flow and transport code using software emulated global shared memory and high performance FORTRAN

    SciTech Connect

    D`Azevedo, E.F.; Gwo, Jin-Ping

    1997-02-01

    The objectives of this research are (1) to parallelize a suite of multiregion groundwater flow and solute transport codes that use Galerkin and Lagrangian- Eulerian finite element methods, (2) to test the compatibility of a global shared memory emulation software with a High Performance FORTRAN (HPF) compiler, and (3) to obtain performance characteristics and scalability of the parallel codes. The suite of multiregion flow and transport codes, 3DMURF and 3DMURT, were parallelized using the DOLIB shared memory emulation, in conjunction with the PGI HPF compiler, to run on the Intel Paragons at the Oak Ridge National Laboratory (ORNL) and a network of workstations. The novelty of this effort is first in the use of HPF and global shared memory emulation concurrently to facilitate the conversion of a serial code to a parallel code, and secondly the shared memory library enables efficient implementation of Lagrangian particle tracking along flow characteristics. The latter allows long-time-step-size simulation with particle tracking and dynamic particle redistribution for load balancing, thereby reducing the number of time steps needed for most transient problems. The parallel codes were applied to a pumping well problem to test the efficiency of the domain decomposition and particle tracking algorithms. The full problem domain consists of over 200,000 degrees of freedom with highly nonlinear soil property functions. Relatively good scalability was obtained for a preliminary test run on the Intel Paragons at the Center for Computational Sciences (CCS), ORNL. However, due to the difficulties we encountered in the PGI HPF compiler, as of the writing of this manuscript we are able to report results from 3DMURF only.

  18. A massively parallel algorithm for the collision probability calculations in the Apollo-II code using the PVM library

    SciTech Connect

    Stankovski, Z.

    1995-12-31

    The collision probability method in neutron transport, as applied to 2D geometries, consume a great amount of computer time, for a typical 2D assembly calculation about 90% of the computing time is consumed in the collision probability evaluations. Consequently RZ or 3D calculations became prohibitive. In this paper the author presents a simple but efficient parallel algorithm based on the message passing host/node programmation model. Parallelization was applied to the energy group treatment. Such approach permits parallelization of the existing code, requiring only limited modifications. Sequential/parallel computer portability is preserved, which is a necessary condition for a industrial code. Sequential performances are also preserved. The algorithm is implemented on a CRAY 90 coupled to a 128 processor T3D computer, a 16 processor IBM SPI and a network of workstations, using the Public Domain PVM library. The tests were executed for a 2D geometry with the standard 99-group library. All results were very satisfactory, the best ones with IBM SPI. Because of heterogeneity of the workstation network, the author did not ask high performances for this architecture. The same source code was used for all computers. A more impressive advantage of this algorithm will appear in the calculations of the SAPHYR project (with the future fine multigroup library of about 8000 groups) with a massively parallel computer, using several hundreds of processors.

  19. 2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation

    DOE PAGESBeta

    Warren, Michael S.

    2014-01-01

    We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k (2 18 ) processors. We present error analysis and scientific application results from a series of more than ten 69 billion (4096 3 ) particle cosmological simulations, accounting for 4×10 20 floating point operations. These results include the first simulations using the new constraints on the standard model of cosmology from the Planck satellite. Our simulations set a new standard for accuracymore » and scientific throughput, while meeting or exceeding the computational efficiency of the latest generation of hybrid TreePM N-body methods.« less

  20. Development of a discrete ordinates code system for unstructured meshes of tetrahedral cells, with serial and parallel implementations

    SciTech Connect

    Miller, R.L.

    1998-11-01

    A numerically stable, accurate, and robust form of the exponential characteristic (EC) method, used to solve the time-independent linearized Boltzmann Transport Equation, is derived using direct affine coordinate transformations on unstructured meshes of tetrahedra. This quadrature, as well as the linear characteristic (LC) spatial quadrature, is implemented in the transport code, called TETRAN. This code solves multi-group neutral particle transport problems with anisotropic scattering and was parallelized using High Performance Fortran and angular domain decomposition. A new, parallel algorithm for updating the scattering source is introduced. The EC source and inflow flux coefficients are efficiently evaluated using Broyden`s rootsolver, started with special approximations developed here. TETRAN showed robustness, stability and accuracy on a variety of challenging test problems. Parallel speed-up was observed as the number of processors was increased using an IBM SP computer system.

  1. GRay: A MASSIVELY PARALLEL GPU-BASED CODE FOR RAY TRACING IN RELATIVISTIC SPACETIMES

    SciTech Connect

    Chan, Chi-kwan; Psaltis, Dimitrios; Özel, Feryal

    2013-11-01

    We introduce GRay, a massively parallel integrator designed to trace the trajectories of billions of photons in a curved spacetime. This graphics-processing-unit (GPU)-based integrator employs the stream processing paradigm, is implemented in CUDA C/C++, and runs on nVidia graphics cards. The peak performance of GRay using single-precision floating-point arithmetic on a single GPU exceeds 300 GFLOP (or 1 ns per photon per time step). For a realistic problem, where the peak performance cannot be reached, GRay is two orders of magnitude faster than existing central-processing-unit-based ray-tracing codes. This performance enhancement allows more effective searches of large parameter spaces when comparing theoretical predictions of images, spectra, and light curves from the vicinities of compact objects to observations. GRay can also perform on-the-fly ray tracing within general relativistic magnetohydrodynamic algorithms that simulate accretion flows around compact objects. Making use of this algorithm, we calculate the properties of the shadows of Kerr black holes and the photon rings that surround them. We also provide accurate fitting formulae of their dependencies on black hole spin and observer inclination, which can be used to interpret upcoming observations of the black holes at the center of the Milky Way, as well as M87, with the Event Horizon Telescope.

  2. GRay: A Massively Parallel GPU-based Code for Ray Tracing in Relativistic Spacetimes

    NASA Astrophysics Data System (ADS)

    Chan, Chi-kwan; Psaltis, Dimitrios; Özel, Feryal

    2013-11-01

    We introduce GRay, a massively parallel integrator designed to trace the trajectories of billions of photons in a curved spacetime. This graphics-processing-unit (GPU)-based integrator employs the stream processing paradigm, is implemented in CUDA C/C++, and runs on nVidia graphics cards. The peak performance of GRay using single-precision floating-point arithmetic on a single GPU exceeds 300 GFLOP (or 1 ns per photon per time step). For a realistic problem, where the peak performance cannot be reached, GRay is two orders of magnitude faster than existing central-processing-unit-based ray-tracing codes. This performance enhancement allows more effective searches of large parameter spaces when comparing theoretical predictions of images, spectra, and light curves from the vicinities of compact objects to observations. GRay can also perform on-the-fly ray tracing within general relativistic magnetohydrodynamic algorithms that simulate accretion flows around compact objects. Making use of this algorithm, we calculate the properties of the shadows of Kerr black holes and the photon rings that surround them. We also provide accurate fitting formulae of their dependencies on black hole spin and observer inclination, which can be used to interpret upcoming observations of the black holes at the center of the Milky Way, as well as M87, with the Event Horizon Telescope.

  3. Implementation of the Turn Function Method in a three-dimensional, parallelized hydrodynamics code

    NASA Astrophysics Data System (ADS)

    Orourke, P. J.; Fairfield, M. S.

    1992-08-01

    The implementation of the Turn Function Method in KIVA-F90, a version of the KIVA computer program written in the FORTRAN 90 programming language that is used on some massively parallel computers is described. The Turn Function Method solves both linear momentum and vorticity equations in numerical calculations of compressible fluid flow. Solving a vorticity equation allows vorticity to be both conserved and transported more accurately than in traditional methods for computing compressible flow. This first implementation of the Turn Function Method in a three-dimensional hydrodynamics code involved some modification of the original method and some numerical difference approximations. In particular, a penalty method is used to keep the divergence of the computed vorticity field close to zero. Difference operators are also defined in such a way that the finite difference analog of del(del x u) = 0 is exactly satisfied. Three example problems show the increased computational cost and the accuracy to be gained by using the Turn Function Method in calculations of flows with rotational motion. Use of the Method can increase by 60 percent the computational times of the Euler equation solver in KIVA-F90, but it is concluded that this increased cost is justified by the increased accuracy.

  4. Assessment of the 2D MOC solver in MPACT: Michigan parallel characteristics transport code

    SciTech Connect

    Collins, B.; Kochunas, B.; Downar, T.

    2013-07-01

    MPACT (Michigan Parallel Characteristics Transport Code) is a new reactor analysis tool being developed by researchers at the University of Michigan as an advanced pin-resolved transport capability within VERA (Virtual Environment for Reactor Analysis). VERA is the end-user reactor simulation tool being developed by the Consortium for the Advanced Simulation of Light Water Reactors (CASL). The MPACT development project is itself unique for the way it is changing how students perform research to achieve the instructional and research goals of an academic institution, while providing immediate value to the industry. One of the major computational pieces in MPACT is the 2D MOC solver. It is critical that the 2D MOC solver provide an efficient, accurate, and robust solution over a broad range of reactor operating conditions. The C5G7 benchmark is first used to test the accuracy of the method with a fixed set of cross-sections. The VERA Core Physics Progression Problems are then used to compare the accuracy of both the 2D transport solver and also the cross-section treatments. (authors)

  5. A parallel Monte Carlo code for planar and SPECT imaging: implementation, verification and applications in 131I SPECT

    PubMed Central

    Dewaraja, Yuni K.; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F.

    2009-01-01

    This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For 131I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction. PMID:11809318

  6. Performance Analysis and Optimization on a Parallel Atmospheric General Circulation Model Code

    NASA Technical Reports Server (NTRS)

    Lou, J. Z.; Farrara, J. D.

    1997-01-01

    An analysis is presented of the primary factors influencing the performance of a parallel implementation of the UCLA atmospheric general circulation model (AGCM) on distributed-memory, massively parallel computer systems.

  7. Parallel Monte Carlo transport modeling in the context of a time-dependent, three-dimensional multi-physics code

    SciTech Connect

    Procassini, R.J.

    1997-12-31

    The fine-scale, multi-space resolution that is envisioned for accurate simulations of complex weapons systems in three spatial dimensions implies flop-rate and memory-storage requirements that will only be obtained in the near future through the use of parallel computational techniques. Since the Monte Carlo transport models in these simulations usually stress both of these computational resources, they are prime candidates for parallelization. The MONACO Monte Carlo transport package, which is currently under development at LLNL, will utilize two types of parallelism within the context of a multi-physics design code: decomposition of the spatial domain across processors (spatial parallelism) and distribution of particles in a given spatial subdomain across additional processors (particle parallelism). This implementation of the package will utilize explicit data communication between domains (message passing). Such a parallel implementation of a Monte Carlo transport model will result in non-deterministic communication patterns. The communication of particles between subdomains during a Monte Carlo time step may require a significant level of effort to achieve a high parallel efficiency.

  8. Seedling establishment in a masting desert shrub parallels the pattern for forest trees

    NASA Astrophysics Data System (ADS)

    Meyer, Susan E.; Pendleton, Burton K.

    2015-05-01

    The masting phenomenon along with its accompanying suite of seedling adaptive traits has been well studied in forest trees but has rarely been examined in desert shrubs. Blackbrush (Coleogyne ramosissima) is a regionally dominant North American desert shrub whose seeds are produced in mast events and scatter-hoarded by rodents. We followed the fate of seedlings in intact stands vs. small-scale disturbances at four contrasting sites for nine growing seasons following emergence after a mast year. The primary cause of first-year mortality was post-emergence cache excavation and seedling predation, with contrasting impacts at sites with different heteromyid rodent seed predators. Long-term establishment patterns were strongly affected by rodent activity in the weeks following emergence. Survivorship curves generally showed decreased mortality risk with age but differed among sites even after the first year. There were no detectable effects of inter-annual precipitation variability or site climatic differences on survival. Intraspecific competition from conspecific adults had strong impacts on survival and growth, both of which were higher on small-scale disturbances, but similar in openings and under shrub crowns in intact stands. This suggests that adult plants preempted soil resources in the interspaces. Aside from effects on seedling predation, there was little evidence for facilitation or interference beneath adult plant crowns. Plants in intact stands were still small and clearly juvenile after nine years, showing that blackbrush forms cohorts of suppressed plants similar to the seedling banks of closed forests. Seedling banks function in the absence of a persistent seed bank in replacement after adult plant death (gap formation), which is temporally uncoupled from masting and associated recruitment events. This study demonstrates that the seedling establishment syndrome associated with masting has evolved in desert shrublands as well as in forests.

  9. CMAD: A Self-consistent Parallel Code to Simulate the Electron Cloud Build-up and Instabilities

    SciTech Connect

    Pivi, M.T.F.; /SLAC

    2007-11-07

    We present the features of CMAD, a newly developed self-consistent code which simulates both the electron cloud build-up and related beam instabilities. By means of parallel (Message Passing Interface - MPI) computation, the code tracks the beam in an existing (MAD-type) lattice and continuously resolves the interaction between the beam and the cloud at each element location, with different cloud distributions at each magnet location. The goal of CMAD is to simulate single- and coupled-bunch instability, allowing tune shift, dynamic aperture and frequency map analysis and the determination of the secondary electron yield instability threshold. The code is in its phase of development and benchmarking with existing codes. Preliminary results on benchmarking are presented in this paper.

  10. High-speed and parallel approach for decoding of binary BCH codes with application to Flash memory devices

    NASA Astrophysics Data System (ADS)

    Prashantha Kumar, H.; Sripati, U.; Shetty, K. Rajesh

    2012-05-01

    In this article, we propose a high-speed decoding algorithm for binary BCH codes that can correct up to 7 bits in error. Evaluation of the error-locator polynomial is the most complicated and time-consuming step in the decoding of a BCH code. We have derived equations for specifying the coefficients of the error-locator polynomial, which can form the basis for the development of a parallel architecture for the decoder. This approach has the advantage that all the coefficients of the error locator polynomial are computed in parallel (in one step). The roots of error-locator polynomial can be obtained by Chien's search and inverting these roots gives the error locations. This algorithm can be employed in any application where high-speed decoding of data encoded by a binary BCH code is required. One important application is in Flash memories where data integrity is preserved using a long, high-rate binary BCH code. We have synthesized generator polynomials for binary BCH codes (error-correcting capability, s ? ) that can be employed in Flash memory devices to improve the integrity of information storage. The proposed decoding algorithm can be used as an efficient, high-speed decoder in this important application.

  11. Efficient random access high resolution region-of-interest (ROI) image retrieval using backward coding of wavelet trees (BCWT)

    NASA Astrophysics Data System (ADS)

    Corona, Enrique; Nutter, Brian; Mitra, Sunanda; Guo, Jiangling; Karp, Tanja

    2008-03-01

    Efficient retrieval of high quality Regions-Of-Interest (ROI) from high resolution medical images is essential for reliable interpretation and accurate diagnosis. Random access to high quality ROI from codestreams is becoming an essential feature in many still image compression applications, particularly in viewing diseased areas from large medical images. This feature is easier to implement in block based codecs because of the inherent spatial independency of the code blocks. This independency implies that the decoding order of the blocks is unimportant as long as the position for each is properly identified. In contrast, wavelet-tree based codecs naturally use some interdependency that exploits the decaying spectrum model of the wavelet coefficients. Thus one must keep track of the decoding order from level to level with such codecs. We have developed an innovative multi-rate image subband coding scheme using "Backward Coding of Wavelet Trees (BCWT)" which is fast, memory efficient, and resolution scalable. It offers far less complexity than many other existing codecs including both, wavelet-tree, and block based algorithms. The ROI feature in BCWT is implemented through a transcoder stage that generates a new BCWT codestream containing only the information associated with the user-defined ROI. This paper presents an efficient technique that locates a particular ROI within the BCWT coded domain, and decodes it back to the spatial domain. This technique allows better access and proper identification of pathologies in high resolution images since only a small fraction of the codestream is required to be transmitted and analyzed.

  12. Parallel Subspace Subcodes of Reed-Solomon Codes for Magnetic Recording Channels

    ERIC Educational Resources Information Center

    Wang, Han

    2010-01-01

    Read channel architectures based on a single low-density parity-check (LDPC) code are being considered for the next generation of hard disk drives. However, LDPC-only solutions suffer from the error floor problem, which may compromise reliability, if not handled properly. Concatenated architectures using an LDPC code plus a Reed-Solomon (RS) code…

  13. A Multiple Sphere T-Matrix Fortran Code for Use on Parallel Computer Clusters

    NASA Technical Reports Server (NTRS)

    Mackowski, D. W.; Mishchenko, M. I.

    2011-01-01

    A general-purpose Fortran-90 code for calculation of the electromagnetic scattering and absorption properties of multiple sphere clusters is described. The code can calculate the efficiency factors and scattering matrix elements of the cluster for either fixed or random orientation with respect to the incident beam and for plane wave or localized- approximation Gaussian incident fields. In addition, the code can calculate maps of the electric field both interior and exterior to the spheres.The code is written with message passing interface instructions to enable the use on distributed memory compute clusters, and for such platforms the code can make feasible the calculation of absorption, scattering, and general EM characteristics of systems containing several thousand spheres.

  14. Reactor Dosimetry Applications Using RAPTOR-M3G:. a New Parallel 3-D Radiation Transport Code

    NASA Astrophysics Data System (ADS)

    Longoni, Gianluca; Anderson, Stanwood L.

    2009-08-01

    The numerical solution of the Linearized Boltzmann Equation (LBE) via the Discrete Ordinates method (SN) requires extensive computational resources for large 3-D neutron and gamma transport applications due to the concurrent discretization of the angular, spatial, and energy domains. This paper will discuss the development RAPTOR-M3G (RApid Parallel Transport Of Radiation - Multiple 3D Geometries), a new 3-D parallel radiation transport code, and its application to the calculation of ex-vessel neutron dosimetry responses in the cavity of a commercial 2-loop Pressurized Water Reactor (PWR). RAPTOR-M3G is based domain decomposition algorithms, where the spatial and angular domains are allocated and processed on multi-processor computer architectures. As compared to traditional single-processor applications, this approach reduces the computational load as well as the memory requirement per processor, yielding an efficient solution methodology for large 3-D problems. Measured neutron dosimetry responses in the reactor cavity air gap will be compared to the RAPTOR-M3G predictions. This paper is organized as follows: Section 1 discusses the RAPTOR-M3G methodology; Section 2 describes the 2-loop PWR model and the numerical results obtained. Section 3 addresses the parallel performance of the code, and Section 4 concludes this paper with final remarks and future work.

  15. Photonic generation of widely tunable phase-coded microwave signals based on a dual-parallel polarization modulator.

    PubMed

    Liu, Shifeng; Zhu, Dan; Wei, Zhengwu; Pan, Shilong

    2014-07-01

    A photonic approach for the generation of a widely tunable arbitrarily phase-coded microwave signal based on a dual-parallel polarization modulator (DP-PolM) is proposed and demonstrated without using any optical or electrical filter. Two orthogonally polarized ± first-order optical sidebands with suppressed carrier are generated based on the DP-PolM, and their polarization directions are aligned with the two principal axes of the following PolM. Phase coding is implemented at a following PolM driven by an electrical coding signal. The inherent frequency-doubling operation can make the system work at a frequency beyond the operation bandwidth of the DP-PolM and the 90° hybrid. Because no optical or electrical filter is applied, good frequency tunability is realized. An experiment is performed. The generation of phase-coded signals tuning from 10 to 40 GHz with up to 10  Gbit/s coding rates is verified. PMID:24978781

  16. Parallelization of the SIR code for the investigation of small-scale features in the solar photosphere

    NASA Astrophysics Data System (ADS)

    Thonhofer, Stefan; Rubio, Luis R. Bellot; Utz, Dominik; Hanslmeier, Arnold; Jurçák, Jan

    2015-10-01

    Magnetic fields are one of the most important drivers of the highly dynamic processes that occur in the lower solar atmosphere. They span a broad range of sizes, from large- and intermediate-scale structures such as sunspots, pores and magnetic knots, down to the smallest magnetic elements observable with current telescopes. On small scales, magnetic flux tubes are often visible as Magnetic Bright Points (MBPs). Apart from simple V/I magnetograms, the most common method to deduce their magnetic properties is the inversion of spectropolarimetric data. Here we employ the SIR code for that purpose. SIR is a well-established tool that can derive not only the magnetic field vector and other atmospheric parameters (e.g., temperature, line-of-sight velocity), but also their stratifications with height, effectively producing 3-dimensional models of the lower solar atmosphere. In order to enhance the runtime performance and the usability of SIR we parallelized the existing code and standardized the input and output formats. This and other improvements make it feasible to invert extensive high-resolution data sets within a reasonable amount of computing time. An evaluation of the speedup of the parallel SIR code shows a substantial improvement in runtime.

  17. A parallel code to calculate rate-state seismicity evolution induced by time dependent, heterogeneous Coulomb stress changes

    NASA Astrophysics Data System (ADS)

    Cattania, C.; Khalid, F.

    2016-09-01

    The estimation of space and time-dependent earthquake probabilities, including aftershock sequences, has received increased attention in recent years, and Operational Earthquake Forecasting systems are currently being implemented in various countries. Physics based earthquake forecasting models compute time dependent earthquake rates based on Coulomb stress changes, coupled with seismicity evolution laws derived from rate-state friction. While early implementations of such models typically performed poorly compared to statistical models, recent studies indicate that significant performance improvements can be achieved by considering the spatial heterogeneity of the stress field and secondary sources of stress. However, the major drawback of these methods is a rapid increase in computational costs. Here we present a code to calculate seismicity induced by time dependent stress changes. An important feature of the code is the possibility to include aleatoric uncertainties due to the existence of multiple receiver faults and to the finite grid size, as well as epistemic uncertainties due to the choice of input slip model. To compensate for the growth in computational requirements, we have parallelized the code for shared memory systems (using OpenMP) and distributed memory systems (using MPI). Performance tests indicate that these parallelization strategies lead to a significant speedup for problems with different degrees of complexity, ranging from those which can be solved on standard multicore desktop computers, to those requiring a small cluster, to a large simulation that can be run using up to 1500 cores.

  18. Application of a parallel 3-dimensional hydrogeochemistry HPF code to a proposed waste disposal site at the Oak Ridge National Laboratory

    SciTech Connect

    Gwo, Jin-Ping; Yeh, Gour-Tsyh

    1997-02-01

    The objectives of this study are (1) to parallelize a 3-dimensional hydrogeochemistry code and (2) to apply the parallel code to a proposed waste disposal site at the Oak Ridge National Laboratory (ORNL). The 2-dimensional hydrogeochemistry code HYDROGEOCHEM, developed at the Pennsylvania State University for coupled subsurface solute transport and chemical equilibrium processes, was first modified to accommodate 3-dimensional problem domains. A bi-conjugate gradient stabilized linear matrix solver was then incorporated to solve the matrix equation. We chose to parallelize the 3-dimensional code on the Intel Paragons at ORNL by using an HPF (high performance FORTRAN) compiler developed at PGI. The data- and task-parallel algorithms available in the HPF compiler proved to be highly efficient for the geochemistry calculation. This calculation can be easily implemented in HPF formats and is perfectly parallel because the chemical speciation on one finite-element node is virtually independent of those on the others. The parallel code was applied to a subwatershed of the Melton Branch at ORNL. Chemical heterogeneity, in addition to physical heterogeneities of the geological formations, has been identified as one of the major factors that affect the fate and transport of contaminants at ORNL. This study demonstrated an application of the 3-dimensional hydrogeochemistry code on the Melton Branch site. A uranium tailing problem that involved in aqueous complexation and precipitation-dissolution was tested. Performance statistics was collected on the Intel Paragons at ORNL. Implications of these results on the further optimization of the code were discussed.

  19. A fast tree-based method for estimating column densities in adaptive mesh refinement codes. Influence of UV radiation field on the structure of molecular clouds

    NASA Astrophysics Data System (ADS)

    Valdivia, Valeska; Hennebelle, Patrick

    2014-11-01

    Context. Ultraviolet radiation plays a crucial role in molecular clouds. Radiation and matter are tightly coupled and their interplay influences the physical and chemical properties of gas. In particular, modeling the radiation propagation requires calculating column densities, which can be numerically expensive in high-resolution multidimensional simulations. Aims: Developing fast methods for estimating column densities is mandatory if we are interested in the dynamical influence of the radiative transfer. In particular, we focus on the effect of the UV screening on the dynamics and on the statistical properties of molecular clouds. Methods: We have developed a tree-based method for a fast estimate of column densities, implemented in the adaptive mesh refinement code RAMSES. We performed numerical simulations using this method in order to analyze the influence of the screening on the clump formation. Results: We find that the accuracy for the extinction of the tree-based method is better than 10%, while the relative error for the column density can be much more. We describe the implementation of a method based on precalculating the geometrical terms that noticeably reduces the calculation time. To study the influence of the screening on the statistical properties of molecular clouds we present the probability distribution function of gas and the associated temperature per density bin and the mass spectra for different density thresholds. Conclusions: The tree-based method is fast and accurate enough to be used during numerical simulations since no communication is needed between CPUs when using a fully threaded tree. It is then suitable to parallel computing. We show that the screening for far UV radiation mainly affects the dense gas, thereby favoring low temperatures and affecting the fragmentation. We show that when we include the screening, more structures are formed with higher densities in comparison to the case that does not include this effect. We

  20. Simulations of implosions with a 3D, parallel, unstructured-grid, radiation-hydrodynamics code

    SciTech Connect

    Kaiser, T B; Milovich, J L; Prasad, M K; Rathkopf, J; Shestakov, A I

    1998-12-28

    An unstructured-grid, radiation-hydrodynamics code is used to simulate implosions. Although most of the problems are spherically symmetric, they are run on 3D, unstructured grids in order to test the code's ability to maintain spherical symmetry of the converging waves. Three problems, of increasing complexity, are presented. In the first, a cold, spherical, ideal gas bubble is imploded by an enclosing high pressure source. For the second, we add non-linear heat conduction and drive the implosion with twelve laser beams centered on the vertices of an icosahedron. In the third problem, a NIF capsule is driven with a Planckian radiation source.

  1. Full Wave Parallel Code for Modeling RF Fields in Hot Plasmas

    NASA Astrophysics Data System (ADS)

    Spencer, Joseph; Svidzinski, Vladimir; Evstatiev, Evstati; Galkin, Sergei; Kim, Jin-Soo

    2015-11-01

    FAR-TECH, Inc. is developing a suite of full wave RF codes in hot plasmas. It is based on a formulation in configuration space with grid adaptation capability. The conductivity kernel (which includes a nonlocal dielectric response) is calculated by integrating the linearized Vlasov equation along unperturbed test particle orbits. For Tokamak applications a 2-D version of the code is being developed. Progress of this work will be reported. This suite of codes has the following advantages over existing spectral codes: 1) It utilizes the localized nature of plasma dielectric response to the RF field and calculates this response numerically without approximations. 2) It uses an adaptive grid to better resolve resonances in plasma and antenna structures. 3) It uses an efficient sparse matrix solver to solve the formulated linear equations. The linear wave equation is formulated using two approaches: for cold plasmas the local cold plasma dielectric tensor is used (resolving resonances by particle collisions), while for hot plasmas the conductivity kernel is calculated. Work is supported by the U.S. DOE SBIR program.

  2. Parallelization of GeoClaw code for modeling geophysical flows with adaptive mesh refinement on many-core systems

    USGS Publications Warehouse

    Zhang, S.; Yuen, D.A.; Zhu, A.; Song, S.; George, D.L.

    2011-01-01

    We parallelized the GeoClaw code on one-level grid using OpenMP in March, 2011 to meet the urgent need of simulating tsunami waves at near-shore from Tohoku 2011 and achieved over 75% of the potential speed-up on an eight core Dell Precision T7500 workstation [1]. After submitting that work to SC11 - the International Conference for High Performance Computing, we obtained an unreleased OpenMP version of GeoClaw from David George, who developed the GeoClaw code as part of his PH.D thesis. In this paper, we will show the complementary characteristics of the two approaches used in parallelizing GeoClaw and the speed-up obtained by combining the advantage of each of the two individual approaches with adaptive mesh refinement (AMR), demonstrating the capabilities of running GeoClaw efficiently on many-core systems. We will also show a novel simulation of the Tohoku 2011 Tsunami waves inundating the Sendai airport and Fukushima Nuclear Power Plants, over which the finest grid distance of 20 meters is achieved through a 4-level AMR. This simulation yields quite good predictions about the wave-heights and travel time of the tsunami waves. ?? 2011 IEEE.

  3. F100(3) parallel compressor computer code and user's manual

    NASA Technical Reports Server (NTRS)

    Mazzawy, R. S.; Fulkerson, D. A.; Haddad, D. E.; Clark, T. A.

    1978-01-01

    The Pratt & Whitney Aircraft multiple segment parallel compressor model has been modified to include the influence of variable compressor vane geometry on the sensitivity to circumferential flow distortion. Further, performance characteristics of the F100 (3) compression system have been incorporated into the model on a blade row basis. In this modified form, the distortion's circumferential location is referenced relative to the variable vane controlling sensors of the F100 (3) engine so that the proper solution can be obtained regardless of distortion orientation. This feature is particularly important for the analysis of inlet temperature distortion. Compatibility with fixed geometry compressor applications has been maintained in the model.

  4. PORTA: A Massively Parallel Code for 3D Non-LTE Polarized Radiative Transfer

    NASA Astrophysics Data System (ADS)

    Štěpán, J.

    2014-10-01

    The interpretation of the Stokes profiles of the solar (stellar) spectral line radiation requires solving a non-LTE radiative transfer problem that can be very complex, especially when the main interest lies in modeling the linear polarization signals produced by scattering processes and their modification by the Hanle effect. One of the main difficulties is due to the fact that the plasma of a stellar atmosphere can be highly inhomogeneous and dynamic, which implies the need to solve the non-equilibrium problem of generation and transfer of polarized radiation in realistic three-dimensional stellar atmospheric models. Here we present PORTA, a computer program we have developed for solving, in three-dimensional (3D) models of stellar atmospheres, the problem of the generation and transfer of spectral line polarization taking into account anisotropic radiation pumping and the Hanle and Zeeman effects in multilevel atoms. The numerical method of solution is based on a highly convergent iterative algorithm, whose convergence rate is insensitive to the grid size, and on an accurate short-characteristics formal solver of the Stokes-vector transfer equation which uses monotonic Bezier interpolation. In addition to the iterative method and the 3D formal solver, another important feature of PORTA is a novel parallelization strategy suitable for taking advantage of massively parallel computers. Linear scaling of the solution with the number of processors allows to reduce the solution time by several orders of magnitude. We present useful benchmarks and a few illustrations of applications using a 3D model of the solar chromosphere resulting from MHD simulations. Finally, we present our conclusions with a view to future research. For more details see Štěpán & Trujillo Bueno (2013).

  5. A Fast Parallel Simulation Code for Interaction between Proto-Planetary Disk and Embedded Proto-Planets: Implementation for 3D Code

    SciTech Connect

    Li, Shengtai; Li, Hui

    2012-06-14

    the position of the planet, we adopt the corotating frame that allows the planet moving only in radial direction if only one planet is present. This code has been extensively tested on a number of problems. For the earthmass planet with constant aspect ratio h = 0.05, the torque calculated using our code matches quite well with the the 3D linear theory results by Tanaka et al. (2002). The code is fully parallelized via message-passing interface (MPI) and has very high parallel efficiency. Several numerical examples for both fixed planet and moving planet are provided to demonstrate the efficacy of the numerical method and code.

  6. Coding for parallel execution of hardware-in-the-loop millimeter-wave scene generation models on multicore SIMD processor architectures

    NASA Astrophysics Data System (ADS)

    Olson, Richard F.

    2013-05-01

    Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.

  7. Real-time photoacoustic and ultrasound dual-modality imaging system facilitated with graphics processing unit and code parallel optimization

    NASA Astrophysics Data System (ADS)

    Yuan, Jie; Xu, Guan; Yu, Yao; Zhou, Yu; Carson, Paul L.; Wang, Xueding; Liu, Xiaojun

    2013-08-01

    Photoacoustic tomography (PAT) offers structural and functional imaging of living biological tissue with highly sensitive optical absorption contrast and excellent spatial resolution comparable to medical ultrasound (US) imaging. We report the development of a fully integrated PAT and US dual-modality imaging system, which performs signal scanning, image reconstruction, and display for both photoacoustic (PA) and US imaging all in a truly real-time manner. The back-projection (BP) algorithm for PA image reconstruction is optimized to reduce the computational cost and facilitate parallel computation on a state of the art graphics processing unit (GPU) card. For the first time, PAT and US imaging of the same object can be conducted simultaneously and continuously, at a real-time frame rate, presently limited by the laser repetition rate of 10 Hz. Noninvasive PAT and US imaging of human peripheral joints in vivo were achieved, demonstrating the satisfactory image quality realized with this system. Another experiment, simultaneous PAT and US imaging of contrast agent flowing through an artificial vessel, was conducted to verify the performance of this system for imaging fast biological events. The GPU-based image reconstruction software code for this dual-modality system is open source and available for download from http://sourceforge.net/projects/patrealtime.

  8. A real-time photoacoustic and ultrasound dual-modality imaging system facilitated with GPU and code parallel optimization

    NASA Astrophysics Data System (ADS)

    Yuan, Jie; Xu, Guan; Yu, Yao; Zhou, Yu; Carson, Paul L.; Wang, Xueding; Liu, Xiaojun

    2014-03-01

    Photoacoustic tomography (PAT) offers structural and functional imaging of living biological tissue with highly sensitive optical absorption contrast and excellent spatial resolution comparable to medical ultrasound (US) imaging. We report the development of a fully integrated PAT and US dual-modality imaging system, which performs signal scanning, image reconstruction and display for both photoacoustic (PA) and US imaging all in a truly real-time manner. The backprojection (BP) algorithm for PA image reconstruction is optimized to reduce the computational cost and facilitate parallel computation on a state of the art graphics processing unit (GPU) card. For the first time, PAT and US imaging of the same object can be conducted simultaneously and continuously, at a real time frame rate, presently limited by the laser repetition rate of 10 Hz. Noninvasive PAT and US imaging of human peripheral joints in vivo were achieved, demonstrating the satisfactory image quality realized with this system. Another experiment, simultaneous PAT and US imaging of contrast agent flowing through an artificial vessel was conducted to verify the performance of this system for imaging fast biological events. The GPU based image reconstruction software code for this dual-modality system is open source and available for download from http://sourceforge.net/projects/pat realtime .

  9. Random-iteration algorithm-based optical parallel architecture for fractal-image decoding by use of iterated-function system codes.

    PubMed

    Chang, H T; Kuo, C J

    1998-03-10

    An optical parallel architecture for the random-iteration algorithm to decode a fractal image by use of iterated-function system (IFS) codes is proposed. The code value is first converted into transmittance in film or a spatial light modulator in the optical part of the system. With an optical-to-electrical converter, electrical-to-optical converter, and some electronic circuits for addition and delay, we can perform the contractive affine transformation (CAT) denoted in IFS codes. In the proposed decoding architecture all CAT's generate points (image pixels) in parallel, and these points then are joined for display purposes. Therefore the decoding speed is improved greatly compared with existing serial-decoding architectures. In addition, an error and stability analysis that considers nonperfect elements is presented for the proposed optical system. Finally, simulation results are given to validate the proposed architecture. PMID:18268718

  10. Four-Channel, 8 x 8 Bit, Two-Dimensional Parallel Transmission by use of Space-Code-Division Multiple-Access Encoder and Decoder Modules.

    PubMed

    Nakamura, M; Kitayama, K; Igasaki, Y; Kaneda, K

    1998-07-10

    We experimentally demonstrate four-channel multiplexing of 64-bit (8 x 8) two-dimensional (2-D) parallel data links on the basis of optical space-code-division multiple access (CDMA) by using new modules of optical spatial encoders and a decoder with a new high-contrast 9-m-long image fiber with 3 x 10(4) cores. Each 8 x 8 bit plane (64-bit parallel data) is optically encoded with an 8 x 8, 2-D optical orthogonal signature pattern. The encoded bit planes are spatially multiplexed and transmitted through an image fiber. A receiver can recover the intended input bit plane by means of an optical decoding process. This result should encourage the application of optical space-CDMA to future high-throughput 2-D parallel data links connecting massively parallel processors. PMID:18285889

  11. PARAMESH V4.1: Parallel Adaptive Mesh Refinement

    NASA Astrophysics Data System (ADS)

    MacNeice, Peter; Olson, Kevin M.; Mobarry, Clark; de Fainchtein, Rosalinda; Packer, Charles

    2011-06-01

    PARAMESH is a package of Fortran 90 subroutines designed to provide an application developer with an easy route to extend an existing serial code which uses a logically cartesian structured mesh into a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, it can operate as a domain decomposition tool for users who want to parallelize their serial codes, but who do not wish to use adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain, with spatial resolution varying to satisfy the demands of the application. These sub-grid blocks form the nodes of a tree data-structure (quad-tree in 2D or oct-tree in 3D). Each grid block has a logically cartesian mesh. The package supports 1, 2 and 3D models. PARAMESH is released under the NASA-wide Open-Source software license.

  12. An open-source, massively parallel code for non-LTE synthesis and inversion of spectral lines and Zeeman-induced Stokes profiles

    NASA Astrophysics Data System (ADS)

    Socas-Navarro, H.; de la Cruz Rodríguez, J.; Asensio Ramos, A.; Trujillo Bueno, J.; Ruiz Cobo, B.

    2015-05-01

    With the advent of a new generation of solar telescopes and instrumentation, interpreting chromospheric observations (in particular, spectropolarimetry) requires new, suitable diagnostic tools. This paper describes a new code, NICOLE, that has been designed for Stokes non-LTE radiative transfer, for synthesis and inversion of spectral lines and Zeeman-induced polarization profiles, spanning a wide range of atmospheric heights from the photosphere to the chromosphere. The code features a number of unique features and capabilities and has been built from scratch with a powerful parallelization scheme that makes it suitable for application on massive datasets using large supercomputers. The source code is written entirely in Fortran 90/2003 and complies strictly with the ANSI standards to ensure maximum compatibility and portability. It is being publicly released, with the idea of facilitating future branching by other groups to augment its capabilities. The source code is currently hosted at the following repository: http://https://github.com/hsocasnavarro/NICOLE

  13. An overview of the activities of the OECD/NEA Task Force on adapting computer codes in nuclear applications to parallel architectures

    SciTech Connect

    Kirk, B.L.; Sartori, E.

    1997-06-01

    Subsequent to the introduction of High Performance Computing in the developed countries, the Organization for Economic Cooperation and Development/Nuclear Energy Agency (OECD/NEA) created the Task Force on Adapting Computer Codes in Nuclear Applications to Parallel Architectures (under the guidance of the Nuclear Science Committee`s Working Party on Advanced Computing) to study the growth area in supercomputing and its applicability to the nuclear community`s computer codes. The result has been four years of investigation for the Task Force in different subject fields - deterministic and Monte Carlo radiation transport, computational mechanics and fluid dynamics, nuclear safety, atmospheric models and waste management.

  14. The role of orthographic and phonological codes in the word and the pseudoword superiority effect: an analysis by means of multinomial processing tree models.

    PubMed

    Maris, Eric

    2002-12-01

    Central to the current accounts of the word and the pseudoword superiority effect (WSE and PWSE, respectively) is the concept of a unitized code that is less susceptible to masking than single-letter codes. Current explanations of the WSE and PWSE assume that this unitized code is orthographic, explaining these phenomena by the assumption of dual read-out from unitized and single-letter codes. In this article, orthographic dual read-out models are compared with a phonological dual read-out model (which is based on the assumption that the 1st unitized code is phonological). From this phonological code, an orthographic code is derived, through either lexical access or assembly. Comparison of the orthographic and phonological dual read-out models was performed by formulating both models as multinomial processing tree models. From an application of these models to the data of 2 letter identification experiments, it was clear that the orthographic dual read-out models are insufficient as an explanation of the PWSE, whereas the phonological dual read-out model is sufficient. PMID:12542135

  15. Optimization and Parallelization of the Thermal-Hydraulic Sub-channel Code CTF for High-Fidelity Multi-physics Applications

    SciTech Connect

    Salko, Robert K; Schmidt, Rodney; Avramova, Maria N

    2014-01-01

    This paper describes major improvements to the computational infrastructure of the CTF sub-channel code so that full-core sub-channel-resolved simulations can now be performed in much shorter run-times, either in stand-alone mode or as part of coupled-code multi-physics calculations. These improvements support the goals of the Department Of Energy (DOE) Consortium for Advanced Simulations of Light Water (CASL) Energy Innovation Hub to develop high fidelity multi-physics simulation tools for nuclear energy design and analysis. A set of serial code optimizations--including fixing computational inefficiencies, optimizing the numerical approach, and making smarter data storage choices--are first described and shown to reduce both execution time and memory usage by about a factor of ten. Next, a Single Program Multiple Data (SPMD) parallelization strategy targeting distributed memory Multiple Instruction Multiple Data (MIMD) platforms and utilizing domain-decomposition is presented. In this approach, data communication between processors is accomplished by inserting standard MPI calls at strategic points in the code. The domain decomposition approach implemented assigns one MPI process to each fuel assembly, with each domain being represented by its own CTF input file. The creation of CTF input files, both for serial and parallel runs, is also fully automated through use of a pre-processor utility that takes a greatly reduced set of user input over the traditional CTF input file. To run CTF in parallel, two additional libraries are currently needed; MPI, for inter-processor message passing, and the Parallel Extensible Toolkit for Scientific Computation (PETSc), which is leveraged to solve the global pressure matrix in parallel. Results presented include a set of testing and verification calculations and performance tests assessing parallel scaling characteristics up to a full core, sub-channel-resolved model of Watts Bar Unit 1 under hot full-power conditions (193 17x17

  16. [Cloning of full-length coding sequence of tree shrew CD4 and prediction of its molecular characteristics].

    PubMed

    Tian, Wei-Wei; Gao, Yue-Dong; Guo, Yan; Huang, Jing-Fei; Xiao, Chang; Li, Zuo-Sheng; Zhang, Hua-Tang

    2012-02-01

    The tree shrews, as an ideal animal model receiving extensive attentions to human disease research, demands essential research tools, in particular cellular markers and monoclonal antibodies for immunological studies. In this paper, a 1 365 bp of the full-length CD4 cDNA encoding sequence was cloned from total RNA in peripheral blood of tree shrews, the sequence completes two unknown fragment gaps of tree shrews predicted CD4 cDNA in the GenBank database, and its molecular characteristics were analyzed compared with other mammals by using biology software such as Clustal W2.0 and so forth. The results showed that the extracellular and intracellular domains of tree shrews CD4 amino acid sequence are conserved. The tree shrews CD4 amino acid sequence showed a close genetic relationship with Homo sapiens and Macaca mulatta. Most regions of the tree shrews CD4 molecule surface showed positive charges as humans. However, compared with CD4 extracellular domain D1 of human, CD4 D1 surface of tree shrews showed more negative charges, and more two N-glycosylation sites, which may affect antibody binding. This study provides a theoretical basis for the preparation and functional studies of CD4 monoclonal antibody. PMID:22345010

  17. Hybrid MPI-OpenMP Parallelism in the ONETEP Linear-Scaling Electronic Structure Code: Application to the Delamination of Cellulose Nanofibrils.

    PubMed

    Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton

    2014-11-11

    We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here. PMID:26584365

  18. Fast Coding Unit Encoding Mechanism for Low Complexity Video Coding

    PubMed Central

    Wu, Yueying; Jia, Kebin; Gao, Guandong

    2016-01-01

    In high efficiency video coding (HEVC), coding tree contributes to excellent compression performance. However, coding tree brings extremely high computational complexity. Innovative works for improving coding tree to further reduce encoding time are stated in this paper. A novel low complexity coding tree mechanism is proposed for HEVC fast coding unit (CU) encoding. Firstly, this paper makes an in-depth study of the relationship among CU distribution, quantization parameter (QP) and content change (CC). Secondly, a CU coding tree probability model is proposed for modeling and predicting CU distribution. Eventually, a CU coding tree probability update is proposed, aiming to address probabilistic model distortion problems caused by CC. Experimental results show that the proposed low complexity CU coding tree mechanism significantly reduces encoding time by 27% for lossy coding and 42% for visually lossless coding and lossless coding. The proposed low complexity CU coding tree mechanism devotes to improving coding performance under various application conditions. PMID:26999741

  19. Paradyn a parallel nonlinear, explicit, three-dimensional finite-element code for solid and structural mechanics user manual

    SciTech Connect

    Hoover, C G; DeGroot, A J; Sherwood, R J

    2000-06-01

    ParaDyn is a parallel version of the DYNA3D computer program, a three-dimensional explicit finite-element program for analyzing the dynamic response of solids and structures. The ParaDyn program has been used as a production tool for over three years for analyzing problems which range in size from a few tens of thousands of elements to between one-million and ten-million elements. ParaDyn runs on parallel computers provided by the Department of Energy Accelerated Strategic Computing Initiative (ASCI) and the Department of Defense High Performance Computing and Modernization Program. Preprocessing and post-processing software utilities and tools are designed to facilitate the generation of partitioned domains for processors on a massively parallel computer and the visualization of both resultant data and boundary data generated in a parallel simulation. This manual provides a brief overview of the parallel implementation; describes techniques for running the ParaDyn program, tools and utilities; and provides examples of parallel simulations.

  20. Distributed Contour Trees

    SciTech Connect

    Morozov, Dmitriy; Weber, Gunther H.

    2014-03-31

    Topological techniques provide robust tools for data analysis. They are used, for example, for feature extraction, for data de-noising, and for comparison of data sets. This chapter concerns contour trees, a topological descriptor that records the connectivity of the isosurfaces of scalar functions. These trees are fundamental to analysis and visualization of physical phenomena modeled by real-valued measurements. We study the parallel analysis of contour trees. After describing a particular representation of a contour tree, called local{global representation, we illustrate how di erent problems that rely on contour trees can be solved in parallel with minimal communication.

  1. Implementation of a flexible and scalable particle-in-cell method for massively parallel computations in the mantle convection code ASPECT

    NASA Astrophysics Data System (ADS)

    Gassmöller, Rene; Bangerth, Wolfgang

    2016-04-01

    Particle-in-cell methods have a long history and many applications in geodynamic modelling of mantle convection, lithospheric deformation and crustal dynamics. They are primarily used to track material information, the strain a material has undergone, the pressure-temperature history a certain material region has experienced, or the amount of volatiles or partial melt present in a region. However, their efficient parallel implementation - in particular combined with adaptive finite-element meshes - is complicated due to the complex communication patterns and frequent reassignment of particles to cells. Consequently, many current scientific software packages accomplish this efficient implementation by specifically designing particle methods for a single purpose, like the advection of scalar material properties that do not evolve over time (e.g., for chemical heterogeneities). Design choices for particle integration, data storage, and parallel communication are then optimized for this single purpose, making the code relatively rigid to changing requirements. Here, we present the implementation of a flexible, scalable and efficient particle-in-cell method for massively parallel finite-element codes with adaptively changing meshes. Using a modular plugin structure, we allow maximum flexibility of the generation of particles, the carried tracer properties, the advection and output algorithms, and the projection of properties to the finite-element mesh. We present scaling tests ranging up to tens of thousands of cores and tens of billions of particles. Additionally, we discuss efficient load-balancing strategies for particles in adaptive meshes with their strengths and weaknesses, local particle-transfer between parallel subdomains utilizing existing communication patterns from the finite element mesh, and the use of established parallel output algorithms like the HDF5 library. Finally, we show some relevant particle application cases, compare our implementation to a

  2. IM3D: A parallel Monte Carlo code for efficient simulations of primary radiation displacements and damage in 3D geometry

    PubMed Central

    Li, Yong Gang; Yang, Yang; Short, Michael P.; Ding, Ze Jun; Zeng, Zhi; Li, Ju

    2015-01-01

    SRIM-like codes have limitations in describing general 3D geometries, for modeling radiation displacements and damage in nanostructured materials. A universal, computationally efficient and massively parallel 3D Monte Carlo code, IM3D, has been developed with excellent parallel scaling performance. IM3D is based on fast indexing of scattering integrals and the SRIM stopping power database, and allows the user a choice of Constructive Solid Geometry (CSG) or Finite Element Triangle Mesh (FETM) method for constructing 3D shapes and microstructures. For 2D films and multilayers, IM3D perfectly reproduces SRIM results, and can be ∼102 times faster in serial execution and > 104 times faster using parallel computation. For 3D problems, it provides a fast approach for analyzing the spatial distributions of primary displacements and defect generation under ion irradiation. Herein we also provide a detailed discussion of our open-source collision cascade physics engine, revealing the true meaning and limitations of the “Quick Kinchin-Pease” and “Full Cascades” options. The issues of femtosecond to picosecond timescales in defining displacement versus damage, the limitation of the displacements per atom (DPA) unit in quantifying radiation damage (such as inadequacy in quantifying degree of chemical mixing), are discussed. PMID:26658477

  3. Development of the 3D Parallel Particle-In-Cell Code IMPACT to Simulate the Ion Beam Transport System of VENUS (Abstract)

    SciTech Connect

    Qiang, J.; Leitner, D.; Todd, D.S.; Ryne, R.D.

    2005-03-15

    The superconducting ECR ion source VENUS serves as the prototype injector ion source for the Rare Isotope Accelerator (RIA) driver linac. The RIA driver linac requires a great variety of high charge state ion beams with up to an order of magnitude higher intensity than currently achievable with conventional ECR ion sources. In order to design the beam line optics of the low energy beam line for the RIA front end for the wide parameter range required for the RIA driver accelerator, reliable simulations of the ion beam extraction from the ECR ion source through the ion mass analyzing system are essential. The RIA low energy beam transport line must be able to transport intense beams (up to 10 mA) of light and heavy ions at 30 keV.For this purpose, LBNL is developing the parallel 3D particle-in-cell code IMPACT to simulate the ion beam transport from the ECR extraction aperture through the analyzing section of the low energy transport system. IMPACT, a parallel, particle-in-cell code, is currently used to model the superconducting RF linac section of RIA and is being modified in order to simulate DC beams from the ECR ion source extraction. By using the high performance of parallel supercomputing we will be able to account consistently for the changing space charge in the extraction region and the analyzing section. A progress report and early results in the modeling of the VENUS source will be presented.

  4. Development of the 3D Parallel Particle-In-Cell Code IMPACT to Simulate the Ion Beam Transport System of VENUS (Abstract)

    NASA Astrophysics Data System (ADS)

    Qiang, J.; Leitner, D.; Todd, D. S.; Ryne, R. D.

    2005-03-01

    The superconducting ECR ion source VENUS serves as the prototype injector ion source for the Rare Isotope Accelerator (RIA) driver linac. The RIA driver linac requires a great variety of high charge state ion beams with up to an order of magnitude higher intensity than currently achievable with conventional ECR ion sources. In order to design the beam line optics of the low energy beam line for the RIA front end for the wide parameter range required for the RIA driver accelerator, reliable simulations of the ion beam extraction from the ECR ion source through the ion mass analyzing system are essential. The RIA low energy beam transport line must be able to transport intense beams (up to 10 mA) of light and heavy ions at 30 keV. For this purpose, LBNL is developing the parallel 3D particle-in-cell code IMPACT to simulate the ion beam transport from the ECR extraction aperture through the analyzing section of the low energy transport system. IMPACT, a parallel, particle-in-cell code, is currently used to model the superconducting RF linac section of RIA and is being modified in order to simulate DC beams from the ECR ion source extraction. By using the high performance of parallel supercomputing we will be able to account consistently for the changing space charge in the extraction region and the analyzing section. A progress report and early results in the modeling of the VENUS source will be presented.

  5. IM3D: A parallel Monte Carlo code for efficient simulations of primary radiation displacements and damage in 3D geometry

    NASA Astrophysics Data System (ADS)

    Li, Yong Gang; Yang, Yang; Short, Michael P.; Ding, Ze Jun; Zeng, Zhi; Li, Ju

    2015-12-01

    SRIM-like codes have limitations in describing general 3D geometries, for modeling radiation displacements and damage in nanostructured materials. A universal, computationally efficient and massively parallel 3D Monte Carlo code, IM3D, has been developed with excellent parallel scaling performance. IM3D is based on fast indexing of scattering integrals and the SRIM stopping power database, and allows the user a choice of Constructive Solid Geometry (CSG) or Finite Element Triangle Mesh (FETM) method for constructing 3D shapes and microstructures. For 2D films and multilayers, IM3D perfectly reproduces SRIM results, and can be ∼102 times faster in serial execution and > 104 times faster using parallel computation. For 3D problems, it provides a fast approach for analyzing the spatial distributions of primary displacements and defect generation under ion irradiation. Herein we also provide a detailed discussion of our open-source collision cascade physics engine, revealing the true meaning and limitations of the “Quick Kinchin-Pease” and “Full Cascades” options. The issues of femtosecond to picosecond timescales in defining displacement versus damage, the limitation of the displacements per atom (DPA) unit in quantifying radiation damage (such as inadequacy in quantifying degree of chemical mixing), are discussed.

  6. Scaling and performance of a 3-D radiation hydrodynamics code on message-passing parallel computers: final report

    SciTech Connect

    Hayes, J C; Norman, M

    1999-10-28

    This report details an investigation into the efficacy of two approaches to solving the radiation diffusion equation within a radiation hydrodynamic simulation. Because leading-edge scientific computing platforms have evolved from large single-node vector processors to parallel aggregates containing tens to thousands of individual CPU's, the ability of an algorithm to maintain high compute efficiency when distributed over a large array of nodes is critically important. The viability of an algorithm thus hinges upon the tripartite question of numerical accuracy, total time to solution, and parallel efficiency.

  7. Numerical approach to the parallel gradient operator in tokamak scrape-off layer turbulence simulations and application to the GBS code

    NASA Astrophysics Data System (ADS)

    Jolliet, S.; Halpern, F. D.; Loizu, J.; Mosetto, A.; Riva, F.; Ricci, P.

    2015-03-01

    This paper presents two discretisation schemes for the parallel gradient operator used in scrape-off layer plasma turbulence simulations. First, a simple model describing the propagation of electrostatic shear-Alfvén waves, and retaining the key elements of the parallel dynamics, is used to test the accuracy of the different schemes against analytical predictions. The most promising scheme is then tested in simulations of limited scrape-off layer turbulence with the flux-driven 3D fluid code GBS (Ricci et al., 2012): the new approach is successfully benchmarked against the original parallel gradient discretisation implemented in GBS. Finally, GBS simulations using a radially varying safety profile, which were inapplicable with the original scheme are carried out for the first time: the well-known stabilisation of resistive ballooning modes at negative magnetic shear is recovered. The main conclusion of this paper is that a simple approach to the parallel gradient, namely centred finite differences in the poloidal and toroidal direction, is able to simulate scrape-off layer turbulence provided that a higher resolution and higher convergence order are used.

  8. Dynamic analysis of the parallel-plate EMP (Electromagnetic Pulse) simulator using a wire-mesh approximation and the numerical electromagnetics code. Final report

    SciTech Connect

    Gedney, S.D.

    1987-09-01

    The electromagnetic pulse (EMP) produced by a high-altitude nuclear blast presents a severe threat to electronic systems due to its extreme characteristics. To test the vulnerability of large systems, such as airplanes, missiles, or satellites, they must be subjected to a simulated EMP environment. One type of simulator that has been used to approximate the EMP environment is the Large Parallel-Plate Bounded-Wave Simulator. It is a guided-wave simulator which has properties of a transmission line and supports a single TEM model at sufficiently low frequencies. This type of simulator consists of finite-width parallel-plate waveguides, which are excited by a wave launcher and terminated by a wave receptor. This study addresses the field distribution within a finite-width parallel-plate waveguide that is matched to a conical tapered waveguide at either end. Characteristics of a parallel-plate bounded-wave EMP simulator were developed using scattering theory, thin-wire mesh approximation of the conducting surfaces, and the Numerical Electronics Code (NEC). Background is provided for readers to use the NEC as a tool in solving thin-wire scattering problems.

  9. A parallelized binary search tree

    Technology Transfer Automated Retrieval System (TEKTRAN)

    PTTRNFNDR is an unsupervised statistical learning algorithm that detects patterns in DNA sequences, protein sequences, or any natural language texts that can be decomposed into letters of a finite alphabet. PTTRNFNDR performs complex mathematical computations and its processing time increases when i...

  10. The influence of viral coding sequences on pestivirus IRES activity reveals further parallels with translation initiation in prokaryotes.

    PubMed Central

    Fletcher, Simon P; Ali, Iraj K; Kaminski, Ann; Digard, Paul; Jackson, Richard J

    2002-01-01

    Classical swine fever virus (CSFV) is a member of the pestivirus family, which shares many features in common with hepatitis C virus (HCV). It is shown here that CSFV has an exceptionally efficient cis-acting internal ribosome entry segment (IRES), which, like that of HCV, is strongly influenced by the sequences immediately downstream of the initiation codon, and is optimal with viral coding sequences in this position. Constructs that retained 17 or more codons of viral coding sequence exhibited full IRES activity, but with only 12 codons, activity was approximately 66% of maximum in vitro (though close to maximum in transfected BHK cells), whereas with just 3 codons or fewer, the activity was only approximately 15% of maximum. The minimal coding region elements required for high activity were exchanged between HCV and CSFV. Although maximum activity was observed in each case with the homologous combination of coding region and 5' UTR, the heterologous combinations were sufficiently active to rule out a highly specific functional interplay between the 5' UTR and coding sequences. On the other hand, inversion of the coding sequences resulted in low IRES activity, particularly with the HCV coding sequences. RNA structure probing showed that the efficiency of internal initiation of these chimeric constructs correlated most closely with the degree of single-strandedness of the region around and immediately downstream of the initiation codon. The low activity IRESs could not be rescued by addition of supplementary eIF4A (the initiation factor with ATP-dependent RNA helicase activity). The extreme sensitivity to secondary structure around the initiation codon is likely to be due to the fact that the eIF4F complex (which has eIF4A as one of its subunits) is not required for and does not participate in initiation on these IRESs. PMID:12515388