Science.gov

Sample records for parallel tree code

  1. A parallel TreeSPH code for galaxy formation

    NASA Astrophysics Data System (ADS)

    Lia, Cesario; Carraro, Giovanni

    2000-05-01

    We describe a new implementation of a parallel TreeSPH code with the aim of simulating galaxy formation and evolution. The code has been parallelized using shmem, a Cray proprietary library to handle communications between the 256 processors of the Silicon Graphics T3E massively parallel supercomputer hosted by the Cineca Super-computing Center (Bologna, Italy).1 The code combines the smoothed particle hydrodynamics (SPH) method for solving hydrodynamical equations with the popular Barnes & Hut tree-code to perform gravity calculation with an N×logN scaling, and it is based on the scalar TreeSPH code developed by Carraro et al. Parallelization is achieved by distributing particles along processors according to a workload criterion. Benchmarks, in terms of load balance and scalability, of the code are analysed and critically discussed against the adiabatic collapse of an isothermal gas sphere test using 2×104 particles on 8 processors. The code results balance at more than the 95per cent level. Increasing the number of processors, the load balance slightly worsens. The deviation from perfect scalability for increasing number of processors is almost negligible up to 32 processors. Finally, we present a simulation of the formation of an X-ray galaxy cluster in a flat cold dark matter cosmology, using 2×105 particles and 32 processors, and compare our results with Evrard's P3M-SPH simulations. Additionally we have incorporated radiative cooling, star formation, feedback from SNe of types II and Ia, stellar winds and UV flux from massive stars, and an algorithm to follow the chemical enrichment of the interstellar medium. Simulations with some of these ingredients are also presented.

  2. FLY MPI-2: a parallel tree code for LSS

    NASA Astrophysics Data System (ADS)

    Becciani, U.; Comparato, M.; Antonuccio-Delogu, V.

    2006-04-01

    New version program summaryProgram title: FLY 3.1 Catalogue identifier: ADSC_v2_0 Licensing provisions: yes Program summary URL:http://cpc.cs.qub.ac.uk/summaries/ADSC_v2_0 Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland No. of lines in distributed program, including test data, etc.: 158 172 No. of bytes in distributed program, including test data, etc.: 4 719 953 Distribution format: tar.gz Programming language: Fortran 90, C Computer: Beowulf cluster, PC, MPP systems Operating system: Linux, Aix RAM: 100M words Catalogue identifier of previous version: ADSC_v1_0 Journal reference of previous version: Comput. Phys. Comm. 155 (2003) 159 Does the new version supersede the previous version?: yes Nature of problem: FLY is a parallel collisionless N-body code for the calculation of the gravitational force Solution method: FLY is based on the hierarchical oct-tree domain decomposition introduced by Barnes and Hut (1986) Reasons for the new version: The new version of FLY is implemented by using the MPI-2 standard: the distributed version 3.1 was developed by using the MPICH2 library on a PC Linux cluster. Today the FLY performance allows us to consider the FLY code among the most powerful parallel codes for tree N-body simulations. Another important new feature regards the availability of an interface with hydrodynamical Paramesh based codes. Simulations must follow a box large enough to accurately represent the power spectrum of fluctuations on very large scales so that we may hope to compare them meaningfully with real data. The number of particles then sets the mass resolution of the simulation, which we would like to make as fine as possible. The idea to build an interface between two codes, that have different and complementary cosmological tasks, allows us to execute complex cosmological simulations with FLY, specialized for DM evolution, and a code specialized for hydrodynamical components that uses a Paramesh block

  3. An implementation of a tree code on a SIMD, parallel computer

    NASA Technical Reports Server (NTRS)

    Olson, Kevin M.; Dorband, John E.

    1994-01-01

    We describe a fast tree algorithm for gravitational N-body simulation on SIMD parallel computers. The tree construction uses fast, parallel sorts. The sorted lists are recursively divided along their x, y and z coordinates. This data structure is a completely balanced tree (i.e., each particle is paired with exactly one other particle) and maintains good spatial locality. An implementation of this tree-building algorithm on a 16k processor Maspar MP-1 performs well and constitutes only a small fraction (approximately 15%) of the entire cycle of finding the accelerations. Each node in the tree is treated as a monopole. The tree search and the summation of accelerations also perform well. During the tree search, node data that is needed from another processor is simply fetched. Roughly 55% of the tree search time is spent in communications between processors. We apply the code to two problems of astrophysical interest. The first is a simulation of the close passage of two gravitationally, interacting, disk galaxies using 65,636 particles. We also simulate the formation of structure in an expanding, model universe using 1,048,576 particles. Our code attains speeds comparable to one head of a Cray Y-MP, so single instruction, multiple data (SIMD) type computers can be used for these simulations. The cost/performance ratio for SIMD machines like the Maspar MP-1 make them an extremely attractive alternative to either vector processors or large multiple instruction, multiple data (MIMD) type parallel computers. With further optimizations (e.g., more careful load balancing), speeds in excess of today's vector processing computers should be possible.

  4. EvoL: the new Padova Tree-SPH parallel code for cosmological simulations. I. Basic code: gravity and hydrodynamics

    NASA Astrophysics Data System (ADS)

    Merlin, E.; Buonomo, U.; Grassi, T.; Piovan, L.; Chiosi, C.

    2010-04-01

    Context. We present the new release of the Padova N-body code for cosmological simulations of galaxy formation and evolution, EvoL. The basic Tree + SPH code is presented and analysed, together with an overview of the software architectures. Aims: EvoL is a flexible parallel Fortran95 code, specifically designed for simulations of cosmological structure formations on cluster, galactic and sub-galactic scales. Methods: EvoL is a fully Lagrangian self-adaptive code, based on the classical oct-tree by Barnes & Hut (1986, Nature, 324, 446) and on the smoothed particle hydrodynamics algorithm (SPH, Lucy 1977, AJ, 82, 1013). It includes special features like adaptive softening lengths with correcting extra-terms, and modern formulations of SPH and artificial viscosity. It is designed to be run in parallel on multiple CPUs to optimise the performance and save computational time. Results: We describe the code in detail, and present the results of a number of standard hydrodynamical tests.

  5. FLY. A parallel tree N-body code for cosmological simulations

    NASA Astrophysics Data System (ADS)

    Antonuccio-Delogu, V.; Becciani, U.; Ferro, D.

    2003-10-01

    FLY is a parallel treecode which makes heavy use of the one-sided communication paradigm to handle the management of the tree structure. In its public version the code implements the equations for cosmological evolution, and can be run for different cosmological models. This reference guide describes the actual implementation of the algorithms of the public version of FLY, and suggests how to modify them to implement other types of equations (for instance, the Newtonian ones). Program summary Title of program: FLY Catalogue identifier: ADSC Program summary URL: http://cpc.cs.qub.ac.uk/summaries/ADSC Program obtainable from: CPC Program Library, Queen's University of Belfast, N. Ireland Computer for which the program is designed and others on which it has been tested: Cray T3E, Sgi Origin 3000, IBM SP Operating systems or monitors under which the program has been tested: Unicos 2.0.5.40, Irix 6.5.14, Aix 4.3.3 Programming language used: Fortran 90, C Memory required to execute with typical data: about 100 Mwords with 2 million-particles Number of bits in a word: 32 Number of processors used: parallel program. The user can select the number of processors >=1 Has the code been vectorized or parallelized?: parallelized Number of bytes in distributed program, including test data, etc.: 4615604 Distribution format: tar gzip file Keywords: Parallel tree N-body code for cosmological simulations Nature of physical problem: FLY is a parallel collisionless N-body code for the calculation of the gravitational force. Method of solution: It is based on the hierarchical oct-tree domain decomposition introduced by Barnes and Hut (1986). Restrictions on the complexity of the program: The program uses the leapfrog integrator schema, but could be changed by the user. Typical running time: 50 seconds for each time-step, running a 2-million-particles simulation on an Sgi Origin 3800 system with 8 processors having 512 Mbytes RAM for each processor. Unusual features of the program: FLY

  6. Are you ready to FLY in the universe? A multi-platform /N-body tree code for parallel supercomputers

    NASA Astrophysics Data System (ADS)

    Becciani, U.; Antonuccio-Delogu, V.

    2001-05-01

    In the last few years, cosmological simulations of structures and galaxies formations have assumed a fundamental role in the study of the origin, formation and evolution of the universe. These studies improved enormously with the use of supercomputers and parallel systems, allowing more accurate simulations, in comparison with traditional serial systems. The code we describe, called FLY, is a newly written code (using the tree /N-body method), for three-dimensional self-gravitating collisionless systems evolution. FLY is a fully parallel code based on the tree Barnes-Hut algorithm and periodical boundary conditions are implemented by means of the Ewald summation technique. We use FLY to run simulations of the large scale structure of the universe and of cluster of galaxies, but it could be usefully adopted to run evolutions of systems based on a tree /N-body algorithm. FLY is based on the one-side communication paradigm to share data among the processors, that access to remote private data avoiding any kind of synchronism. The code was originally developed on CRAY T3E system using the logically SHared MEMory access routines (SHMEM) but it runs also on SGI ORIGIN systems and on IBM SP by using the Low-Level Application Programming Interface routines (LAPI). This new code is the evolution of preliminary codes (WDSH-PT and WD99) for cosmological simulations we implemented in the last years, and it reaches very high performance in all systems where it has been well-tested. This performance allows us today to consider the code FLY among the most powerful parallel codes for tree /N-body simulations. The performance that FLY reaches is discussed and reported, and a comparison with other similar codes is preliminary considered. The FLY version 1.1 is freely available on http://www.ct.astro.it/fly/ and it will be maintained and upgraded with new releases.

  7. Giant impacts during planet formation: Parallel tree code simulations using smooth particle hydrodynamics

    NASA Astrophysics Data System (ADS)

    Cohen, Randi L.

    There is both theoretical and observational evidence that giant planets collided with objects ≥ Mearth during their evolution. These impacts may play a key role in giant planet formation. This paper describes impacts of a ˜ Earth-mass object onto a suite of proto-giant-planets, as simulated using an SPH parallel tree code. We run 6 simulations, varying the impact angle and evolutionary stage of the proto-Jupiter. We find that it is possible for an impactor to free some mass from the core of the proto-planet it impacts through direct collision, as well as to make physical contact with the core yet escape partially, or even completely, intact. None of the 6 cases we consider produced a solid disk or resulted in a net decrease in the core mass of the pinto-planet (since the mass decrease due to disruption was outweighed by the increase due to the addition of the impactor's mass to the core). However, we suggest parameters which may have these effects, and thus decrease core mass and formation time in protoplanetary models and/or create satellite systems. We find that giant impacts can remove significant envelope mass from forming giant planets, leaving only 2 MEarth of gas, similar to Uranus and Neptune. They can also create compositional inhomogeneities in planetary cores, which creates differences in planetary thermal emission characteristics.

  8. Giant Impacts During Planet Formation: Parallel Tree Code Simulations Using Smooth Particle Hydrodynamics

    NASA Astrophysics Data System (ADS)

    Cohen, R.; Bodenheimer, P.; Asphaug, E.

    2000-12-01

    There is both theoretical and observational evidence that giant planets collided with objects with mass >= Mearth during their evolution. These impacts may help shorten planetary formation timescales by changing the opacity of the planetary atmosphere to allow quicker cooling. They may also redistribute heavy metals within giant planets, affect the core/envelope mass ratio, and help determine the ratio of emitted to absorbed energy within giant planets. Thus, the researchers propose to simulate the impact of a ~ Earth-mass object onto a proto-giant-planet with SPH. Results of the SPH collision models will be input into a steady-state planetary evolution code and the effect of impacts on formation timescales, core/envelope mass ratios, density profiles, and thermal emissions of giant planets will be quantified. The collision will be modelled using a modified version of an SPH routine which simulates the collision of two polytropes. The Saumon-Chabrier and Tillotson equations of state will replace the polytropic equation of state. The parallel tree algorithm of Olson & Packer will be used for the domain decomposition and neighbor search necessary to calculate pressure and self-gravity efficiently. This work is funded by the NASA Graduate Student Researchers Program.

  9. PKDGRAV3: Parallel gravity code

    NASA Astrophysics Data System (ADS)

    Potter, Douglas; Stadel, Joachim

    2016-09-01

    Pkdgrav3 is an 𝒪(N) gravity calculation method; it uses a binary tree algorithm with fifth order fast multipole expansion of the gravitational potential, using cell-cell interactions. Periodic boundaries conditions require very little data movement and allow a high degree of parallelism; the code includes GPU acceleration for all force calculations, leading to a significant speed-up with respect to previous versions (ascl:1305.005). Pkdgrav3 also has a sophisticated time-stepping criterion based on an estimation of the local dynamical time.

  10. TPM: Tree-Particle-Mesh code

    NASA Astrophysics Data System (ADS)

    Bode, Paul

    2013-05-01

    TPM carries out collisionless (dark matter) cosmological N-body simulations, evolving a system of N particles as they move under their mutual gravitational interaction. It combines aspects of both Tree and Particle-Mesh algorithms. After the global PM forces are calculated, spatially distinct regions above a given density contrast are located; the tree code calculates the gravitational interactions inside these denser objects at higher spatial and temporal resolution. The code is parallel and uses MPI for message passing.

  11. PARAVT: Parallel Voronoi tessellation code

    NASA Astrophysics Data System (ADS)

    González, R. E.

    2016-10-01

    In this study, we present a new open source code for massive parallel computation of Voronoi tessellations (VT hereafter) in large data sets. The code is focused for astrophysical purposes where VT densities and neighbors are widely used. There are several serial Voronoi tessellation codes, however no open source and parallel implementations are available to handle the large number of particles/galaxies in current N-body simulations and sky surveys. Parallelization is implemented under MPI and VT using Qhull library. Domain decomposition takes into account consistent boundary computation between tasks, and includes periodic conditions. In addition, the code computes neighbors list, Voronoi density, Voronoi cell volume, density gradient for each particle, and densities on a regular grid. Code implementation and user guide are publicly available at https://github.com/regonzar/paravt.

  12. iVINE - Ionization in the parallel TREE/SPH code VINE: first results on the observed age-spread around O-stars

    NASA Astrophysics Data System (ADS)

    Gritschneder, M.; Naab, T.; Burkert, A.; Walch, S.; Heitsch, F.; Wetzstein, M.

    2009-02-01

    We present a three-dimensional, fully parallelized, efficient implementation of ionizing ultraviolet (UV) radiation for smoothed particle hydrodynamics (SPH) including self-gravity. Our method is based on the SPH/TREE code VINE. We therefore call it iVINE (for Ionization + VINE). This approach allows detailed high-resolution studies of the effects of ionizing radiation from, for example, young massive stars on their turbulent parental molecular clouds. In this paper, we describe the concept and the numerical implementation of the radiative transfer for a plane-parallel geometry and we discuss several test cases demonstrating the efficiency and accuracy of the new method. As a first application, we study the radiatively driven implosion of marginally stable molecular clouds at various distances of a strong UV source and show that they are driven into gravitational collapse. The resulting cores are very compact and dense exactly as it is observed in clustered environments. Our simulations indicate that the time of triggered collapse depends on the distance of the core from the UV source. Clouds closer to the source collapse several 105yr earlier than more distant clouds. This effect can explain the observed age spread in OB associations where stars closer to the source are found to be younger. We discuss possible uncertainties in the observational derivation of shock front velocities due to early stripping of protostellar envelopes by ionizing radiation.

  13. Parallel Tree-SPH: A Tool for Galaxy Formation

    NASA Astrophysics Data System (ADS)

    Lia, C.; Carraro, G.

    We describe a new implementation of a parallel Tree-SPH code with the aim of simulating galaxy formation and evolution. The code has been parallelized using SHMEM, a Cray proprietary library to handle communications between the 256 processors of the Silicon Graphics T3E massively parallel supercomputer hosted by the Cineca Super-computing Center (Bologna, Italy). The code combines the smoothed particle hydrodynamics (SPH) method to solve hydrodynamical equations with the popular Barnes and Hut (1986) tree-code to perform gravity calculation with a N × log N scaling, and it is based on the scalar Tree-SPH code developed by Carraro et al. (1998). Parallelization is achieved by distributing particles along processors according to a workload criterion. Benchmarks of the code, in terms of load balance and scalability, are analysed and critically discussed against the adiabatic collapse of an isothermal gas sphere test using 2 × 10^4 particles on eight processors. The code turns out to be balanced at more than 95% level. If the number of processors is increased, the load balance worsens slightly. The deviation from perfect scalability at increasing number of processors is negligible up to 64 processors. Additionally we have incorporated radiative cooling, star formation, feedback and an algorithm to follow the chemical enrichment of the interstellar medium.

  14. Parallelizing the XSTAR Photoionization Code

    NASA Astrophysics Data System (ADS)

    Noble, M. S.; Ji, L.; Young, A.; Lee, J. C.

    2009-09-01

    We describe two means by which XSTAR, a code which computes physical conditions and emission spectra of photoionized gases, has been parallelized. The first is pvmxstar, a wrapper which can be used in place of the serial xstar2xspec script to foster concurrent execution of the XSTAR command line application on independent sets of parameters. The second is pmodel, a plugin for the Interactive Spectral Interpretation System (ISIS) which allows arbitrary components of a broad range of astrophysical models to be distributed across processors during fitting and confidence limits calculations, by scientists with little training in parallel programming. Plugging the XSTAR family of analytic models into pmodel enables multiple ionization states (e.g., of a complex absorber/emitter) to be computed simultaneously, alleviating the often prohibitive expense of the traditional serial approach. Initial performance results indicate that these methods substantially enlarge the problem space to which XSTAR may be applied within practical timeframes.

  15. Parallel tree code for large N-body simulation: Dynamic load balance and data distribution on a CRAY T3D system

    NASA Astrophysics Data System (ADS)

    Becciani, U.; Ansaloni, R.; Antonuccio-Delogu, V.; Erbacci, G.; Gambera, M.; Pagliaro, A.

    1997-10-01

    N-body algorithms for long-range unscreened interactions like gravity belong to a class of highly irregular problems whose optimal solution is a challenging task for present-day massively parallel computers. In this paper we describe a strategy for optimal memory and work distribution which we have applied to our parallel implementation of the Barnes & Hut (1986) recursive tree scheme on a Cray T3D using the CRAFT programming environment. We have performed a series of tests to find an optimal data distribution in the T3D memory, and to identify a strategy for the Dynamic Load Balance in order to obtain good performances when running large simulations (more than 10 million particles). The results of tests show that the step duration depends on two main factors: the data locality and the T3D network contention. Increasing data locality we are able to minimize the step duration if the closest bodies (direct interaction) tend to be located in the same PE local memory (contiguous block subdivision, high granularity), whereas the tree properties have a fine grain distribution. In a very large simulation, due to network contention, an unbalanced load arises. To remedy this we have devised an automatic work redistribution mechanism which provided a good Dynamic Load Balance at the price of an insignificant overhead.

  16. Parallel search of strongly ordered game trees

    SciTech Connect

    Marsland, T.A.; Campbell, M.

    1982-12-01

    The alpha-beta algorithm forms the basis of many programs that search game trees. A number of methods have been designed to improve the utility of the sequential version of this algorithm, especially for use in game-playing programs. These enhancements are based on the observation that alpha beta is most effective when the best move in each position is considered early in the search. Trees that have this so-called strong ordering property are not only of practical importance but possess characteristics that can be exploited in both sequential and parallel environments. This paper draws upon experiences gained during the development of programs which search chess game trees. Over the past decade major enhancements of the alpha beta algorithm have been developed by people building game-playing programs, and many of these methods will be surveyed and compared here. The balance of the paper contains a study of contemporary methods for searching chess game trees in parallel, using an arbitrary number of independent processors. To make efficient use of these processors, one must have a clear understanding of the basic properties of the trees actually traversed when alpha-beta cutoffs occur. This paper provides such insights and concludes with a brief description of a refinement to a standard parallel search algorithm for this problem. 33 references.

  17. Code Parallelization with CAPO: A User Manual

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Frumkin, Michael; Yan, Jerry; Biegel, Bryan (Technical Monitor)

    2001-01-01

    A software tool has been developed to assist the parallelization of scientific codes. This tool, CAPO, extends an existing parallelization toolkit, CAPTools developed at the University of Greenwich, to generate OpenMP parallel codes for shared memory architectures. This is an interactive toolkit to transform a serial Fortran application code to an equivalent parallel version of the software - in a small fraction of the time normally required for a manual parallelization. We first discuss the way in which loop types are categorized and how efficient OpenMP directives can be defined and inserted into the existing code using the in-depth interprocedural analysis. The use of the toolkit on a number of application codes ranging from benchmark to real-world application codes is presented. This will demonstrate the great potential of using the toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of toolkit to quickly parallelize serial programs as well as the good performance achievable on a large number of processors. The second part of the document gives references to the parameters and the graphic user interface implemented in the toolkit. Finally a set of tutorials is included for hands-on experiences with this toolkit.

  18. National Combustion Code: Parallel Implementation and Performance

    NASA Technical Reports Server (NTRS)

    Quealy, A.; Ryder, R.; Norris, A.; Liu, N.-S.

    2000-01-01

    The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. CORSAIR-CCD is the current baseline reacting flow solver for NCC. This is a parallel, unstructured grid code which uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC flow solver to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This paper describes the parallel implementation of the NCC flow solver and summarizes its current parallel performance on an SGI Origin 2000. Earlier parallel performance results on an IBM SP-2 are also included. The performance improvements which have enabled a turnaround of less than 15 hours for a 1.3 million element fully reacting combustion simulation are described.

  19. National Combustion Code Parallel Performance Enhancements

    NASA Technical Reports Server (NTRS)

    Quealy, Angela; Benyo, Theresa (Technical Monitor)

    2002-01-01

    The National Combustion Code (NCC) is being developed by an industry-government team for the design and analysis of combustion systems. The unstructured grid, reacting flow code uses a distributed memory, message passing model for its parallel implementation. The focus of the present effort has been to improve the performance of the NCC code to meet combustor designer requirements for model accuracy and analysis turnaround time. Improving the performance of this code contributes significantly to the overall reduction in time and cost of the combustor design cycle. This report describes recent parallel processing modifications to NCC that have improved the parallel scalability of the code, enabling a two hour turnaround for a 1.3 million element fully reacting combustion simulation on an SGI Origin 2000.

  20. FLY: a Tree Code for Adaptive Mesh Refinement

    NASA Astrophysics Data System (ADS)

    Becciani, U.; Antonuccio-Delogu, V.; Costa, A.; Ferro, D.

    FLY is a public domain parallel treecode, which makes heavy use of the one-sided communication paradigm to handle the management of the tree structure. It implements the equations for cosmological evolution and can be run for different cosmological models. This paper shows an example of the integration of a tree N-body code with an adaptive mesh, following the PARAMESH scheme. This new implementation will allow the FLY output, and more generally any binary output, to be used with any hydrodynamics code that adopts the PARAMESH data structure, to study compressible flow problems.

  1. SLAC Parallel Tracking Code Development and Applications

    SciTech Connect

    McCandless, Brian C.

    2001-01-19

    The increase in single processor speed based on Moore's law alone will not be able to deliver the dramatic speedup needed in many beam tracking simulations to uncover very slowly evolving effects in a reasonable time. SLAC has embarked on an effort to bring the power of parallel computing to bear on such computations with the goal to reduce the turnaround time by orders of magnitude so that the results may impact present facilities and future machine designs. This poster will describe the approaches adopted for parallelizing the LIAR code and the ION{_}MAD code. The scalability of these tracking codes and their further improvement will be discussed.

  2. Bitplane Image Coding With Parallel Coefficient Processing.

    PubMed

    Auli-Llinas, Francesc; Enfedaque, Pablo; Moure, Juan C; Sanchez, Victor

    2016-01-01

    Image coding systems have been traditionally tailored for multiple instruction, multiple data (MIMD) computing. In general, they partition the (transformed) image in codeblocks that can be coded in the cores of MIMD-based processors. Each core executes a sequential flow of instructions to process the coefficients in the codeblock, independently and asynchronously from the others cores. Bitplane coding is a common strategy to code such data. Most of its mechanisms require sequential processing of the coefficients. The last years have seen the upraising of processing accelerators with enhanced computational performance and power efficiency whose architecture is mainly based on the single instruction, multiple data (SIMD) principle. SIMD computing refers to the execution of the same instruction to multiple data in a lockstep synchronous way. Unfortunately, current bitplane coding strategies cannot fully profit from such processors due to inherently sequential coding task. This paper presents bitplane image coding with parallel coefficient (BPC-PaCo) processing, a coding method that can process many coefficients within a codeblock in parallel and synchronously. To this end, the scanning order, the context formation, the probability model, and the arithmetic coder of the coding engine have been re-formulated. The experimental results suggest that the penalization in coding performance of BPC-PaCo with respect to the traditional strategies is almost negligible.

  3. GRADSPMHD: A parallel MHD code based on the SPH formalism

    NASA Astrophysics Data System (ADS)

    Vanaverbeke, S.; Keppens, R.; Poedts, S.

    2014-03-01

    We present GRADSPMHD, a completely Lagrangian parallel magnetohydrodynamics code based on the SPH formalism. The implementation of the equations of SPMHD in the “GRAD-h” formalism assembles known results, including the derivation of the discretized MHD equations from a variational principle, the inclusion of time-dependent artificial viscosity, resistivity and conductivity terms, as well as the inclusion of a mixed hyperbolic/parabolic correction scheme for satisfying the ∇ṡB→ constraint on the magnetic field. The code uses a tree-based formalism for neighbor finding and can optionally use the tree code for computing the self-gravity of the plasma. The structure of the code closely follows the framework of our parallel GRADSPH FORTRAN 90 code which we added previously to the CPC program library. We demonstrate the capabilities of GRADSPMHD by running 1, 2, and 3 dimensional standard benchmark tests and we find good agreement with previous work done by other researchers. The code is also applied to the problem of simulating the magnetorotational instability in 2.5D shearing box tests as well as in global simulations of magnetized accretion disks. We find good agreement with available results on this subject in the literature. Finally, we discuss the performance of the code on a parallel supercomputer with distributed memory architecture. Catalogue identifier: AERP_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AERP_v1_0.html Program obtainable from: CPC Program Library, Queen’s University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 620503 No. of bytes in distributed program, including test data, etc.: 19837671 Distribution format: tar.gz Programming language: FORTRAN 90/MPI. Computer: HPC cluster. Operating system: Unix. Has the code been vectorized or parallelized?: Yes, parallelized using MPI. RAM: ˜30 MB for a

  4. Parallel CARLOS-3D code development

    SciTech Connect

    Putnam, J.M.; Kotulski, J.D.

    1996-02-01

    CARLOS-3D is a three-dimensional scattering code which was developed under the sponsorship of the Electromagnetic Code Consortium, and is currently used by over 80 aerospace companies and government agencies. The code has been extensively validated and runs on both serial workstations and parallel super computers such as the Intel Paragon. CARLOS-3D is a three-dimensional surface integral equation scattering code based on a Galerkin method of moments formulation employing Rao- Wilton-Glisson roof-top basis for triangular faceted surfaces. Fully arbitrary 3D geometries composed of multiple conducting and homogeneous bulk dielectric materials can be modeled. This presentation describes some of the extensions to the CARLOS-3D code, and how the operator structure of the code facilitated these improvements. Body of revolution (BOR) and two-dimensional geometries were incorporated by simply including new input routines, and the appropriate Galerkin matrix operator routines. Some additional modifications were required in the combined field integral equation matrix generation routine due to the symmetric nature of the BOR and 2D operators. Quadrilateral patched surfaces with linear roof-top basis functions were also implemented in the same manner. Quadrilateral facets and triangular facets can be used in combination to more efficiently model geometries with both large smooth surfaces and surfaces with fine detail such as gaps and cracks. Since the parallel implementation in CARLOS-3D is at high level, these changes were independent of the computer platform being used. This approach minimizes code maintenance, while providing capabilities with little additional effort. Results are presented showing the performance and accuracy of the code for some large scattering problems. Comparisons between triangular faceted and quadrilateral faceted geometry representations will be shown for some complex scatterers.

  5. A parallel and modular deformable cell Car-Parrinello code

    NASA Astrophysics Data System (ADS)

    Cavazzoni, Carlo; Chiarotti, Guido L.

    1999-12-01

    We have developed a modular parallel code implementing the Car-Parrinello [Phys. Rev. Lett. 55 (1985) 2471] algorithm including the variable cell dynamics [Europhys. Lett. 36 (1994) 345; J. Phys. Chem. Solids 56 (1995) 510]. Our code is written in Fortran 90, and makes use of some new programming concepts like encapsulation, data abstraction and data hiding. The code has a multi-layer hierarchical structure with tree like dependences among modules. The modules include not only the variables but also the methods acting on them, in an object oriented fashion. The modular structure allows easier code maintenance, develop and debugging procedures, and is suitable for a developer team. The layer structure permits high portability. The code displays an almost linear speed-up in a wide range of number of processors independently of the architecture. Super-linear speed up is obtained with a "smart" Fast Fourier Transform (FFT) that uses the available memory on the single node (increasing for a fixed problem with the number of processing elements) as temporary buffer to store wave function transforms. This code has been used to simulate water and ammonia at giant planet conditions for systems as large as 64 molecules for ˜50 ps.

  6. Parallel solid mechanics codes at Sandia National Laboratories

    SciTech Connect

    McGlaun, M.

    1994-08-01

    Computational physicists at Sandia National Laboratories have moved their production codes to distributed memory parallel computers. The codes include the multi-material CTH Eulerian code, structural mechanics code. This presentation discusses our experiences moving the codes to parallel computers and experiences running the codes. Moving large production codes onto parallel computers require developing parallel algorithms, parallel data bases and parallel support tools. We rewrote the Eulerian CTH code for parallel computers. We were able to move both ALEGRA and PRONTO to parallel computers with only a modest number of modifications. We restructured the restart and graphics data bases to make them parallel and minimize the I/O to the parallel computer. We developed mesh decomposition tools to divide a rectangular or arbitrary connectivity mesh into sub-meshes. The sub-meshes map to processors and minimize the communication between processors. We developed new visualization tools to process the very large, parallel data bases. This presentation also discusses our experiences running these codes on Sandia`s 1840 compute node Intel Paragon, 1024 processor nCUBE and networked workstations. The parallel version of CTH uses the Paragon and nCUBE for production calculations. The ALEGRA and PRONTO codes are moving off networked workstations onto the Paragon and nCUBE massively parallel computers.

  7. Parallel object-oriented decision tree system

    DOEpatents

    Kamath; Chandrika , Cantu-Paz; Erick

    2006-02-28

    A data mining decision tree system that uncovers patterns, associations, anomalies, and other statistically significant structures in data by reading and displaying data files, extracting relevant features for each of the objects, and using a method of recognizing patterns among the objects based upon object features through a decision tree that reads the data, sorts the data if necessary, determines the best manner to split the data into subsets according to some criterion, and splits the data.

  8. Adaptive Dynamic Event Tree in RAVEN code

    SciTech Connect

    Alfonsi, Andrea; Rabiti, Cristian; Mandelli, Diego; Cogliati, Joshua Joseph; Kinoshita, Robert Arthur

    2014-11-01

    RAVEN is a software tool that is focused on performing statistical analysis of stochastic dynamic systems. RAVEN has been designed in a high modular and pluggable way in order to enable easy integration of different programming languages (i.e., C++, Python) and coupling with other applications (system codes). Among the several capabilities currently present in RAVEN, there are five different sampling strategies: Monte Carlo, Latin Hyper Cube, Grid, Adaptive and Dynamic Event Tree (DET) sampling methodologies. The scope of this paper is to present a new sampling approach, currently under definition and implementation: an evolution of the DET me

  9. Portable, parallel, reusable Krylov space codes

    SciTech Connect

    Smith, B.; Gropp, W.

    1994-12-31

    Krylov space accelerators are an important component of many algorithms for the iterative solution of linear systems. Each Krylov space method has it`s own particular advantages and disadvantages, therefore it is desirable to have a variety of them available all with an identical, easy to use, interface. A common complaint application programmers have with available software libraries for the iterative solution of linear systems is that they require the programmer to use the data structures provided by the library. The library is not able to work with the data structures of the application code. Hence, application programmers find themselves constantly recoding the Krlov space algorithms. The Krylov space package (KSP) is a data-structure-neutral implementation of a variety of Krylov space methods including preconditioned conjugate gradient, GMRES, BiCG-Stab, transpose free QMR and CGS. Unlike all other software libraries for linear systems that the authors are aware of, KSP will work with any application codes data structures, in Fortran or C. Due to it`s data-structure-neutral design KSP runs unchanged on both sequential and parallel machines. KSP has been tested on workstations, the Intel i860 and Paragon, Thinking Machines CM-5 and the IBM SP1.

  10. Parafrase restructuring of FORTRAN code for parallel processing

    NASA Technical Reports Server (NTRS)

    Wadhwa, Atul

    1988-01-01

    Parafrase transforms a FORTRAN code, subroutine by subroutine, into a parallel code for a vector and/or shared-memory multiprocessor system. Parafrase is not a compiler; it transforms a code and provides information for a vector or concurrent process. Parafrase uses a data dependency to reveal parallelism among instructions. The data dependency test distinguishes between recurrences and statements that can be directly vectorized or parallelized. A number of transformations are required to build a data dependency graph.

  11. Petascale Parallelization of the Gyrokinetic Toroidal Code

    SciTech Connect

    Ethier, Stephane; Adams, Mark; Carter, Jonathan; Oliker, Leonid

    2010-05-01

    The Gyrokinetic Toroidal Code (GTC) is a global, three-dimensional particle-in-cell application developed to study microturbulence in tokamak fusion devices. The global capability of GTC is unique, allowing researchers to systematically analyze important dynamics such as turbulence spreading. In this work we examine a new radial domain decomposition approach to allow scalability onto the latest generation of petascale systems. Extensive performance evaluation is conducted on three high performance computing systems: the IBM BG/P, the Cray XT4, and an Intel Xeon Cluster. Overall results show that the radial decomposition approach dramatically increases scalability, while reducing the memory footprint - allowing for fusion device simulations at an unprecedented scale. After a decade where high-end computing (HEC) was dominated by the rapid pace of improvements to processor frequencies, the performance of next-generation supercomputers is increasingly differentiated by varying interconnect designs and levels of integration. Understanding the tradeoffs of these system designs is a key step towards making effective petascale computing a reality. In this work, we examine a new parallelization scheme for the Gyrokinetic Toroidal Code (GTC) [?] micro-turbulence fusion application. Extensive scalability results and analysis are presented on three HEC systems: the IBM BlueGene/P (BG/P) at Argonne National Laboratory, the Cray XT4 at Lawrence Berkeley National Laboratory, and an Intel Xeon cluster at Lawrence Livermore National Laboratory. Overall results indicate that the new radial decomposition approach successfully attains unprecedented scalability to 131,072 BG/P cores by overcoming the memory limitations of the previous approach. The new version is well suited to utilize emerging petascale resources to access new regimes of physical phenomena.

  12. Identifying failure in a tree network of a parallel computer

    DOEpatents

    Archer, Charles J.; Pinnow, Kurt W.; Wallenfelt, Brian P.

    2010-08-24

    Methods, parallel computers, and products are provided for identifying failure in a tree network of a parallel computer. The parallel computer includes one or more processing sets including an I/O node and a plurality of compute nodes. For each processing set embodiments include selecting a set of test compute nodes, the test compute nodes being a subset of the compute nodes of the processing set; measuring the performance of the I/O node of the processing set; measuring the performance of the selected set of test compute nodes; calculating a current test value in dependence upon the measured performance of the I/O node of the processing set, the measured performance of the set of test compute nodes, and a predetermined value for I/O node performance; and comparing the current test value with a predetermined tree performance threshold. If the current test value is below the predetermined tree performance threshold, embodiments include selecting another set of test compute nodes. If the current test value is not below the predetermined tree performance threshold, embodiments include selecting from the test compute nodes one or more potential problem nodes and testing individually potential problem nodes and links to potential problem nodes.

  13. Parallel Spectral Transform Shallow Water Model: A runtime-tunable parallel benchmark code

    SciTech Connect

    Worley, P.H.; Foster, I.T.

    1994-05-01

    Fairness is an important issue when benchmarking parallel computers using application codes. The best parallel algorithm on one platform may not be the best on another. While it is not feasible to reevaluate parallel algorithms and reimplement large codes whenever new machines become available, it is possible to embed algorithmic options into codes that allow them to be ``tuned`` for a paticular machine without requiring code modifications. In this paper, we describe a code in which such an approach was taken. PSTSWM was developed for evaluating parallel algorithms for the spectral transform method in atmospheric circulation models. Many levels of runtime-selectable algorithmic options are supported. We discuss these options and our evaluation methodology. We also provide empirical results from a number of parallel machines, indicating the importance of tuning for each platform before making a comparison.

  14. Parallel Algorithms for Graph Optimization using Tree Decompositions

    SciTech Connect

    Weerapurage, Dinesh P; Sullivan, Blair D; Groer, Christopher S

    2013-01-01

    Although many NP-hard graph optimization problems can be solved in polynomial time on graphs of bounded tree-width, the adoption of these techniques into mainstream scientific computation has been limited due to the high memory requirements of required dynamic programming tables and excessive running times of sequential implementations. This work addresses both challenges by proposing a set of new parallel algorithms for all steps of a tree-decomposition based approach to solve maximum weighted independent set. A hybrid OpenMP/MPI implementation includes a highly scalable parallel dynamic programming algorithm leveraging the MADNESS task-based runtime, and computational results demonstrate scaling. This work enables a significant expansion of the scale of graphs on which exact solutions to maximum weighted independent set can be obtained, and forms a framework for solving additional graph optimization problems with similar techniques.

  15. Parallel Algorithms for Graph Optimization using Tree Decompositions

    SciTech Connect

    Sullivan, Blair D; Weerapurage, Dinesh P; Groer, Christopher S

    2012-06-01

    Although many $\\cal{NP}$-hard graph optimization problems can be solved in polynomial time on graphs of bounded tree-width, the adoption of these techniques into mainstream scientific computation has been limited due to the high memory requirements of the necessary dynamic programming tables and excessive runtimes of sequential implementations. This work addresses both challenges by proposing a set of new parallel algorithms for all steps of a tree decomposition-based approach to solve the maximum weighted independent set problem. A hybrid OpenMP/MPI implementation includes a highly scalable parallel dynamic programming algorithm leveraging the MADNESS task-based runtime, and computational results demonstrate scaling. This work enables a significant expansion of the scale of graphs on which exact solutions to maximum weighted independent set can be obtained, and forms a framework for solving additional graph optimization problems with similar techniques.

  16. Memory Scalability and Efficiency Analysis of Parallel Codes

    SciTech Connect

    Janjusic, Tommy; Kartsaklis, Christos

    2015-01-01

    Memory scalability is an enduring problem and bottleneck that plagues many parallel codes. Parallel codes designed for High Performance Systems are typically designed over the span of several, and in some instances 10+, years. As a result, optimization practices which were appropriate for earlier systems may no longer be valid and thus require careful optimization consideration. Specifically, parallel codes whose memory footprint is a function of their scalability must be carefully considered for future exa-scale systems. In this paper we present a methodology and tool to study the memory scalability of parallel codes. Using our methodology we evaluate an application s memory footprint as a function of scalability, which we coined memory efficiency, and describe our results. In particular, using our in-house tools we can pinpoint the specific application components which contribute to the application s overall memory foot-print (application data- structures, libraries, etc.).

  17. Shot level parallelization of a seismic inversion code using PVM

    SciTech Connect

    Versteeg, R.J.; Gockenback, M.; Symes, W.W.; Kern, M.

    1994-12-31

    This paper presents experience with parallelization using PVM of DSO, a seismic inversion code developed in The Rice Inversion Project. It focuses on one aspect: trying to run efficiently on a cluster of 4 workstations. The authors use a coarse grain parallelism in which they dynamically distribute the shots over the available machines in the cluster. The modeling and migration of their code is parallelized very effectively by this strategy; they have reached a overall performance of 104 Mflops using a configuration of one manager with 3 workers, a speedup of 2.4 versus the serial version, which according to Amdahl`s law is optimal given the current design of their code. Further speedup is currently limited by the non parallelized part of their code optimization, linear algebra and i(o).

  18. Parallel continuous flow: a parallel suffix tree construction tool for whole genomes.

    PubMed

    Comin, Matteo; Farreras, Montse

    2014-04-01

    The construction of suffix trees for very long sequences is essential for many applications, and it plays a central role in the bioinformatic domain. With the advent of modern sequencing technologies, biological sequence databases have grown dramatically. Also the methodologies required to analyze these data have become more complex everyday, requiring fast queries to multiple genomes. In this article, we present parallel continuous flow (PCF), a parallel suffix tree construction method that is suitable for very long genomes. We tested our method for the suffix tree construction of the entire human genome, about 3GB. We showed that PCF can scale gracefully as the size of the input genome grows. Our method can work with an efficiency of 90% with 36 processors and 55% with 172 processors. We can index the human genome in 7 minutes using 172 processes.

  19. Bonsai: N-body GPU tree-code

    NASA Astrophysics Data System (ADS)

    Bédorf, Jeroen; Gaburov, Evghenii; Portegies Zwart, Simon

    2012-12-01

    Bonsai is a gravitational N-body tree-code that runs completely on the GPU. This reduces the amount of time spent on communication with the CPU. The code runs on NVIDIA GPUs and on a GTX480 it is able to integrate ~2.8M particles per second. The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages.

  20. CALTRANS: A parallel, deterministic, 3D neutronics code

    SciTech Connect

    Carson, L.; Ferguson, J.; Rogers, J.

    1994-04-01

    Our efforts to parallelize the deterministic solution of the neutron transport equation has culminated in a new neutronics code CALTRANS, which has full 3D capability. In this article, we describe the layout and algorithms of CALTRANS and present performance measurements of the code on a variety of platforms. Explicit implementation of the parallel algorithms of CALTRANS using both the function calls of the Parallel Virtual Machine software package (PVM 3.2) and the Meiko CS-2 tagged message passing library (based on the Intel NX/2 interface) are provided in appendices.

  1. Parallel Scaling Characteristics of Selected NERSC User ProjectCodes

    SciTech Connect

    Skinner, David; Verdier, Francesca; Anand, Harsh; Carter,Jonathan; Durst, Mark; Gerber, Richard

    2005-03-05

    This report documents parallel scaling characteristics of NERSC user project codes between Fiscal Year 2003 and the first half of Fiscal Year 2004 (Oct 2002-March 2004). The codes analyzed cover 60% of all the CPU hours delivered during that time frame on seaborg, a 6080 CPU IBM SP and the largest parallel computer at NERSC. The scale in terms of concurrency and problem size of the workload is analyzed. Drawing on batch queue logs, performance data and feedback from researchers we detail the motivations, benefits, and challenges of implementing highly parallel scientific codes on current NERSC High Performance Computing systems. An evaluation and outlook of the NERSC workload for Allocation Year 2005 is presented.

  2. The Forest Method as a New Parallel Tree Method with the Sectional Voronoi Tessellation

    NASA Astrophysics Data System (ADS)

    Yahagi, Hideki; Mori, Masao; Yoshii, Yuzuru

    1999-09-01

    We have developed a new parallel tree method which will be called the forest method hereafter. This new method uses the sectional Voronoi tessellation (SVT) for the domain decomposition. The SVT decomposes a whole space into polyhedra and allows their flat borders to move by assigning different weights. The forest method determines these weights based on the load balancing among processors by means of the overload diffusion (OLD). Moreover, since all the borders are flat, before receiving the data from other processors, each processor can collect enough data to calculate the gravity force with precision. Both the SVT and the OLD are coded in a highly vectorizable manner to accommodate on vector parallel processors. The parallel code based on the forest method with the Message Passing Interface is run on various platforms so that a wide portability is guaranteed. Extensive calculations with 15 processors of Fujitsu VPP300/16R indicate that the code can calculate the gravity force exerted on 105 particles in each second for some ideal dark halo. This code is found to enable an N-body simulation with 107 or more particles for a wide dynamic range and is therefore a very powerful tool for the study of galaxy formation and large-scale structure in the universe.

  3. A Data Parallel Multizone Navier-Stokes Code

    NASA Technical Reports Server (NTRS)

    Jespersen, Dennis C.; Levit, Creon; Kwak, Dochan (Technical Monitor)

    1995-01-01

    We have developed a data parallel multizone compressible Navier-Stokes code on the Connection Machine CM-5. The code is set up for implicit time-stepping on single or multiple structured grids. For multiple grids and geometrically complex problems, we follow the "chimera" approach, where flow data on one zone is interpolated onto another in the region of overlap. We will describe our design philosophy and give some timing results for the current code. The design choices can be summarized as: 1. finite differences on structured grids; 2. implicit time-stepping with either distributed solves or data motion and local solves; 3. sequential stepping through multiple zones with interzone data transfer via a distributed data structure. We have implemented these ideas on the CM-5 using CMF (Connection Machine Fortran), a data parallel language which combines elements of Fortran 90 and certain extensions, and which bears a strong similarity to High Performance Fortran (HPF). One interesting feature is the issue of turbulence modeling, where the architecture of a parallel machine makes the use of an algebraic turbulence model awkward, whereas models based on transport equations are more natural. We will present some performance figures for the code on the CM-5, and consider the issues involved in transitioning the code to HPF for portability to other parallel platforms.

  4. An Expert System for the Development of Efficient Parallel Code

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Chun, Robert; Jin, Hao-Qiang; Labarta, Jesus; Gimenez, Judit

    2004-01-01

    We have built the prototype of an expert system to assist the user in the development of efficient parallel code. The system was integrated into the parallel programming environment that is currently being developed at NASA Ames. The expert system interfaces to tools for automatic parallelization and performance analysis. It uses static program structure information and performance data in order to automatically determine causes of poor performance and to make suggestions for improvements. In this paper we give an overview of our programming environment, describe the prototype implementation of our expert system, and demonstrate its usefulness with several case studies.

  5. Advances in Parallelization for Large Scale Oct-Tree Mesh Generation

    NASA Technical Reports Server (NTRS)

    O'Connell, Matthew; Karman, Steve L.

    2015-01-01

    Despite great advancements in the parallelization of numerical simulation codes over the last 20 years, it is still common to perform grid generation in serial. Generating large scale grids in serial often requires using special "grid generation" compute machines that can have more than ten times the memory of average machines. While some parallel mesh generation techniques have been proposed, generating very large meshes for LES or aeroacoustic simulations is still a challenging problem. An automated method for the parallel generation of very large scale off-body hierarchical meshes is presented here. This work enables large scale parallel generation of off-body meshes by using a novel combination of parallel grid generation techniques and a hybrid "top down" and "bottom up" oct-tree method. Meshes are generated using hardware commonly found in parallel compute clusters. The capability to generate very large meshes is demonstrated by the generation of off-body meshes surrounding complex aerospace geometries. Results are shown including a one billion cell mesh generated around a Predator Unmanned Aerial Vehicle geometry, which was generated on 64 processors in under 45 minutes.

  6. New Parallel computing framework for radiation transport codes

    SciTech Connect

    Kostin, M.A.; Mokhov, N.V.; Niita, K.; /JAERI, Tokai

    2010-09-01

    A new parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was integrated with the MARS15 code, and an effort is under way to deploy it in PHITS. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility can be used in single process calculations as well as in the parallel regime. Several checkpoint files can be merged into one thus combining results of several calculations. The framework also corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.

  7. LUDWIG: A parallel Lattice-Boltzmann code for complex fluids

    NASA Astrophysics Data System (ADS)

    Desplat, Jean-Christophe; Pagonabarraga, Ignacio; Bladon, Peter

    2001-03-01

    This paper describes Ludwig, a versatile code for the simulation of Lattice-Boltzmann (LB) models in 3D on cubic lattices. In fact, Ludwig is not a single code, but a set of codes that share certain common routines, such as I/O and communications. If Ludwig is used as intended, a variety of complex fluid models with different equilibrium free energies are simple to code, so that the user may concentrate on the physics of the problem, rather than on parallel computing issues. Thus far, Ludwig's main application has been to symmetric binary fluid mixtures. We first explain the philosophy and structure of Ludwig which is argued to be a very effective way of developing large codes for academic consortia. Next we elaborate on some parallel implementation issues such as parallel I/O, and the use of MPI to achieve full portability and good efficiency on both MPP and SMP systems. Finally, we describe how to implement generic solid boundaries, and look in detail at the particular case of a symmetric binary fluid mixture near a solid wall. We present a novel scheme for the thermodynamically consistent simulation of wetting phenomena, in the presence of static and moving solid boundaries, and check its performance.

  8. Parallel 3-D Electromagnetic Particle Code Using High Performance FORTRAN: Parallel TRISTAN

    NASA Astrophysics Data System (ADS)

    Cai, D.; Li, Y.; Nishikawa, K.-I.; et al.

    A three-dimensional full electromagnetic particle-in-cell (PIC ) code, TRISTAN (Tridimensional Stanford) code, has been parallelized using High Performance Fortran (HPF) as a RPM (Real Parallel Machine). In the parallelized HPF code, the simulation domain is decomposed in one-dimension, and both the particle and field data located in each domain that we call the sub-domain are distributed on each processor. Both the particle and field data on a sub-domain are needed by the neighbor sub-domains and thus communications between the sub-domains are inevitable. Our simulation results using HPF exhibit the promising applicability of the HPF communications to a large scale scientific computing such as solar wind-magnetosphere interactions.

  9. Compressed data organization for high throughput parallel entropy coding

    NASA Astrophysics Data System (ADS)

    Said, Amir; Mahfoodh, Abo-Talib; Yea, Sehoon

    2015-09-01

    The difficulty of parallelizing entropy coding is increasingly limiting the data throughputs achievable in media compression. In this work we analyze what are the fundamental limitations, using finite-state-machine models for identifying the best manner of separating tasks that can be processed independently, while minimizing compression losses. This analysis confirms previous works showing that effective parallelization is feasible only if the compressed data is organized in a proper way, which is quite different from conventional formats. The proposed new formats exploit the fact that optimal compression is not affected by the arrangement of coded bits, but it goes further in exploiting the decreasing cost of data processing and memory. Additional advantages include the ability to use, within this framework, increasingly more complex data modeling techniques, and the freedom to mix different types of coding. We confirm the parallelization effectiveness using coding simulations that run on multi-core processors, and show how throughput scales with the number of cores, and analyze the additional bit-rate overhead.

  10. Advances in Parallel Electromagnetic Codes for Accelerator Science and Development

    SciTech Connect

    Ko, Kwok; Candel, Arno; Ge, Lixin; Kabel, Andreas; Lee, Rich; Li, Zenghai; Ng, Cho; Rawat, Vineet; Schussman, Greg; Xiao, Liling; /SLAC

    2011-02-07

    Over a decade of concerted effort in code development for accelerator applications has resulted in a new set of electromagnetic codes which are based on higher-order finite elements for superior geometry fidelity and better solution accuracy. SLAC's ACE3P code suite is designed to harness the power of massively parallel computers to tackle large complex problems with the increased memory and solve them at greater speed. The US DOE supports the computational science R&D under the SciDAC project to improve the scalability of ACE3P, and provides the high performance computing resources needed for the applications. This paper summarizes the advances in the ACE3P set of codes, explains the capabilities of the modules, and presents results from selected applications covering a range of problems in accelerator science and development important to the Office of Science.

  11. Boltzmann Transport Code Update: Parallelization and Integrated Design Updates

    NASA Technical Reports Server (NTRS)

    Heinbockel, J. H.; Nealy, J. E.; DeAngelis, G.; Feldman, G. A.; Chokshi, S.

    2003-01-01

    The on going efforts at developing a web site for radiation analysis is expected to result in an increased usage of the High Charge and Energy Transport Code HZETRN. It would be nice to be able to do the requested calculations quickly and efficiently. Therefore the question arose, "Could the implementation of parallel processing speed up the calculations required?" To answer this question two modifications of the HZETRN computer code were created. The first modification selected the shield material of Al(2219) , then polyethylene and then Al(2219). The modified Fortran code was labeled 1SSTRN.F. The second modification considered the shield material of CO2 and Martian regolith. This modified Fortran code was labeled MARSTRN.F.

  12. Parallelization of Finite Element Analysis Codes Using Heterogeneous Distributed Computing

    NASA Technical Reports Server (NTRS)

    Ozguner, Fusun

    1996-01-01

    Performance gains in computer design are quickly consumed as users seek to analyze larger problems to a higher degree of accuracy. Innovative computational methods, such as parallel and distributed computing, seek to multiply the power of existing hardware technology to satisfy the computational demands of large applications. In the early stages of this project, experiments were performed using two large, coarse-grained applications, CSTEM and METCAN. These applications were parallelized on an Intel iPSC/860 hypercube. It was found that the overall speedup was very low, due to large, inherently sequential code segments present in the applications. The overall execution time T(sub par), of the application is dependent on these sequential segments. If these segments make up a significant fraction of the overall code, the application will have a poor speedup measure.

  13. Parallelization of KENO-Va Monte Carlo code

    NASA Astrophysics Data System (ADS)

    Ramón, Javier; Peña, Jorge

    1995-07-01

    KENO-Va is a code integrated within the SCALE system developed by Oak Ridge that solves the transport equation through the Monte Carlo Method. It is being used at the Consejo de Seguridad Nuclear (CSN) to perform criticality calculations for fuel storage pools and shipping casks. Two parallel versions of the code: one for shared memory machines and other for distributed memory systems using the message-passing interface PVM have been generated. In both versions the neutrons of each generation are tracked in parallel. In order to preserve the reproducibility of the results in both versions, advanced seeds for random numbers were used. The CONVEX C3440 with four processors and shared memory at CSN was used to implement the shared memory version. A FDDI network of 6 HP9000/735 was employed to implement the message-passing version using proprietary PVM. The speedup obtained was 3.6 in both cases.

  14. Parallel processing a real code: A case history

    SciTech Connect

    Mandell, D.A.; Trease, H.E.

    1988-01-01

    A three-dimensional, time-dependent Free-Lagrange hydrodynamics code has been multitasked and autotasked on a Cray X-MP/416. The multitasking was done by using the Los Alamos Multitasking Control Library, which is a superset of the Cray multitasking library. Autotasking is done by using constructs which are only comment cards if the source code is not run through a preprocessor. The 3-D algorithm has presented a number of problems that simpler algorithms, such as 1-D hydrodynamics, did not exhibit. Problems in converting the serial code, originally written for a Cray 1, to a multitasking code are discussed, Autotasking of a rewritten version of the code is discussed. Timing results for subroutines and hot spots in the serial code are presented and suggestions for additional tools and debugging aids are given. Theoretical speedup results obtained from Amdahl's law and actual speedup results obtained on a dedicated machine are presented. Suggestions for designing large parallel codes are given. 8 refs., 13 figs.

  15. Parallel processing a three-dimensional free-lagrange code

    SciTech Connect

    Mandell, D.A.; Trease, H.E. )

    1989-01-01

    A three-dimensional, time-dependent free-Lagrange hydrodynamics code has been multitasked and autotasked on a CRAY X-MP/416. The multitasking was done by using the Los Alamos Multitasking Control Library, which is a superset of the CRAY multitasking library. Autotasking is done by using constructs which are only comment cards if the source code is not run through a preprocessor. The three-dimensional algorithm has presented a number of problems that simpler algorithms, such as those for one-dimensional hydrodynamics, did not exhibit. Problems in converting the serial code, originally written for a CRAY-1, to a multitasking code are discussed. Autotasking of a rewritten version of the code is discussed. Timing results for subroutines and hot spots in the serial code are presented and suggestions for additional tools and debugging aids are given. Theoretical speedup results obtained from Amdahl's law and actual speedup results obtained on a dedicated machine are presented. Suggestions for designing large parallel codes are given.

  16. Development of a parallelization strategy for the VARIANT code

    SciTech Connect

    Hanebutte, U.R.; Khalil, H.S.; Palmiotti, G.; Tatsumi, M.

    1996-12-31

    The VARIANT code solves the multigroup steady-state neutron diffusion and transport equation in three-dimensional Cartesian and hexagonal geometries using the variational nodal method. VARIANT consists of four major parts that must be executed sequentially: input handling, calculation of response matrices, solution algorithm (i.e. inner-outer iteration), and output of results. The objective of the parallelization effort was to reduce the overall computing time by distributing the work of the two computationally intensive (sequential) tasks, the coupling coefficient calculation and the iterative solver, equally among a group of processors. This report describes the code`s calculations and gives performance results on one of the benchmark problems used to test the code. The performance analysis in the IBM SPx system shows good efficiency for well-load-balanced programs. Even for relatively small problem sizes, respectable efficiencies are seen for the SPx. An extension to achieve a higher degree of parallelism will be addressed in future work. 7 refs., 1 tab.

  17. Development of Parallel Code for the Alaska Tsunami Forecast Model

    NASA Astrophysics Data System (ADS)

    Bahng, B.; Knight, W. R.; Whitmore, P.

    2014-12-01

    The Alaska Tsunami Forecast Model (ATFM) is a numerical model used to forecast propagation and inundation of tsunamis generated by earthquakes and other means in both the Pacific and Atlantic Oceans. At the U.S. National Tsunami Warning Center (NTWC), the model is mainly used in a pre-computed fashion. That is, results for hundreds of hypothetical events are computed before alerts, and are accessed and calibrated with observations during tsunamis to immediately produce forecasts. ATFM uses the non-linear, depth-averaged, shallow-water equations of motion with multiply nested grids in two-way communications between domains of each parent-child pair as waves get closer to coastal waters. Even with the pre-computation the task becomes non-trivial as sub-grid resolution gets finer. Currently, the finest resolution Digital Elevation Models (DEM) used by ATFM are 1/3 arc-seconds. With a serial code, large or multiple areas of very high resolution can produce run-times that are unrealistic even in a pre-computed approach. One way to increase the model performance is code parallelization used in conjunction with a multi-processor computing environment. NTWC developers have undertaken an ATFM code-parallelization effort to streamline the creation of the pre-computed database of results with the long term aim of tsunami forecasts from source to high resolution shoreline grids in real time. Parallelization will also permit timely regeneration of the forecast model database with new DEMs; and, will make possible future inclusion of new physics such as the non-hydrostatic treatment of tsunami propagation. The purpose of our presentation is to elaborate on the parallelization approach and to show the compute speed increase on various multi-processor systems.

  18. Composing Data Parallel Code for a SPARQL Graph Engine

    SciTech Connect

    Castellana, Vito G.; Tumeo, Antonino; Villa, Oreste; Haglin, David J.; Feo, John

    2013-09-08

    Big data analytics process large amount of data to extract knowledge from them. Semantic databases are big data applications that adopt the Resource Description Framework (RDF) to structure metadata through a graph-based representation. The graph based representation provides several benefits, such as the possibility to perform in memory processing with large amounts of parallelism. SPARQL is a language used to perform queries on RDF-structured data through graph matching. In this paper we present a tool that automatically translates SPARQL queries to parallel graph crawling and graph matching operations. The tool also supports complex SPARQL constructs, which requires more than basic graph matching for their implementation. The tool generates parallel code annotated with OpenMP pragmas for x86 Shared-memory Multiprocessors (SMPs). With respect to commercial database systems such as Virtuoso, our approach reduces memory occupation due to join operations and provides higher performance. We show the scaling of the automatically generated graph-matching code on a 48-core SMP.

  19. A Parallel Numerical Micromagnetic Code Using FEniCS

    NASA Astrophysics Data System (ADS)

    Nagy, L.; Williams, W.; Mitchell, L.

    2013-12-01

    Many problems in the geosciences depend on understanding the ability of magnetic minerals to provide stable paleomagnetic recordings. Numerical micromagnetic modelling allows us to calculate the domain structures found in naturally occurring magnetic materials. However the computational cost rises exceedingly quickly with respect to the size and complexity of the geometries that we wish to model. This problem is compounded by the fact that the modern processor design no longer focuses on the speed at which calculations are performed, but rather on the number of computational units amongst which we may distribute our calculations. Consequently to better exploit modern computational resources our micromagnetic simulations must "go parallel". We present a parallel and scalable micromagnetics code written using FEniCS. FEniCS is a multinational collaboration involving several institutions (University of Cambridge, University of Chicago, The Simula Research Laboratory, etc.) that aims to provide a set of tools for writing scientific software; in particular software that employs the finite element method. The advantages of this approach are the leveraging of pre-existing projects from the world of scientific computing (PETSc, Trilinos, Metis/Parmetis, etc.) and exposing these so that researchers may pose problems in a manner closer to the mathematical language of their domain. Our code provides a scriptable interface (in Python) that allows users to not only run micromagnetic models in parallel, but also to perform pre/post processing of data.

  20. Performance of a parallel thermal-hydraulics code TEMPEST

    SciTech Connect

    Fann, G.I.; Trent, D.S.

    1996-11-01

    The authors describe the parallelization of the Tempest thermal-hydraulics code. The serial version of this code is used for production quality 3-D thermal-hydraulics simulations. Good speedup was obtained with a parallel diagonally preconditioned BiCGStab non-symmetric linear solver, using a spatial domain decomposition approach for the semi-iterative pressure-based and mass-conserved algorithm. The test case used here to illustrate the performance of the BiCGStab solver is a 3-D natural convection problem modeled using finite volume discretization in cylindrical coordinates. The BiCGStab solver replaced the LSOR-ADI method for solving the pressure equation in TEMPEST. BiCGStab also solves the coupled thermal energy equation. Scaling performance of 3 problem sizes (221220 nodes, 358120 nodes, and 701220 nodes) are presented. These problems were run on 2 different parallel machines: IBM-SP and SGI PowerChallenge. The largest problem attains a speedup of 68 on an 128 processor IBM-SP. In real terms, this is over 34 times faster than the fastest serial production time using the LSOR-ADI solver.

  1. Scalability study of parallel spatial direct numerical simulation code on IBM SP1 parallel supercomputer

    NASA Technical Reports Server (NTRS)

    Hanebutte, Ulf R.; Joslin, Ronald D.; Zubair, Mohammad

    1994-01-01

    The implementation and the performance of a parallel spatial direct numerical simulation (PSDNS) code are reported for the IBM SP1 supercomputer. The spatially evolving disturbances that are associated with laminar-to-turbulent in three-dimensional boundary-layer flows are computed with the PS-DNS code. By remapping the distributed data structure during the course of the calculation, optimized serial library routines can be utilized that substantially increase the computational performance. Although the remapping incurs a high communication penalty, the parallel efficiency of the code remains above 40% for all performed calculations. By using appropriate compile options and optimized library routines, the serial code achieves 52-56 Mflops on a single node of the SP1 (45% of theoretical peak performance). The actual performance of the PSDNS code on the SP1 is evaluated with a 'real world' simulation that consists of 1.7 million grid points. One time step of this simulation is calculated on eight nodes of the SP1 in the same time as required by a Cray Y/MP for the same simulation. The scalability information provides estimated computational costs that match the actual costs relative to changes in the number of grid points.

  2. CHOLLA: A NEW MASSIVELY PARALLEL HYDRODYNAMICS CODE FOR ASTROPHYSICAL SIMULATION

    SciTech Connect

    Schneider, Evan E.; Robertson, Brant E.

    2015-04-15

    We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳256{sup 3}) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.

  3. Dependent video coding using a tree representation of pixel dependencies

    NASA Astrophysics Data System (ADS)

    Amati, Luca; Valenzise, Giuseppe; Ortega, Antonio; Tubaro, Stefano

    2011-09-01

    Motion-compensated prediction induces a chain of coding dependencies between pixels in video. In principle, an optimal selection of encoding parameters (motion vectors, quantization parameters, coding modes) should take into account the whole temporal horizon of a GOP. However, in practical coding schemes, these choices are made on a frame-by-frame basis, thus with a possible loss of performance. In this paper we describe a tree-based model for pixelwise coding dependencies: each pixel in a frame is the child of a pixel in a previous reference frame. We show that some tree structures are more favorable than others from a rate-distortion perspective, e.g., because they entail a large descendance of pixels which are well predicted from a common ancestor. In those cases, a higher quality has to be assigned to pixels at the top of such trees. We promote the creation of these structures by adding a special discount term to the conventional Lagrangian cost adopted at the encoder. The proposed model can be implemented through a double-pass encoding procedure. Specifically, we devise heuristic cost functions to drive the selection of quantization parameters and of motion vectors, which can be readily implemented into a state-of-the-art H.264/AVC encoder. Our experiments demonstrate that coding efficiency is improved for video sequences with low motion, while there are no apparent gains for more complex motion. We argue that this is due to both the presence of complex encoder features not captured by the model, and to the complexity of the source to be encoded.

  4. Development of a massively parallel parachute performance prediction code

    SciTech Connect

    Peterson, C.W.; Strickland, J.H.; Wolfe, W.P.; Sundberg, W.D.; McBride, D.D.

    1997-04-01

    The Department of Energy has given Sandia full responsibility for the complete life cycle (cradle to grave) of all nuclear weapon parachutes. Sandia National Laboratories is initiating development of a complete numerical simulation of parachute performance, beginning with parachute deployment and continuing through inflation and steady state descent. The purpose of the parachute performance code is to predict the performance of stockpile weapon parachutes as these parachutes continue to age well beyond their intended service life. A new massively parallel computer will provide unprecedented speed and memory for solving this complex problem, and new software will be written to treat the coupled fluid, structure and trajectory calculations as part of a single code. Verification and validation experiments have been proposed to provide the necessary confidence in the computations.

  5. Development of parallel DEM for the open source code MFIX

    SciTech Connect

    Gopalakrishnan, Pradeep; Tafti, Danesh

    2013-02-01

    The paper presents the development of a parallel Discrete Element Method (DEM) solver for the open source code, Multiphase Flow with Interphase eXchange (MFIX) based on the domain decomposition method. The performance of the code was evaluated by simulating a bubbling fluidized bed with 2.5 million particles. The DEM solver shows strong scalability up to 256 processors with an efficiency of 81%. Further, to analyze weak scaling, the static height of the fluidized bed was increased to hold 5 and 10 million particles. The results show that global communication cost increases with problem size while the computational cost remains constant. Further, the effects of static bed height on the bubble hydrodynamics and mixing characteristics are analyzed.

  6. Parallel Monte Carlo Electron and Photon Transport Simulation Code (PMCEPT code)

    NASA Astrophysics Data System (ADS)

    Kum, Oyeon

    2004-11-01

    Simulations for customized cancer radiation treatment planning for each patient are very useful for both patient and doctor. These simulations can be used to find the most effective treatment with the least possible dose to the patient. This typical system, so called ``Doctor by Information Technology", will be useful to provide high quality medical services everywhere. However, the large amount of computing time required by the well-known general purpose Monte Carlo(MC) codes has prevented their use for routine dose distribution calculations for a customized radiation treatment planning. The optimal solution to provide ``accurate" dose distribution within an ``acceptable" time limit is to develop a parallel simulation algorithm on a beowulf PC cluster because it is the most accurate, efficient, and economic. I developed parallel MC electron and photon transport simulation code based on the standard MPI message passing interface. This algorithm solved the main difficulty of the parallel MC simulation (overlapped random number series in the different processors) using multiple random number seeds. The parallel results agreed well with the serial ones. The parallel efficiency approached 100% as was expected.

  7. GPU-based parallel clustered differential pulse code modulation

    NASA Astrophysics Data System (ADS)

    Wu, Jiaji; Li, Wenze; Kong, Wanqiu

    2015-10-01

    Hyperspectral remote sensing technology is widely used in marine remote sensing, geological exploration, atmospheric and environmental remote sensing. Owing to the rapid development of hyperspectral remote sensing technology, resolution of hyperspectral image has got a huge boost. Thus data size of hyperspectral image is becoming larger. In order to reduce their saving and transmission cost, lossless compression for hyperspectral image has become an important research topic. In recent years, large numbers of algorithms have been proposed to reduce the redundancy between different spectra. Among of them, the most classical and expansible algorithm is the Clustered Differential Pulse Code Modulation (CDPCM) algorithm. This algorithm contains three parts: first clusters all spectral lines, then trains linear predictors for each band. Secondly, use these predictors to predict pixels, and get the residual image by subtraction between original image and predicted image. Finally, encode the residual image. However, the process of calculating predictors is timecosting. In order to improve the processing speed, we propose a parallel C-DPCM based on CUDA (Compute Unified Device Architecture) with GPU. Recently, general-purpose computing based on GPUs has been greatly developed. The capacity of GPU improves rapidly by increasing the number of processing units and storage control units. CUDA is a parallel computing platform and programming model created by NVIDIA. It gives developers direct access to the virtual instruction set and memory of the parallel computational elements in GPUs. Our core idea is to achieve the calculation of predictors in parallel. By respectively adopting global memory, shared memory and register memory, we finally get a decent speedup.

  8. Recent developments in DYNSUB: New models, code optimization and parallelization

    SciTech Connect

    Daeubler, M.; Trost, N.; Jimenez, J.; Sanchez, V.

    2013-07-01

    DYNSUB is a high-fidelity coupled code system consisting of the reactor simulator DYN3D and the sub-channel code SUBCHANFLOW. It describes nuclear reactor core behavior with pin-by-pin resolution for both steady-state and transient scenarios. In the course of the coupled code system's active development, super-homogenization (SPH) and generalized equivalence theory (GET) discontinuity factors may be computed with and employed in DYNSUB to compensate pin level homogenization errors. Because of the largely increased numerical problem size for pin-by-pin simulations, DYNSUB has bene fitted from HPC techniques to improve its numerical performance. DYNSUB's coupling scheme has been structurally revised. Computational bottlenecks have been identified and parallelized for shared memory systems using OpenMP. Comparing the elapsed time for simulating a PWR core with one-eighth symmetry under hot zero power conditions applying the original and the optimized DYNSUB using 8 cores, overall speed up factors greater than 10 have been observed. The corresponding reduction in execution time enables a routine application of DYNSUB to study pin level safety parameters for engineering sized cases in a scientific environment. (authors)

  9. Fast parallel algorithms and enumeration techniques for partial k-trees

    SciTech Connect

    Narayanan, C.

    1989-01-01

    Recent research by several authors have resulted in systematic way of developing linear-time sequential algorithms for a host of problem: on a fairly general class of graphs variously known as bounded decomposable graphs, graphs of bounded treewidth, partial k-trees, etc. Partial k-trees arise in a variety of real-life applications such as network reliability, VLSI design and database systems and hence fast sequential algorithms on these graphs have been found to be desirable. The linear-time methodologies were independently developed by Bern, Lawler, and Wong ((10)), Arnborg and Proskurowski ((6)), Bodlaender ((14)), and Courcelle ((25)). Wimer ((89)) significantly extended the work of Bern, Lawler and Wong. All of these approaches share the common thread of using dynamic programming on a tree structure. In particular the methodology of Wimer uses a parse-tree as the data structure. The methodologies claim linear-time algorithms on partial k-trees for fixed k, for a number of combinatorial optimization problems given the tree structure as input. It is known that obtaining the tree structure is NP-hard. This dissertation investigates three important classes of problems: (1) Developing parallel algorithms for constructing a k-tree embedding, finding a tree decomposition and most notably obtaining a parse-tree for a partial k-tree. (2) Developing parallel algorithms for parse-tree computations, testing isomorphism of k-trees, and finding a 2-tree embedding of a cactus. (3) Obtaining techniques for counting vertex/edge subsets satisfying a certain property in some classes of partial k-trees. The parallel algorithms the author has developed are in class NC and are either new or improve upon the existing results of Bodlaender (13). The difference equations he has obtained for counting certain sub-graphs are not known in the literature so far.

  10. A GPU accelerated Barnes-Hut tree code for FLASH4

    NASA Astrophysics Data System (ADS)

    Lukat, Gunther; Banerjee, Robi

    2016-05-01

    We present a GPU accelerated CUDA-C implementation of the Barnes Hut (BH) tree code for calculating the gravitational potential on octree adaptive meshes. The tree code algorithm is implemented within the FLASH4 adaptive mesh refinement (AMR) code framework and therefore fully MPI parallel. We describe the algorithm and present test results that demonstrate its accuracy and performance in comparison to the algorithms available in the current FLASH4 version. We use a MacLaurin spheroid to test the accuracy of our new implementation and use spherical, collapsing cloud cores with effective AMR to carry out performance tests also in comparison with previous gravity solvers. Depending on the setup and the GPU/CPU ratio, we find a speedup for the gravity unit of at least a factor of 3 and up to 60 in comparison to the gravity solvers implemented in the FLASH4 code. We find an overall speedup factor for full simulations of at least factor 1.6 up to a factor of 10.

  11. Time-Dependent, Parallel Neutral Particle Transport Code System.

    2009-09-10

    Version 00 PARTISN (PARallel, TIme-Dependent SN) is the evolutionary successor to CCC-547/DANTSYS. The PARTISN code package is a modular computer program package designed to solve the time-independent or dependent multigroup discrete ordinates form of the Boltzmann transport equation in several different geometries. The modular construction of the package separates the input processing, the transport equation solving, and the post processing (or edit) functions into distinct code modules: the Input Module, the Solver Module, and themore » Edit Module, respectively. PARTISN is the evolutionary successor to the DANTSYSTM code system package. The Input and Edit Modules in PARTISN are very similar to those in DANTSYS. However, unlike DANTSYS, the Solver Module in PARTISN contains one, two, and three-dimensional solvers in a single module. In addition to the diamond-differencing method, the Solver Module also has Adaptive Weighted Diamond-Differencing (AWDD), Linear Discontinuous (LD), and Exponential Discontinuous (ED) spatial differencing methods. The spatial mesh may consist of either a standard orthogonal mesh or a block adaptive orthogonal mesh. The Solver Module may be run in parallel for two and three dimensional problems. One can now run 1-D problems in parallel using Energy Domain Decomposition (triggered by Block 5 input keyword npeg>0). EDD can also be used in 2-D/3-D with or without our standard Spatial Domain Decomposition. Both the static (fixed source or eigenvalue) and time-dependent forms of the transport equation are solved in forward or adjoint mode. In addition, PARTISN now has a probabilistic mode for Probability of Initiation (static) and Probability of Survival (dynamic) calculations. Vacuum, reflective, periodic, white, or inhomogeneous boundary conditions are solved. General anisotropic scattering and inhomogeneous sources are permitted. PARTISN solves the transport equation on orthogonal (single level or block-structured AMR) grids in 1-D

  12. a Vomr-Tree Based Parallel Range Query Method on Distributed Spatial Database

    NASA Astrophysics Data System (ADS)

    Fu, Z.; Liu, S.

    2012-07-01

    Spatial index impacts upon the efficiency of spatial query seriously in distributed spatial database. In this paper, we introduce a parallel spatial range query algorithm, based on VoMR-tree index, which incorporates Voronoi diagrams into MR-tree, benefiting from the nearest neighbors. We first augments MR-tree to store the nearest neighbors and constructs the VoMR-tree index by Voronoi diagram. We then propose a novel range query algorithm based on VoMR-tree index. In processing a range query, we discuss the data partition method so that we can improve the efficiency by parallelization in distributed database. Just then a verification strategy is promoted. We show the superiority of the proposed method by extensive experiments using data sets of various sizes. The experimental results reveal that the proposed method improves the performance of range query processing up to three times in comparison with the widely-used R-tree variants.

  13. Domain decomposition methods for a parallel Monte Carlo transport code

    SciTech Connect

    Alme, H J; Rodrigue, G H; Zimmerman, G B

    1999-01-27

    Achieving parallelism in simulations that use Monte Carlo transport methods presents interesting challenges. For problems that require domain decomposition, load balance can be harder to achieve. The Monte Carlo transport package may have to operate with other packages that have different optimal domain decompositions for a given problem. To examine some of these issues, we have developed a code that simulates the interaction of a laser with biological tissue; it uses a Monte Carlo method to simulate the laser and a finite element model to simulate the conduction of the temperature field in the tissue. We will present speedup and load balance results obtained for a suite of problems decomposed using a few domain decomposition algorithms we have developed.

  14. Improved video coding efficiency exploiting tree-based pixelwise coding dependencies

    NASA Astrophysics Data System (ADS)

    Valenzise, Giuseppe; Ortega, Antonio

    2010-01-01

    In a conventional hybrid video coding scheme, the choice of encoding parameters (motion vectors, quantization parameters, etc.) is carried out by optimizing frame by frame the output distortion for a given rate budget. While it is well known that motion estimation naturally induces a chain of dependencies among pixels, this is usually not explicitly exploited in the coding process in order to improve overall coding efficiency. Specifically, when considering a group of pictures with an IPPP... structure, each pixel of the first frame can be thought of as the root of a tree whose children are the pixels of the subsequent frames predicted by it. In this work, we demonstrate the advantages of such a representation by showing that, in some situations, the best motion vector is not the one that minimizes the energy of the prediction residual, but the one that produces a better tree structure, e.g., one that can be globally more favorable from a rate-distortion perspective. In this new structure, pixel with a larger descendance are allocated extra rate to produce higher quality predictors. As a proof of concept, we verify this assertion by assigning the quantization parameter in a video sequence in such a way that pixels with a larger number of descendants are coded with a higher quality. In this way we are able to improve RD performance by nearly 1 dB. Our preliminary results suggest that a deeper understanding of the temporal dependencies can potentially lead to substantial gains in coding performance.

  15. Exploration of Extreme Mass Ratio Inspirals with a Tree Code

    NASA Astrophysics Data System (ADS)

    Miller, Michael

    Extreme mass ratio inspirals (EMRIs), in which a stellar-mass object spirals into a supermassive black hole, are critical gravitational wave sources for the Laser Interferometer Space Antenna (LISA) because of their potential as precise probes of strong gravity. They are although thought to contribute to the flares observed in a few active galactic nuclei that have been attributed to tidal disruption of stars. There are, however, large uncertainties about the rates and properties of EMRIs. The reason is that their galactic nuclear environments contain millions of stars around a central massive object, and their paths must be integrated with great precision to include properly effects such as secular resonances, which accumulate over many orbits. Progress is being made on all fronts, but current numerical options are either profoundly computationally intensive (direct N-body integrators, which in addition do not currently have the needed long-term accuracy) or require special symmetry or other simplifications that may compromise the realism of the results (Monte Carlo and Fokker-Planck codes). We propose to undertake extensive simulations of EMRIs using tree codes that we have adapted to the problem. Tree codes are much faster than direct N-body simulations, yet they are powerful and flexible enough to include nonideal physics such as triaxiality, arbitrary mass spectra, post-Newtonian corrections, and secular evolutionary effects such as resonant relaxation and Kozai oscillations to the equations of motion. We propose to extend our codes to include these effects and to allow separate tracking of special ? that will represent binaries, thus allowing us to follow their interactions and evolution. In our development we will compare our results for a few tens of thousands of particles with a state of the art direct N-body integrator, to evaluate the accuracy of our code and discern systematic effects. This will allow detailed yet fast examinations of large-N systems

  16. Hierarchical segmentation-based image coding using hybrid quad-binary trees.

    PubMed

    Kassim, Ashraf A; Lee, Wei Siong; Zonoobi, Dornoosh

    2009-06-01

    A novel segmentation-based image approximation and coding technique is proposed. A hybrid quad-binary (QB) tree structure is utilized to efficiently model and code geometrical information within images. Compared to other tree-based representation such as wedgelets, the proposed QB-tree based method is more efficient for a wide range of contour features such as junctions, corners and ridges, especially at low bit rates.

  17. Use of Hilbert Curves in Parallelized CUDA code: Interaction of Interstellar Atoms with the Heliosphere

    NASA Astrophysics Data System (ADS)

    Destefano, Anthony; Heerikhuisen, Jacob

    2015-04-01

    Fully 3D particle simulations can be a computationally and memory expensive task, especially when high resolution grid cells are required. The problem becomes further complicated when parallelization is needed. In this work we focus on computational methods to solve these difficulties. Hilbert curves are used to map the 3D particle space to the 1D contiguous memory space. This method of organization allows for minimized cache misses on the GPU as well as a sorted structure that is equivalent to an octal tree data structure. This type of sorted structure is attractive for uses in adaptive mesh implementations due to the logarithm search time. Implementations using the Message Passing Interface (MPI) library and NVIDIA's parallel computing platform CUDA will be compared, as MPI is commonly used on server nodes with many CPU's. We will also compare static grid structures with those of adaptive mesh structures. The physical test bed will be simulating heavy interstellar atoms interacting with a background plasma, the heliosphere, simulated from fully consistent coupled MHD/kinetic particle code. It is known that charge exchange is an important factor in space plasmas, specifically it modifies the structure of the heliosphere itself. We would like to thank the Alabama Supercomputer Authority for the use of their computational resources.

  18. Breakdown of Spatial Parallel Coding in Children's Drawing

    ERIC Educational Resources Information Center

    De Bruyn, Bart; Davis, Alyson

    2005-01-01

    When drawing real scenes or copying simple geometric figures young children are highly sensitive to parallel cues and use them effectively. However, this sensitivity can break down in surprisingly simple tasks such as copying a single line where robust directional errors occur despite the presence of parallel cues. Before we can conclude that this…

  19. Implementation of a 3D mixing layer code on parallel computers

    NASA Technical Reports Server (NTRS)

    Roe, K.; Thakur, R.; Dang, T.; Bogucz, E.

    1995-01-01

    This paper summarizes our progress and experience in the development of a Computational-Fluid-Dynamics code on parallel computers to simulate three-dimensional spatially-developing mixing layers. In this initial study, the three-dimensional time-dependent Euler equations are solved using a finite-volume explicit time-marching algorithm. The code was first programmed in Fortran 77 for sequential computers. The code was then converted for use on parallel computers using the conventional message-passing technique, while we have not been able to compile the code with the present version of HPF compilers.

  20. Punctured Parallel and Serial Concatenated Convolutional Codes for BPSK/QPSK Channels

    NASA Technical Reports Server (NTRS)

    Acikel, Omer Fatih

    1999-01-01

    As available bandwidth for communication applications becomes scarce, bandwidth-efficient modulation and coding schemes become ever important. Since their discovery in 1993, turbo codes (parallel concatenated convolutional codes) have been the center of the attention in the coding community because of their bit error rate performance near the Shannon limit. Serial concatenated convolutional codes have also been shown to be as powerful as turbo codes. In this dissertation, we introduce algorithms for designing bandwidth-efficient rate r = k/(k + 1),k = 2, 3,..., 16, parallel and rate 3/4, 7/8, and 15/16 serial concatenated convolutional codes via puncturing for BPSK/QPSK (Binary Phase Shift Keying/Quadrature Phase Shift Keying) channels. Both parallel and serial concatenated convolutional codes have initially, steep bit error rate versus signal-to-noise ratio slope (called the -"cliff region"). However, this steep slope changes to a moderate slope with increasing signal-to-noise ratio, where the slope is characterized by the weight spectrum of the code. The region after the cliff region is called the "error rate floor" which dominates the behavior of these codes in moderate to high signal-to-noise ratios. Our goal is to design high rate parallel and serial concatenated convolutional codes while minimizing the error rate floor effect. The design algorithm includes an interleaver enhancement procedure and finds the polynomial sets (only for parallel concatenated convolutional codes) and the puncturing schemes that achieve the lowest bit error rate performance around the floor for the code rates of interest.

  1. Performance analysis of large scale parallel CFD computing based on Code_Saturne

    NASA Astrophysics Data System (ADS)

    Shang, Zhi

    2013-02-01

    In order to run computational fluid dynamics (CFD) codes on large scales, parallel computing has to be employed. For instance, on Petascale computing, general parallel computing without any optimization is not enough, especially for complex industrial issues that employ a large number of mesh cells to capture the details of the geometry. How to distribute these mesh cells among the multi-processors for Terascale and Petascale systems to obtain a good performance on parallel computing is really a challenge. Some mesh partitioning software packages, such as Metis, ParMetis, PT-Scotch and Zoltan, were chosen as the candidates ported into Code_Saturne to test if they can lead Code_Saturne towards Petascale and Exascale parallel CFD computing. Through the studies, it was found that mesh partitioning optimization software packages based on the graph mesh partitioning method can help the CFD code obtain good mesh distributions for high performance computing (HPC).

  2. Adaptive Mesh Refinement Algorithms for Parallel Unstructured Finite Element Codes

    SciTech Connect

    Parsons, I D; Solberg, J M

    2006-02-03

    This project produced algorithms for and software implementations of adaptive mesh refinement (AMR) methods for solving practical solid and thermal mechanics problems on multiprocessor parallel computers using unstructured finite element meshes. The overall goal is to provide computational solutions that are accurate to some prescribed tolerance, and adaptivity is the correct path toward this goal. These new tools will enable analysts to conduct more reliable simulations at reduced cost, both in terms of analyst and computer time. Previous academic research in the field of adaptive mesh refinement has produced a voluminous literature focused on error estimators and demonstration problems; relatively little progress has been made on producing efficient implementations suitable for large-scale problem solving on state-of-the-art computer systems. Research issues that were considered include: effective error estimators for nonlinear structural mechanics; local meshing at irregular geometric boundaries; and constructing efficient software for parallel computing environments.

  3. Fully Parallel Electrical Impedance Tomography Using Code Division Multiplexing.

    PubMed

    Tšoeu, M S; Inggs, M R

    2016-06-01

    Electrical Impedance Tomography (EIT) has been dominated by the use of Time Division Multiplexing (TDM) and Frequency Division Multiplexing (FDM) as methods of achieving orthogonal injection of excitation signals. Code Division Multiplexing (CDM), presented in this paper is an alternative that eliminates temporal data inconsistencies of TDM for fast changing systems. Furthermore, this approach eliminates data inconsistencies that arise in FDM when frequency bands of current injecting electrodes are chosen over frequencies that have large changes in the imaged object's impedance. To the authors knowledge no fully functional wideband system or simulation platform using simultaneous injection of Gold codes currents has been reported. In this paper, we formulate, simulate and develop a fully functional pseudo-random (Gold) code driven EIT system with 15 excitation currents and 16 separate voltage measurement electrodes. In the work we verify the use of CDM as a multiplexing modality in simultaneous injection EIT, using a prototype system with an overall bandwidth of 15 kHz, and attainable speed of 462 frames/s using codes with a period of 31 chips. Simulations and experiments are performed using the Electrical Impedance and Diffuse Optics Reconstruction Software (EIDORS). We also propose the use of image processing on reconstructed images to establish their quality quantitatively without access to raw reconstruction data. The results of this study show that CDM can be successfully used in EIT, and gives results of similar visual quality to TDM and FDM. Achieved performance shows average position error of 3.5% and size error of 6.2%. PMID:26731774

  4. Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

    NASA Technical Reports Server (NTRS)

    OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

    1998-01-01

    This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).

  5. Load-balancing techniques for a parallel electromagnetic particle-in-cell code

    SciTech Connect

    PLIMPTON,STEVEN J.; SEIDEL,DAVID B.; PASIK,MICHAEL F.; COATS,REBECCA S.

    2000-01-01

    QUICKSILVER is a 3-d electromagnetic particle-in-cell simulation code developed and used at Sandia to model relativistic charged particle transport. It models the time-response of electromagnetic fields and low-density-plasmas in a self-consistent manner: the fields push the plasma particles and the plasma current modifies the fields. Through an LDRD project a new parallel version of QUICKSILVER was created to enable large-scale plasma simulations to be run on massively-parallel distributed-memory supercomputers with thousands of processors, such as the Intel Tflops and DEC CPlant machines at Sandia. The new parallel code implements nearly all the features of the original serial QUICKSILVER and can be run on any platform which supports the message-passing interface (MPI) standard as well as on single-processor workstations. This report describes basic strategies useful for parallelizing and load-balancing particle-in-cell codes, outlines the parallel algorithms used in this implementation, and provides a summary of the modifications made to QUICKSILVER. It also highlights a series of benchmark simulations which have been run with the new code that illustrate its performance and parallel efficiency. These calculations have up to a billion grid cells and particles and were run on thousands of processors. This report also serves as a user manual for people wishing to run parallel QUICKSILVER.

  6. Salinas - An implicit finite element structural dynamics code developed for massively parallel platforms

    SciTech Connect

    BHARDWAJ, MANLJ K.; REESE,GARTH M.; DRIESSEN,BRIAN; ALVIN,KENNETH F.; DAY,DAVID M.

    2000-04-06

    As computational needs for structural finite element analysis increase, a robust implicit structural dynamics code is needed which can handle millions of degrees of freedom in the model and produce results with quick turn around time. A parallel code is needed to avoid limitations of serial platforms. Salinas is an implicit structural dynamics code specifically designed for massively parallel platforms. It computes the structural response of very large complex structures and provides solutions faster than any existing serial machine. This paper gives a current status of Salinas and uses demonstration problems to show Salinas' performance.

  7. Codes for a priority queue on a parallel data bus. [Deep Space Network

    NASA Technical Reports Server (NTRS)

    Wallis, D. E.; Taylor, H.

    1979-01-01

    Some codes for arbitration of priorities among subsystem computers or peripheral device controllers connected to a parallel data bus are described. At arbitration time, several subsystems present wire-OR, parallel code words to the bus, and the central computer can identify the subsystem of highest priority and determine which of two or more transmission services the subsystem requires. A mathematical discussion of the optimality of the codes with regard to the number of subsystems that may participate in the scheme for a given number of wires is presented along with the number of services that each subsystem may request.

  8. The Study of Address Tree Coding Based on the Maximum Matching Algorithm in Courier Business

    NASA Astrophysics Data System (ADS)

    Zhou, Shumin; Tang, Bin; Li, Wen

    As an important component of EMS monitoring system, address is different from user name with great uncertainty because there are many ways to represent it. Therefore, address standardization is a difficult task. Address tree coding has been trying to resolve that issue for many years. Zip code, as its most widely used algorithm, can only subdivide the address down to a designated post office, not the recipients' address. This problem needs artificial identification method to be accurately delivered. This paper puts forward a new encoding algorithm of the address tree - the maximum matching algorithm to solve the problem. This algorithm combines the characteristics of the address tree and the best matching theory, and brings in the associated layers of tree nodes to improve the matching efficiency. Taking the variability of address into account, the thesaurus of address tree should be updated timely by increasing new nodes automatically through intelligent tools.

  9. ANNarchy: a code generation approach to neural simulations on parallel hardware.

    PubMed

    Vitay, Julien; Dinkelbach, Helge Ü; Hamker, Fred H

    2015-01-01

    Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions.

  10. ANNarchy: a code generation approach to neural simulations on parallel hardware

    PubMed Central

    Vitay, Julien; Dinkelbach, Helge Ü.; Hamker, Fred H.

    2015-01-01

    Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957

  11. ANNarchy: a code generation approach to neural simulations on parallel hardware.

    PubMed

    Vitay, Julien; Dinkelbach, Helge Ü; Hamker, Fred H

    2015-01-01

    Many modern neural simulators focus on the simulation of networks of spiking neurons on parallel hardware. Another important framework in computational neuroscience, rate-coded neural networks, is mostly difficult or impossible to implement using these simulators. We present here the ANNarchy (Artificial Neural Networks architect) neural simulator, which allows to easily define and simulate rate-coded and spiking networks, as well as combinations of both. The interface in Python has been designed to be close to the PyNN interface, while the definition of neuron and synapse models can be specified using an equation-oriented mathematical description similar to the Brian neural simulator. This information is used to generate C++ code that will efficiently perform the simulation on the chosen parallel hardware (multi-core system or graphical processing unit). Several numerical methods are available to transform ordinary differential equations into an efficient C++code. We compare the parallel performance of the simulator to existing solutions. PMID:26283957

  12. Nyx: A MASSIVELY PARALLEL AMR CODE FOR COMPUTATIONAL COSMOLOGY

    SciTech Connect

    Almgren, Ann S.; Bell, John B.; Lijewski, Mike J.; Lukic, Zarija; Van Andel, Ethan

    2013-03-01

    We present a new N-body and gas dynamics code, called Nyx, for large-scale cosmological simulations. Nyx follows the temporal evolution of a system of discrete dark matter particles gravitationally coupled to an inviscid ideal fluid in an expanding universe. The gas is advanced in an Eulerian framework with block-structured adaptive mesh refinement; a particle-mesh scheme using the same grid hierarchy is used to solve for self-gravity and advance the particles. Computational results demonstrating the validation of Nyx on standard cosmological test problems, and the scaling behavior of Nyx to 50,000 cores, are presented.

  13. Parallelizing serial code for a distributed processing environment with an application to high frequency electromagnetic scattering

    NASA Astrophysics Data System (ADS)

    Work, Paul R.

    1991-12-01

    This thesis investigates the parallelization of existing serial programs in computational electromagnetics for use in a parallel environment. Existing algorithms for calculating the radar cross section of an object are covered, and a ray-tracing code is chosen for implementation on a parallel machine. Current parallel architectures are introduced and a suitable parallel machine is selected for the implementation of the chosen ray-tracing algorithm. The standard techniques for the parallelization of serial codes are discussed, including load balancing and decomposition considerations, and appropriate methods for the parallelization effort are selected. A load balancing algorithm is modified to increase the efficiency of the application, and a high level design of the structure of the serial program is presented. A detailed design of the modifications for the parallel implementation is also included, with both the high level and the detailed design specified in a high level design language called UNITY. The correctness of the design is proven using UNITY and standard logic operations. The theoretical and empirical results show that it is possible to achieve an efficient parallel application for a serial computational electromagnetic program where the characteristics of the algorithm and the target architecture critically influence the development of such an implementation.

  14. Pelegant : a parallel accelerator simulation code for electron generation and tracking.

    SciTech Connect

    Wang, Y.; Borland, M. D.; Accelerator Systems Division

    2006-01-01

    elegant is a general-purpose code for electron accelerator simulation that has a worldwide user base. Recently, many of the time-intensive elements were parallelized using MPI. Development has used modest Linux clusters and the BlueGene/L supercomputer at Argonne National Laboratory. This has provided very good performance for some practical simulations, such as multiparticle tracking with synchrotron radiation and emittance blow-up in the vertical rf kick scheme. The effort began with development of a concept that allowed for gradual parallelization of the code, using the existing beamline-element classification table in elegant. This was crucial as it allowed parallelization without major changes in code structure and without major conflicts with the ongoing evolution of elegant. Because of rounding error and finite machine precision, validating a parallel program against a uniprocessor program with the requirement of bitwise identical results is notoriously difficult. We will report validating simulation results of parallel elegant against those of serial elegant by applying Kahan's algorithm to improve accuracy dramatically for both versions. The quality of random numbers in a parallel implementation is very important for some simulations. Some practical experience with generating parallel random numbers by offsetting the seed of each random sequence according to the processor ID will be reported.

  15. DPR-tree: a distributed parallel spatial index structure for high performance spatial databases

    NASA Astrophysics Data System (ADS)

    Zhou, Yan; Zhu, Qing; Liu, Qiang

    2008-12-01

    Parallelism of spatial index could significantly improve the performance of spatial queries, special for massive spatial databases, so the research of parallel spatial index takes a important role in high performance spatial databases. Existing parallel spatial index methods have two main shortcoming: one is accessing hotspot and bottleneck of index items located in main server, the other is high costs and complicated operations for maintaining index consistency. Aim at these, a distributed parallel spatial index structure called DPR-tree is proposed. It splits whole index region into partition sub-regions by using Hilbert space-filling curve grid and organizes index sub-regions according to locality of spatial objects, then maps index sub-regions to partition sub-regions and assigns these index sub-regions to different computer nodes by a appointed map function, Each computer node manages a multi-level distributed sub-Rtree which is built from a index sub-region. Our experimental results indicate that the proposed parallel spatial index can achieve speedup well and offer significant potential for reducing query response time.

  16. A portable, parallel, object-oriented Monte Carlo neutron transport code in C++

    SciTech Connect

    Lee, S.R.; Cummings, J.C.; Nolen, S.D. |

    1997-05-01

    We have developed a multi-group Monte Carlo neutron transport code using C++ and the Parallel Object-Oriented Methods and Applications (POOMA) class library. This transport code, called MC++, currently computes k and {alpha}-eigenvalues and is portable to and runs parallel on a wide variety of platforms, including MPPs, clustered SMPs, and individual workstations. It contains appropriate classes and abstractions for particle transport and, through the use of POOMA, for portable parallelism. Current capabilities of MC++ are discussed, along with physics and performance results on a variety of hardware, including all Accelerated Strategic Computing Initiative (ASCI) hardware. Current parallel performance indicates the ability to compute {alpha}-eigenvalues in seconds to minutes rather than hours to days. Future plans and the implementation of a general transport physics framework are also discussed.

  17. PARALLEL IMPLEMENTATION OF THE TOPAZ OPACITY CODE: ISSUES IN LOAD-BALANCING

    SciTech Connect

    Sonnad, V; Iglesias, C A

    2008-05-12

    The TOPAZ opacity code explicitly includes configuration term structure in the calculation of bound-bound radiative transitions. This approach involves myriad spectral lines and requires the large computational capabilities of parallel processing computers. It is important, however, to make use of these resources efficiently. For example, an increase in the number of processors should yield a comparable reduction in computational time. This proportional 'speedup' indicates that very large problems can be addressed with massively parallel computers. Opacity codes can readily take advantage of parallel architecture since many intermediate calculations are independent. On the other hand, since the different tasks entail significantly disparate computational effort, load-balancing issues emerge so that parallel efficiency does not occur naturally. Several schemes to distribute the labor among processors are discussed.

  18. Omega3P: A Parallel Finite-Element Eigenmode Analysis Code for Accelerator Cavities

    SciTech Connect

    Lee, Lie-Quan; Li, Zenghai; Ng, Cho; Ko, Kwok; /SLAC

    2009-03-04

    Omega3P is a parallel eigenmode calculation code for accelerator cavities in frequency domain analysis using finite-element methods. In this report, we will present detailed finite-element formulations and resulting eigenvalue problems for lossless cavities, cavities with lossy materials, cavities with imperfectly conducting surfaces, and cavities with waveguide coupling. We will discuss the parallel algorithms for solving those eigenvalue problems and demonstrate modeling of accelerator cavities through different examples.

  19. Data Parallel Line Relaxation (DPLR) Code User Manual: Acadia - Version 4.01.1

    NASA Technical Reports Server (NTRS)

    Wright, Michael J.; White, Todd; Mangini, Nancy

    2009-01-01

    Data-Parallel Line Relaxation (DPLR) code is a computational fluid dynamic (CFD) solver that was developed at NASA Ames Research Center to help mission support teams generate high-value predictive solutions for hypersonic flow field problems. The DPLR Code Package is an MPI-based, parallel, full three-dimensional Navier-Stokes CFD solver with generalized models for finite-rate reaction kinetics, thermal and chemical non-equilibrium, accurate high-temperature transport coefficients, and ionized flow physics incorporated into the code. DPLR also includes a large selection of generalized realistic surface boundary conditions and links to enable loose coupling with external thermal protection system (TPS) material response and shock layer radiation codes.

  20. Assessing the performance of a parallel MATLAB-based 3D convection code

    NASA Astrophysics Data System (ADS)

    Kirkpatrick, G. J.; Hasenclever, J.; Phipps Morgan, J.; Shi, C.

    2008-12-01

    We are currently building 2D and 3D MATLAB-based parallel finite element codes for mantle convection and melting. The codes use the MATLAB implementation of core MPI commands (eg. Send, Receive, Broadcast) for message passing between computational subdomains. We have found that code development and algorithm testing are much faster in MATLAB than in our previous work coding in C or FORTRAN, this code was built from scratch with only 12 man-months of effort. The one extra cost w.r.t. C coding on a Beowulf cluster is the cost of the parallel MATLAB license for a >4core cluster. Here we present some preliminary results on the efficiency of MPI messaging in MATLAB on a small 4 machine, 16core, 32Gb RAM Intel Q6600 processor-based cluster. Our code implements fully parallelized preconditioned conjugate gradients with a multigrid preconditioner. Our parallel viscous flow solver is currently 20% slower for a 1,000,000 DOF problem on a single core in 2D as the direct solve MILAMIN MATLAB viscous flow solver. We have tested both continuous and discontinuous pressure formulations. We test with various configurations of network hardware, CPU speeds, and memory using our own and MATLAB's built in cluster profiler. So far we have only explored relatively small (up to 1.6GB RAM) test problems. We find that with our current code and Intel memory controller bandwidth limitations we can only get ~2.3 times performance out of 4 cores than 1 core per machine. Even for these small problems the code runs faster with message passing between 4 machines with one core each than 1 machine with 4 cores and internal messaging (1.29x slower), or 1 core (2.15x slower). It surprised us that for 2D ~1GB-sized problems with only 3 multigrid levels, the direct- solve on the coarsest mesh consumes comparable time to the iterative solve on the finest mesh - a penalty that is greatly reduced either by using a 4th multigrid level or by using an iterative solve at the coarsest grid level. We plan to

  1. Code Optimization and Parallelization on the Origins: Looking from Users' Perspective

    NASA Technical Reports Server (NTRS)

    Chang, Yan-Tyng Sherry; Thigpen, William W. (Technical Monitor)

    2002-01-01

    Parallel machines are becoming the main compute engines for high performance computing. Despite their increasing popularity, it is still a challenge for most users to learn the basic techniques to optimize/parallelize their codes on such platforms. In this paper, we present some experiences on learning these techniques for the Origin systems at the NASA Advanced Supercomputing Division. Emphasis of this paper will be on a few essential issues (with examples) that general users should master when they work with the Origins as well as other parallel systems.

  2. SUPREM-DSMC: A New Scalable, Parallel, Reacting, Multidimensional Direct Simulation Monte Carlo Flow Code

    NASA Technical Reports Server (NTRS)

    Campbell, David; Wysong, Ingrid; Kaplan, Carolyn; Mott, David; Wadsworth, Dean; VanGilder, Douglas

    2000-01-01

    An AFRL/NRL team has recently been selected to develop a scalable, parallel, reacting, multidimensional (SUPREM) Direct Simulation Monte Carlo (DSMC) code for the DoD user community under the High Performance Computing Modernization Office (HPCMO) Common High Performance Computing Software Support Initiative (CHSSI). This paper will introduce the JANNAF Exhaust Plume community to this three-year development effort and present the overall goals, schedule, and current status of this new code.

  3. A parallel algorithm for motion estimation in video coding using the bilinear transformation.

    PubMed

    Konstantopoulos, Charalampos

    2015-01-01

    Accurate motion estimation between frames is important for drastically reducing data redundancy in video coding. However, advanced motion estimation methods are computationally intensive and their execution in real time usually requires a parallel implementation. In this paper, we investigate the parallel implementation of such a motion estimation technique. Specifically, we present a parallel algorithm for motion estimation based on the bilinear transformation on the well-known parallel model of the hypercube network and formally prove the time and the space complexity of the proposed algorithm. We also show that the parallel algorithm can also run on other hypercubic networks, such as butterfly, cube-connected-cycles, shuffle-exchange or de Bruijn network with only constant slowdown.

  4. Hierarchical Place Trees: A Portable Abstraction for Task Parallelism and Data Movement

    NASA Astrophysics Data System (ADS)

    Yan, Yonghong; Zhao, Jisheng; Guo, Yi; Sarkar, Vivek

    Modern computer systems feature multiple homogeneous or heterogeneous computing units with deep memory hierarchies, and expect a high degree of thread-level parallelism from the software. Exploitation of data locality is critical to achieving scalable parallelism, but adds a significant dimension of complexity to performance optimization of parallel programs. This is especially true for programming models where locality is implicit and opaque to programmers. In this paper, we introduce the hierarchical place tree (HPT) model as a portable abstraction for task parallelism and data movement. The HPT model supports co-allocation of data and computation at multiple levels of a memory hierarchy. It can be viewed as a generalization of concepts from the Sequoia and X10 programming models, resulting in capabilities that are not supported by either. Compared to Sequoia, HPT supports three kinds of data movement in a memory hierarchy rather than just explicit data transfer between adjacent levels, as well as dynamic task scheduling rather than static task assignment. Compared to X10, HPT provides a hierarchical notion of places for both computation and data mapping. We describe our work-in-progress on implementing the HPT model in the Habanero-Java (HJ) compiler and runtime system. Preliminary results on general-purpose multicore processors and GPU accelerators indicate that the HPT model can be a promising portable abstraction for future multicore processors.

  5. Parallel level-set methods on adaptive tree-based grids

    NASA Astrophysics Data System (ADS)

    Mirzadeh, Mohammad; Guittet, Arthur; Burstedde, Carsten; Gibou, Frederic

    2016-10-01

    We present scalable algorithms for the level-set method on dynamic, adaptive Quadtree and Octree Cartesian grids. The algorithms are fully parallelized and implemented using the MPI standard and the open-source p4est library. We solve the level set equation with a semi-Lagrangian method which, similar to its serial implementation, is free of any time-step restrictions. This is achieved by introducing a scalable global interpolation scheme on adaptive tree-based grids. Moreover, we present a simple parallel reinitialization scheme using the pseudo-time transient formulation. Both parallel algorithms scale on the Stampede supercomputer, where we are currently using up to 4096 CPU cores, the limit of our current account. Finally, a relevant application of the algorithms is presented in modeling a crystallization phenomenon by solving a Stefan problem, illustrating a level of detail that would be impossible to achieve without a parallel adaptive strategy. We believe that the algorithms presented in this article will be of interest and useful to researchers working with the level-set framework and modeling multi-scale physics in general.

  6. A new parallel P3M code for very large-scale cosmological simulations

    NASA Astrophysics Data System (ADS)

    MacFarland, Tom; Couchman, H. M. P.; Pearce, F. R.; Pichlmeier, Jakob

    1998-12-01

    We have developed a parallel Particle-Particle, Particle-Mesh (P3M) simulation code for the Cray T3E parallel supercomputer that is well suited to studying the time evolution of systems of particles interacting via gravity and gas forces in cosmological contexts. The parallel code is based upon the public-domain serial Adaptive P3M-SPH (http://coho.astro.uwo.ca/pub/hydra/hydra.html) code of Couchman et al. (1995)[ApJ, 452, 797]. The algorithm resolves gravitational forces into a long-range component computed by discretizing the mass distribution and solving Poisson's equation on a grid using an FFT convolution method, and a short-range component computed by direct force summation for sufficiently close particle pairs. The code consists primarily of a particle-particle computation parallelized by domain decomposition over blocks of neighbour-cells, a more regular mesh calculation distributed in planes along one dimension, and several transformations between the two distributions. The load balancing of the P3M code is static, since this greatly aids the ongoing implementation of parallel adaptive refinements of the particle and mesh systems. Great care was taken throughout to make optimal use of the available memory, so that a version of the current implementation has been used to simulate systems of up to 109 particles with a 10243 mesh for the long-range force computation. These are the largest Cosmological N-body simulations of which we are aware. We discuss these memory optimizations as well as those motivated by computational performance. Performance results are very encouraging, and, even without refinements, the code has been used effectively for simulations in which the particle distribution becomes highly clustered as well as for other non-uniform systems of astrophysical interest.

  7. Understanding Performance of Parallel Scientific Simulation Codes using Open|SpeedShop

    SciTech Connect

    Ghosh, K K

    2011-11-07

    Conclusions of this presentation are: (1) Open SpeedShop's (OSS) is convenient to use for large, parallel, scientific simulation codes; (2) Large codes benefit from uninstrumented execution; (3) Many experiments can be run in a short time - might need multiple shots e.g. usertime for caller-callee, hwcsamp for HW counters; (4) Decent idea of code's performance is easily obtained; (5) Statistical sampling calls for decent number of samples; and (6) HWC data is very useful for micro-analysis but can be tricky to analyze.

  8. The implementation of an aeronautical CFD flow code onto distributed memory parallel systems

    NASA Astrophysics Data System (ADS)

    Ierotheou, C. S.; Forsey, C. R.; Leatham, M.

    2000-04-01

    The parallelization of an industrially important in-house computational fluid dynamics (CFD) code for calculating the airflow over complex aircraft configurations using the Euler or Navier-Stokes equations is presented. The code discussed is the flow solver module of the SAUNA CFD suite. This suite uses a novel grid system that may include block-structured hexahedral or pyramidal grids, unstructured tetrahedral grids or a hybrid combination of both. To assist in the rapid convergence to a solution, a number of convergence acceleration techniques are employed including implicit residual smoothing and a multigrid full approximation storage scheme (FAS). Key features of the parallelization approach are the use of domain decomposition and encapsulated message passing to enable the execution in parallel using a single programme multiple data (SPMD) paradigm. In the case where a hybrid grid is used, a unified grid partitioning scheme is employed to define the decomposition of the mesh. The parallel code has been tested using both structured and hybrid grids on a number of different distributed memory parallel systems and is now routinely used to perform industrial scale aeronautical simulations. Copyright

  9. Recent development for the ITS code system: Parallel processing and visualization

    SciTech Connect

    Fan, W.C.; Turner, C.D.; Halbleib, J.A. Sr.; Kensek, R.P.

    1996-03-01

    A brief overview is given for two software developments related to the ITS code system. These developments provide parallel processing and visualization capabilities and thus allow users to perform ITS calculations more efficiently. Timing results and a graphical example are presented to demonstrate these capabilities.

  10. Parallelization of a Transient Method of Lines Navier-Stokes Code

    NASA Astrophysics Data System (ADS)

    Erşahin, Cem; Tarhan, Tanil; Tuncer, Ismail H.; Selçuk, Nevin

    2004-01-01

    Parallel implementation of a serial code, namely method of lines (MOL) solution for momentum equations (MOLS4ME), previously developed for the solution of transient Navier-Stokes equations for incompressible separated internal flows in regular and complex geometries, is described.

  11. User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code

    SciTech Connect

    Earth Sciences Division; Zhang, Keni; Zhang, Keni; Wu, Yu-Shu; Pruess, Karsten

    2008-05-27

    TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator is to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used

  12. Performance Modeling and Measurement of Parallelized Code for Distributed Shared Memory Multiprocessors

    NASA Technical Reports Server (NTRS)

    Waheed, Abdul; Yan, Jerry

    1998-01-01

    This paper presents a model to evaluate the performance and overhead of parallelizing sequential code using compiler directives for multiprocessing on distributed shared memory (DSM) systems. With increasing popularity of shared address space architectures, it is essential to understand their performance impact on programs that benefit from shared memory multiprocessing. We present a simple model to characterize the performance of programs that are parallelized using compiler directives for shared memory multiprocessing. We parallelized the sequential implementation of NAS benchmarks using native Fortran77 compiler directives for an Origin2000, which is a DSM system based on a cache-coherent Non Uniform Memory Access (ccNUMA) architecture. We report measurement based performance of these parallelized benchmarks from four perspectives: efficacy of parallelization process; scalability; parallelization overhead; and comparison with hand-parallelized and -optimized version of the same benchmarks. Our results indicate that sequential programs can conveniently be parallelized for DSM systems using compiler directives but realizing performance gains as predicted by the performance model depends primarily on minimizing architecture-specific data locality overhead.

  13. Five-bit parallel operation of optical quantization and coding for photonic analog-to-digital conversion.

    PubMed

    Konishi, Tsuyoshi; Takahashi, Koji; Matsui, Hideki; Satoh, Takema; Itoh, Kazuyoshi

    2011-08-15

    We report the attempt of optical quantization and coding in 5-bit parallel format for photonic A/D conversion. The proposed system is designed to realize generation of 32 different optical codes in proportion to the corresponding signal levels when fed a certain range of amplitude-varied input pulses to the setup. Optical coding in a bit-parallel format made it possible, that provides 5 bit optical codes from 32 optical quantized pulses. The 5-bit parallel operation of an optical quantization and coding module with 5 multi-ports was tested in our experimental setup.

  14. Parallel Grand Canonical Monte Carlo (ParaGrandMC) Simulation Code

    NASA Technical Reports Server (NTRS)

    Yamakov, Vesselin I.

    2016-01-01

    This report provides an overview of the Parallel Grand Canonical Monte Carlo (ParaGrandMC) simulation code. This is a highly scalable parallel FORTRAN code for simulating the thermodynamic evolution of metal alloy systems at the atomic level, and predicting the thermodynamic state, phase diagram, chemical composition and mechanical properties. The code is designed to simulate multi-component alloy systems, predict solid-state phase transformations such as austenite-martensite transformations, precipitate formation, recrystallization, capillary effects at interfaces, surface absorption, etc., which can aid the design of novel metallic alloys. While the software is mainly tailored for modeling metal alloys, it can also be used for other types of solid-state systems, and to some degree for liquid or gaseous systems, including multiphase systems forming solid-liquid-gas interfaces.

  15. Advancements and performance of iterative methods in industrial applications codes on CRAY parallel/vector supercomputers

    SciTech Connect

    Poole, G.; Heroux, M.

    1994-12-31

    This paper will focus on recent work in two widely used industrial applications codes with iterative methods. The ANSYS program, a general purpose finite element code widely used in structural analysis applications, has now added an iterative solver option. Some results are given from real applications comparing performance with the tradition parallel/vector frontal solver used in ANSYS. Discussion of the applicability of iterative solvers as a general purpose solver will include the topics of robustness, as well as memory requirements and CPU performance. The FIDAP program is a widely used CFD code which uses iterative solvers routinely. A brief description of preconditioners used and some performance enhancements for CRAY parallel/vector systems is given. The solution of large-scale applications in structures and CFD includes examples from industry problems solved on CRAY systems.

  16. Evaluating In-Clique and Topological Parallelism Strategies for Junction Tree-Based Bayesian Inference Algorithm on the Cray XMT

    SciTech Connect

    Chin, George; Choudhury, Sutanay; Kangas, Lars J.; McFarlane, Sally A.; Marquez, Andres

    2011-09-01

    Long viewed as a strong statistical inference technique, Bayesian networks have emerged to be an important class of applications for high-performance computing. We have applied an architecture-conscious approach to parallelizing the Lauritzen-Spiegelhalter Junction Tree algorithm for exact inferencing in Bayesian networks. In optimizing the Junction Tree algorithm, we have implemented both in-clique and topological parallelism strategies to best leverage the fine-grained synchronization and massive-scale multithreading of the Cray XMT architecture. Two topological techniques were developed to parallelize the evidence propagation process through the Bayesian network. One technique involves performing intelligent scheduling of junction tree nodes based on its topology and relative size. The second technique involves decomposing the junction tree into a much finer tree-like representation to offer much more opportunities for parallelism. We evaluate these optimizations on five different Bayesian networks and report our findings and observations. Another important contribution of this paper is to demonstrate the application of massive-scale multithreading for load balancing and use of implicit parallelism-based compiler optimizations in designing scalable inferencing algorithms.

  17. [Series: Medical Applications of the PHITS Code (2): Acceleration by Parallel Computing].

    PubMed

    Furuta, Takuya; Sato, Tatsuhiko

    2015-01-01

    Time-consuming Monte Carlo dose calculation becomes feasible owing to the development of computer technology. However, the recent development is due to emergence of the multi-core high performance computers. Therefore, parallel computing becomes a key to achieve good performance of software programs. A Monte Carlo simulation code PHITS contains two parallel computing functions, the distributed-memory parallelization using protocols of message passing interface (MPI) and the shared-memory parallelization using open multi-processing (OpenMP) directives. Users can choose the two functions according to their needs. This paper gives the explanation of the two functions with their advantages and disadvantages. Some test applications are also provided to show their performance using a typical multi-core high performance workstation.

  18. Parallel coding schemes of whisker velocity in the rat's somatosensory system.

    PubMed

    Lottem, Eran; Gugig, Erez; Azouz, Rony

    2015-03-15

    The function of rodents' whisker somatosensory system is to transform tactile cues, in the form of vibrissa vibrations, into neuronal responses. It is well established that rodents can detect numerous tactile stimuli and tell them apart. However, the transformation of tactile stimuli obtained through whisker movements to neuronal responses is not well-understood. Here we examine the role of whisker velocity in tactile information transmission and its coding mechanisms. We show that in anaesthetized rats, whisker velocity is related to the radial distance of the object contacted and its own velocity. Whisker velocity is accurately and reliably coded in first-order neurons in parallel, by both the relative time interval between velocity-independent first spike latency of rapidly adapting neurons and velocity-dependent first spike latency of slowly adapting neurons. At the same time, whisker velocity is also coded, although less robustly, by the firing rates of slowly adapting neurons. Comparing first- and second-order neurons, we find similar decoding efficiencies for whisker velocity using either temporal or rate-based methods. Both coding schemes are sufficiently robust and hardly affected by neuronal noise. Our results suggest that whisker kinematic variables are coded by two parallel coding schemes and are disseminated in a similar way through various brain stem nuclei to multiple brain areas.

  19. Performance and Application of Parallel OVERFLOW Codes on Distributed and Shared Memory Platforms

    NASA Technical Reports Server (NTRS)

    Djomehri, M. Jahed; Rizk, Yehia M.

    1999-01-01

    The presentation discusses recent studies on the performance of the two parallel versions of the aerodynamics CFD code, OVERFLOW_MPI and _MLP. Developed at NASA Ames, the serial version, OVERFLOW, is a multidimensional Navier-Stokes flow solver based on overset (Chimera) grid technology. The code has recently been parallelized in two ways. One is based on the explicit message-passing interface (MPI) across processors and uses the _MPI communication package. This approach is primarily suited for distributed memory systems and workstation clusters. The second, termed the multi-level parallel (MLP) method, is simple and uses shared memory for all communications. The _MLP code is suitable on distributed-shared memory systems. For both methods, the message passing takes place across the processors or processes at the advancement of each time step. This procedure is, in effect, the Chimera boundary conditions update, which is done in an explicit "Jacobi" style. In contrast, the update in the serial code is done in more of the "Gauss-Sidel" fashion. The programming efforts for the _MPI code is more complicated than for the _MLP code; the former requires modification of the outer and some inner shells of the serial code, whereas the latter focuses only on the outer shell of the code. The _MPI version offers a great deal of flexibility in distributing grid zones across a specified number of processors in order to achieve load balancing. The approach is capable of partitioning zones across multiple processors or sending each zone and/or cluster of several zones into a single processor. The message passing across the processors consists of Chimera boundary and/or an overlap of "halo" boundary points for each partitioned zone. The MLP version is a new coarse-grain parallel concept at the zonal and intra-zonal levels. A grouping strategy is used to distribute zones into several groups forming sub-processes which will run in parallel. The total volume of grid points in each

  20. On the error probability of general tree and trellis codes with applications to sequential decoding

    NASA Technical Reports Server (NTRS)

    Johannesson, R.

    1973-01-01

    An upper bound on the average error probability for maximum-likelihood decoding of the ensemble of random binary tree codes is derived and shown to be independent of the length of the tree. An upper bound on the average error probability for maximum-likelihood decoding of the ensemble of random L-branch binary trellis codes of rate R = 1/n is derived which separates the effects of the tail length T and the memory length M of the code. It is shown that the bound is independent of the length L of the information sequence. This implication is investigated by computer simulations of sequential decoding utilizing the stack algorithm. These simulations confirm the implication and further suggest an empirical formula for the true undetected decoding error probability with sequential decoding.

  1. Performance of a parallel code for the Euler equations on hypercube computers

    NASA Technical Reports Server (NTRS)

    Barszcz, Eric; Chan, Tony F.; Jesperson, Dennis C.; Tuminaro, Raymond S.

    1990-01-01

    The performance of hypercubes were evaluated on a computational fluid dynamics problem and the parallel environment issues were considered that must be addressed, such as algorithm changes, implementation choices, programming effort, and programming environment. The evaluation focuses on a widely used fluid dynamics code, FLO52, which solves the two dimensional steady Euler equations describing flow around the airfoil. The code development experience is described, including interacting with the operating system, utilizing the message-passing communication system, and code modifications necessary to increase parallel efficiency. Results from two hypercube parallel computers (a 16-node iPSC/2, and a 512-node NCUBE/ten) are discussed and compared. In addition, a mathematical model of the execution time was developed as a function of several machine and algorithm parameters. This model accurately predicts the actual run times obtained and is used to explore the performance of the code in interesting but yet physically realizable regions of the parameter space. Based on this model, predictions about future hypercubes are made.

  2. Three-dimensional parallel UNIPIC-3D code for simulations of high-power microwave devices

    SciTech Connect

    Wang Jianguo; Chen Zaigao; Wang Yue; Zhang Dianhui; Qiao Hailiang; Fu Meiyan; Yuan Yuan; Liu Chunliang; Li Yongdong; Wang Hongguang

    2010-07-15

    This paper introduces a self-developed, three-dimensional parallel fully electromagnetic particle simulation code UNIPIC-3D. In this code, the electromagnetic fields are updated using the second-order, finite-difference time-domain method, and the particles are moved using the relativistic Newton-Lorentz force equation. The electromagnetic field and particles are coupled through the current term in Maxwell's equations. Two numerical examples are used to verify the algorithms adopted in this code, numerical results agree well with theoretical ones. This code can be used to simulate the high-power microwave (HPM) devices, such as the relativistic backward wave oscillator, coaxial vircator, and magnetically insulated line oscillator, etc. UNIPIC-3D is written in the object-oriented C++ language and can be run on a variety of platforms including WINDOWS, LINUX, and UNIX. Users can use the graphical user's interface to create the complex geometric structures of the simulated HPM devices, which can be automatically meshed by UNIPIC-3D code. This code has a powerful postprocessor which can display the electric field, magnetic field, current, voltage, power, spectrum, momentum of particles, etc. For the sake of comparison, the results computed by using the two-and-a-half-dimensional UNIPIC code are also provided for the same parameters of HPM devices, the numerical results computed from these two codes agree well with each other.

  3. Software tools for developing parallel applications. Part 1: Code development and debugging

    SciTech Connect

    Brown, J.; Geist, A.; Pancake, C.; Rover, D.

    1997-04-01

    Developing an application for parallel computers can be a lengthy and frustrating process making it a perfect candidate for software tool support. Yet application programmers are often the last to hear about new tools emerging from R and D efforts. This paper provides an overview of two focuses of tool support: code development and debugging. Each is discussed in terms of the programmer needs addressed, the extent to which representative current tools meet those needs, and what new levels of tool support are important if parallel computing is to become more widespread.

  4. Neptune: An astrophysical smooth particle hydrodynamics code for massively parallel computer architectures

    NASA Astrophysics Data System (ADS)

    Sandalski, Stou

    Smooth particle hydrodynamics is an efficient method for modeling the dynamics of fluids. It is commonly used to simulate astrophysical processes such as binary mergers. We present a newly developed GPU accelerated smooth particle hydrodynamics code for astrophysical simulations. The code is named neptune after the Roman god of water. It is written in OpenMP parallelized C++ and OpenCL and includes octree based hydrodynamic and gravitational acceleration. The design relies on object-oriented methodologies in order to provide a flexible and modular framework that can be easily extended and modified by the user. Several pre-built scenarios for simulating collisions of polytropes and black-hole accretion are provided. The code is released under the MIT Open Source license and publicly available at http://code.google.com/p/neptune-sph/.

  5. Parallel evolution of dwarf ecotypes in the forest tree Eucalyptus globulus.

    PubMed

    Foster, Susan A; McKinnon, Gay E; Steane, Dorothy A; Potts, Brad M; Vaillancourt, René E

    2007-01-01

    Three small populations of a dwarf ecotype of the forest tree Eucalyptus globulus are found on exposed granite headlands in south-eastern Australia. These populations are separated by at least 100 km. Here, we used 12 nuclear microsatellites and a chloroplast DNA marker to investigate the genetic affinities of the dwarf populations to one another and to their nearest populations of tall E. globulus. Cape Tourville was studied in greater detail to assess the processes enabling the maintenance of distinct ecotypes in close geographical proximity. The three dwarf populations were not related to one another and were more closely related to adjacent tall trees than to one another. At Cape Tourville the dwarf and tall ecotypes were significantly differentiated in microsatellites and in chloroplast DNA. The dwarf and tall populations differed in flowering time and no evidence of pollen dispersal from the more extensive tall to the dwarf population was found. The three dwarf populations have evolved in parallel from the local tall ecotypes. This study shows that small marginal populations of eucalypts are capable of developing reproductive isolation from nearby larger populations through differences in flowering time and/or minor spatial separation, making parapatric speciation possible.

  6. Shared and Distributed Memory Parallel Security Analysis of Large-Scale Source Code and Binary Applications

    SciTech Connect

    Quinlan, D; Barany, G; Panas, T

    2007-08-30

    Many forms of security analysis on large scale applications can be substantially automated but the size and complexity can exceed the time and memory available on conventional desktop computers. Most commercial tools are understandably focused on such conventional desktop resources. This paper presents research work on the parallelization of security analysis of both source code and binaries within our Compass tool, which is implemented using the ROSE source-to-source open compiler infrastructure. We have focused on both shared and distributed memory parallelization of the evaluation of rules implemented as checkers for a wide range of secure programming rules, applicable to desktop machines, networks of workstations and dedicated clusters. While Compass as a tool focuses on source code analysis and reports violations of an extensible set of rules, the binary analysis work uses the exact same infrastructure but is less well developed into an equivalent final tool.

  7. Parallelization of the GKEM Electromagnetic PIC code using MPI and OpenMP

    NASA Astrophysics Data System (ADS)

    Benjamin, Mark; Ethier, Stephane; Lee, Wei-Li

    2009-11-01

    GKEM is a legacy gyrokinetic PIC code in slab geometry that calculates anomalous transport in fusion plasmas due to drift wave microturbulence. It is currently being used to develop new algorithms for high-beta electromagnetic PIC simulations. This work focuses on the modernization and performance improvement of GKEM through the use of FORTRAN 90 language features and parallelization. MPI-based particle parallelization was implemented as well as loop-level multi-threading using OpenMP directives. Performance improvements and speedup curves for the different stages of the code are discussed. Project supported by the DOE-PPPL High School Internship Program and DOE contract DE-AC02-09CH11466.

  8. Parallel sparse and dense information coding streams in the electrosensory midbrain.

    PubMed

    Sproule, Michael K J; Metzen, Michael G; Chacron, Maurice J

    2015-10-21

    Efficient processing of incoming sensory information is critical for an organism's survival. It has been widely observed across systems and species that the representation of sensory information changes across successive brain areas. Indeed, peripheral sensory neurons tend to respond densely to a broad range of sensory stimuli while more central neurons tend to instead respond sparsely to a narrow range of stimuli. Such a transition might be advantageous as sparse neural codes are thought to be metabolically efficient and optimize coding efficiency. Here we investigated whether the neural code transitions from dense to sparse within the midbrain Torus semicircularis (TS) of weakly electric fish. Confirming previous results, we found both dense and sparse coding neurons. However, subsequent histological classification revealed that most dense neurons projected to higher brain areas. Our results thus provide strong evidence against the hypothesis that the neural code transitions from dense to sparse in the electrosensory system. Rather, they support the alternative hypothesis that higher brain areas receive parallel streams of dense and sparse coded information from the electrosensory midbrain. We discuss the implications and possible advantages of such a coding strategy and argue that it is a general feature of sensory processing.

  9. SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX/80

    NASA Technical Reports Server (NTRS)

    Kamat, Manohar P.; Watson, Brian C.

    1992-01-01

    The results of a research activity aimed at providing a finite element capability for analyzing turbo-machinery bladed-disk assemblies in a vector/parallel processing environment are summarized. Analysis of aircraft turbofan engines is very computationally intensive. The performance limit of modern day computers with a single processing unit was estimated at 3 billions of floating point operations per second (3 gigaflops). In view of this limit of a sequential unit, performance rates higher than 3 gigaflops can be achieved only through vectorization and/or parallelization as on Alliant FX/80. Accordingly, the efforts of this critically needed research were geared towards developing and evaluating parallel finite element methods for static and vibration analysis. A special purpose code, named with the acronym SAPNEW, performs static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements.

  10. SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX/80

    NASA Astrophysics Data System (ADS)

    Kamat, Manohar P.; Watson, Brian C.

    1992-02-01

    The results of a research activity aimed at providing a finite element capability for analyzing turbo-machinery bladed-disk assemblies in a vector/parallel processing environment are summarized. Analysis of aircraft turbofan engines is very computationally intensive. The performance limit of modern day computers with a single processing unit was estimated at 3 billions of floating point operations per second (3 gigaflops). In view of this limit of a sequential unit, performance rates higher than 3 gigaflops can be achieved only through vectorization and/or parallelization as on Alliant FX/80. Accordingly, the efforts of this critically needed research were geared towards developing and evaluating parallel finite element methods for static and vibration analysis. A special purpose code, named with the acronym SAPNEW, performs static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements.

  11. MT3DMSP - A parallelized version of the MT3DMS code

    NASA Astrophysics Data System (ADS)

    Abdelaziz, Ramadan; Le, Hai Ha

    2014-12-01

    A parallelized version of the 3-D multi-species transport model MT3DMS was developed and tested. Specifically, the open multiprocessing (OpenMP) was utilized for communication between the processors. MT3DMS emulates the solute transport by dividing the calculation into the flow and transport steps. In this article, a new preconditioner, derived from Symmetric Successive Over Relaxation (SSOR) was added into the generalized conjugate gradient solver. This preconditioner is well suited and appropriate for the parallel architecture. A case study in the test field at TU Bergakademie Freiberg was used to produce the results and analyze the code performance. It was observed that most of running time would be required for the advection, dispersion. As a result, the parallel version decreases significantly running time of solute transport modeling. In addition, this work provides a first attempt to demonstrate the capability and versatility of MT3DMS5P to simulate the solute transport in fractured gneiss rock.

  12. Denovo--A New Three-Dimensional Parallel Discrete Ordinates Code in SCALE

    SciTech Connect

    Evans, Thomas M; Stafford, Alissa; Clarno, Kevin T

    2010-01-01

    Denovo is a new, three-dimensional, discrete ordinates (SN) transport code that uses state-of-the-art solution methods to obtain accurate solutions to the Boltzmann transport equation. Denovo uses the Koch-Baker-Alcouffe parallel sweep algorithm to obtain high parallel efficiency on O(100) processors on XYZ orthogonal meshes. As opposed to traditional SN codes that use source iteration, Denovo uses nonstationary Krylov methods to solve the within-group equations. Krylov methods are far more efficient than stationary schemes. Additionally, classic acceleration schemes (diffusion synthetic acceleration) do not suffer stability problems when used as a preconditioner to a Krylov solver. Denovo's generic programming framework allows multiple spatial discretization schemes and solution methodologies. Denovo currently provides diamond-difference, theta-weighted diamond-difference, linear-discontinuous finite element, trilinear-discontinuous finite element, and step characteristics spatial differencing schemes. Also, users have the option of running traditional source iteration instead of Krylov iteration. Multigroup upscatter problems can be solved using Gauss-Seidel iteration with transport, two-grid acceleration. A parallel first-collision source is also available. Denovo solutions to the Kobayashi benchmarks are in excellent agreement with published results. Parallel performance shows excellent weak scaling up to 20000 cores and good scaling up to 40000 cores.

  13. Reading out a spatiotemporal population code by imaging neighbouring parallel fibre axons in vivo

    PubMed Central

    Wilms, Christian D.; Häusser, Michael

    2015-01-01

    The spatiotemporal pattern of synaptic inputs to the dendritic tree is crucial for synaptic integration and plasticity. However, it is not known if input patterns driven by sensory stimuli are structured or random. Here we investigate the spatial patterning of synaptic inputs by directly monitoring presynaptic activity in the intact mouse brain on the micron scale. Using in vivo calcium imaging of multiple neighbouring cerebellar parallel fibre axons, we find evidence for clustered patterns of axonal activity during sensory processing. The clustered parallel fibre input we observe is ideally suited for driving dendritic spikes, postsynaptic calcium signalling, and synaptic plasticity in downstream Purkinje cells, and is thus likely to be a major feature of cerebellar function during sensory processing. PMID:25751648

  14. Self-Scheduling Parallel Methods for Multiple Serial Codes with Application to WOPWOP

    NASA Technical Reports Server (NTRS)

    Long, Lyle N.; Brentner, Kenneth S.

    2000-01-01

    This paper presents a scheme for efficiently running a large number of serial jobs on parallel computers. Two examples are given of computer programs that run relatively quickly, but often they must be run numerous times to obtain all the results needed. It is very common in science and engineering to have codes that are not massive computing challenges in themselves, but due to the number of instances that must be run, they do become large-scale computing problems. The two examples given here represent common problems in aerospace engineering: aerodynamic panel methods and aeroacoustic integral methods. The first example simply solves many systems of linear equations. This is representative of an aerodynamic panel code where someone would like to solve for numerous angles of attack. The complete code for this first example is included in the appendix so that it can be readily used by others as a template. The second example is an aeroacoustics code (WOPWOP) that solves the Ffowcs Williams Hawkings equation to predict the far-field sound due to rotating blades. In this example, one quite often needs to compute the sound at numerous observer locations, hence parallelization is utilized to automate the noise computation for a large number of observers.

  15. A symbol-map wavelet zero-tree image coding algorithm

    NASA Astrophysics Data System (ADS)

    Wang, Xiaodong; Liu, Wenyao; Peng, Xiang; Liu, Xiaoli

    2008-03-01

    A improved SPIHT image compression algorithm called symbol-map zero-tree coding algorithm (SMZTC) is proposed in this paper based on wavelet transform. The SPIHT algorithm is a high efficiency wavelet coefficients coding method and have good image compressing effect, but it has more complexity and need too much memory. The algorithm presented in this paper utilizes two small symbol-maps Mark and FC to store the status of coefficients and zero tree sets during coding procedure so as to reduce the memory requirement. By this strategy, the memory cost is reduced distinctly as well as the scanning speed of coefficients is improved. Those comparison experiments for 512 by 512 images are done with some other zerotree coding algorithms, such as SPIHT, NLS method. During the experiments, the biorthogonal 9/7 lifting wavelet transform is used to image transform. The results of coding experiments show that this algorithm speed of codec is improved significantly, and compression-ratio is almost uniformed with SPIHT algorithm.

  16. Recent Improvements to the IMPACT-T Parallel Particle TrackingCode

    SciTech Connect

    Qiang, J.; Pogorelov, I.V.; Ryne, R.

    2006-11-16

    The IMPACT-T code is a parallel three-dimensional quasi-static beam dynamics code for modeling high brightness beams in photoinjectors and RF linacs. Developed under the US DOE Scientific Discovery through Advanced Computing (SciDAC) program, it includes several key features including a self-consistent calculation of 3D space-charge forces using a shifted and integrated Green function method, multiple energy bins for beams with large energy spread, and models for treating RF standing wave and traveling wave structures. In this paper, we report on recent improvements to the IMPACT-T code including modeling traveling wave structures, short-range transverse and longitudinal wakefields, and longitudinal coherent synchrotron radiation through bending magnets.

  17. Performance analysis of parallel gravitational N-body codes on large GPU clusters

    NASA Astrophysics Data System (ADS)

    Huang, Si-Yi; Spurzem, Rainer; Berczik, Peter

    2016-01-01

    We compare the performance of two very different parallel gravitational N-body codes for astrophysical simulations on large Graphics Processing Unit (GPU) clusters, both of which are pioneers in their own fields as well as on certain mutual scales - NBODY6++ and Bonsai. We carry out benchmarks of the two codes by analyzing their performance, accuracy and efficiency through the modeling of structure decomposition and timing measurements. We find that both codes are heavily optimized to leverage the computational potential of GPUs as their performance has approached half of the maximum single precision performance of the underlying GPU cards. With such performance we predict that a speed-up of 200 – 300 can be achieved when up to 1k processors and GPUs are employed simultaneously. We discuss the quantitative information about comparisons of the two codes, finding that in the same cases Bonsai adopts larger time steps as well as larger relative energy errors than NBODY6++, typically ranging from 10 – 50 times larger, depending on the chosen parameters of the codes. Although the two codes are built for different astrophysical applications, in specified conditions they may overlap in performance at certain physical scales, thus allowing the user to choose either one by fine-tuning parameters accordingly.

  18. Performance analysis of parallel gravitational N-body codes on large GPU clusters

    NASA Astrophysics Data System (ADS)

    Huang, Si-Yi; Spurzem, Rainer; Berczik, Peter

    2016-01-01

    We compare the performance of two very different parallel gravitational N-body codes for astrophysical simulations on large Graphics Processing Unit (GPU) clusters, both of which are pioneers in their own fields as well as on certain mutual scales - NBODY6++ and Bonsai. We carry out benchmarks of the two codes by analyzing their performance, accuracy and efficiency through the modeling of structure decomposition and timing measurements. We find that both codes are heavily optimized to leverage the computational potential of GPUs as their performance has approached half of the maximum single precision performance of the underlying GPU cards. With such performance we predict that a speed-up of 200 - 300 can be achieved when up to 1k processors and GPUs are employed simultaneously. We discuss the quantitative information about comparisons of the two codes, finding that in the same cases Bonsai adopts larger time steps as well as larger relative energy errors than NBODY6++, typically ranging from 10 - 50 times larger, depending on the chosen parameters of the codes. Although the two codes are built for different astrophysical applications, in specified conditions they may overlap in performance at certain physical scales, thus allowing the user to choose either one by fine-tuning parameters accordingly.

  19. HOTB: High precision parallel code for calculation of four-particle harmonic oscillator transformation brackets

    NASA Astrophysics Data System (ADS)

    Stepšys, A.; Mickevicius, S.; Germanas, D.; Kalinauskas, R. K.

    2014-11-01

    This new version of the HOTB program for calculation of the three and four particle harmonic oscillator transformation brackets provides some enhancements and corrections to the earlier version (Germanas et al., 2010) [1]. In particular, new version allows calculations of harmonic oscillator transformation brackets be performed in parallel using MPI parallel communication standard. Moreover, higher precision of intermediate calculations using GNU Quadruple Precision and arbitrary precision library FMLib [2] is done. A package of Fortran code is presented. Calculation time of large matrices can be significantly reduced using effective parallel code. Use of Higher Precision methods in intermediate calculations increases the stability of algorithms and extends the validity of used algorithms for larger input values. Catalogue identifier: AEFQ_v4_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEFQ_v4_0.html Program obtainable from: CPC Program Library, Queen’s University of Belfast, N. Ireland Licensing provisions: GNU General Public License, version 3 Number of lines in programs, including test data, etc.: 1711 Number of bytes in distributed programs, including test data, etc.: 11667 Distribution format: tar.gz Program language used: FORTRAN 90 with MPI extensions for parallelism Computer: Any computer with FORTRAN 90 compiler Operating system: Windows, Linux, FreeBSD, True64 Unix Has the code been vectorized of parallelized?: Yes, parallelism using MPI extensions. Number of CPUs used: up to 999 RAM(per CPU core): Depending on allocated binomial and trinomial matrices and use of precision; at least 500 MB Catalogue identifier of previous version: AEFQ_v1_0 Journal reference of previous version: Comput. Phys. Comm. 181, Issue 2, (2010) 420-425 Does the new version supersede the previous version? Yes Nature of problem: Calculation of matrices of three-particle harmonic oscillator brackets (3HOB) and four-particle harmonic oscillator brackets (4HOB) in a more

  20. An ecological and evolutionary perspective on the parallel invasion of two cross-compatible trees

    PubMed Central

    Besnard, Guillaume; Cuneo, Peter

    2016-01-01

    Invasive trees are generally seen as ecosystem-transforming plants that can have significant impacts on native vegetation, and often require management and control. Understanding their history and biology is essential to guide actions of land managers. Here, we present a summary of recent research into the ecology, phylogeography and management of invasive olives, which are now established outside of their native range as high ecological impact invasive trees. The parallel invasion of European and African olive in different climatic zones of Australia provides an interesting case study of invasion, characterized by early genetic admixture between domesticated and wild taxa. Today, the impact of the invasive olives on native vegetation and ecosystem function is of conservation concern, with European olive a declared weed in areas of South Australia, and African olive a declared weed in New South Wales and Pacific islands. Population genetics was used to trace the origins and invasion of both subspecies in Australia, indicating that both olive subspecies have hybridized early after introduction. Research also indicates that African olive populations can establish from a low number of founder individuals even after successive bottlenecks. Modelling based on distributional data from the native and invasive range identified a shift of the realized ecological niche in the Australian invasive range for both olive subspecies, which was particularly marked for African olive. As highly successful and long-lived invaders, olives offer further opportunities to understand the genetic basis of invasion, and we propose that future research examines the history of introduction and admixture, the genetic basis of adaptability and the role of biotic interactions during invasion. Advances on these questions will ultimately improve predictions on the future olive expansion and provide a solid basis for better management of invasive populations. PMID:27519914

  1. An ecological and evolutionary perspective on the parallel invasion of two cross-compatible trees.

    PubMed

    Besnard, Guillaume; Cuneo, Peter

    2016-01-01

    Invasive trees are generally seen as ecosystem-transforming plants that can have significant impacts on native vegetation, and often require management and control. Understanding their history and biology is essential to guide actions of land managers. Here, we present a summary of recent research into the ecology, phylogeography and management of invasive olives, which are now established outside of their native range as high ecological impact invasive trees. The parallel invasion of European and African olive in different climatic zones of Australia provides an interesting case study of invasion, characterized by early genetic admixture between domesticated and wild taxa. Today, the impact of the invasive olives on native vegetation and ecosystem function is of conservation concern, with European olive a declared weed in areas of South Australia, and African olive a declared weed in New South Wales and Pacific islands. Population genetics was used to trace the origins and invasion of both subspecies in Australia, indicating that both olive subspecies have hybridized early after introduction. Research also indicates that African olive populations can establish from a low number of founder individuals even after successive bottlenecks. Modelling based on distributional data from the native and invasive range identified a shift of the realized ecological niche in the Australian invasive range for both olive subspecies, which was particularly marked for African olive. As highly successful and long-lived invaders, olives offer further opportunities to understand the genetic basis of invasion, and we propose that future research examines the history of introduction and admixture, the genetic basis of adaptability and the role of biotic interactions during invasion. Advances on these questions will ultimately improve predictions on the future olive expansion and provide a solid basis for better management of invasive populations.

  2. An ecological and evolutionary perspective on the parallel invasion of two cross-compatible trees.

    PubMed

    Besnard, Guillaume; Cuneo, Peter

    2016-01-01

    Invasive trees are generally seen as ecosystem-transforming plants that can have significant impacts on native vegetation, and often require management and control. Understanding their history and biology is essential to guide actions of land managers. Here, we present a summary of recent research into the ecology, phylogeography and management of invasive olives, which are now established outside of their native range as high ecological impact invasive trees. The parallel invasion of European and African olive in different climatic zones of Australia provides an interesting case study of invasion, characterized by early genetic admixture between domesticated and wild taxa. Today, the impact of the invasive olives on native vegetation and ecosystem function is of conservation concern, with European olive a declared weed in areas of South Australia, and African olive a declared weed in New South Wales and Pacific islands. Population genetics was used to trace the origins and invasion of both subspecies in Australia, indicating that both olive subspecies have hybridized early after introduction. Research also indicates that African olive populations can establish from a low number of founder individuals even after successive bottlenecks. Modelling based on distributional data from the native and invasive range identified a shift of the realized ecological niche in the Australian invasive range for both olive subspecies, which was particularly marked for African olive. As highly successful and long-lived invaders, olives offer further opportunities to understand the genetic basis of invasion, and we propose that future research examines the history of introduction and admixture, the genetic basis of adaptability and the role of biotic interactions during invasion. Advances on these questions will ultimately improve predictions on the future olive expansion and provide a solid basis for better management of invasive populations. PMID:27519914

  3. Manchester code telemetry system for well logging using quasi-parallel inductive-capacitive resonance.

    PubMed

    Xu, Lijun; Chen, Jianjun; Cao, Zhang; Liu, Xingbin; Hu, Jinhai

    2014-07-01

    In this paper, a quasi-parallel inductive-capacitive (LC) resonance method is proposed to improve the recovery of MIL-STD-1553 Manchester code with several frequency components from attenuated, distorted, and drifted signal for data telemetry in well logging, and corresponding telemetry system is developed. Required resonant frequency and quality factor are derived, and the quasi-parallel LC resonant circuit is established at the receiving end of the logging cable to suppress the low-pass filtering effect caused by the distributed capacitance of the cable and provide balanced pass for all the three frequency components of the Manchester code. The performance of the method for various encoding frequencies and cable lengths at different bit energy to noise density ratios (Eb/No) have been evaluated in the simulation. A 5 km single-core cable used in on-site well logging and various encoding frequencies were employed to verify the proposed telemetry system in the experiment. Results obtained demonstrate that the telemetry system is feasible and effective to improve the code recovery in terms of anti-attenuation, anti-distortion, and anti-drift performances, decrease the bit error rate, and increase the reachable transmission rate and distance greatly.

  4. Manchester code telemetry system for well logging using quasi-parallel inductive-capacitive resonance

    NASA Astrophysics Data System (ADS)

    Xu, Lijun; Chen, Jianjun; Cao, Zhang; Liu, Xingbin; Hu, Jinhai

    2014-07-01

    In this paper, a quasi-parallel inductive-capacitive (LC) resonance method is proposed to improve the recovery of MIL-STD-1553 Manchester code with several frequency components from attenuated, distorted, and drifted signal for data telemetry in well logging, and corresponding telemetry system is developed. Required resonant frequency and quality factor are derived, and the quasi-parallel LC resonant circuit is established at the receiving end of the logging cable to suppress the low-pass filtering effect caused by the distributed capacitance of the cable and provide balanced pass for all the three frequency components of the Manchester code. The performance of the method for various encoding frequencies and cable lengths at different bit energy to noise density ratios (Eb/No) have been evaluated in the simulation. A 5 km single-core cable used in on-site well logging and various encoding frequencies were employed to verify the proposed telemetry system in the experiment. Results obtained demonstrate that the telemetry system is feasible and effective to improve the code recovery in terms of anti-attenuation, anti-distortion, and anti-drift performances, decrease the bit error rate, and increase the reachable transmission rate and distance greatly.

  5. Portable implementation of implicit methods for the UEDGE and BOUT codes on parallel computers

    SciTech Connect

    Rognlien, T D; Xu, X Q

    1999-02-17

    A description is given of the parallelization algorithms and results for two codes used ex- tensively to model edge-plasmas in magnetic fusion energy devices. The codes are UEDGE, which calculates two-dimensional plasma and neutral gas profiles, and BOUT, which cal- culates three-dimensional plasma turbulence using experimental or UEDGE profiles. Both codes describe the plasma behavior using fluid equations. A domain decomposition model is used for parallelization by dividing the global spatial simulation region into a set of domains. This approach allows the used of two recently developed LLNL Newton-Krylov numerical solvers, PVODE and KINSOL. Results show an order of magnitude speed up in execution time for the plasma equations with UEDGE. A problem which is identified for UEDGE is the solution of the fluid gas equations on a highly anisotropic mesh. The speed up of BOUT is closer to two orders of magnitude, especially if one includes the initial improvement from switching to the fully implicit Newton-Krylov solver. The turbulent transport coefficients obtained from BOUT guide the use of anomalous transport models within UEDGE, with the eventual goal of a self-consistent coupling.

  6. Coding trees and boundaries of attracting basins for some entire maps

    NASA Astrophysics Data System (ADS)

    Baranski, Krzysztof; Karpinska, Boguslawa

    2007-02-01

    Let f be an entire transcendental map, such that all the singularities of f-1 are contained in a compact subset of the immediate basin B(z0) of an attracting fixed point z0. We study the structure of the Julia set of f, which is equal to the boundary of B(z0), and the behaviour of the Riemann mapping phiv onto B(z0) using the technique of geometric coding trees of preimages of points from B(z0). We show that for a given symbolic itinerary, if codes of the tracts of f are bounded and codes of the fundamental domains grow no faster than the iterates of an exponential function, then there exists a point in the Julia set with this itinerary. Moreover, we determine cluster sets for phiv and show that phiv has an unrestricted limit equal to ∞ at points of a dense uncountable set in the unit circle.

  7. Parallel changes in mate-attracting calls and female preferences in autotriploid tree frogs

    PubMed Central

    Tucker, Mitch A.; Gerhardt, H. C.

    2012-01-01

    For polyploid species to persist, they must be reproductively isolated from their diploid parental species, which coexist at the same time and place at least initially. In a complex of biparentally reproducing tetraploid and diploid tree frogs in North America, selective phonotaxis—mediated by differences in the pulse-repetition (pulse rate) of their mate-attracting vocalizations—ensures assortative mating. We show that artificially produced autotriploid females of the diploid species (Hyla chrysoscelis) show a shift in pulse-rate preference in the direction of the pulse rate produced by males of the tetraploid species (Hyla versicolor). The estimated preference function is centred near the mean pulse rate of the calls of artificially produced male autotriploids. Such a parallel shift, which is caused by polyploidy per se and whose magnitude is expected to be greater in autotetraploids, may have facilitated sympatric speciation by promoting reproductive isolation of the initially formed polyploids from their diploid parental forms. This process also helps to explain why tetraploid lineages with different origins have similar advertisement calls and freely interbreed. PMID:22113033

  8. Cupid: Cluster-Based Exploration of Geometry Generators with Parallel Coordinates and Radial Trees.

    PubMed

    Beham, Michael; Herzner, Wolfgang; Gröller, M Eduard; Kehrer, Johannes

    2014-12-01

    Geometry generators are commonly used in video games and evaluation systems for computer vision to create geometric shapes such as terrains, vegetation or airplanes. The parameters of the generator are often sampled automatically which can lead to many similar or unwanted geometric shapes. In this paper, we propose a novel visual exploration approach that combines the abstract parameter space of the geometry generator with the resulting 3D shapes in a composite visualization. Similar geometric shapes are first grouped using hierarchical clustering and then nested within an illustrative parallel coordinates visualization. This helps the user to study the sensitivity of the generator with respect to its parameter space and to identify invalid parameter settings. Starting from a compact overview representation, the user can iteratively drill-down into local shape differences by clicking on the respective clusters. Additionally, a linked radial tree gives an overview of the cluster hierarchy and enables the user to manually split or merge clusters. We evaluate our approach by exploring the parameter space of a cup generator and provide feedback from domain experts. PMID:26356883

  9. Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

    DOE PAGESBeta

    Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; Davidson, Gregory G.; Hamilton, Steven P.; Godfrey, Andrew T.

    2015-12-21

    This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Somemore » specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000® problems. These benchmark and scaling studies show promising results.« less

  10. Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

    SciTech Connect

    Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; Davidson, Gregory G.; Hamilton, Steven P.; Godfrey, Andrew T.

    2015-12-21

    This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Some specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000® problems. These benchmark and scaling studies show promising results.

  11. Shared Memory Parallelization of an Implicit ADI-type CFD Code

    NASA Technical Reports Server (NTRS)

    Hauser, Th.; Huang, P. G.

    1999-01-01

    A parallelization study designed for ADI-type algorithms is presented using the OpenMP specification for shared-memory multiprocessor programming. Details of optimizations specifically addressed to cache-based computer architectures are described and performance measurements for the single and multiprocessor implementation are summarized. The paper demonstrates that optimization of memory access on a cache-based computer architecture controls the performance of the computational algorithm. A hybrid MPI/OpenMP approach is proposed for clusters of shared memory machines to further enhance the parallel performance. The method is applied to develop a new LES/DNS code, named LESTool. A preliminary DNS calculation of a fully developed channel flow at a Reynolds number of 180, Re(sub tau) = 180, has shown good agreement with existing data.

  12. MC++: A parallel, portable, Monte Carlo neutron transport code in C++

    SciTech Connect

    Lee, S.R.; Cummings, J.C.; Nolen, S.D.

    1997-03-01

    MC++ is an implicit multi-group Monte Carlo neutron transport code written in C++ and based on the Parallel Object-Oriented Methods and Applications (POOMA) class library. MC++ runs in parallel on and is portable to a wide variety of platforms, including MPPs, SMPs, and clusters of UNIX workstations. MC++ is being developed to provide transport capabilities to the Accelerated Strategic Computing Initiative (ASCI). It is also intended to form the basis of the first transport physics framework (TPF), which is a C++ class library containing appropriate abstractions, objects, and methods for the particle transport problem. The transport problem is briefly described, as well as the current status and algorithms in MC++ for solving the transport equation. The alpha version of the POOMA class library is also discussed, along with the implementation of the transport solution algorithms using POOMA. Finally, a simple test problem is defined and performance and physics results from this problem are discussed on a variety of platforms.

  13. Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

    NASA Astrophysics Data System (ADS)

    Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; Davidson, Gregory G.; Hamilton, Steven P.; Godfrey, Andrew T.

    2016-03-01

    This work discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package authored at Oak Ridge National Laboratory. Shift has been developed to scale well from laptops to small computing clusters to advanced supercomputers and includes features such as support for multiple geometry and physics engines, hybrid capabilities for variance reduction methods such as the Consistent Adjoint-Driven Importance Sampling methodology, advanced parallel decompositions, and tally methods optimized for scalability on supercomputing architectures. The scaling studies presented in this paper demonstrate good weak and strong scaling behavior for the implemented algorithms. Shift has also been validated and verified against various reactor physics benchmarks, including the Consortium for Advanced Simulation of Light Water Reactors' Virtual Environment for Reactor Analysis criticality test suite and several Westinghouse AP1000® problems presented in this paper. These benchmark results compare well to those from other contemporary Monte Carlo codes such as MCNP5 and KENO.

  14. Parallelism to solute transport code MT3DMS and case study in TU. Freiberg

    NASA Astrophysics Data System (ADS)

    Abdelaziz, Ramadan; Leb, Hai Ha

    2014-05-01

    A parallel software for 3-D Multi-Species Transport Model MT3DMS was developed. Open Multiprocessing (OpenMP) was used for communication within the processors. MT3DMS emulated the solute transport by dividing the calculation into flow and transport steps. A new preconditioner, derived from Symmetric Successive Over Relaxation (SSOR), is added into the generalized conjugate gradient solver. A case study in the test field at TU Bergakademie Freiberg was used to produce the results and analyze the code performance. A demonstration test field indicated that the parallel mode for MT3DMS is accessible within a processor and problem size. A low timeframe occurs due to speedups for the field test of the solute transport model.

  15. Multi-Zone Liquid Thrust Chamber Performance Code with Domain Decomposition for Parallel Processing

    NASA Technical Reports Server (NTRS)

    Navaz, Homayun K.

    2002-01-01

    -equation turbulence model, and two-phase flow. To overcome these limitations, the LTCP code is rewritten to include the multi-zone capability with domain decomposition that makes it suitable for parallel processing, i.e., enabling the code to run every zone or sub-domain on a separate processor. This can reduce the run time by a factor of 6 to 8, depending on the problem.

  16. Grid-based Parallel Data Streaming Implemented for the Gyrokinetic Toroidal Code

    SciTech Connect

    S. Klasky; S. Ethier; Z. Lin; K. Martins; D. McCune; R. Samtaney

    2003-09-15

    We have developed a threaded parallel data streaming approach using Globus to transfer multi-terabyte simulation data from a remote supercomputer to the scientist's home analysis/visualization cluster, as the simulation executes, with negligible overhead. Data transfer experiments show that this concurrent data transfer approach is more favorable compared with writing to local disk and then transferring this data to be post-processed. The present approach is conducive to using the grid to pipeline the simulation with post-processing and visualization. We have applied this method to the Gyrokinetic Toroidal Code (GTC), a 3-dimensional particle-in-cell code used to study microturbulence in magnetic confinement fusion from first principles plasma theory.

  17. Coding for Parallel Links to Maximize the Expected Value of Decodable Messages

    NASA Technical Reports Server (NTRS)

    Klimesh, Matthew A.; Chang, Christopher S.

    2011-01-01

    When multiple parallel communication links are available, it is useful to consider link-utilization strategies that provide tradeoffs between reliability and throughput. Interesting cases arise when there are three or more available links. Under the model considered, the links have known probabilities of being in working order, and each link has a known capacity. The sender has a number of messages to send to the receiver. Each message has a size and a value (i.e., a worth or priority). Messages may be divided into pieces arbitrarily, and the value of each piece is proportional to its size. The goal is to choose combinations of messages to send on the links so that the expected value of the messages decodable by the receiver is maximized. There are three parts to the innovation: (1) Applying coding to parallel links under the model; (2) Linear programming formulation for finding the optimal combinations of messages to send on the links; and (3) Algorithms for assisting in finding feasible combinations of messages, as support for the linear programming formulation. There are similarities between this innovation and methods developed in the field of network coding. However, network coding has generally been concerned with either maximizing throughput in a fixed network, or robust communication of a fixed volume of data. In contrast, under this model, the throughput is expected to vary depending on the state of the network. Examples of error-correcting codes that are useful under this model but which are not needed under previous models have been found. This model can represent either a one-shot communication attempt, or a stream of communications. Under the one-shot model, message sizes and link capacities are quantities of information (e.g., measured in bits), while under the communications stream model, message sizes and link capacities are information rates (e.g., measured in bits/second). This work has the potential to increase the value of data returned from

  18. ALEGRA -- A massively parallel h-adaptive code for solid dynamics

    SciTech Connect

    Summers, R.M.; Wong, M.K.; Boucheron, E.A.; Weatherby, J.R.

    1997-12-31

    ALEGRA is a multi-material, arbitrary-Lagrangian-Eulerian (ALE) code for solid dynamics designed to run on massively parallel (MP) computers. It combines the features of modern Eulerian shock codes, such as CTH, with modern Lagrangian structural analysis codes using an unstructured grid. ALEGRA is being developed for use on the teraflop supercomputers to conduct advanced three-dimensional (3D) simulations of shock phenomena important to a variety of systems. ALEGRA was designed with the Single Program Multiple Data (SPMD) paradigm, in which the mesh is decomposed into sub-meshes so that each processor gets a single sub-mesh with approximately the same number of elements. Using this approach the authors have been able to produce a single code that can scale from one processor to thousands of processors. A current major effort is to develop efficient, high precision simulation capabilities for ALEGRA, without the computational cost of using a global highly resolved mesh, through flexible, robust h-adaptivity of finite elements. H-adaptivity is the dynamic refinement of the mesh by subdividing elements, thus changing the characteristic element size and reducing numerical error. The authors are working on several major technical challenges that must be met to make effective use of HAMMER on MP computers.

  19. HOTB: High precision parallel code for calculation of four-particle harmonic oscillator transformation brackets

    NASA Astrophysics Data System (ADS)

    Stepšys, A.; Mickevicius, S.; Germanas, D.; Kalinauskas, R. K.

    2014-11-01

    This new version of the HOTB program for calculation of the three and four particle harmonic oscillator transformation brackets provides some enhancements and corrections to the earlier version (Germanas et al., 2010) [1]. In particular, new version allows calculations of harmonic oscillator transformation brackets be performed in parallel using MPI parallel communication standard. Moreover, higher precision of intermediate calculations using GNU Quadruple Precision and arbitrary precision library FMLib [2] is done. A package of Fortran code is presented. Calculation time of large matrices can be significantly reduced using effective parallel code. Use of Higher Precision methods in intermediate calculations increases the stability of algorithms and extends the validity of used algorithms for larger input values. Catalogue identifier: AEFQ_v4_0 Program summary URL: http://cpc.cs.qub.ac.uk/summaries/AEFQ_v4_0.html Program obtainable from: CPC Program Library, Queen’s University of Belfast, N. Ireland Licensing provisions: GNU General Public License, version 3 Number of lines in programs, including test data, etc.: 1711 Number of bytes in distributed programs, including test data, etc.: 11667 Distribution format: tar.gz Program language used: FORTRAN 90 with MPI extensions for parallelism Computer: Any computer with FORTRAN 90 compiler Operating system: Windows, Linux, FreeBSD, True64 Unix Has the code been vectorized of parallelized?: Yes, parallelism using MPI extensions. Number of CPUs used: up to 999 RAM(per CPU core): Depending on allocated binomial and trinomial matrices and use of precision; at least 500 MB Catalogue identifier of previous version: AEFQ_v1_0 Journal reference of previous version: Comput. Phys. Comm. 181, Issue 2, (2010) 420-425 Does the new version supersede the previous version? Yes Nature of problem: Calculation of matrices of three-particle harmonic oscillator brackets (3HOB) and four-particle harmonic oscillator brackets (4HOB) in a more

  20. OpenGeoSys-GEMS: Hybrid parallelization of a reactive transport code with MPI and threads

    NASA Astrophysics Data System (ADS)

    Kosakowski, G.; Kulik, D. A.; Shao, H.

    2012-04-01

    OpenGeoSys-GEMS is a generic purpose reactive transport code based on the operator splitting approach. The code couples the Finite-Element groundwater flow and multi-species transport modules of the OpenGeoSys (OGS) project (http://www.ufz.de/index.php?en=18345) with the GEM-Selektor research package to model thermodynamic equilibrium of aquatic (geo)chemical systems utilizing the Gibbs Energy Minimization approach (http://gems.web.psi.ch/). The combination of OGS and the GEM-Selektor kernel (GEMS3K) is highly flexible due to the object-oriented modular code structures and the well defined (memory based) data exchange modules. Like other reactive transport codes, the practical applicability of OGS-GEMS is often hampered by the long calculation time and large memory requirements. • For realistic geochemical systems which might include dozens of mineral phases and several (non-ideal) solid solutions the time needed to solve the chemical system with GEMS3K may increase exceptionally. • The codes are coupled in a sequential non-iterative loop. In order to keep the accuracy, the time step size is restricted. In combination with a fine spatial discretization the time step size may become very small which increases calculation times drastically even for small 1D problems. • The current version of OGS is not optimized for memory use and the MPI version of OGS does not distribute data between nodes. Even for moderately small 2D problems the number of MPI processes that fit into memory of up-to-date workstations or HPC hardware is limited. One strategy to overcome the above mentioned restrictions of OGS-GEMS is to parallelize the coupled code. For OGS a parallelized version already exists. It is based on a domain decomposition method implemented with MPI and provides a parallel solver for fluid and mass transport processes. In the coupled code, after solving fluid flow and solute transport, geochemical calculations are done in form of a central loop over all finite

  1. Dynamical evolution of massive black holes in galactic-scale N-body simulations - introducing the regularized tree code `rVINE'

    NASA Astrophysics Data System (ADS)

    Karl, Simon J.; Aarseth, Sverre J.; Naab, Thorsten; Haehnelt, Martin G.; Spurzem, Rainer

    2015-09-01

    We present a hybrid code combining the OpenMP-parallel tree code VINE with an algorithmic chain regularization scheme. The new code, called `rVINE', aims to significantly improve the accuracy of close encounters of massive bodies with supermassive black holes (SMBHs) in galaxy-scale numerical simulations. We demonstrate the capabilities of the code by studying two test problems, the sinking of a single massive black hole to the centre of a gas-free galaxy due to dynamical friction and the hardening of an SMBH binary due to close stellar encounters. We show that results obtained with rVINE compare well with NBODY7 for problems with particle numbers that can be simulated with NBODY7. In particular, in both NBODY7 and rVINE we find a clear N-dependence of the binary hardening rate, a low binary eccentricity and moderate eccentricity evolution, as well as the conversion of the galaxy's inner density profile from a cusp to a core via the ejection of stars at high velocity. The much larger number of particles that can be handled by rVINE will open up exciting opportunities to model stellar dynamics close to SMBHs much more accurately in a realistic galactic context. This will help to remedy the inherent limitations of commonly used tree solvers to follow the correct dynamical evolution of black holes in galaxy-scale simulations.

  2. Implementation and performance of FDPS: a framework for developing parallel particle simulation codes

    NASA Astrophysics Data System (ADS)

    Iwasawa, Masaki; Tanikawa, Ataru; Hosono, Natsuki; Nitadori, Keigo; Muranushi, Takayuki; Makino, Junichiro

    2016-08-01

    We present the basic idea, implementation, measured performance, and performance model of FDPS (Framework for Developing Particle Simulators). FDPS is an application-development framework which helps researchers to develop simulation programs using particle methods for large-scale distributed-memory parallel supercomputers. A particle-based simulation program for distributed-memory parallel computers needs to perform domain decomposition, exchange of particles which are not in the domain of each computing node, and gathering of the particle information in other nodes which are necessary for interaction calculation. Also, even if distributed-memory parallel computers are not used, in order to reduce the amount of computation, algorithms such as the Barnes-Hut tree algorithm or the Fast Multipole Method should be used in the case of long-range interactions. For short-range interactions, some methods to limit the calculation to neighbor particles are required. FDPS provides all of these functions which are necessary for efficient parallel execution of particle-based simulations as "templates," which are independent of the actual data structure of particles and the functional form of the particle-particle interaction. By using FDPS, researchers can write their programs with the amount of work necessary to write a simple, sequential and unoptimized program of O(N2) calculation cost, and yet the program, once compiled with FDPS, will run efficiently on large-scale parallel supercomputers. A simple gravitational N-body program can be written in around 120 lines. We report the actual performance of these programs and the performance model. The weak scaling performance is very good, and almost linear speed-up was obtained for up to the full system of the K computer. The minimum calculation time per timestep is in the range of 30 ms (N = 107) to 300 ms (N = 109). These are currently limited by the time for the calculation of the domain decomposition and communication

  3. Hybrid parallel code acceleration methods in full-core reactor physics calculations

    SciTech Connect

    Courau, T.; Plagne, L.; Ponicot, A.; Sjoden, G.

    2012-07-01

    When dealing with nuclear reactor calculation schemes, the need for three dimensional (3D) transport-based reference solutions is essential for both validation and optimization purposes. Considering a benchmark problem, this work investigates the potential of discrete ordinates (Sn) transport methods applied to 3D pressurized water reactor (PWR) full-core calculations. First, the benchmark problem is described. It involves a pin-by-pin description of a 3D PWR first core, and uses a 8-group cross-section library prepared with the DRAGON cell code. Then, a convergence analysis is performed using the PENTRAN parallel Sn Cartesian code. It discusses the spatial refinement and the associated angular quadrature required to properly describe the problem physics. It also shows that initializing the Sn solution with the EDF SPN solver COCAGNE reduces the number of iterations required to converge by nearly a factor of 6. Using a best estimate model, PENTRAN results are then compared to multigroup Monte Carlo results obtained with the MCNP5 code. Good consistency is observed between the two methods (Sn and Monte Carlo), with discrepancies that are less than 25 pcm for the k{sub eff}, and less than 2.1% and 1.6% for the flux at the pin-cell level and for the pin-power distribution, respectively. (authors)

  4. Acceleration of the Geostatistical Software Library (GSLIB) by code optimization and hybrid parallel programming

    NASA Astrophysics Data System (ADS)

    Peredo, Oscar; Ortiz, Julián M.; Herrero, José R.

    2015-12-01

    The Geostatistical Software Library (GSLIB) has been used in the geostatistical community for more than thirty years. It was designed as a bundle of sequential Fortran codes, and today it is still in use by many practitioners and researchers. Despite its widespread use, few attempts have been reported in order to bring this package to the multi-core era. Using all CPU resources, GSLIB algorithms can handle large datasets and grids, where tasks are compute- and memory-intensive applications. In this work, a methodology is presented to accelerate GSLIB applications using code optimization and hybrid parallel processing, specifically for compute-intensive applications. Minimal code modifications are added decreasing as much as possible the elapsed time of execution of the studied routines. If multi-core processing is available, the user can activate OpenMP directives to speed up the execution using all resources of the CPU. If multi-node processing is available, the execution is enhanced using MPI messages between the compute nodes.Four case studies are presented: experimental variogram calculation, kriging estimation, sequential gaussian and indicator simulation. For each application, three scenarios (small, large and extra large) are tested using a desktop environment with 4 CPU-cores and a multi-node server with 128 CPU-nodes. Elapsed times, speedup and efficiency results are shown.

  5. Delta: An object-oriented finite element code architecture for massively parallel computers

    SciTech Connect

    Weatherby, J.R.; Schutt, J.A.; Peery, J.S.; Hogan, R.E.

    1996-02-01

    Delta is an object-oriented code architecture based on the finite element method which enables simulation of a wide range of engineering mechanics problems in a parallel processing environment. Written in C{sup ++}, Delta is a natural framework for algorithm development and for research involving coupling of mechanics from different Engineering Science disciplines. To enhance flexibility and encourage code reuse, the architecture provides a clean separation of the major aspects of finite element programming. Spatial discretization, temporal discretization, and the solution of linear and nonlinear systems of equations are each implemented separately, independent from the governing field equations. Other attractive features of the Delta architecture include support for constitutive models with internal variables, reusable ``matrix-free`` equation solvers, and support for region-to-region variations in the governing equations and the active degrees of freedom. A demonstration code built from the Delta architecture has been used in two-dimensional and three-dimensional simulations involving dynamic and quasi-static solid mechanics, transient and steady heat transport, and flow in porous media.

  6. A 3D Parallel Beam Dynamics Code for Modeling High Brightness Beams in Photoinjectors

    SciTech Connect

    Qiang, Ji; Lidia, S.; Ryne, R.D.; Limborg, C.; /SLAC

    2006-02-13

    In this paper we report on IMPACT-T, a 3D beam dynamics code for modeling high brightness beams in photoinjectors and rf linacs. IMPACT-T is one of the few codes used in the photoinjector community that has a parallel implementation, making it very useful for high statistics simulations of beam halos and beam diagnostics. It has a comprehensive set of beamline elements, and furthermore allows arbitrary overlap of their fields. It is unique in its use of space-charge solvers based on an integrated Green function to efficiently and accurately treat beams with large aspect ratio, and a shifted Green function to efficiently treat image charge effects of a cathode. It is also unique in its inclusion of energy binning in the space-charge calculation to model beams with large energy spread. Together, all these features make IMPACT-T a powerful and versatile tool for modeling beams in photoinjectors and other systems. In this paper we describe the code features and present results of IMPACT-T simulations of the LCLS photoinjectors. We also include a comparison of IMPACT-T and PARMELA results.

  7. A 3d Parallel Beam Dynamics Code for Modeling High BrightnessBeams in Photoinjectors

    SciTech Connect

    Qiang, J.; Lidia, S.; Ryne, R.; Limborg, C.

    2005-05-16

    In this paper we report on IMPACT-T, a 3D beam dynamics code for modeling high brightness beams in photoinjectors and rf linacs. IMPACT-T is one of the few codes used in the photoinjector community that has a parallel implementation, making it very useful for high statistics simulations of beam halos and beam diagnostics. It has a comprehensive set of beamline elements, and furthermore allows arbitrary overlap of their fields. It is unique in its use of space-charge solvers based on an integrated Green function to efficiently and accurately treat beams with large aspect ratio, and a shifted Green function to efficiently treat image charge effects of a cathode. It is also unique in its inclusion of energy binning in the space-charge calculation to model beams with large energy spread. Together, all these features make IMPACT-T a powerful and versatile tool for modeling beams in photoinjectors and other systems. In this paper we describe the code features and present results of IMPACT-T simulations of the LCLS photoinjectors. We also include a comparison of IMPACT-T and PARMELA results.

  8. L-PICOLA: A parallel code for fast dark matter simulation

    NASA Astrophysics Data System (ADS)

    Howlett, C.; Manera, M.; Percival, W. J.

    2015-09-01

    Robust measurements based on current large-scale structure surveys require precise knowledge of statistical and systematic errors. This can be obtained from large numbers of realistic mock galaxy catalogues that mimic the observed distribution of galaxies within the survey volume. To this end we present a fast, distributed-memory, planar-parallel code, L-PICOLA, which can be used to generate and evolve a set of initial conditions into a dark matter field much faster than a full non-linear N-Body simulation. Additionally, L-PICOLA has the ability to include primordial non-Gaussianity in the simulation and simulate the past lightcone at run-time, with optional replication of the simulation volume. Through comparisons to fully non-linear N-Body simulations we find that our code can reproduce the z = 0 power spectrum and reduced bispectrum of dark matter to within 2% and 5% respectively on all scales of interest to measurements of Baryon Acoustic Oscillations and Redshift Space Distortions, but 3 orders of magnitude faster. The accuracy, speed and scalability of this code, alongside the additional features we have implemented, make it extremely useful for both current and next generation large-scale structure surveys. L-PICOLA is publicly available at https://cullanhowlett.github.io/l-picola.

  9. Hybrid threshold adaptable quantum secret sharing scheme with reverse Huffman-Fibonacci-tree coding.

    PubMed

    Lai, Hong; Zhang, Jun; Luo, Ming-Xing; Pan, Lei; Pieprzyk, Josef; Xiao, Fuyuan; Orgun, Mehmet A

    2016-01-01

    With prevalent attacks in communication, sharing a secret between communicating parties is an ongoing challenge. Moreover, it is important to integrate quantum solutions with classical secret sharing schemes with low computational cost for the real world use. This paper proposes a novel hybrid threshold adaptable quantum secret sharing scheme, using an m-bonacci orbital angular momentum (OAM) pump, Lagrange interpolation polynomials, and reverse Huffman-Fibonacci-tree coding. To be exact, we employ entangled states prepared by m-bonacci sequences to detect eavesdropping. Meanwhile, we encode m-bonacci sequences in Lagrange interpolation polynomials to generate the shares of a secret with reverse Huffman-Fibonacci-tree coding. The advantages of the proposed scheme is that it can detect eavesdropping without joint quantum operations, and permits secret sharing for an arbitrary but no less than threshold-value number of classical participants with much lower bandwidth. Also, in comparison with existing quantum secret sharing schemes, it still works when there are dynamic changes, such as the unavailability of some quantum channel, the arrival of new participants and the departure of participants. Finally, we provide security analysis of the new hybrid quantum secret sharing scheme and discuss its useful features for modern applications. PMID:27515908

  10. Hybrid threshold adaptable quantum secret sharing scheme with reverse Huffman-Fibonacci-tree coding.

    PubMed

    Lai, Hong; Zhang, Jun; Luo, Ming-Xing; Pan, Lei; Pieprzyk, Josef; Xiao, Fuyuan; Orgun, Mehmet A

    2016-01-01

    With prevalent attacks in communication, sharing a secret between communicating parties is an ongoing challenge. Moreover, it is important to integrate quantum solutions with classical secret sharing schemes with low computational cost for the real world use. This paper proposes a novel hybrid threshold adaptable quantum secret sharing scheme, using an m-bonacci orbital angular momentum (OAM) pump, Lagrange interpolation polynomials, and reverse Huffman-Fibonacci-tree coding. To be exact, we employ entangled states prepared by m-bonacci sequences to detect eavesdropping. Meanwhile, we encode m-bonacci sequences in Lagrange interpolation polynomials to generate the shares of a secret with reverse Huffman-Fibonacci-tree coding. The advantages of the proposed scheme is that it can detect eavesdropping without joint quantum operations, and permits secret sharing for an arbitrary but no less than threshold-value number of classical participants with much lower bandwidth. Also, in comparison with existing quantum secret sharing schemes, it still works when there are dynamic changes, such as the unavailability of some quantum channel, the arrival of new participants and the departure of participants. Finally, we provide security analysis of the new hybrid quantum secret sharing scheme and discuss its useful features for modern applications.

  11. Hybrid threshold adaptable quantum secret sharing scheme with reverse Huffman-Fibonacci-tree coding

    PubMed Central

    Lai, Hong; Zhang, Jun; Luo, Ming-Xing; Pan, Lei; Pieprzyk, Josef; Xiao, Fuyuan; Orgun, Mehmet A.

    2016-01-01

    With prevalent attacks in communication, sharing a secret between communicating parties is an ongoing challenge. Moreover, it is important to integrate quantum solutions with classical secret sharing schemes with low computational cost for the real world use. This paper proposes a novel hybrid threshold adaptable quantum secret sharing scheme, using an m-bonacci orbital angular momentum (OAM) pump, Lagrange interpolation polynomials, and reverse Huffman-Fibonacci-tree coding. To be exact, we employ entangled states prepared by m-bonacci sequences to detect eavesdropping. Meanwhile, we encode m-bonacci sequences in Lagrange interpolation polynomials to generate the shares of a secret with reverse Huffman-Fibonacci-tree coding. The advantages of the proposed scheme is that it can detect eavesdropping without joint quantum operations, and permits secret sharing for an arbitrary but no less than threshold-value number of classical participants with much lower bandwidth. Also, in comparison with existing quantum secret sharing schemes, it still works when there are dynamic changes, such as the unavailability of some quantum channel, the arrival of new participants and the departure of participants. Finally, we provide security analysis of the new hybrid quantum secret sharing scheme and discuss its useful features for modern applications. PMID:27515908

  12. Hybrid threshold adaptable quantum secret sharing scheme with reverse Huffman-Fibonacci-tree coding

    NASA Astrophysics Data System (ADS)

    Lai, Hong; Zhang, Jun; Luo, Ming-Xing; Pan, Lei; Pieprzyk, Josef; Xiao, Fuyuan; Orgun, Mehmet A.

    2016-08-01

    With prevalent attacks in communication, sharing a secret between communicating parties is an ongoing challenge. Moreover, it is important to integrate quantum solutions with classical secret sharing schemes with low computational cost for the real world use. This paper proposes a novel hybrid threshold adaptable quantum secret sharing scheme, using an m-bonacci orbital angular momentum (OAM) pump, Lagrange interpolation polynomials, and reverse Huffman-Fibonacci-tree coding. To be exact, we employ entangled states prepared by m-bonacci sequences to detect eavesdropping. Meanwhile, we encode m-bonacci sequences in Lagrange interpolation polynomials to generate the shares of a secret with reverse Huffman-Fibonacci-tree coding. The advantages of the proposed scheme is that it can detect eavesdropping without joint quantum operations, and permits secret sharing for an arbitrary but no less than threshold-value number of classical participants with much lower bandwidth. Also, in comparison with existing quantum secret sharing schemes, it still works when there are dynamic changes, such as the unavailability of some quantum channel, the arrival of new participants and the departure of participants. Finally, we provide security analysis of the new hybrid quantum secret sharing scheme and discuss its useful features for modern applications.

  13. Fortran code for SU(3) lattice gauge theory with and without MPI checkerboard parallelization

    NASA Astrophysics Data System (ADS)

    Berg, Bernd A.; Wu, Hao

    2012-10-01

    We document plain Fortran and Fortran MPI checkerboard code for Markov chain Monte Carlo simulations of pure SU(3) lattice gauge theory with the Wilson action in D dimensions. The Fortran code uses periodic boundary conditions and is suitable for pedagogical purposes and small scale simulations. For the Fortran MPI code two geometries are covered: the usual torus with periodic boundary conditions and the double-layered torus as defined in the paper. Parallel computing is performed on checkerboards of sublattices, which partition the full lattice in one, two, and so on, up to D directions (depending on the parameters set). For updating, the Cabibbo-Marinari heatbath algorithm is used. We present validations and test runs of the code. Performance is reported for a number of currently used Fortran compilers and, when applicable, MPI versions. For the parallelized code, performance is studied as a function of the number of processors. Program summary Program title: STMC2LSU3MPI Catalogue identifier: AEMJ_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEMJ_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 26666 No. of bytes in distributed program, including test data, etc.: 233126 Distribution format: tar.gz Programming language: Fortran 77 compatible with the use of Fortran 90/95 compilers, in part with MPI extensions. Computer: Any capable of compiling and executing Fortran 77 or Fortran 90/95, when needed with MPI extensions. Operating system: Red Hat Enterprise Linux Server 6.1 with OpenMPI + pgf77 11.8-0, Centos 5.3 with OpenMPI + gfortran 4.1.2, Cray XT4 with MPICH2 + pgf90 11.2-0. Has the code been vectorised or parallelized?: Yes, parallelized using MPI extensions. Number of processors used: 2 to 11664 RAM: 200 Mega bytes per process. Classification: 11

  14. An object-oriented implementation of a parallel Monte Carlo code for radiation transport

    NASA Astrophysics Data System (ADS)

    Santos, Pedro Duarte; Lani, Andrea

    2016-05-01

    This paper describes the main features of a state-of-the-art Monte Carlo solver for radiation transport which has been implemented within COOLFluiD, a world-class open source object-oriented platform for scientific simulations. The Monte Carlo code makes use of efficient ray tracing algorithms (for 2D, axisymmetric and 3D arbitrary unstructured meshes) which are described in detail. The solver accuracy is first verified in testcases for which analytical solutions are available, then validated for a space re-entry flight experiment (i.e. FIRE II) for which comparisons against both experiments and reference numerical solutions are provided. Through the flexible design of the physical models, ray tracing and parallelization strategy (fully reusing the mesh decomposition inherited by the fluid simulator), the implementation was made efficient and reusable.

  15. LCODE: A parallel quasistatic code for computationally heavy problems of plasma wakefield acceleration

    NASA Astrophysics Data System (ADS)

    Sosedkin, A. P.; Lotov, K. V.

    2016-09-01

    LCODE is a freely distributed quasistatic 2D3V code for simulating plasma wakefield acceleration, mainly specialized at resource-efficient studies of long-term propagation of ultrarelativistic particle beams in plasmas. The beam is modeled with fully relativistic macro-particles in a simulation window copropagating with the light velocity; the plasma can be simulated with either kinetic or fluid model. Several techniques are used to obtain exceptional numerical stability and precision while maintaining high resource efficiency, enabling LCODE to simulate the evolution of long particle beams over long propagation distances even on a laptop. A recent upgrade enabled LCODE to perform the calculations in parallel. A pipeline of several LCODE processes communicating via MPI (Message-Passing Interface) is capable of executing multiple consecutive time steps of the simulation in a single pass. This approach can speed up the calculations by hundreds of times.

  16. Global Trees: A Framework for Linked Data Structures on Distributed Memory Parallel Systems

    SciTech Connect

    Larkins, D. B.; Dinan, James S.; Krishnamoorthy, Sriram; Parthasarathy, Srinivasan; Rountev, Atanas; Sadayappan, Ponnuswamy

    2008-11-17

    This paper describes the Global Trees (GT) system that provides a multi-layered interface to a global address space view of distributed tree data structures, while providing scalable performance on distributed memory systems. The Global Trees system utilizes coarse-grained data movement to enhance locality and communication efficiency. We describe the design and implementation of GT, illustrate its use in the context of a gravitational simulation application, and provide experimental results that demonstrate the effectiveness of the approach. The key benefits of using this system include efficient sharedmemory style programming of distributed trees, tree-specific optimizations for data access and computation, and the ability to customize many aspects of GT to optimize application performance.

  17. Development of Parallel Computing Framework to Enhance Radiation Transport Code Capabilities for Rare Isotope Beam Facility Design

    SciTech Connect

    Kostin, Mikhail; Mokhov, Nikolai; Niita, Koji

    2013-09-25

    A parallel computing framework has been developed to use with general-purpose radiation transport codes. The framework was implemented as a C++ module that uses MPI for message passing. It is intended to be used with older radiation transport codes implemented in Fortran77, Fortran 90 or C. The module is significantly independent of radiation transport codes it can be used with, and is connected to the codes by means of a number of interface functions. The framework was developed and tested in conjunction with the MARS15 code. It is possible to use it with other codes such as PHITS, FLUKA and MCNP after certain adjustments. Besides the parallel computing functionality, the framework offers a checkpoint facility that allows restarting calculations with a saved checkpoint file. The checkpoint facility can be used in single process calculations as well as in the parallel regime. The framework corrects some of the known problems with the scheduling and load balancing found in the original implementations of the parallel computing functionality in MARS15 and PHITS. The framework can be used efficiently on homogeneous systems and networks of workstations, where the interference from the other users is possible.

  18. On distributed memory MPI-based parallelization of SPH codes in massive HPC context

    NASA Astrophysics Data System (ADS)

    Oger, G.; Le Touzé, D.; Guibert, D.; de Leffe, M.; Biddiscombe, J.; Soumagne, J.; Piccinali, J.-G.

    2016-03-01

    Most of particle methods share the problem of high computational cost and in order to satisfy the demands of solvers, currently available hardware technologies must be fully exploited. Two complementary technologies are now accessible. On the one hand, CPUs which can be structured into a multi-node framework, allowing massive data exchanges through a high speed network. In this case, each node is usually comprised of several cores available to perform multithreaded computations. On the other hand, GPUs which are derived from the graphics computing technologies, able to perform highly multi-threaded calculations with hundreds of independent threads connected together through a common shared memory. This paper is primarily dedicated to the distributed memory parallelization of particle methods, targeting several thousands of CPU cores. The experience gained clearly shows that parallelizing a particle-based code on moderate numbers of cores can easily lead to an acceptable scalability, whilst a scalable speedup on thousands of cores is much more difficult to obtain. The discussion revolves around speeding up particle methods as a whole, in a massive HPC context by making use of the MPI library. We focus on one particular particle method which is Smoothed Particle Hydrodynamics (SPH), one of the most widespread today in the literature as well as in engineering.

  19. A massively parallel method of characteristic neutral particle transport code for GPUs

    SciTech Connect

    Boyd, W. R.; Smith, K.; Forget, B.

    2013-07-01

    Over the past 20 years, parallel computing has enabled computers to grow ever larger and more powerful while scientific applications have advanced in sophistication and resolution. This trend is being challenged, however, as the power consumption for conventional parallel computing architectures has risen to unsustainable levels and memory limitations have come to dominate compute performance. Heterogeneous computing platforms, such as Graphics Processing Units (GPUs), are an increasingly popular paradigm for solving these issues. This paper explores the applicability of GPUs for deterministic neutron transport. A 2D method of characteristics (MOC) code - OpenMOC - has been developed with solvers for both shared memory multi-core platforms as well as GPUs. The multi-threading and memory locality methodologies for the GPU solver are presented. Performance results for the 2D C5G7 benchmark demonstrate 25-35 x speedup for MOC on the GPU. The lessons learned from this case study will provide the basis for further exploration of MOC on GPUs as well as design decisions for hardware vendors exploring technologies for the next generation of machines for scientific computing. (authors)

  20. Spin wave based parallel logic operations for binary data coded with domain walls

    SciTech Connect

    Urazuka, Y.; Oyabu, S.; Chen, H.; Peng, B.; Otsuki, H.; Tanaka, T. Matsuyama, K.

    2014-05-07

    We numerically investigate the feasibility of spin wave (SW) based parallel logic operations, where the phase of SW packet (SWP) is exploited as a state variable and the phase shift caused by the interaction with domain wall (DW) is utilized as a logic inversion functionality. A designed functional element consists of parallel ferromagnetic nanowires (6 nm-thick, 36 nm-width, 5120 nm-length, and 200 nm separation) with the perpendicular magnetization and sub-μm scale overlaid conductors. The logic outputs for binary data, coded with the existence (“1”) or absence (“0”) of the DW, are inductively read out from interferometric aspect of the superposed SWPs, one of them propagating through the stored data area. A practical exclusive-or operation, based on 2π periodicity in the phase logic, is demonstrated for the individual nanowire with an order of different output voltage V{sub out}, depending on the logic output for the stored data. The inductive output from the two nanowires exhibits well defined three different signal levels, corresponding to the information distance (Hamming distance) between 2-bit data stored in the multiple nanowires.

  1. Optimization of Parallel Legendre Transform using Graphics Processing Unit (GPU) for a Geodynamo Code

    NASA Astrophysics Data System (ADS)

    Lokavarapu, H. V.; Matsui, H.

    2015-12-01

    Convection and magnetic field of the Earth's outer core are expected to have vast length scales. To resolve these flows, high performance computing is required for geodynamo simulations using spherical harmonics transform (SHT), a significant portion of the execution time is spent on the Legendre transform. Calypso is a geodynamo code designed to model magnetohydrodynamics of a Boussinesq fluid in a rotating spherical shell, such as the outer core of the Earth. The code has been shown to scale well on computer clusters capable of computing at the order of 10⁵ cores using Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) parallelization for CPUs. To further optimize, we investigate three different algorithms of the SHT using GPUs. One is to preemptively compute the Legendre polynomials on the CPU before executing SHT on the GPU within the time integration loop. In the second approach, both the Legendre polynomials and the SHT are computed on the GPU simultaneously. In the third approach , we initially partition the radial grid for the forward transform and the harmonic order for the backward transform between the CPU and GPU. There after, the partitioned works are simultaneously computed in the time integration loop. We examine the trade-offs between space and time, memory bandwidth and GPU computations on Maverick, a Texas Advanced Computing Center (TACC) supercomputer. We have observed improved performance using a GPU enabled Legendre transform. Furthermore, we will compare and contrast the different algorithms in the context of GPUs.

  2. Overview of development and design of MPACT: Michigan parallel characteristics transport code

    SciTech Connect

    Kochunas, B.; Collins, B.; Jabaay, D.; Downar, T. J.; Martin, W. R.

    2013-07-01

    MPACT (Michigan Parallel Characteristics Transport Code) is a new reactor analysis tool. It is being developed by students and research staff at the University of Michigan to be used for an advanced pin-resolved transport capability within VERA (Virtual Environment for Reactor Analysis). VERA is the end-user reactor simulation tool being produced by the Consortium for the Advanced Simulation of Light Water Reactors (CASL). The MPACT development project is itself unique for the way it is changing how students do research to achieve the instructional and research goals of an academic institution, while providing immediate value to industry. The MPACT code makes use of modern lean/agile software processes and extensive testing to maintain a level of productivity and quality required by CASL. MPACT's design relies heavily on object-oriented programming concepts and design patterns and is programmed in Fortran 2003. These designs are explained and illustrated as to how they can be readily extended to incorporate new capabilities and research ideas in support of academic research objectives. The transport methods currently implemented in MPACT include the 2-D and 3-D method of characteristics (MOC) and 2-D and 3-D method of collision direction probabilities (CDP). For the cross section resonance treatment, presently the subgroup method and the new embedded self-shielding method (ESSM) are implemented within MPACT. (authors)

  3. Evaluation of a parallel FDTD code and application to modeling of light scattering by deformed red blood cells.

    PubMed

    Brock, R Scott; Hu, Xin-Hua; Yang, Ping; Lu, Jun

    2005-07-11

    A parallel Finite-Difference-Time-Domain (FDTD) code has been developed to numerically model the elastic light scattering by biological cells. Extensive validation and evaluation on various computing clusters demonstrated the high performance of the parallel code and its significant potential of reducing the computational cost of the FDTD method with low cost computer clusters. The parallel FDTD code has been used to study the problem of light scattering by a human red blood cell (RBC) of a deformed shape in terms of the angular distributions of the Mueller matrix elements. The dependence of the Mueller matrix elements on the shape and orientation of the deformed RBC has been investigated. Analysis of these data provides valuable insight on determination of the RBC shapes using the method of elastic light scattering measurements.

  4. LPIC++ a parallel one-dimensional relativistic electromagnetic Particle-In-Cell code for simulating laser-plasma-interaction

    NASA Astrophysics Data System (ADS)

    Pfund, R. E. W.; Lichters, R.; Meyer-ter-Vehn, J.

    1998-02-01

    We report on a recently developed electromagnetic relativistic 1D3V (one spatial, three velocity dimensions) Particle-In-Cell code for simulating laser-plasma interaction at normal and oblique incidence. The code is written in C++ and easy to extend. The data structure is characterized by the use of chained lists for the grid cells as well as particles belonging to one cell. The parallel version of the code is based on PVM. It splits the grid into several spatial domains each belonging to one processor. Since particles can cross boundaries of cells as well as domains, the processor loads will generally change in time. This is counteracted by adjusting the domain sizes dynamically, for which the use of chained lists has proven to be very convenient. Moreover, an option for restarting the simulation from intermediate stages of the time evolution has been implemented even in the parallel version. The code will be published and distributed freely.

  5. Parallel Adaptive Mesh Refinement Library

    NASA Technical Reports Server (NTRS)

    Mac-Neice, Peter; Olson, Kevin

    2005-01-01

    Parallel Adaptive Mesh Refinement Library (PARAMESH) is a package of Fortran 90 subroutines designed to provide a computer programmer with an easy route to extension of (1) a previously written serial code that uses a logically Cartesian structured mesh into (2) a parallel code with adaptive mesh refinement (AMR). Alternatively, in its simplest use, and with minimal effort, PARAMESH can operate as a domain-decomposition tool for users who want to parallelize their serial codes but who do not wish to utilize adaptivity. The package builds a hierarchy of sub-grids to cover the computational domain of a given application program, with spatial resolution varying to satisfy the demands of the application. The sub-grid blocks form the nodes of a tree data structure (a quad-tree in two or an oct-tree in three dimensions). Each grid block has a logically Cartesian mesh. The package supports one-, two- and three-dimensional models.

  6. The Hymenopteran Tree of Life: Evidence from Protein-Coding Genes and Objectively Aligned Ribosomal Data

    PubMed Central

    Klopfstein, Seraina; Vilhelmsen, Lars; Heraty, John M.; Sharkey, Michael; Ronquist, Fredrik

    2013-01-01

    Previous molecular analyses of higher hymenopteran relationships have largely been based on subjectively aligned ribosomal sequences (18S and 28S). Here, we reanalyze the 18S and 28S data (unaligned about 4.4 kb) using an objective and a semi-objective alignment approach, based on MAFFT and BAli-Phy, respectively. Furthermore, we present the first analyses of a substantial protein-coding data set (4.6 kb from one mitochondrial and four nuclear genes). Our results indicate that previous studies may have suffered from inflated support values due to subjective alignment of the ribosomal sequences, but apparently not from significant biases. The protein data provide independent confirmation of several earlier results, including the monophyly of non-xyelid hymenopterans, Pamphilioidea + Unicalcarida, Unicalcarida, Vespina, Apocrita, Proctotrupomorpha and core Proctotrupomorpha. The protein data confirm that Aculeata are nested within a paraphyletic Evaniomorpha, but cast doubt on the monophyly of Evanioidea. Combining the available morphological, ribosomal and protein-coding data, we examine the total-evidence signal as well as congruence and conflict among the three data sources. Despite an emerging consensus on many higher-level hymenopteran relationships, several problems remain unresolved or contentious, including rooting of the hymenopteran tree, relationships of the woodwasps, placement of Stephanoidea and Ceraphronoidea, and the sister group of Aculeata. PMID:23936325

  7. A massively parallel algorithm for the collision probability calculations in the Apollo-II code using the PVM library

    SciTech Connect

    Stankovski, Z.

    1995-12-31

    The collision probability method in neutron transport, as applied to 2D geometries, consume a great amount of computer time, for a typical 2D assembly calculation about 90% of the computing time is consumed in the collision probability evaluations. Consequently RZ or 3D calculations became prohibitive. In this paper the author presents a simple but efficient parallel algorithm based on the message passing host/node programmation model. Parallelization was applied to the energy group treatment. Such approach permits parallelization of the existing code, requiring only limited modifications. Sequential/parallel computer portability is preserved, which is a necessary condition for a industrial code. Sequential performances are also preserved. The algorithm is implemented on a CRAY 90 coupled to a 128 processor T3D computer, a 16 processor IBM SPI and a network of workstations, using the Public Domain PVM library. The tests were executed for a 2D geometry with the standard 99-group library. All results were very satisfactory, the best ones with IBM SPI. Because of heterogeneity of the workstation network, the author did not ask high performances for this architecture. The same source code was used for all computers. A more impressive advantage of this algorithm will appear in the calculations of the SAPHYR project (with the future fine multigroup library of about 8000 groups) with a massively parallel computer, using several hundreds of processors.

  8. Trees

    ERIC Educational Resources Information Center

    Al-Khaja, Nawal

    2007-01-01

    This is a thematic lesson plan for young learners about palm trees and the importance of taking care of them. The two part lesson teaches listening, reading and speaking skills. The lesson includes parts of a tree; the modal auxiliary, can; dialogues and a role play activity.

  9. Development of a discrete ordinates code system for unstructured meshes of tetrahedral cells, with serial and parallel implementations

    SciTech Connect

    Miller, R.L.

    1998-11-01

    A numerically stable, accurate, and robust form of the exponential characteristic (EC) method, used to solve the time-independent linearized Boltzmann Transport Equation, is derived using direct affine coordinate transformations on unstructured meshes of tetrahedra. This quadrature, as well as the linear characteristic (LC) spatial quadrature, is implemented in the transport code, called TETRAN. This code solves multi-group neutral particle transport problems with anisotropic scattering and was parallelized using High Performance Fortran and angular domain decomposition. A new, parallel algorithm for updating the scattering source is introduced. The EC source and inflow flux coefficients are efficiently evaluated using Broyden`s rootsolver, started with special approximations developed here. TETRAN showed robustness, stability and accuracy on a variety of challenging test problems. Parallel speed-up was observed as the number of processors was increased using an IBM SP computer system.

  10. GRay: A Massively Parallel GPU-based Code for Ray Tracing in Relativistic Spacetimes

    NASA Astrophysics Data System (ADS)

    Chan, Chi-kwan; Psaltis, Dimitrios; Özel, Feryal

    2013-11-01

    We introduce GRay, a massively parallel integrator designed to trace the trajectories of billions of photons in a curved spacetime. This graphics-processing-unit (GPU)-based integrator employs the stream processing paradigm, is implemented in CUDA C/C++, and runs on nVidia graphics cards. The peak performance of GRay using single-precision floating-point arithmetic on a single GPU exceeds 300 GFLOP (or 1 ns per photon per time step). For a realistic problem, where the peak performance cannot be reached, GRay is two orders of magnitude faster than existing central-processing-unit-based ray-tracing codes. This performance enhancement allows more effective searches of large parameter spaces when comparing theoretical predictions of images, spectra, and light curves from the vicinities of compact objects to observations. GRay can also perform on-the-fly ray tracing within general relativistic magnetohydrodynamic algorithms that simulate accretion flows around compact objects. Making use of this algorithm, we calculate the properties of the shadows of Kerr black holes and the photon rings that surround them. We also provide accurate fitting formulae of their dependencies on black hole spin and observer inclination, which can be used to interpret upcoming observations of the black holes at the center of the Milky Way, as well as M87, with the Event Horizon Telescope.

  11. GRay: A MASSIVELY PARALLEL GPU-BASED CODE FOR RAY TRACING IN RELATIVISTIC SPACETIMES

    SciTech Connect

    Chan, Chi-kwan; Psaltis, Dimitrios; Özel, Feryal

    2013-11-01

    We introduce GRay, a massively parallel integrator designed to trace the trajectories of billions of photons in a curved spacetime. This graphics-processing-unit (GPU)-based integrator employs the stream processing paradigm, is implemented in CUDA C/C++, and runs on nVidia graphics cards. The peak performance of GRay using single-precision floating-point arithmetic on a single GPU exceeds 300 GFLOP (or 1 ns per photon per time step). For a realistic problem, where the peak performance cannot be reached, GRay is two orders of magnitude faster than existing central-processing-unit-based ray-tracing codes. This performance enhancement allows more effective searches of large parameter spaces when comparing theoretical predictions of images, spectra, and light curves from the vicinities of compact objects to observations. GRay can also perform on-the-fly ray tracing within general relativistic magnetohydrodynamic algorithms that simulate accretion flows around compact objects. Making use of this algorithm, we calculate the properties of the shadows of Kerr black holes and the photon rings that surround them. We also provide accurate fitting formulae of their dependencies on black hole spin and observer inclination, which can be used to interpret upcoming observations of the black holes at the center of the Milky Way, as well as M87, with the Event Horizon Telescope.

  12. Parallel Monte Carlo transport modeling in the context of a time-dependent, three-dimensional multi-physics code

    SciTech Connect

    Procassini, R.J.

    1997-12-31

    The fine-scale, multi-space resolution that is envisioned for accurate simulations of complex weapons systems in three spatial dimensions implies flop-rate and memory-storage requirements that will only be obtained in the near future through the use of parallel computational techniques. Since the Monte Carlo transport models in these simulations usually stress both of these computational resources, they are prime candidates for parallelization. The MONACO Monte Carlo transport package, which is currently under development at LLNL, will utilize two types of parallelism within the context of a multi-physics design code: decomposition of the spatial domain across processors (spatial parallelism) and distribution of particles in a given spatial subdomain across additional processors (particle parallelism). This implementation of the package will utilize explicit data communication between domains (message passing). Such a parallel implementation of a Monte Carlo transport model will result in non-deterministic communication patterns. The communication of particles between subdomains during a Monte Carlo time step may require a significant level of effort to achieve a high parallel efficiency.

  13. SPILADY: A parallel CPU and GPU code for spin-lattice magnetic molecular dynamics simulations

    NASA Astrophysics Data System (ADS)

    Ma, Pui-Wai; Dudarev, S. L.; Woo, C. H.

    2016-10-01

    Spin-lattice dynamics generalizes molecular dynamics to magnetic materials, where dynamic variables describing an evolving atomic system include not only coordinates and velocities of atoms but also directions and magnitudes of atomic magnetic moments (spins). Spin-lattice dynamics simulates the collective time evolution of spins and atoms, taking into account the effect of non-collinear magnetism on interatomic forces. Applications of the method include atomistic models for defects, dislocations and surfaces in magnetic materials, thermally activated diffusion of defects, magnetic phase transitions, and various magnetic and lattice relaxation phenomena. Spin-lattice dynamics retains all the capabilities of molecular dynamics, adding to them the treatment of non-collinear magnetic degrees of freedom. The spin-lattice dynamics time integration algorithm uses symplectic Suzuki-Trotter decomposition of atomic coordinate, velocity and spin evolution operators, and delivers highly accurate numerical solutions of dynamic evolution equations over extended intervals of time. The code is parallelized in coordinate and spin spaces, and is written in OpenMP C/C++ for CPU and in CUDA C/C++ for Nvidia GPU implementations. Temperatures of atoms and spins are controlled by Langevin thermostats. Conduction electrons are treated by coupling the discrete spin-lattice dynamics equations for atoms and spins to the heat transfer equation for the electrons. Worked examples include simulations of thermalization of ferromagnetic bcc iron, the dynamics of laser pulse demagnetization, and collision cascades. Catalogue identifier: AFAN_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFAN_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Apache License, Version 2.0 No. of lines in distributed program, including test data, etc.: 1611165 No. of bytes in distributed program, including test data, etc.: 367246683

  14. Wakefield Computations for the CLIC PETS using the Parallel Finite Element Time-Domain Code T3P

    SciTech Connect

    Candel, A; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; Syratchev, I.; /CERN

    2009-06-19

    In recent years, SLAC's Advanced Computations Department (ACD) has developed the high-performance parallel 3D electromagnetic time-domain code, T3P, for simulations of wakefields and transients in complex accelerator structures. T3P is based on advanced higher-order Finite Element methods on unstructured grids with quadratic surface approximation. Optimized for large-scale parallel processing on leadership supercomputing facilities, T3P allows simulations of realistic 3D structures with unprecedented accuracy, aiding the design of the next generation of accelerator facilities. Applications to the Compact Linear Collider (CLIC) Power Extraction and Transfer Structure (PETS) are presented.

  15. CMAD: A Self-consistent Parallel Code to Simulate the Electron Cloud Build-up and Instabilities

    SciTech Connect

    Pivi, M.T.F.; /SLAC

    2007-11-07

    We present the features of CMAD, a newly developed self-consistent code which simulates both the electron cloud build-up and related beam instabilities. By means of parallel (Message Passing Interface - MPI) computation, the code tracks the beam in an existing (MAD-type) lattice and continuously resolves the interaction between the beam and the cloud at each element location, with different cloud distributions at each magnet location. The goal of CMAD is to simulate single- and coupled-bunch instability, allowing tune shift, dynamic aperture and frequency map analysis and the determination of the secondary electron yield instability threshold. The code is in its phase of development and benchmarking with existing codes. Preliminary results on benchmarking are presented in this paper.

  16. 2HOT: An Improved Parallel Hashed Oct-Tree N-Body Algorithm for Cosmological Simulation

    DOE PAGESBeta

    Warren, Michael S.

    2014-01-01

    We report on improvements made over the past two decades to our adaptive treecode N-body method (HOT). A mathematical and computational approach to the cosmological N-body problem is described, with performance and scalability measured up to 256k (2 18 ) processors. We present error analysis and scientific application results from a series of more than ten 69 billion (4096 3 ) particle cosmological simulations, accounting for 4×10 20 floating point operations. These results include the first simulations using the new constraints on the standard model of cosmology from the Planck satellite. Our simulations set a new standard for accuracymore » and scientific throughput, while meeting or exceeding the computational efficiency of the latest generation of hybrid TreePM N-body methods.« less

  17. Parallel Subspace Subcodes of Reed-Solomon Codes for Magnetic Recording Channels

    ERIC Educational Resources Information Center

    Wang, Han

    2010-01-01

    Read channel architectures based on a single low-density parity-check (LDPC) code are being considered for the next generation of hard disk drives. However, LDPC-only solutions suffer from the error floor problem, which may compromise reliability, if not handled properly. Concatenated architectures using an LDPC code plus a Reed-Solomon (RS) code…

  18. A Multiple Sphere T-Matrix Fortran Code for Use on Parallel Computer Clusters

    NASA Technical Reports Server (NTRS)

    Mackowski, D. W.; Mishchenko, M. I.

    2011-01-01

    A general-purpose Fortran-90 code for calculation of the electromagnetic scattering and absorption properties of multiple sphere clusters is described. The code can calculate the efficiency factors and scattering matrix elements of the cluster for either fixed or random orientation with respect to the incident beam and for plane wave or localized- approximation Gaussian incident fields. In addition, the code can calculate maps of the electric field both interior and exterior to the spheres.The code is written with message passing interface instructions to enable the use on distributed memory compute clusters, and for such platforms the code can make feasible the calculation of absorption, scattering, and general EM characteristics of systems containing several thousand spheres.

  19. Seedling establishment in a masting desert shrub parallels the pattern for forest trees

    NASA Astrophysics Data System (ADS)

    Meyer, Susan E.; Pendleton, Burton K.

    2015-05-01

    The masting phenomenon along with its accompanying suite of seedling adaptive traits has been well studied in forest trees but has rarely been examined in desert shrubs. Blackbrush (Coleogyne ramosissima) is a regionally dominant North American desert shrub whose seeds are produced in mast events and scatter-hoarded by rodents. We followed the fate of seedlings in intact stands vs. small-scale disturbances at four contrasting sites for nine growing seasons following emergence after a mast year. The primary cause of first-year mortality was post-emergence cache excavation and seedling predation, with contrasting impacts at sites with different heteromyid rodent seed predators. Long-term establishment patterns were strongly affected by rodent activity in the weeks following emergence. Survivorship curves generally showed decreased mortality risk with age but differed among sites even after the first year. There were no detectable effects of inter-annual precipitation variability or site climatic differences on survival. Intraspecific competition from conspecific adults had strong impacts on survival and growth, both of which were higher on small-scale disturbances, but similar in openings and under shrub crowns in intact stands. This suggests that adult plants preempted soil resources in the interspaces. Aside from effects on seedling predation, there was little evidence for facilitation or interference beneath adult plant crowns. Plants in intact stands were still small and clearly juvenile after nine years, showing that blackbrush forms cohorts of suppressed plants similar to the seedling banks of closed forests. Seedling banks function in the absence of a persistent seed bank in replacement after adult plant death (gap formation), which is temporally uncoupled from masting and associated recruitment events. This study demonstrates that the seedling establishment syndrome associated with masting has evolved in desert shrublands as well as in forests.

  20. Reactor Dosimetry Applications Using RAPTOR-M3G:. a New Parallel 3-D Radiation Transport Code

    NASA Astrophysics Data System (ADS)

    Longoni, Gianluca; Anderson, Stanwood L.

    2009-08-01

    The numerical solution of the Linearized Boltzmann Equation (LBE) via the Discrete Ordinates method (SN) requires extensive computational resources for large 3-D neutron and gamma transport applications due to the concurrent discretization of the angular, spatial, and energy domains. This paper will discuss the development RAPTOR-M3G (RApid Parallel Transport Of Radiation - Multiple 3D Geometries), a new 3-D parallel radiation transport code, and its application to the calculation of ex-vessel neutron dosimetry responses in the cavity of a commercial 2-loop Pressurized Water Reactor (PWR). RAPTOR-M3G is based domain decomposition algorithms, where the spatial and angular domains are allocated and processed on multi-processor computer architectures. As compared to traditional single-processor applications, this approach reduces the computational load as well as the memory requirement per processor, yielding an efficient solution methodology for large 3-D problems. Measured neutron dosimetry responses in the reactor cavity air gap will be compared to the RAPTOR-M3G predictions. This paper is organized as follows: Section 1 discusses the RAPTOR-M3G methodology; Section 2 describes the 2-loop PWR model and the numerical results obtained. Section 3 addresses the parallel performance of the code, and Section 4 concludes this paper with final remarks and future work.

  1. A parallel code to calculate rate-state seismicity evolution induced by time dependent, heterogeneous Coulomb stress changes

    NASA Astrophysics Data System (ADS)

    Cattania, C.; Khalid, F.

    2016-09-01

    The estimation of space and time-dependent earthquake probabilities, including aftershock sequences, has received increased attention in recent years, and Operational Earthquake Forecasting systems are currently being implemented in various countries. Physics based earthquake forecasting models compute time dependent earthquake rates based on Coulomb stress changes, coupled with seismicity evolution laws derived from rate-state friction. While early implementations of such models typically performed poorly compared to statistical models, recent studies indicate that significant performance improvements can be achieved by considering the spatial heterogeneity of the stress field and secondary sources of stress. However, the major drawback of these methods is a rapid increase in computational costs. Here we present a code to calculate seismicity induced by time dependent stress changes. An important feature of the code is the possibility to include aleatoric uncertainties due to the existence of multiple receiver faults and to the finite grid size, as well as epistemic uncertainties due to the choice of input slip model. To compensate for the growth in computational requirements, we have parallelized the code for shared memory systems (using OpenMP) and distributed memory systems (using MPI). Performance tests indicate that these parallelization strategies lead to a significant speedup for problems with different degrees of complexity, ranging from those which can be solved on standard multicore desktop computers, to those requiring a small cluster, to a large simulation that can be run using up to 1500 cores.

  2. Application of a parallel 3-dimensional hydrogeochemistry HPF code to a proposed waste disposal site at the Oak Ridge National Laboratory

    SciTech Connect

    Gwo, Jin-Ping; Yeh, Gour-Tsyh

    1997-02-01

    The objectives of this study are (1) to parallelize a 3-dimensional hydrogeochemistry code and (2) to apply the parallel code to a proposed waste disposal site at the Oak Ridge National Laboratory (ORNL). The 2-dimensional hydrogeochemistry code HYDROGEOCHEM, developed at the Pennsylvania State University for coupled subsurface solute transport and chemical equilibrium processes, was first modified to accommodate 3-dimensional problem domains. A bi-conjugate gradient stabilized linear matrix solver was then incorporated to solve the matrix equation. We chose to parallelize the 3-dimensional code on the Intel Paragons at ORNL by using an HPF (high performance FORTRAN) compiler developed at PGI. The data- and task-parallel algorithms available in the HPF compiler proved to be highly efficient for the geochemistry calculation. This calculation can be easily implemented in HPF formats and is perfectly parallel because the chemical speciation on one finite-element node is virtually independent of those on the others. The parallel code was applied to a subwatershed of the Melton Branch at ORNL. Chemical heterogeneity, in addition to physical heterogeneities of the geological formations, has been identified as one of the major factors that affect the fate and transport of contaminants at ORNL. This study demonstrated an application of the 3-dimensional hydrogeochemistry code on the Melton Branch site. A uranium tailing problem that involved in aqueous complexation and precipitation-dissolution was tested. Performance statistics was collected on the Intel Paragons at ORNL. Implications of these results on the further optimization of the code were discussed.

  3. A Multi-core Shared Tree Algorithm Based on Network Coding for Multi-point Optical Multicast

    NASA Astrophysics Data System (ADS)

    Liu, Huanlin; Yang, Yuming; Li, Yuan; Chen, Yong; Huang, Sheng

    2015-03-01

    With the growth of multi-point to multi-point multicast applications, the optical network bandwidth resource consumption is increasing rapidly. It attracted more and more researchers to improve the limited wavelength bandwidth utilization for multicast applications in wavelength division multiplexing (WDM) networks. In the paper, a multi-core shared multicast tree algorithm based on network coding is proposed to minimize the fiber link stress. The proposed algorithm includes three processes: searching the core node candidate set excluding core node loop path, selecting the core nodes from the convergence matrix based on heuristic algorithm, and constructing the multi-core nodes shared trees. The convergence matrix based on the heuristic method is constructed for selecting the core nodes from candidate core node set. To improve the limited wavelength utilization, we introduce network coding into the shared tree to compress the transmitting information. The simulation results show that the proposed algorithm's performance is better than the existing algorithms' performance in terms of link stress and balance degree.

  4. A parallel implementation of an MHD code for the simulation of mechanically driven, turbulent dynamos in spherical geometry

    NASA Astrophysics Data System (ADS)

    Reuter, K.; Jenko, F.; Forest, C. B.; Bayliss, R. A.

    2008-08-01

    A parallel implementation of a nonlinear pseudo-spectral MHD code for the simulation of turbulent dynamos in spherical geometry is reported. It employs a dual domain decomposition technique in both real and spectral space. It is shown that this method shows nearly ideal scaling going up to 128 CPUs on Beowulf-type clusters with fast interconnect. Furthermore, the potential of exploiting single precision arithmetic on standard x86 processors is examined. It is pointed out that the MHD code thereby achieves a maximum speedup of 1.7, whereas the validity of the computations is still granted. The combination of both measures will allow for the direct numerical simulation of highly turbulent cases ( 1500

  5. Simulations of implosions with a 3D, parallel, unstructured-grid, radiation-hydrodynamics code

    SciTech Connect

    Kaiser, T B; Milovich, J L; Prasad, M K; Rathkopf, J; Shestakov, A I

    1998-12-28

    An unstructured-grid, radiation-hydrodynamics code is used to simulate implosions. Although most of the problems are spherically symmetric, they are run on 3D, unstructured grids in order to test the code's ability to maintain spherical symmetry of the converging waves. Three problems, of increasing complexity, are presented. In the first, a cold, spherical, ideal gas bubble is imploded by an enclosing high pressure source. For the second, we add non-linear heat conduction and drive the implosion with twelve laser beams centered on the vertices of an icosahedron. In the third problem, a NIF capsule is driven with a Planckian radiation source.

  6. HLA-F coding and regulatory segments variability determined by massively parallel sequencing procedures in a Brazilian population sample.

    PubMed

    Lima, Thálitta Hetamaro Ayala; Buttura, Renato Vidal; Donadi, Eduardo Antônio; Veiga-Castelli, Luciana Caricati; Mendes-Junior, Celso Teixeira; Castelli, Erick C

    2016-10-01

    Human Leucocyte Antigen F (HLA-F) is a non-classical HLA class I gene distinguished from its classical counterparts by low allelic polymorphism and distinctive expression patterns. Its exact function remains unknown. It is believed that HLA-F has tolerogenic and immune modulatory properties. Currently, there is little information regarding the HLA-F allelic variation among human populations and the available studies have evaluated only a fraction of the HLA-F gene segment and/or have searched for known alleles only. Here we present a strategy to evaluate the complete HLA-F variability including its 5' upstream, coding and 3' downstream segments by using massively parallel sequencing procedures. HLA-F variability was surveyed on 196 individuals from the Brazilian Southeast. The results indicate that the HLA-F gene is indeed conserved at the protein level, where thirty coding haplotypes or coding alleles were detected, encoding only four different HLA-F full-length protein molecules. Moreover, a same protein molecule is encoded by 82.45% of all coding alleles detected in this Brazilian population sample. However, the HLA-F nucleotide and haplotype variability is much higher than our current knowledge both in Brazilians and considering the 1000 Genomes Project data. This protein conservation is probably a consequence of the key role of HLA-F in the immune system physiology.

  7. Parallelizing R Code for the Evaluation of Heterogeneity Statistics in Precipitation Frequency Analysis

    NASA Astrophysics Data System (ADS)

    Ferreira, C.; Wright, M.; Houck, M. H.

    2013-12-01

    The regional index-flood method of precipitation quantile estimation is supported by the package "lmomRFA" in the programming language R. Using "Rmpi", a parallelization package in R, Monte Carlo statistics representing the heterogeneity of candidate regions and their error of quantile estimation were calculated for all possible regionalizations of twelve daily precipitation gauges in Minnesota. The predictive power of various heterogeneity statistics with regard to error from daily to yearly timesteps is represented graphically and in terms of correlation. The "embarrassingly parallel" nature of Monte Carlo simulation is exploited using large-scale scientific computing to describe fully the relationship between estimators of heterogeneity and error across a small gauge network.

  8. A user`s guide for BREAKUP: A computer code for parallelizing the overset grid approach

    SciTech Connect

    Barnette, D.W.

    1998-04-01

    In this user`s guide, details for running BREAKUP are discussed. BREAKUP allows the widely used overset grid method to be run in a parallel computer environment to achieve faster run times for computational field simulations over complex geometries. The overset grid method permits complex geometries to be divided into separate components. Each component is then gridded independently. The grids are computationally rejoined in a solver via interpolation coefficients used for grid-to-grid communications of boundary data. Overset grids have been in widespread use for many years on serial computers, and several well-known Navier-Stokes flow solvers have been extensively developed and validated to support their use. One drawback of serial overset grid methods has been the extensive compute time required to update flow solutions one grid at a time. Parallelizing the overset grid method overcomes this limitation by updating each grid or subgrid simultaneously. BREAKUP prepares overset grids for parallel processing by subdividing each overset grid into statically load-balanced subgrids. Two-dimensional examples with sample solutions, and three-dimensional examples, are presented.

  9. Full Wave Parallel Code for Modeling RF Fields in Hot Plasmas

    NASA Astrophysics Data System (ADS)

    Spencer, Joseph; Svidzinski, Vladimir; Evstatiev, Evstati; Galkin, Sergei; Kim, Jin-Soo

    2015-11-01

    FAR-TECH, Inc. is developing a suite of full wave RF codes in hot plasmas. It is based on a formulation in configuration space with grid adaptation capability. The conductivity kernel (which includes a nonlocal dielectric response) is calculated by integrating the linearized Vlasov equation along unperturbed test particle orbits. For Tokamak applications a 2-D version of the code is being developed. Progress of this work will be reported. This suite of codes has the following advantages over existing spectral codes: 1) It utilizes the localized nature of plasma dielectric response to the RF field and calculates this response numerically without approximations. 2) It uses an adaptive grid to better resolve resonances in plasma and antenna structures. 3) It uses an efficient sparse matrix solver to solve the formulated linear equations. The linear wave equation is formulated using two approaches: for cold plasmas the local cold plasma dielectric tensor is used (resolving resonances by particle collisions), while for hot plasmas the conductivity kernel is calculated. Work is supported by the U.S. DOE SBIR program.

  10. Parallelization of GeoClaw code for modeling geophysical flows with adaptive mesh refinement on many-core systems

    USGS Publications Warehouse

    Zhang, S.; Yuen, D.A.; Zhu, A.; Song, S.; George, D.L.

    2011-01-01

    We parallelized the GeoClaw code on one-level grid using OpenMP in March, 2011 to meet the urgent need of simulating tsunami waves at near-shore from Tohoku 2011 and achieved over 75% of the potential speed-up on an eight core Dell Precision T7500 workstation [1]. After submitting that work to SC11 - the International Conference for High Performance Computing, we obtained an unreleased OpenMP version of GeoClaw from David George, who developed the GeoClaw code as part of his PH.D thesis. In this paper, we will show the complementary characteristics of the two approaches used in parallelizing GeoClaw and the speed-up obtained by combining the advantage of each of the two individual approaches with adaptive mesh refinement (AMR), demonstrating the capabilities of running GeoClaw efficiently on many-core systems. We will also show a novel simulation of the Tohoku 2011 Tsunami waves inundating the Sendai airport and Fukushima Nuclear Power Plants, over which the finest grid distance of 20 meters is achieved through a 4-level AMR. This simulation yields quite good predictions about the wave-heights and travel time of the tsunami waves. ?? 2011 IEEE.

  11. Parametric Study of CO2 Sequestration in Geologic Media Using the Massively Parallel Computer Code PFLOTRAN

    NASA Astrophysics Data System (ADS)

    Lu, C.; Lichtner, P. C.; Tsimpanogiannis, I. N.

    2005-12-01

    Uncontrolled release of CO2 to the atmosphere has been identified as a major contributing source to the global warming problem. Significant research efforts from the international scientific community are targeted towards stabilization/reduction of CO2 concentrations in the atmosphere while attempting to satisfy our continuously increasing needs for energy. CO2 sequestration (capture, separation, and long term storage) in various media (e.g. geologic such as depleted oil reservoirs, saline aquifers, etc.; oceanic at different depths) has been considered as a possible solution to reduce green house gas emissions. In this study we utilize the PFLOTRAN simulator to investigate geologic sequestration of CO2. PFLOTRAN is a massively parallel 3-D reservoir simulator for modeling supercritical CO2 sequestration in geologic formations based on continuum scale mass and energy conservations. The mass and energy equations are sequentially coupled to reactive transport equations describing multi-component chemical reactions within the formation including aqueous speciation, and precipitation and dissolution of minerals to describe aqueous and mineral CO2 sequestration. The effect of the injected CO2 on pH, CO2 concentration within the aqueous phase, mineral stability, and other factors can be evaluated with this model. Parallelization is carried out using the PETSc parallel library package based on MPI providing a high parallel efficiency and allowing simulations with several tens of millions of degrees of freedom to be carried out-ideal for large-scale field applications involving multi-component chemistry. In this work, our main focus is a parametrical examination on the effects of reservoir and fluid properties on the sequestration process, such as permeability and capillary pressure functions (e.g. linear, van Genuchten, etc.), diffusion coefficients in a multiphase system, the sensitivity of component solubility on pressure, temperature and mole fractions etc. Several

  12. F100(3) parallel compressor computer code and user's manual

    NASA Technical Reports Server (NTRS)

    Mazzawy, R. S.; Fulkerson, D. A.; Haddad, D. E.; Clark, T. A.

    1978-01-01

    The Pratt & Whitney Aircraft multiple segment parallel compressor model has been modified to include the influence of variable compressor vane geometry on the sensitivity to circumferential flow distortion. Further, performance characteristics of the F100 (3) compression system have been incorporated into the model on a blade row basis. In this modified form, the distortion's circumferential location is referenced relative to the variable vane controlling sensors of the F100 (3) engine so that the proper solution can be obtained regardless of distortion orientation. This feature is particularly important for the analysis of inlet temperature distortion. Compatibility with fixed geometry compressor applications has been maintained in the model.

  13. A fast tree-based method for estimating column densities in adaptive mesh refinement codes. Influence of UV radiation field on the structure of molecular clouds

    NASA Astrophysics Data System (ADS)

    Valdivia, Valeska; Hennebelle, Patrick

    2014-11-01

    Context. Ultraviolet radiation plays a crucial role in molecular clouds. Radiation and matter are tightly coupled and their interplay influences the physical and chemical properties of gas. In particular, modeling the radiation propagation requires calculating column densities, which can be numerically expensive in high-resolution multidimensional simulations. Aims: Developing fast methods for estimating column densities is mandatory if we are interested in the dynamical influence of the radiative transfer. In particular, we focus on the effect of the UV screening on the dynamics and on the statistical properties of molecular clouds. Methods: We have developed a tree-based method for a fast estimate of column densities, implemented in the adaptive mesh refinement code RAMSES. We performed numerical simulations using this method in order to analyze the influence of the screening on the clump formation. Results: We find that the accuracy for the extinction of the tree-based method is better than 10%, while the relative error for the column density can be much more. We describe the implementation of a method based on precalculating the geometrical terms that noticeably reduces the calculation time. To study the influence of the screening on the statistical properties of molecular clouds we present the probability distribution function of gas and the associated temperature per density bin and the mass spectra for different density thresholds. Conclusions: The tree-based method is fast and accurate enough to be used during numerical simulations since no communication is needed between CPUs when using a fully threaded tree. It is then suitable to parallel computing. We show that the screening for far UV radiation mainly affects the dense gas, thereby favoring low temperatures and affecting the fragmentation. We show that when we include the screening, more structures are formed with higher densities in comparison to the case that does not include this effect. We

  14. A Fast Parallel Simulation Code for Interaction between Proto-Planetary Disk and Embedded Proto-Planets: Implementation for 3D Code

    SciTech Connect

    Li, Shengtai; Li, Hui

    2012-06-14

    the position of the planet, we adopt the corotating frame that allows the planet moving only in radial direction if only one planet is present. This code has been extensively tested on a number of problems. For the earthmass planet with constant aspect ratio h = 0.05, the torque calculated using our code matches quite well with the the 3D linear theory results by Tanaka et al. (2002). The code is fully parallelized via message-passing interface (MPI) and has very high parallel efficiency. Several numerical examples for both fixed planet and moving planet are provided to demonstrate the efficacy of the numerical method and code.

  15. BMI optimization by using parallel UNDX real-coded genetic algorithm with Beowulf cluster

    NASA Astrophysics Data System (ADS)

    Handa, Masaya; Kawanishi, Michihiro; Kanki, Hiroshi

    2007-12-01

    This paper deals with the global optimization algorithm of the Bilinear Matrix Inequalities (BMIs) based on the Unimodal Normal Distribution Crossover (UNDX) GA. First, analyzing the structure of the BMIs, the existence of the typical difficult structures is confirmed. Then, in order to improve the performance of algorithm, based on results of the problem structures analysis and consideration of BMIs characteristic properties, we proposed the algorithm using primary search direction with relaxed Linear Matrix Inequality (LMI) convex estimation. Moreover, in these algorithms, we propose two types of evaluation methods for GA individuals based on LMI calculation considering BMI characteristic properties more. In addition, in order to reduce computational time, we proposed parallelization of RCGA algorithm, Master-Worker paradigm with cluster computing technique.

  16. Robust conjunctive item-place coding by hippocampal neurons parallels learning what happens where.

    PubMed

    Komorowski, Robert W; Manns, Joseph R; Eichenbaum, Howard

    2009-08-01

    Previous research indicates a critical role of the hippocampus in memory for events in the context in which they occur. However, studies to date have not provided compelling evidence that hippocampal neurons encode event-context conjunctions directly associated with this kind of learning. Here we report that, as animals learn different meanings for items in distinct contexts, individual hippocampal neurons develop responses to specific stimuli in the places where they have differential significance. Furthermore, this conjunctive coding evolves in the form of enhanced item-specific responses within a subset of the preexisting spatial representation. These findings support the view that conjunctive representations in the hippocampus underlie the acquisition of context-specific memories.

  17. Implementation of a tree algorithm in MCNP code for nuclear well logging applications.

    PubMed

    Li, Fusheng; Han, Xiaogang

    2012-07-01

    The goal of this paper is to develop some modeling capabilities that are missing in the current MCNP code. Those missing capabilities can greatly help for some certain nuclear tools designs, such as a nuclear lithology/mineralogy spectroscopy tool. The new capabilities to be developed in this paper include the following: zone tally, neutron interaction tally, gamma rays index tally and enhanced pulse-height tally. The patched MCNP code also can be used to compute neutron slowing-down length and thermal neutron diffusion length.

  18. Coding for parallel execution of hardware-in-the-loop millimeter-wave scene generation models on multicore SIMD processor architectures

    NASA Astrophysics Data System (ADS)

    Olson, Richard F.

    2013-05-01

    Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.

  19. Real-time photoacoustic and ultrasound dual-modality imaging system facilitated with graphics processing unit and code parallel optimization

    NASA Astrophysics Data System (ADS)

    Yuan, Jie; Xu, Guan; Yu, Yao; Zhou, Yu; Carson, Paul L.; Wang, Xueding; Liu, Xiaojun

    2013-08-01

    Photoacoustic tomography (PAT) offers structural and functional imaging of living biological tissue with highly sensitive optical absorption contrast and excellent spatial resolution comparable to medical ultrasound (US) imaging. We report the development of a fully integrated PAT and US dual-modality imaging system, which performs signal scanning, image reconstruction, and display for both photoacoustic (PA) and US imaging all in a truly real-time manner. The back-projection (BP) algorithm for PA image reconstruction is optimized to reduce the computational cost and facilitate parallel computation on a state of the art graphics processing unit (GPU) card. For the first time, PAT and US imaging of the same object can be conducted simultaneously and continuously, at a real-time frame rate, presently limited by the laser repetition rate of 10 Hz. Noninvasive PAT and US imaging of human peripheral joints in vivo were achieved, demonstrating the satisfactory image quality realized with this system. Another experiment, simultaneous PAT and US imaging of contrast agent flowing through an artificial vessel, was conducted to verify the performance of this system for imaging fast biological events. The GPU-based image reconstruction software code for this dual-modality system is open source and available for download from http://sourceforge.net/projects/patrealtime.

  20. A real-time photoacoustic and ultrasound dual-modality imaging system facilitated with GPU and code parallel optimization

    NASA Astrophysics Data System (ADS)

    Yuan, Jie; Xu, Guan; Yu, Yao; Zhou, Yu; Carson, Paul L.; Wang, Xueding; Liu, Xiaojun

    2014-03-01

    Photoacoustic tomography (PAT) offers structural and functional imaging of living biological tissue with highly sensitive optical absorption contrast and excellent spatial resolution comparable to medical ultrasound (US) imaging. We report the development of a fully integrated PAT and US dual-modality imaging system, which performs signal scanning, image reconstruction and display for both photoacoustic (PA) and US imaging all in a truly real-time manner. The backprojection (BP) algorithm for PA image reconstruction is optimized to reduce the computational cost and facilitate parallel computation on a state of the art graphics processing unit (GPU) card. For the first time, PAT and US imaging of the same object can be conducted simultaneously and continuously, at a real time frame rate, presently limited by the laser repetition rate of 10 Hz. Noninvasive PAT and US imaging of human peripheral joints in vivo were achieved, demonstrating the satisfactory image quality realized with this system. Another experiment, simultaneous PAT and US imaging of contrast agent flowing through an artificial vessel was conducted to verify the performance of this system for imaging fast biological events. The GPU based image reconstruction software code for this dual-modality system is open source and available for download from http://sourceforge.net/projects/pat realtime .

  1. Four-Channel, 8 x 8 Bit, Two-Dimensional Parallel Transmission by use of Space-Code-Division Multiple-Access Encoder and Decoder Modules.

    PubMed

    Nakamura, M; Kitayama, K; Igasaki, Y; Kaneda, K

    1998-07-10

    We experimentally demonstrate four-channel multiplexing of 64-bit (8 x 8) two-dimensional (2-D) parallel data links on the basis of optical space-code-division multiple access (CDMA) by using new modules of optical spatial encoders and a decoder with a new high-contrast 9-m-long image fiber with 3 x 10(4) cores. Each 8 x 8 bit plane (64-bit parallel data) is optically encoded with an 8 x 8, 2-D optical orthogonal signature pattern. The encoded bit planes are spatially multiplexed and transmitted through an image fiber. A receiver can recover the intended input bit plane by means of an optical decoding process. This result should encourage the application of optical space-CDMA to future high-throughput 2-D parallel data links connecting massively parallel processors.

  2. An overview of the activities of the OECD/NEA Task Force on adapting computer codes in nuclear applications to parallel architectures

    SciTech Connect

    Kirk, B.L.; Sartori, E.

    1997-06-01

    Subsequent to the introduction of High Performance Computing in the developed countries, the Organization for Economic Cooperation and Development/Nuclear Energy Agency (OECD/NEA) created the Task Force on Adapting Computer Codes in Nuclear Applications to Parallel Architectures (under the guidance of the Nuclear Science Committee`s Working Party on Advanced Computing) to study the growth area in supercomputing and its applicability to the nuclear community`s computer codes. The result has been four years of investigation for the Task Force in different subject fields - deterministic and Monte Carlo radiation transport, computational mechanics and fluid dynamics, nuclear safety, atmospheric models and waste management.

  3. Optimization and Parallelization of the Thermal-Hydraulic Sub-channel Code CTF for High-Fidelity Multi-physics Applications

    SciTech Connect

    Salko, Robert K; Schmidt, Rodney; Avramova, Maria N

    2014-01-01

    This paper describes major improvements to the computational infrastructure of the CTF sub-channel code so that full-core sub-channel-resolved simulations can now be performed in much shorter run-times, either in stand-alone mode or as part of coupled-code multi-physics calculations. These improvements support the goals of the Department Of Energy (DOE) Consortium for Advanced Simulations of Light Water (CASL) Energy Innovation Hub to develop high fidelity multi-physics simulation tools for nuclear energy design and analysis. A set of serial code optimizations--including fixing computational inefficiencies, optimizing the numerical approach, and making smarter data storage choices--are first described and shown to reduce both execution time and memory usage by about a factor of ten. Next, a Single Program Multiple Data (SPMD) parallelization strategy targeting distributed memory Multiple Instruction Multiple Data (MIMD) platforms and utilizing domain-decomposition is presented. In this approach, data communication between processors is accomplished by inserting standard MPI calls at strategic points in the code. The domain decomposition approach implemented assigns one MPI process to each fuel assembly, with each domain being represented by its own CTF input file. The creation of CTF input files, both for serial and parallel runs, is also fully automated through use of a pre-processor utility that takes a greatly reduced set of user input over the traditional CTF input file. To run CTF in parallel, two additional libraries are currently needed; MPI, for inter-processor message passing, and the Parallel Extensible Toolkit for Scientific Computation (PETSc), which is leveraged to solve the global pressure matrix in parallel. Results presented include a set of testing and verification calculations and performance tests assessing parallel scaling characteristics up to a full core, sub-channel-resolved model of Watts Bar Unit 1 under hot full-power conditions (193 17x17

  4. Implementation and Characterization of Three-Dimensional Particle-in-Cell Codes on Multiple-Instruction-Multiple-Data Massively Parallel Supercomputers

    NASA Technical Reports Server (NTRS)

    Lyster, P. M.; Liewer, P. C.; Decyk, V. K.; Ferraro, R. D.

    1995-01-01

    A three-dimensional electrostatic particle-in-cell (PIC) plasma simulation code has been developed on coarse-grain distributed-memory massively parallel computers with message passing communications. Our implementation is the generalization to three-dimensions of the general concurrent particle-in-cell (GCPIC) algorithm. In the GCPIC algorithm, the particle computation is divided among the processors using a domain decomposition of the simulation domain. In a three-dimensional simulation, the domain can be partitioned into one-, two-, or three-dimensional subdomains ("slabs," "rods," or "cubes") and we investigate the efficiency of the parallel implementation of the push for all three choices. The present implementation runs on the Intel Touchstone Delta machine at Caltech; a multiple-instruction-multiple-data (MIMD) parallel computer with 512 nodes. We find that the parallel efficiency of the push is very high, with the ratio of communication to computation time in the range 0.3%-10.0%. The highest efficiency (> 99%) occurs for a large, scaled problem with 64(sup 3) particles per processing node (approximately 134 million particles of 512 nodes) which has a push time of about 250 ns per particle per time step. We have also developed expressions for the timing of the code which are a function of both code parameters (number of grid points, particles, etc.) and machine-dependent parameters (effective FLOP rate, and the effective interprocessor bandwidths for the communication of particles and grid points). These expressions can be used to estimate the performance of scaled problems--including those with inhomogeneous plasmas--to other parallel machines once the machine-dependent parameters are known.

  5. Hybrid MPI-OpenMP Parallelism in the ONETEP Linear-Scaling Electronic Structure Code: Application to the Delamination of Cellulose Nanofibrils.

    PubMed

    Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton

    2014-11-11

    We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here. PMID:26584365

  6. Parallel algorithm development

    SciTech Connect

    Adams, T.F.

    1996-06-01

    Rapid changes in parallel computing technology are causing significant changes in the strategies being used for parallel algorithm development. One approach is simply to write computer code in a standard language like FORTRAN 77 or with the expectation that the compiler will produce executable code that will run in parallel. The alternatives are: (1) to build explicit message passing directly into the source code; or (2) to write source code without explicit reference to message passing or parallelism, but use a general communications library to provide efficient parallel execution. Application of these strategies is illustrated with examples of codes currently under development.

  7. Fast Coding Unit Encoding Mechanism for Low Complexity Video Coding

    PubMed Central

    Wu, Yueying; Jia, Kebin; Gao, Guandong

    2016-01-01

    In high efficiency video coding (HEVC), coding tree contributes to excellent compression performance. However, coding tree brings extremely high computational complexity. Innovative works for improving coding tree to further reduce encoding time are stated in this paper. A novel low complexity coding tree mechanism is proposed for HEVC fast coding unit (CU) encoding. Firstly, this paper makes an in-depth study of the relationship among CU distribution, quantization parameter (QP) and content change (CC). Secondly, a CU coding tree probability model is proposed for modeling and predicting CU distribution. Eventually, a CU coding tree probability update is proposed, aiming to address probabilistic model distortion problems caused by CC. Experimental results show that the proposed low complexity CU coding tree mechanism significantly reduces encoding time by 27% for lossy coding and 42% for visually lossless coding and lossless coding. The proposed low complexity CU coding tree mechanism devotes to improving coding performance under various application conditions. PMID:26999741

  8. The role of orthographic and phonological codes in the word and the pseudoword superiority effect: an analysis by means of multinomial processing tree models.

    PubMed

    Maris, Eric

    2002-12-01

    Central to the current accounts of the word and the pseudoword superiority effect (WSE and PWSE, respectively) is the concept of a unitized code that is less susceptible to masking than single-letter codes. Current explanations of the WSE and PWSE assume that this unitized code is orthographic, explaining these phenomena by the assumption of dual read-out from unitized and single-letter codes. In this article, orthographic dual read-out models are compared with a phonological dual read-out model (which is based on the assumption that the 1st unitized code is phonological). From this phonological code, an orthographic code is derived, through either lexical access or assembly. Comparison of the orthographic and phonological dual read-out models was performed by formulating both models as multinomial processing tree models. From an application of these models to the data of 2 letter identification experiments, it was clear that the orthographic dual read-out models are insufficient as an explanation of the PWSE, whereas the phonological dual read-out model is sufficient. PMID:12542135

  9. Implementation of a flexible and scalable particle-in-cell method for massively parallel computations in the mantle convection code ASPECT

    NASA Astrophysics Data System (ADS)

    Gassmöller, Rene; Bangerth, Wolfgang

    2016-04-01

    Particle-in-cell methods have a long history and many applications in geodynamic modelling of mantle convection, lithospheric deformation and crustal dynamics. They are primarily used to track material information, the strain a material has undergone, the pressure-temperature history a certain material region has experienced, or the amount of volatiles or partial melt present in a region. However, their efficient parallel implementation - in particular combined with adaptive finite-element meshes - is complicated due to the complex communication patterns and frequent reassignment of particles to cells. Consequently, many current scientific software packages accomplish this efficient implementation by specifically designing particle methods for a single purpose, like the advection of scalar material properties that do not evolve over time (e.g., for chemical heterogeneities). Design choices for particle integration, data storage, and parallel communication are then optimized for this single purpose, making the code relatively rigid to changing requirements. Here, we present the implementation of a flexible, scalable and efficient particle-in-cell method for massively parallel finite-element codes with adaptively changing meshes. Using a modular plugin structure, we allow maximum flexibility of the generation of particles, the carried tracer properties, the advection and output algorithms, and the projection of properties to the finite-element mesh. We present scaling tests ranging up to tens of thousands of cores and tens of billions of particles. Additionally, we discuss efficient load-balancing strategies for particles in adaptive meshes with their strengths and weaknesses, local particle-transfer between parallel subdomains utilizing existing communication patterns from the finite element mesh, and the use of established parallel output algorithms like the HDF5 library. Finally, we show some relevant particle application cases, compare our implementation to a

  10. Development of the 3D Parallel Particle-In-Cell Code IMPACT to Simulate the Ion Beam Transport System of VENUS (Abstract)

    NASA Astrophysics Data System (ADS)

    Qiang, J.; Leitner, D.; Todd, D. S.; Ryne, R. D.

    2005-03-01

    The superconducting ECR ion source VENUS serves as the prototype injector ion source for the Rare Isotope Accelerator (RIA) driver linac. The RIA driver linac requires a great variety of high charge state ion beams with up to an order of magnitude higher intensity than currently achievable with conventional ECR ion sources. In order to design the beam line optics of the low energy beam line for the RIA front end for the wide parameter range required for the RIA driver accelerator, reliable simulations of the ion beam extraction from the ECR ion source through the ion mass analyzing system are essential. The RIA low energy beam transport line must be able to transport intense beams (up to 10 mA) of light and heavy ions at 30 keV. For this purpose, LBNL is developing the parallel 3D particle-in-cell code IMPACT to simulate the ion beam transport from the ECR extraction aperture through the analyzing section of the low energy transport system. IMPACT, a parallel, particle-in-cell code, is currently used to model the superconducting RF linac section of RIA and is being modified in order to simulate DC beams from the ECR ion source extraction. By using the high performance of parallel supercomputing we will be able to account consistently for the changing space charge in the extraction region and the analyzing section. A progress report and early results in the modeling of the VENUS source will be presented.

  11. Development of the 3D Parallel Particle-In-Cell Code IMPACT to Simulate the Ion Beam Transport System of VENUS (Abstract)

    SciTech Connect

    Qiang, J.; Leitner, D.; Todd, D.S.; Ryne, R.D.

    2005-03-15

    The superconducting ECR ion source VENUS serves as the prototype injector ion source for the Rare Isotope Accelerator (RIA) driver linac. The RIA driver linac requires a great variety of high charge state ion beams with up to an order of magnitude higher intensity than currently achievable with conventional ECR ion sources. In order to design the beam line optics of the low energy beam line for the RIA front end for the wide parameter range required for the RIA driver accelerator, reliable simulations of the ion beam extraction from the ECR ion source through the ion mass analyzing system are essential. The RIA low energy beam transport line must be able to transport intense beams (up to 10 mA) of light and heavy ions at 30 keV.For this purpose, LBNL is developing the parallel 3D particle-in-cell code IMPACT to simulate the ion beam transport from the ECR extraction aperture through the analyzing section of the low energy transport system. IMPACT, a parallel, particle-in-cell code, is currently used to model the superconducting RF linac section of RIA and is being modified in order to simulate DC beams from the ECR ion source extraction. By using the high performance of parallel supercomputing we will be able to account consistently for the changing space charge in the extraction region and the analyzing section. A progress report and early results in the modeling of the VENUS source will be presented.

  12. IM3D: A parallel Monte Carlo code for efficient simulations of primary radiation displacements and damage in 3D geometry

    PubMed Central

    Li, Yong Gang; Yang, Yang; Short, Michael P.; Ding, Ze Jun; Zeng, Zhi; Li, Ju

    2015-01-01

    SRIM-like codes have limitations in describing general 3D geometries, for modeling radiation displacements and damage in nanostructured materials. A universal, computationally efficient and massively parallel 3D Monte Carlo code, IM3D, has been developed with excellent parallel scaling performance. IM3D is based on fast indexing of scattering integrals and the SRIM stopping power database, and allows the user a choice of Constructive Solid Geometry (CSG) or Finite Element Triangle Mesh (FETM) method for constructing 3D shapes and microstructures. For 2D films and multilayers, IM3D perfectly reproduces SRIM results, and can be ∼102 times faster in serial execution and > 104 times faster using parallel computation. For 3D problems, it provides a fast approach for analyzing the spatial distributions of primary displacements and defect generation under ion irradiation. Herein we also provide a detailed discussion of our open-source collision cascade physics engine, revealing the true meaning and limitations of the “Quick Kinchin-Pease” and “Full Cascades” options. The issues of femtosecond to picosecond timescales in defining displacement versus damage, the limitation of the displacements per atom (DPA) unit in quantifying radiation damage (such as inadequacy in quantifying degree of chemical mixing), are discussed. PMID:26658477

  13. Dynamic analysis of the parallel-plate EMP (Electromagnetic Pulse) simulator using a wire-mesh approximation and the numerical electromagnetics code. Final report

    SciTech Connect

    Gedney, S.D.

    1987-09-01

    The electromagnetic pulse (EMP) produced by a high-altitude nuclear blast presents a severe threat to electronic systems due to its extreme characteristics. To test the vulnerability of large systems, such as airplanes, missiles, or satellites, they must be subjected to a simulated EMP environment. One type of simulator that has been used to approximate the EMP environment is the Large Parallel-Plate Bounded-Wave Simulator. It is a guided-wave simulator which has properties of a transmission line and supports a single TEM model at sufficiently low frequencies. This type of simulator consists of finite-width parallel-plate waveguides, which are excited by a wave launcher and terminated by a wave receptor. This study addresses the field distribution within a finite-width parallel-plate waveguide that is matched to a conical tapered waveguide at either end. Characteristics of a parallel-plate bounded-wave EMP simulator were developed using scattering theory, thin-wire mesh approximation of the conducting surfaces, and the Numerical Electronics Code (NEC). Background is provided for readers to use the NEC as a tool in solving thin-wire scattering problems.

  14. The influence of viral coding sequences on pestivirus IRES activity reveals further parallels with translation initiation in prokaryotes.

    PubMed Central

    Fletcher, Simon P; Ali, Iraj K; Kaminski, Ann; Digard, Paul; Jackson, Richard J

    2002-01-01

    Classical swine fever virus (CSFV) is a member of the pestivirus family, which shares many features in common with hepatitis C virus (HCV). It is shown here that CSFV has an exceptionally efficient cis-acting internal ribosome entry segment (IRES), which, like that of HCV, is strongly influenced by the sequences immediately downstream of the initiation codon, and is optimal with viral coding sequences in this position. Constructs that retained 17 or more codons of viral coding sequence exhibited full IRES activity, but with only 12 codons, activity was approximately 66% of maximum in vitro (though close to maximum in transfected BHK cells), whereas with just 3 codons or fewer, the activity was only approximately 15% of maximum. The minimal coding region elements required for high activity were exchanged between HCV and CSFV. Although maximum activity was observed in each case with the homologous combination of coding region and 5' UTR, the heterologous combinations were sufficiently active to rule out a highly specific functional interplay between the 5' UTR and coding sequences. On the other hand, inversion of the coding sequences resulted in low IRES activity, particularly with the HCV coding sequences. RNA structure probing showed that the efficiency of internal initiation of these chimeric constructs correlated most closely with the degree of single-strandedness of the region around and immediately downstream of the initiation codon. The low activity IRESs could not be rescued by addition of supplementary eIF4A (the initiation factor with ATP-dependent RNA helicase activity). The extreme sensitivity to secondary structure around the initiation codon is likely to be due to the fact that the eIF4F complex (which has eIF4A as one of its subunits) is not required for and does not participate in initiation on these IRESs. PMID:12515388

  15. Combining node-centered parallel radiation transport and higher-order multi-material cell-centered hydrodynamics methods in three-temperature radiation hydrodynamics code TRHD

    NASA Astrophysics Data System (ADS)

    Sijoy, C. D.; Chaturvedi, S.

    2016-06-01

    Higher-order cell-centered multi-material hydrodynamics (HD) and parallel node-centered radiation transport (RT) schemes are combined self-consistently in three-temperature (3T) radiation hydrodynamics (RHD) code TRHD (Sijoy and Chaturvedi, 2015) developed for the simulation of intense thermal radiation or high-power laser driven RHD. For RT, a node-centered gray model implemented in a popular RHD code MULTI2D (Ramis et al., 2009) is used. This scheme, in principle, can handle RT in both optically thick and thin materials. The RT module has been parallelized using message passing interface (MPI) for parallel computation. Presently, for multi-material HD, we have used a simple and robust closure model in which common strain rates to all materials in a mixed cell is assumed. The closure model has been further generalized to allow different temperatures for the electrons and ions. In addition to this, electron and radiation temperatures are assumed to be in non-equilibrium. Therefore, the thermal relaxation between the electrons and ions and the coupling between the radiation and matter energies are required to be computed self-consistently. This has been achieved by using a node-centered symmetric-semi-implicit (SSI) integration scheme. The electron thermal conduction is calculated using a cell-centered, monotonic, non-linear finite volume scheme (NLFV) suitable for unstructured meshes. In this paper, we have described the details of the 2D, 3T, non-equilibrium, multi-material RHD code developed with a special attention to the coupling of various cell-centered and node-centered formulations along with a suite of validation test problems to demonstrate the accuracy and performance of the algorithms. We also report the parallel performance of RT module. Finally, in order to demonstrate the full capability of the code implementation, we have presented the simulation of laser driven shock propagation in a layered thin foil. The simulation results are found to be in good

  16. TOMO3D: 3-D joint refraction and reflection traveltime tomography parallel code for active-source seismic data—synthetic test

    NASA Astrophysics Data System (ADS)

    Meléndez, A.; Korenaga, J.; Sallarès, V.; Miniussi, A.; Ranero, C. R.

    2015-10-01

    We present a new 3-D traveltime tomography code (TOMO3D) for the modelling of active-source seismic data that uses the arrival times of both refracted and reflected seismic phases to derive the velocity distribution and the geometry of reflecting boundaries in the subsurface. This code is based on its popular 2-D version TOMO2D from which it inherited the methods to solve the forward and inverse problems. The traveltime calculations are done using a hybrid ray-tracing technique combining the graph and bending methods. The LSQR algorithm is used to perform the iterative regularized inversion to improve the initial velocity and depth models. In order to cope with an increased computational demand due to the incorporation of the third dimension, the forward problem solver, which takes most of the run time (˜90 per cent in the test presented here), has been parallelized with a combination of multi-processing and message passing interface standards. This parallelization distributes the ray-tracing and traveltime calculations among available computational resources. The code's performance is illustrated with a realistic synthetic example, including a checkerboard anomaly and two reflectors, which simulates the geometry of a subduction zone. The code is designed to invert for a single reflector at a time. A data-driven layer-stripping strategy is proposed for cases involving multiple reflectors, and it is tested for the successive inversion of the two reflectors. Layers are bound by consecutive reflectors, and an initial velocity model for each inversion step incorporates the results from previous steps. This strategy poses simpler inversion problems at each step, allowing the recovery of strong velocity discontinuities that would otherwise be smoothened.

  17. Icarus: A 2D direct simulation Monte Carlo (DSMC) code for parallel computers. User`s manual - V.3.0

    SciTech Connect

    Bartel, T.; Plimpton, S.; Johannes, J.; Payne, J.

    1996-10-01

    Icarus is a 2D Direct Simulation Monte Carlo (DSMC) code which has been optimized for the parallel computing environment. The code is based on the DSMC method of Bird and models from free-molecular to continuum flowfields in either cartesian (x, y) or axisymmetric (z, r) coordinates. Computational particles, representing a given number of molecules or atoms, are tracked as they have collisions with other particles or surfaces. Multiple species, internal energy modes (rotation and vibration), chemistry, and ion transport are modelled. A new trace species methodology for collisions and chemistry is used to obtain statistics for small species concentrations. Gas phase chemistry is modelled using steric factors derived from Arrhenius reaction rates. Surface chemistry is modelled with surface reaction probabilities. The electron number density is either a fixed external generated field or determined using a local charge neutrality assumption. Ion chemistry is modelled with electron impact chemistry rates and charge exchange reactions. Coulomb collision cross-sections are used instead of Variable Hard Sphere values for ion-ion interactions. The electrostatic fields can either be externally input or internally generated using a Langmuir-Tonks model. The Icarus software package includes the grid generation, parallel processor decomposition, postprocessing, and restart software. The commercial graphics package, Tecplot, is used for graphics display. The majority of the software packages are written in standard Fortran.

  18. A parallel PCG solver for MODFLOW.

    PubMed

    Dong, Yanhui; Li, Guomin

    2009-01-01

    In order to simulate large-scale ground water flow problems more efficiently with MODFLOW, the OpenMP programming paradigm was used to parallelize the preconditioned conjugate-gradient (PCG) solver with in this study. Incremental parallelization, the significant advantage supported by OpenMP on a shared-memory computer, made the solver transit to a parallel program smoothly one block of code at a time. The parallel PCG solver, suitable for both MODFLOW-2000 and MODFLOW-2005, is verified using an 8-processor computer. Both the impact of compilers and different model domain sizes were considered in the numerical experiments. Based on the timing results, execution times using the parallel PCG solver are typically about 1.40 to 5.31 times faster than those using the serial one. In addition, the simulation results are the exact same as the original PCG solver, because the majority of serial codes were not changed. It is worth noting that this parallelizing approach reduces cost in terms of software maintenance because only a single source PCG solver code needs to be maintained in the MODFLOW source tree. PMID:19563427

  19. Large Scale Earth's Bow Shock with Northern IMF as Simulated by PIC Code in Parallel with MHD Model

    NASA Astrophysics Data System (ADS)

    Baraka, Suleiman

    2016-06-01

    In this paper, we propose a 3D kinetic model (particle-in-cell, PIC) for the description of the large scale Earth's bow shock. The proposed version is stable and does not require huge or extensive computer resources. Because PIC simulations work with scaled plasma and field parameters, we also propose to validate our code by comparing its results with the available MHD simulations under same scaled solar wind (SW) and (IMF) conditions. We report new results from the two models. In both codes the Earth's bow shock position is found to be ≈14.8 R E along the Sun-Earth line, and ≈29 R E on the dusk side. Those findings are consistent with past in situ observations. Both simulations reproduce the theoretical jump conditions at the shock. However, the PIC code density and temperature distributions are inflated and slightly shifted sunward when compared to the MHD results. Kinetic electron motions and reflected ions upstream may cause this sunward shift. Species distributions in the foreshock region are depicted within the transition of the shock (measured ≈2 c/ ω pi for Θ Bn = 90° and M MS = 4.7) and in the downstream. The size of the foot jump in the magnetic field at the shock is measured to be (1.7 c/ ω pi ). In the foreshocked region, the thermal velocity is found equal to 213 km s-1 at 15 R E and is equal to 63 km s -1 at 12 R E (magnetosheath region). Despite the large cell size of the current version of the PIC code, it is powerful to retain macrostructure of planets magnetospheres in very short time, thus it can be used for pedagogical test purposes. It is also likely complementary with MHD to deepen our understanding of the large scale magnetosphere.

  20. OFF, Open source Finite volume Fluid dynamics code: A free, high-order solver based on parallel, modular, object-oriented Fortran API

    NASA Astrophysics Data System (ADS)

    Zaghi, S.

    2014-07-01

    OFF, an open source (free software) code for performing fluid dynamics simulations, is presented. The aim of OFF is to solve, numerically, the unsteady (and steady) compressible Navier-Stokes equations of fluid dynamics by means of finite volume techniques: the research background is mainly focused on high-order (WENO) schemes for multi-fluids, multi-phase flows over complex geometries. To this purpose a highly modular, object-oriented application program interface (API) has been developed. In particular, the concepts of data encapsulation and inheritance available within Fortran language (from standard 2003) have been stressed in order to represent each fluid dynamics “entity” (e.g. the conservative variables of a finite volume, its geometry, etc…) by a single object so that a large variety of computational libraries can be easily (and efficiently) developed upon these objects. The main features of OFF can be summarized as follows: Programming LanguageOFF is written in standard (compliant) Fortran 2003; its design is highly modular in order to enhance simplicity of use and maintenance without compromising the efficiency; Parallel Frameworks Supported the development of OFF has been also targeted to maximize the computational efficiency: the code is designed to run on shared-memory multi-cores workstations and distributed-memory clusters of shared-memory nodes (supercomputers); the code’s parallelization is based on Open Multiprocessing (OpenMP) and Message Passing Interface (MPI) paradigms; Usability, Maintenance and Enhancement in order to improve the usability, maintenance and enhancement of the code also the documentation has been carefully taken into account; the documentation is built upon comprehensive comments placed directly into the source files (no external documentation files needed): these comments are parsed by means of doxygen free software producing high quality html and latex documentation pages; the distributed versioning system referred

  1. LaMEM: a Massively Parallel Staggered-Grid Finite-Difference Code for Thermo-Mechanical Modeling of Lithospheric Deformation with Visco-Elasto-Plastic Rheologies

    NASA Astrophysics Data System (ADS)

    Kaus, B.; Popov, A.

    2014-12-01

    The complexity of lithospheric rheology and the necessity to resolve the deformation patterns near the free surface (faults and folds) sufficiently well places a great demand on a stable and scalable modeling tool that is capable of efficiently handling nonlinearities. Our code LaMEM (Lithosphere and Mantle Evolution Model) is an attempt to satisfy this demand. The code utilizes a stable and numerically inexpensive finite difference discretization with the spatial staggering of velocity, pressure, and temperature unknowns (a so-called staggered grid). As a time discretization method the forward Euler, or a combination of the predictor-corrector and the fourth-order Runge-Kutta can be chosen. Elastic stresses are rotated on the markers, which are also used to track all relevant material properties and solution history fields. The Newtonian nonlinear iteration, however, is handled at the level of the grid points to avoid spurious averaging between markers and grid. Such an arrangement required us to develop a non-standard discretization of the effective strain-rate second invariant. Important feature of the code is its ability to handle stress-free and open-box boundary conditions, in which empty cells are simply eliminated from the discretization, which also solves the biggest problem of the sticky-air approach - namely large viscosity jumps near the free surface. We currently support an arbitrary combination of linear elastic, nonlinear viscous with multiple creep mechanisms, and plastic rheologies based on either a depth-dependent von Mises or pressure-dependent Drucker-Prager yield criteria.LaMEM is being developed as an inherently parallel code. Structurally all its parts are based on the building blocks provided by PETSc library. These include Jacobian-Free Newton-Krylov nonlinear solvers with convergence globalization techniques (line search), equipped with different linear preconditioners. We have also implemented the coupled velocity-pressure multigrid

  2. Distributed Contour Trees

    SciTech Connect

    Morozov, Dmitriy; Weber, Gunther H.

    2014-03-31

    Topological techniques provide robust tools for data analysis. They are used, for example, for feature extraction, for data de-noising, and for comparison of data sets. This chapter concerns contour trees, a topological descriptor that records the connectivity of the isosurfaces of scalar functions. These trees are fundamental to analysis and visualization of physical phenomena modeled by real-valued measurements. We study the parallel analysis of contour trees. After describing a particular representation of a contour tree, called local{global representation, we illustrate how di erent problems that rely on contour trees can be solved in parallel with minimal communication.

  3. Parallelization of TWOPORFLOW, a Cartesian Grid based Two-phase Porous Media Code for Transient Thermo-hydraulic Simulations

    NASA Astrophysics Data System (ADS)

    Trost, Nico; Jiménez, Javier; Imke, Uwe; Sanchez, Victor

    2014-06-01

    TWOPORFLOW is a thermo-hydraulic code based on a porous media approach to simulate single- and two-phase flow including boiling. It is under development at the Institute for Neutron Physics and Reactor Technology (INR) at KIT. The code features a 3D transient solution of the mass, momentum and energy conservation equations for two inter-penetrating fluids with a semi-implicit continuous Eulerian type solver. The application domain of TWOPORFLOW includes the flow in standard porous media and in structured porous media such as micro-channels and cores of nuclear power plants. In the latter case, the fluid domain is coupled to a fuel rod model, describing the heat flow inside the solid structure. In this work, detailed profiling tools have been utilized to determine the optimization potential of TWOPORFLOW. As a result, bottle-necks were identified and reduced in the most feasible way, leading for instance to an optimization of the water-steam property computation. Furthermore, an OpenMP implementation addressing the routines in charge of inter-phase momentum-, energy- and mass-coupling delivered good performance together with a high scalability on shared memory architectures. In contrast to that, the approach for distributed memory systems was to solve sub-problems resulting by the decomposition of the initial Cartesian geometry. Thread communication for the sub-problem boundary updates was accomplished by the Message Passing Interface (MPI) standard.

  4. A parallelized binary search tree

    Technology Transfer Automated Retrieval System (TEKTRAN)

    PTTRNFNDR is an unsupervised statistical learning algorithm that detects patterns in DNA sequences, protein sequences, or any natural language texts that can be decomposed into letters of a finite alphabet. PTTRNFNDR performs complex mathematical computations and its processing time increases when i...

  5. Identification and quantification of carbamate pesticides in dried lime tree flowers by means of excitation-emission molecular fluorescence and parallel factor analysis when quenching effect exists.

    PubMed

    Rubio, L; Ortiz, M C; Sarabia, L A

    2014-04-11

    A non-separative, fast and inexpensive spectrofluorimetric method based on the second order calibration of excitation-emission fluorescence matrices (EEMs) was proposed for the determination of carbaryl, carbendazim and 1-naphthol in dried lime tree flowers. The trilinearity property of three-way data was used to handle the intrinsic fluorescence of lime flowers and the difference in the fluorescence intensity of each analyte. It also made possible to identify unequivocally each analyte. Trilinearity of the data tensor guarantees the uniqueness of the solution obtained through parallel factor analysis (PARAFAC), so the factors of the decomposition match up with the analytes. In addition, an experimental procedure was proposed to identify, with three-way data, the quenching effect produced by the fluorophores of the lime flowers. This procedure also enabled the selection of the adequate dilution of the lime flowers extract to minimize the quenching effect so the three analytes can be quantified. Finally, the analytes were determined using the standard addition method for a calibration whose standards were chosen with a D-optimal design. The three analytes were unequivocally identified by the correlation between the pure spectra and the PARAFAC excitation and emission spectral loadings. The trueness was established by the accuracy line "calculated concentration versus added concentration" in all cases. Better decision limit values (CCα), in x0=0 with the probability of false positive fixed at 0.05, were obtained for the calibration performed in pure solvent: 2.97 μg L(-1) for 1-naphthol, 3.74 μg L(-1) for carbaryl and 23.25 μg L(-1) for carbendazim. The CCα values for the second calibration carried out in matrix were 1.61, 4.34 and 51.75 μg L(-1) respectively; while the values obtained considering only the pure samples as calibration set were: 2.65, 8.61 and 28.7 μg L(-1), respectively.

  6. Cultural Codes as Catalysts for Collective Conscientisation in Environmental Adult Education: Mr. Floatie, Tree Squatting and Save-Our-Surfers

    ERIC Educational Resources Information Center

    Walter, Pierre

    2012-01-01

    This study examines how cultural codes in environmental adult education can be used to "frame" collective identity, develop counterhegemonic ideologies, and catalyse "educative-activism" within social movements. Three diverse examples are discussed, spanning environmental movements in urban Victoria, British Columbia, Canada, the redwoods of…

  7. The Use of Coded PCR Primers Enables High-Throughput Sequencing of Multiple Homolog Amplification Products by 454 Parallel Sequencing

    PubMed Central

    Bollback, Jonathan P.; Panitz, Frank; Bendixen, Christian; Nielsen, Rasmus; Willerslev, Eske

    2007-01-01

    Background The invention of the Genome Sequence 20™ DNA Sequencing System (454 parallel sequencing platform) has enabled the rapid and high-volume production of sequence data. Until now, however, individual emulsion PCR (emPCR) reactions and subsequent sequencing runs have been unable to combine template DNA from multiple individuals, as homologous sequences cannot be subsequently assigned to their original sources. Methodology We use conventional PCR with 5′-nucleotide tagged primers to generate homologous DNA amplification products from multiple specimens, followed by sequencing through the high-throughput Genome Sequence 20™ DNA Sequencing System (GS20, Roche/454 Life Sciences). Each DNA sequence is subsequently traced back to its individual source through 5′tag-analysis. Conclusions We demonstrate that this new approach enables the assignment of virtually all the generated DNA sequences to the correct source once sequencing anomalies are accounted for (miss-assignment rate<0.4%). Therefore, the method enables accurate sequencing and assignment of homologous DNA sequences from multiple sources in single high-throughput GS20 run. We observe a bias in the distribution of the differently tagged primers that is dependent on the 5′ nucleotide of the tag. In particular, primers 5′ labelled with a cytosine are heavily overrepresented among the final sequences, while those 5′ labelled with a thymine are strongly underrepresented. A weaker bias also exists with regards to the distribution of the sequences as sorted by the second nucleotide of the dinucleotide tags. As the results are based on a single GS20 run, the general applicability of the approach requires confirmation. However, our experiments demonstrate that 5′primer tagging is a useful method in which the sequencing power of the GS20 can be applied to PCR-based assays of multiple homologous PCR products. The new approach will be of value to a broad range of research areas, such as those of

  8. OCTGRAV: Sparse Octree Gravitational N-body Code on Graphics Processing Units

    NASA Astrophysics Data System (ADS)

    Gaburov, Evghenii; Bédorf, Jeroen; Portegies Zwart, Simon

    2010-10-01

    Octgrav is a new very fast tree-code which runs on massively parallel Graphical Processing Units (GPU) with NVIDIA CUDA architecture. The algorithms are based on parallel-scan and sort methods. The tree-construction and calculation of multipole moments is carried out on the host CPU, while the force calculation which consists of tree walks and evaluation of interaction list is carried out on the GPU. In this way, a sustained performance of about 100GFLOP/s and data transfer rates of about 50GB/s is achieved. It takes about a second to compute forces on a million particles with an opening angle of heta approx 0.5. To test the performance and feasibility, we implemented the algorithms in CUDA in the form of a gravitational tree-code which completely runs on the GPU. The tree construction and traverse algorithms are portable to many-core devices which have support for CUDA or OpenCL programming languages. The gravitational tree-code outperforms tuned CPU code during the tree-construction and shows a performance improvement of more than a factor 20 overall, resulting in a processing rate of more than 2.8 million particles per second. The code has a convenient user interface and is freely available for use.

  9. High frequency burst firing of granule cells ensures transmission at the parallel fiber to purkinje cell synapse at the cost of temporal coding.

    PubMed

    van Beugen, Boeke J; Gao, Zhenyu; Boele, Henk-Jan; Hoebeek, Freek; De Zeeuw, Chris I

    2013-01-01

    Cerebellar granule cells (GrCs) convey information from mossy fibers (MFs) to Purkinje cells (PCs) via their parallel fibers (PFs). MF to GrC signaling allows transmission of frequencies up to 1 kHz and GrCs themselves can also fire bursts of action potentials with instantaneous frequencies up to 1 kHz. So far, in the scientific literature no evidence has been shown that these high-frequency bursts also exist in awake, behaving animals. More so, it remains to be shown whether such high-frequency bursts can transmit temporally coded information from MFs to PCs and/or whether these patterns of activity contribute to the spatiotemporal filtering properties of the GrC layer. Here, we show that, upon sensory stimulation in both un-anesthetized rabbits and mice, GrCs can show bursts that consist of tens of spikes at instantaneous frequencies over 800 Hz. In vitro recordings from individual GrC-PC pairs following high-frequency stimulation revealed an overall low initial release probability of ~0.17. Nevertheless, high-frequency burst activity induced a short-lived facilitation to ensure signaling within the first few spikes, which was rapidly followed by a reduction in transmitter release. The facilitation rate among individual GrC-PC pairs was heterogeneously distributed and could be classified as either "reluctant" or "responsive" according to their release characteristics. Despite the variety of efficacy at individual connections, grouped activity in GrCs resulted in a linear relationship between PC response and PF burst duration at frequencies up to 300 Hz allowing rate coding to persist at the network level. Together, these findings support the hypothesis that the cerebellar granular layer acts as a spatiotemporal filter between MF input and PC output (D'Angelo and De Zeeuw, 2009). PMID:23734102

  10. High frequency burst firing of granule cells ensures transmission at the parallel fiber to purkinje cell synapse at the cost of temporal coding.

    PubMed

    van Beugen, Boeke J; Gao, Zhenyu; Boele, Henk-Jan; Hoebeek, Freek; De Zeeuw, Chris I

    2013-01-01

    Cerebellar granule cells (GrCs) convey information from mossy fibers (MFs) to Purkinje cells (PCs) via their parallel fibers (PFs). MF to GrC signaling allows transmission of frequencies up to 1 kHz and GrCs themselves can also fire bursts of action potentials with instantaneous frequencies up to 1 kHz. So far, in the scientific literature no evidence has been shown that these high-frequency bursts also exist in awake, behaving animals. More so, it remains to be shown whether such high-frequency bursts can transmit temporally coded information from MFs to PCs and/or whether these patterns of activity contribute to the spatiotemporal filtering properties of the GrC layer. Here, we show that, upon sensory stimulation in both un-anesthetized rabbits and mice, GrCs can show bursts that consist of tens of spikes at instantaneous frequencies over 800 Hz. In vitro recordings from individual GrC-PC pairs following high-frequency stimulation revealed an overall low initial release probability of ~0.17. Nevertheless, high-frequency burst activity induced a short-lived facilitation to ensure signaling within the first few spikes, which was rapidly followed by a reduction in transmitter release. The facilitation rate among individual GrC-PC pairs was heterogeneously distributed and could be classified as either "reluctant" or "responsive" according to their release characteristics. Despite the variety of efficacy at individual connections, grouped activity in GrCs resulted in a linear relationship between PC response and PF burst duration at frequencies up to 300 Hz allowing rate coding to persist at the network level. Together, these findings support the hypothesis that the cerebellar granular layer acts as a spatiotemporal filter between MF input and PC output (D'Angelo and De Zeeuw, 2009).

  11. Performance of the UCAN2 Gyrokinetic Particle In Cell (PIC) Code on Two Massively Parallel Mainframes with Intel ``Sandy Bridge'' Processors

    NASA Astrophysics Data System (ADS)

    Leboeuf, Jean-Noel; Decyk, Viktor; Newman, David; Sanchez, Raul

    2013-10-01

    The massively parallel, 2D domain-decomposed, nonlinear, 3D, toroidal, electrostatic, gyrokinetic, Particle in Cell (PIC), Cartesian geometry UCAN2 code, with particle ions and adiabatic electrons, has been ported to two emerging mainframes. These two computers, one at NERSC in the US built by Cray named Edison and the other at the Barcelona Supercomputer Center (BSC) in Spain built by IBM named MareNostrum III (MNIII) just happen to share the same Intel ``Sandy Bridge'' processors. The successful port of UCAN2 to MNIII which came online first has enabled us to be up and running efficiently in record time on Edison. Overall, the performance of UCAN2 on Edison is superior to that on MNIII, particularly at large numbers of processors (>1024) for the same Intel IFORT compiler. This appears to be due to different MPI modules (OpenMPI on MNIII and MPICH2 on Edison) and different interconnection networks (Infiniband on MNIII and Cray's Aries on Edison) on the two mainframes. Details of these ports and comparative benchmarks are presented. Work supported by OFES, USDOE, under contract no. DE-FG02-04ER54741 with the University of Alaska at Fairbanks.

  12. Modeling the Backscatter and Transmitted Light of High Power Smoothed Beams with pF3D, a Massively Parallel Laser Plasma Interaction Code

    SciTech Connect

    Berger, R.L.; Divol, L.; Glenzer, S.; Hinkel, D.E.; Kirkwood, R.K.; Langdon, A.B.; Moody, J.D.; Still, C.H.; Suter, L.; Williams, E.A.; Young, P.E.

    2000-06-01

    Using the three-dimensional wave propagation code, F3D[Berger et al., Phys. Fluids B 5,2243 (1993), Berger et al., Phys. Plasmas 5,4337(1998)], and the massively parallel version pF3D, [Still et al. Phys. Plasmas 7 (2000)], we have computed the transmitted and reflected light for laser and plasma conditions in experiments that simulated ignition hohlraum conditions. The frequency spectrum and the wavenumber spectrum of the transmitted light are calculated and used to identify the relative contributions of stimulated forward Brillouin and self-focusing in hydrocarbon-filled balloons, commonly called gasbags. The effect of beam smoothing, smoothing by spectral dispersion (SSD) and polarization smoothing (PS), on the stimulated Brillouin backscatter (SBS) from Scale-1 NOVA hohlraums was simulated with the use nonlinear saturation models that limit the amplitude of the driven acoustic waves. Other experiments on CO{sub 2} gasbags simultaneously measure at a range of intensities the SBS reflectivity and the Thomson scatter from the SBS-driven acoustic waves that provide a more detailed test of the modeling. These calculations also predict that the backscattered light will be very nonuniform in the nearfield (the focusing system optics) which is important for specifying the backscatter intensities be tolerated by the National Ignition Facility laser system.

  13. Massively parallel multiple interacting continua formulation for modeling flow in fractured porous media using the subsurface reactive flow and transport code PFLOTRAN

    NASA Astrophysics Data System (ADS)

    Kumar, J.; Mills, R. T.; Lichtner, P. C.; Hammond, G. E.

    2010-12-01

    Fracture dominated flows occur in numerous subsurface geochemical processes and at many different scales in rock pore structures, micro-fractures, fracture networks and faults. Fractured porous media can be modeled as multiple interacting continua which are connected to each other through transfer terms that capture the flow of mass and energy in response to pressure, temperature and concentration gradients. However, the analysis of large-scale transient problems using the multiple interacting continuum approach presents an algorithmic and computational challenge for problems with very large numbers of degrees of freedom. A generalized dual porosity model based on the Dual Continuum Disconnected Matrix approach has been implemented within a massively parallel multiphysics-multicomponent-multiphase subsurface reactive flow and transport code PFLOTRAN. Developed as part of the Department of Energy's SciDAC-2 program, PFLOTRAN provides subsurface simulation capabilities that can scale from laptops to ultrascale supercomputers, and utilizes the PETSc framework to solve the large, sparse algebraic systems that arises in complex subsurface reactive flow and transport problems. It has been successfully applied to the solution of problems composed of more than two billions degrees of freedom, utilizing up to 131,072 processor cores on Jaguar, the Cray XT5 system at Oak Ridge National Laboratory that is the world’s fastest supercomputer. Building upon the capabilities and computational efficiency of PFLOTRAN, we will present an implementation of the multiple interacting continua formulation for fractured porous media along with an application case study.

  14. A fast parallel code for calculating energies and oscillator strengths of many-electron atoms at neutron star magnetic field strengths in adiabatic approximation

    NASA Astrophysics Data System (ADS)

    Engel, D.; Klews, M.; Wunner, G.

    2009-02-01

    We have developed a new method for the fast computation of wavelengths and oscillator strengths for medium-Z atoms and ions, up to iron, at neutron star magnetic field strengths. The method is a parallelized Hartree-Fock approach in adiabatic approximation based on finite-element and B-spline techniques. It turns out that typically 15-20 finite elements are sufficient to calculate energies to within a relative accuracy of 10-5 in 4 or 5 iteration steps using B-splines of 6th order, with parallelization speed-ups of 20 on a 26-processor machine. Results have been obtained for the energies of the ground states and excited levels and for the transition strengths of astrophysically relevant atoms and ions in the range Z=2…26 in different ionization stages. Catalogue identifier: AECC_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AECC_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3845 No. of bytes in distributed program, including test data, etc.: 27 989 Distribution format: tar.gz Programming language: MPI/Fortran 95 and Python Computer: Cluster of 1-26 HP Compaq dc5750 Operating system: Fedora 7 Has the code been vectorised or parallelized?: Yes RAM: 1 GByte Classification: 2.1 External routines: MPI/GFortran, LAPACK, PyLab/Matplotlib Nature of problem: Calculations of synthetic spectra [1] of strongly magnetized neutron stars are bedevilled by the lack of data for atoms in intense magnetic fields. While the behaviour of hydrogen and helium has been investigated in detail (see, e.g., [2]), complete and reliable data for heavier elements, in particular iron, are still missing. Since neutron stars are formed by the collapse of the iron cores of massive stars, it may be assumed that their atmospheres contain an iron plasma. Our objective is to fill the gap

  15. Neural coding of image structure and contrast polarity of Cartesian, hyperbolic, and polar gratings in the primary and secondary visual cortex of the tree shrew.

    PubMed

    Poirot, Jordan; De Luna, Paolo; Rainer, Gregor

    2016-04-01

    We comprehensively characterize spiking and visual evoked potential (VEP) activity in tree shrew V1 and V2 using Cartesian, hyperbolic, and polar gratings. Neural selectivity to structure of Cartesian gratings was higher than other grating classes in both visual areas. From V1 to V2, structure selectivity of spiking activity increased, whereas corresponding VEP values tended to decrease, suggesting that single-neuron coding of Cartesian grating attributes improved while the cortical columnar organization of these neurons became less precise from V1 to V2. We observed that neurons in V2 generally exhibited similar selectivity for polar and Cartesian gratings, suggesting that structure of polar-like stimuli might be encoded as early as in V2. This hypothesis is supported by the preference shift from V1 to V2 toward polar gratings of higher spatial frequency, consistent with the notion that V2 neurons encode visual scene borders and contours. Neural sensitivity to modulations of polarity of hyperbolic gratings was highest among all grating classes and closely related to the visual receptive field (RF) organization of ON- and OFF-dominated subregions. We show that spatial RF reconstructions depend strongly on grating class, suggesting that intracortical contributions to RF structure are strongest for Cartesian and polar gratings. Hyperbolic gratings tend to recruit least cortical elaboration such that the RF maps are similar to those generated by sparse noise, which most closely approximate feedforward inputs. Our findings complement previous literature in primates, rodents, and carnivores and highlight novel aspects of shape representation and coding occurring in mammalian early visual cortex. PMID:26843607

  16. Transgenic hybrid aspen trees with increased gibberellin (GA) concentrations suggest that GA acts in parallel with FLOWERING LOCUS T2 to control shoot elongation.

    PubMed

    Eriksson, Maria E; Hoffman, Daniel; Kaduk, Mateusz; Mauriat, Mélanie; Moritz, Thomas

    2015-02-01

    Bioactive gibberellins (GAs) have been implicated in short day (SD)-induced growth cessation in Populus, because exogenous applications of bioactive GAs to hybrid aspens (Populus tremula × tremuloides) under SD conditions delay growth cessation. However, this effect diminishes with time, suggesting that plants may cease growth following exposure to SDs due to a reduction in sensitivity to GAs. In order to validate and further explore the role of GAs in growth cessation, we perturbed GA biosynthesis or signalling in hybrid aspen plants by overexpressing AtGA20ox1, AtGA2ox2 and PttGID1.3 (encoding GA biosynthesis enzymes and a GA receptor). We found trees with elevated concentrations of bioactive GA, due to overexpression of AtGA20ox1, continued to grow in SD conditions and were insensitive to the level of FLOWERING LOCUS T2 (FT2) expression. As transgenic plants overexpressing the PttGID1.3 GA receptor responded in a wild-type (WT) manner to SD conditions, this insensitivity did not result from limited receptor availability. As high concentrations of bioactive GA during SD conditions were sufficient to sustain shoot elongation growth in hybrid aspen trees, independent of FT2 expression levels, we conclude elongation growth in trees is regulated by both GA- and long day-responsive pathways, similar to the regulation of flowering in Arabidopsis thaliana.

  17. Scioto: A Framework for Global-ViewTask Parallelism

    SciTech Connect

    Dinan, James S.; Krishnamoorthy, Sriram; Larkins, D. B.; Nieplocha, Jaroslaw; Sadayappan, Ponnuswamy

    2008-09-09

    We introduce Scioto, Shared Collections of Task Objects, a framework for supporting task-parallelism in one-sided and global-view parallel programming models. Scioto provides lightweight, locality aware dynamic load balancing and interoperates with existing parallel models including MPI, SHMEM, CAF, and Global Arrays. Through task parallelism, the Scioto framework provides a solution for overcoming load imbalance and heterogeneity as well as dynamic mapping of computation onto emerging multicore architectures. In this paper, we present the design and implementation of the Scioto framework and demonstrate its effectiveness on the Unbalanced Tree Search (UTS) benchmark and two quantum chemistry codes: the closed shell Self-Consistent Field (SCF) method and a sparse tensor contraction kernel extracted from a coupled cluster computation. We explore the efficiency and scalability of Scioto through these sample applications and demonstrate that is offers low overhead, achieves good performance on heterogeneous and multicore clusters, and scales to hundreds of processors.

  18. A fast parallel code for calculating energies and oscillator strengths of many-electron atoms at neutron star magnetic field strengths in adiabatic approximation

    NASA Astrophysics Data System (ADS)

    Engel, D.; Klews, M.; Wunner, G.

    2009-02-01

    We have developed a new method for the fast computation of wavelengths and oscillator strengths for medium-Z atoms and ions, up to iron, at neutron star magnetic field strengths. The method is a parallelized Hartree-Fock approach in adiabatic approximation based on finite-element and B-spline techniques. It turns out that typically 15-20 finite elements are sufficient to calculate energies to within a relative accuracy of 10-5 in 4 or 5 iteration steps using B-splines of 6th order, with parallelization speed-ups of 20 on a 26-processor machine. Results have been obtained for the energies of the ground states and excited levels and for the transition strengths of astrophysically relevant atoms and ions in the range Z=2…26 in different ionization stages. Catalogue identifier: AECC_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AECC_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3845 No. of bytes in distributed program, including test data, etc.: 27 989 Distribution format: tar.gz Programming language: MPI/Fortran 95 and Python Computer: Cluster of 1-26 HP Compaq dc5750 Operating system: Fedora 7 Has the code been vectorised or parallelized?: Yes RAM: 1 GByte Classification: 2.1 External routines: MPI/GFortran, LAPACK, PyLab/Matplotlib Nature of problem: Calculations of synthetic spectra [1] of strongly magnetized neutron stars are bedevilled by the lack of data for atoms in intense magnetic fields. While the behaviour of hydrogen and helium has been investigated in detail (see, e.g., [2]), complete and reliable data for heavier elements, in particular iron, are still missing. Since neutron stars are formed by the collapse of the iron cores of massive stars, it may be assumed that their atmospheres contain an iron plasma. Our objective is to fill the gap

  19. Parallel Total Energy

    2004-10-21

    This is a total energy electronic structure code using Local Density Approximation (LDA) of the density funtional theory. It uses the plane wave as the wave function basis set. It can sue both the norm conserving pseudopotentials and the ultra soft pseudopotentials. It can relax the atomic positions according to the total energy. It is a parallel code using MP1.

  20. Parallelized direct execution simulation of message-passing parallel programs

    NASA Technical Reports Server (NTRS)

    Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

    1994-01-01

    As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.

  1. Categorizing Ideas about Trees: A Tree of Trees

    PubMed Central

    Fisler, Marie; Lecointre, Guillaume

    2013-01-01

    The aim of this study is to explore whether matrices and MP trees used to produce systematic categories of organisms could be useful to produce categories of ideas in history of science. We study the history of the use of trees in systematics to represent the diversity of life from 1766 to 1991. We apply to those ideas a method inspired from coding homologous parts of organisms. We discretize conceptual parts of ideas, writings and drawings about trees contained in 41 main writings; we detect shared parts among authors and code them into a 91-characters matrix and use a tree representation to show who shares what with whom. In other words, we propose a hierarchical representation of the shared ideas about trees among authors: this produces a “tree of trees.” Then, we categorize schools of tree-representations. Classical schools like “cladists” and “pheneticists” are recovered but others are not: “gradists” are separated into two blocks, one of them being called here “grade theoreticians.” We propose new interesting categories like the “buffonian school,” the “metaphoricians,” and those using “strictly genealogical classifications.” We consider that networks are not useful to represent shared ideas at the present step of the study. A cladogram is made for showing who is sharing what with whom, but also heterobathmy and homoplasy of characters. The present cladogram is not modelling processes of transmission of ideas about trees, and here it is mostly used to test for proximity of ideas of the same age and for categorization. PMID:23950877

  2. A systolic array parallelizing compiler

    SciTech Connect

    Tseng, P.S. )

    1990-01-01

    This book presents a completely new approach to the problem of systolic array parallelizing compiler. It describes the AL parallelizing compiler for the Warp systolic array, the first working systolic array parallelizing compiler which can generate efficient parallel code for complete LINPACK routines. This book begins by analyzing the architectural strength of the Warp systolic array. It proposes a model for mapping programs onto the machine and introduces the notion of data relations for optimizing the program mapping. Also presented are successful applications of the AL compiler in matrix computation and image processing. A complete listing of the source program and compiler-generated parallel code are given to clarify the overall picture of the compiler. The book concludes that systolic array parallelizing compiler can produce efficient parallel code, almost identical to what the user would have written by hand.

  3. System performances of optical space code-division multiple-access-based fiber-optic two-dimensional parallel data link.

    PubMed

    Nakamura, M; Kitayama, K

    1998-05-10

    Optical space code-division multiple access is a scheme to multiplex and link data between two-dimensional processors such as smart pixels and spatial light modulators or arrays of optical sources like vertical-cavity surface-emitting lasers. We examine the multiplexing characteristics of optical space code-division multiple access by using optical orthogonal signature patterns. The probability density function of interference noise in interfering optical orthogonal signature patterns is calculated. The bit-error rate is derived from the result and plotted as a function of receiver threshold, code length, code weight, and number of users. Furthermore, we propose a prethresholding method to suppress the interference noise, and we experimentally verify that the method works effectively in improving system performance.

  4. The impact of collisionality, FLR, and parallel closure effects on instabilities in the tokamak pedestal: Numerical studies with the NIMROD code

    DOE PAGESBeta

    King, J. R.; Pankin, A. Y.; Kruger, S. E.; Snyder, P. B.

    2016-06-24

    The extended-MHD NIMROD code [C. R. Sovinec and J. R. King, J. Comput. Phys. 229, 5803 (2010)] is verified against the ideal-MHD ELITE code [H. R. Wilson et al., Phys. Plasmas 9, 1277 (2002)] on a diverted tokamak discharge. When the NIMROD model complexity is increased incrementally, resistive and first-order finite-Larmour radius effects are destabilizing and stabilizing, respectively. Lastly, the full result is compared to local analytic calculations which are found to overpredict both the resistive destabilization and drift stabilization in comparison to the NIMROD computations.

  5. The impact of collisionality, FLR, and parallel closure effects on instabilities in the tokamak pedestal: Numerical studies with the NIMROD code

    NASA Astrophysics Data System (ADS)

    King, J. R.; Pankin, A. Y.; Kruger, S. E.; Snyder, P. B.

    2016-06-01

    The extended-MHD NIMROD code [C. R. Sovinec and J. R. King, J. Comput. Phys. 229, 5803 (2010)] is verified against the ideal-MHD ELITE code [H. R. Wilson et al., Phys. Plasmas 9, 1277 (2002)] on a diverted tokamak discharge. When the NIMROD model complexity is increased incrementally, resistive and first-order finite-Larmour radius effects are destabilizing and stabilizing, respectively. The full result is compared to local analytic calculations which are found to overpredict both the resistive destabilization and drift stabilization in comparison to the NIMROD computations.

  6. Parallel computers

    SciTech Connect

    Treveaven, P.

    1989-01-01

    This book presents an introduction to object-oriented, functional, and logic parallel computing on which the fifth generation of computer systems will be based. Coverage includes concepts for parallel computing languages, a parallel object-oriented system (DOOM) and its language (POOL), an object-oriented multilevel VLSI simulator using POOL, and implementation of lazy functional languages on parallel architectures.

  7. GSHR-Tree: a spatial index tree based on dynamic spatial slot and hash table in grid environments

    NASA Astrophysics Data System (ADS)

    Chen, Zhanlong; Wu, Xin-cai; Wu, Liang

    2008-12-01

    distributed operation, reduplication operation transfer operation of spatial index in the grid environment. The design of GSHR-Tree has ensured the performance of the load balance in the parallel computation. This tree structure is fit for the parallel process of the spatial information in the distributed network environments. Instead of spatial object's recursive comparison where original R tree has been used, the algorithm builds the spatial index by applying binary code operation in which computer runs more efficiently, and extended dynamic hash code for bit comparison. In GSHR-Tree, a new server is assigned to the network whenever a split of a full node is required. We describe a more flexible allocation protocol which copes with a temporary shortage of storage resources. It uses a distributed balanced binary spatial tree that scales with insertions to potentially any number of storage servers through splits of the overloaded ones. The application manipulates the GSHR-Tree structure from a node in the grid environment. The node addresses the tree through its image that the splits can make outdated. This may generate addressing errors, solved by the forwarding among the servers. In this paper, a spatial index data distribution algorithm that limits the number of servers has been proposed. We improve the storage utilization at the cost of additional messages. The structure of GSHR-Tree is believed that the scheme of this grid spatial index should fit the needs of new applications using endlessly larger sets of spatial data. Our proposal constitutes a flexible storage allocation method for a distributed spatial index. The insertion policy can be tuned dynamically to cope with periods of storage shortage. In such cases storage balancing should be favored for better space utilization, at the price of extra message exchanges between servers. This structure makes a compromise in the updating of the duplicated index and the transformation of the spatial index data. Meeting the

  8. Bilingual parallel programming

    SciTech Connect

    Foster, I.; Overbeek, R.

    1990-01-01

    Numerous experiments have demonstrated that computationally intensive algorithms support adequate parallelism to exploit the potential of large parallel machines. Yet successful parallel implementations of serious applications are rare. The limiting factor is clearly programming technology. None of the approaches to parallel programming that have been proposed to date -- whether parallelizing compilers, language extensions, or new concurrent languages -- seem to adequately address the central problems of portability, expressiveness, efficiency, and compatibility with existing software. In this paper, we advocate an alternative approach to parallel programming based on what we call bilingual programming. We present evidence that this approach provides and effective solution to parallel programming problems. The key idea in bilingual programming is to construct the upper levels of applications in a high-level language while coding selected low-level components in low-level languages. This approach permits the advantages of a high-level notation (expressiveness, elegance, conciseness) to be obtained without the cost in performance normally associated with high-level approaches. In addition, it provides a natural framework for reusing existing code.

  9. Support for Debugging Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Hood, Robert; Jost, Gabriele

    2001-01-01

    This viewgraph presentation provides information on support sources available for the automatic parallelization of computer program. CAPTools, a support tool developed at the University of Greenwich, transforms, with user guidance, existing sequential Fortran code into parallel message passing code. Comparison routines are then run for debugging purposes, in essence, ensuring that the code transformation was accurate.

  10. Flood predictions using the parallel version of distributed numerical physical rainfall-runoff model TOPKAPI

    NASA Astrophysics Data System (ADS)

    Boyko, Oleksiy; Zheleznyak, Mark

    2015-04-01

    The original numerical code TOPKAPI-IMMS of the distributed rainfall-runoff model TOPKAPI ( Todini et al, 1996-2014) is developed and implemented in Ukraine. The parallel version of the code has been developed recently to be used on multiprocessors systems - multicore/processors PC and clusters. Algorithm is based on binary-tree decomposition of the watershed for the balancing of the amount of computation for all processors/cores. Message passing interface (MPI) protocol is used as a parallel computing framework. The numerical efficiency of the parallelization algorithms is demonstrated for the case studies for the flood predictions of the mountain watersheds of the Ukrainian Carpathian regions. The modeling results is compared with the predictions based on the lumped parameters models.

  11. Force user's manual: A portable, parallel FORTRAN

    NASA Technical Reports Server (NTRS)

    Jordan, Harry F.; Benten, Muhammad S.; Arenstorf, Norbert S.; Ramanan, Aruna V.

    1990-01-01

    The use of Force, a parallel, portable FORTRAN on shared memory parallel computers is described. Force simplifies writing code for parallel computers and, once the parallel code is written, it is easily ported to computers on which Force is installed. Although Force is nearly the same for all computers, specific details are included for the Cray-2, Cray-YMP, Convex 220, Flex/32, Encore, Sequent, Alliant computers on which it is installed.

  12. Parallel, grid-adaptive approaches for relativistic hydro and magnetohydrodynamics

    NASA Astrophysics Data System (ADS)

    Keppens, R.; Meliani, Z.; van Marle, A. J.; Delmont, P.; Vlasis, A.; van der Holst, B.

    2012-02-01

    Relativistic hydro and magnetohydrodynamics provide continuum fluid descriptions for gas and plasma dynamics throughout the visible universe. We present an overview of state-of-the-art modeling in special relativistic regimes, targeting strong shock-dominated flows with speeds approaching the speed of light. Significant progress in its numerical modeling emerged in the last two decades, and we highlight specifically the need for grid-adaptive, shock-capturing treatments found in several contemporary codes in active use and development. Our discussion highlights one such code, MPI-AMRVAC (Message-Passing Interface-Adaptive Mesh Refinement Versatile Advection Code), but includes generic strategies for allowing massively parallel, block-tree adaptive simulations in any dimensionality. We provide implementation details reflecting the underlying data structures as used in MPI-AMRVAC. Parallelization strategies and scaling efficiencies are discussed for representative applications, along with guidelines for data formats suitable for parallel I/O. Refinement strategies available in MPI-AMRVAC are presented, which cover error estimators in use in many modern AMR frameworks. A test suite for relativistic hydro and magnetohydrodynamics is provided, chosen to cover all aspects encountered in high-resolution, shock-governed astrophysical applications. This test suite provides ample examples highlighting the advantages of AMR in relativistic flow problems.

  13. Massively parallel sequencing of the entire control region and targeted coding region SNPs of degraded mtDNA using a simplified library preparation method.

    PubMed

    Lee, Eun Young; Lee, Hwan Young; Oh, Se Yoon; Jung, Sang-Eun; Yang, In Seok; Lee, Yang-Han; Yang, Woo Ick; Shin, Kyoung-Jin

    2016-05-01

    The application of next-generation sequencing (NGS) to forensic genetics is being explored by an increasing number of laboratories because of the potential of high-throughput sequencing for recovering genetic information from multiple markers and multiple individuals in a single run. A cumbersome and technically challenging library construction process is required for NGS. In this study, we propose a simplified library preparation method for mitochondrial DNA (mtDNA) analysis that involves two rounds of PCR amplification. In the first-round of multiplex PCR, six fragments covering the entire mtDNA control region and 22 fragments covering interspersed single nucleotide polymorphisms (SNPs) in the coding region that can be used to determine global haplogroups and East Asian haplogroups were amplified using template-specific primers with read sequences. In the following step, indices and platform-specific sequences for the MiSeq(®) system (Illumina) were added by PCR. The barcoded library produced using this simplified workflow was successfully sequenced on the MiSeq system using the MiSeq Reagent Nano Kit v2. A total of 0.4 GB of sequences, 80.6% with base quality of >Q30, were obtained from 12 degraded DNA samples and mapped to the revised Cambridge Reference Sequence (rCRS). A relatively even read count was obtained for all amplicons, with an average coverage of 5200 × and a less than three-fold read count difference between amplicons per sample. Control region sequences were successfully determined, and all samples were assigned to the relevant haplogroups. In addition, enhanced discrimination was observed by adding coding region SNPs to the control region in in silico analysis. Because the developed multiplex PCR system amplifies small-sized amplicons (<250 bp), NGS analysis using the library preparation method described here allows mtDNA analysis using highly degraded DNA samples. PMID:26844917

  14. Vine—A Numerical Code for Simulating Astrophysical Systems Using Particles. II. Implementation and Performance Characteristics

    NASA Astrophysics Data System (ADS)

    Nelson, Andrew F.; Wetzstein, M.; Naab, T.

    2009-10-01

    We continue our presentation of VINE. In this paper, we begin with a description of relevant architectural properties of the serial and shared memory parallel computers on which VINE is intended to run, and describe their influences on the design of the code itself. We continue with a detailed description of a number of optimizations made to the layout of the particle data in memory and to our implementation of a binary tree used to access that data for use in gravitational force calculations and searches for smoothed particle hydrodynamics (SPH) neighbor particles. We describe the modifications to the code necessary to obtain forces efficiently from special purpose "GRAPE" hardware, the interfaces required to allow transparent substitution of those forces in the code instead of those obtained from the tree, and the modifications necessary to use both tree and GRAPE together as a fused GRAPE/tree combination. We conclude with an extensive series of performance tests, which demonstrate that the code can be run efficiently and without modification in serial on small workstations or in parallel using the OpenMP compiler directives on large-scale, shared memory parallel machines. We analyze the effects of the code optimizations and estimate that they improve its overall performance by more than an order of magnitude over that obtained by many other tree codes. Scaled parallel performance of the gravity and SPH calculations, together the most costly components of most simulations, is nearly linear up to at least 120 processors on moderate sized test problems using the Origin 3000 architecture, and to the maximum machine sizes available to us on several other architectures. At similar accuracy, performance of VINE, used in GRAPE-tree mode, is approximately a factor 2 slower than that of VINE, used in host-only mode. Further optimizations of the GRAPE/host communications could improve the speed by as much as a factor of 3, but have not yet been implemented in VINE

  15. VINE-A NUMERICAL CODE FOR SIMULATING ASTROPHYSICAL SYSTEMS USING PARTICLES. II. IMPLEMENTATION AND PERFORMANCE CHARACTERISTICS

    SciTech Connect

    Nelson, Andrew F.; Wetzstein, M.; Naab, T.

    2009-10-01

    We continue our presentation of VINE. In this paper, we begin with a description of relevant architectural properties of the serial and shared memory parallel computers on which VINE is intended to run, and describe their influences on the design of the code itself. We continue with a detailed description of a number of optimizations made to the layout of the particle data in memory and to our implementation of a binary tree used to access that data for use in gravitational force calculations and searches for smoothed particle hydrodynamics (SPH) neighbor particles. We describe the modifications to the code necessary to obtain forces efficiently from special purpose 'GRAPE' hardware, the interfaces required to allow transparent substitution of those forces in the code instead of those obtained from the tree, and the modifications necessary to use both tree and GRAPE together as a fused GRAPE/tree combination. We conclude with an extensive series of performance tests, which demonstrate that the code can be run efficiently and without modification in serial on small workstations or in parallel using the OpenMP compiler directives on large-scale, shared memory parallel machines. We analyze the effects of the code optimizations and estimate that they improve its overall performance by more than an order of magnitude over that obtained by many other tree codes. Scaled parallel performance of the gravity and SPH calculations, together the most costly components of most simulations, is nearly linear up to at least 120 processors on moderate sized test problems using the Origin 3000 architecture, and to the maximum machine sizes available to us on several other architectures. At similar accuracy, performance of VINE, used in GRAPE-tree mode, is approximately a factor 2 slower than that of VINE, used in host-only mode. Further optimizations of the GRAPE/host communications could improve the speed by as much as a factor of 3, but have not yet been implemented in VINE

  16. Fast Inverse Distance Weighting-Based Spatiotemporal Interpolation: A Web-Based Application of Interpolating Daily Fine Particulate Matter PM2.5 in the Contiguous U.S. Using Parallel Programming and k-d Tree

    PubMed Central

    Li, Lixin; Losser, Travis; Yorke, Charles; Piltner, Reinhard

    2014-01-01

    Epidemiological studies have identified associations between mortality and changes in concentration of particulate matter. These studies have highlighted the public concerns about health effects of particulate air pollution. Modeling fine particulate matter PM2.5 exposure risk and monitoring day-to-day changes in PM2.5 concentration is a critical step for understanding the pollution problem and embarking on the necessary remedy. This research designs, implements and compares two inverse distance weighting (IDW)-based spatiotemporal interpolation methods, in order to assess the trend of daily PM2.5 concentration for the contiguous United States over the year of 2009, at both the census block group level and county level. Traditionally, when handling spatiotemporal interpolation, researchers tend to treat space and time separately and reduce the spatiotemporal interpolation problems to a sequence of snapshots of spatial interpolations. In this paper, PM2.5 data interpolation is conducted in the continuous space-time domain by integrating space and time simultaneously, using the so-called extension approach. Time values are calculated with the help of a factor under the assumption that spatial and temporal dimensions are equally important when interpolating a continuous changing phenomenon in the space-time domain. Various IDW-based spatiotemporal interpolation methods with different parameter configurations are evaluated by cross-validation. In addition, this study explores computational issues (computer processing speed) faced during implementation of spatiotemporal interpolation for huge data sets. Parallel programming techniques and an advanced data structure, named k-d tree, are adapted in this paper to address the computational challenges. Significant computational improvement has been achieved. Finally, a web-based spatiotemporal IDW-based interpolation application is designed and implemented where users can visualize and animate spatiotemporal interpolation

  17. Parallel programming with PCN

    SciTech Connect

    Foster, I.; Tuecke, S.

    1991-12-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

  18. Legacy Code Modernization

    NASA Technical Reports Server (NTRS)

    Hribar, Michelle R.; Frumkin, Michael; Jin, Haoqiang; Waheed, Abdul; Yan, Jerry; Saini, Subhash (Technical Monitor)

    1998-01-01

    Over the past decade, high performance computing has evolved rapidly; systems based on commodity microprocessors have been introduced in quick succession from at least seven vendors/families. Porting codes to every new architecture is a difficult problem; in particular, here at NASA, there are many large CFD applications that are very costly to port to new machines by hand. The LCM ("Legacy Code Modernization") Project is the development of an integrated parallelization environment (IPE) which performs the automated mapping of legacy CFD (Fortran) applications to state-of-the-art high performance computers. While most projects to port codes focus on the parallelization of the code, we consider porting to be an iterative process consisting of several steps: 1) code cleanup, 2) serial optimization,3) parallelization, 4) performance monitoring and visualization, 5) intelligent tools for automated tuning using performance prediction and 6) machine specific optimization. The approach for building this parallelization environment is to build the components for each of the steps simultaneously and then integrate them together. The demonstration will exhibit our latest research in building this environment: 1. Parallelizing tools and compiler evaluation. 2. Code cleanup and serial optimization using automated scripts 3. Development of a code generator for performance prediction 4. Automated partitioning 5. Automated insertion of directives. These demonstrations will exhibit the effectiveness of an automated approach for all the steps involved with porting and tuning a legacy code application for a new architecture.

  19. Simple, parallel virtual machines for extreme computations

    NASA Astrophysics Data System (ADS)

    Chokoufe Nejad, Bijan; Ohl, Thorsten; Reuter, Jürgen

    2015-11-01

    We introduce a virtual machine (VM) written in a numerically fast language like Fortran or C for evaluating very large expressions. We discuss the general concept of how to perform computations in terms of a VM and present specifically a VM that is able to compute tree-level cross sections for any number of external legs, given the corresponding byte-code from the optimal matrix element generator, O'MEGA. Furthermore, this approach allows to formulate the parallel computation of a single phase space point in a simple and obvious way. We analyze hereby the scaling behavior with multiple threads as well as the benefits and drawbacks that are introduced with this method. Our implementation of a VM can run faster than the corresponding native, compiled code for certain processes and compilers, especially for very high multiplicities, and has in general runtimes in the same order of magnitude. By avoiding the tedious compile and link steps, which may fail for source code files of gigabyte sizes, new processes or complex higher order corrections that are currently out of reach could be evaluated with a VM given enough computing power.

  20. FLY: MPI-2 High Resolution code for LSS Cosmological Simulations

    NASA Astrophysics Data System (ADS)

    Becciani, U.; Antonuccio, V.; Comparato, M.

    2010-11-01

    Cosmological simulations of structures and galaxies formations have played a fundamental role in the study of the origin, formation and evolution of the Universe. These studies improved enormously with the use of supercomputers and parallel systems and, recently, grid based systems and Linux clusters. Now we present the new version of the tree N-body parallel code FLY that runs on a PC Linux Cluster using the one side communication paradigm MPI-2 and we show the performances obtained. FLY is included in the Computer Physics Communication Program Library. This new version was developed using the Linux Cluster of CINECA, an IBM Cluster with 1024 Intel Xeon Pentium IV 3.0 Ghz. The results show that it is possible to run a 64 Million particle simulation in less than 15 minutes for each timestep, and the code scalability with the number of processors is achieved. This lead us to propose FLY as a code to run very large N-Body simulations with more than 10(9) particles with the higher resolution of a pure tree code.

  1. FLY: MPI-2 high resolution code for LSS cosmological simulations

    NASA Astrophysics Data System (ADS)

    Becciani, U.; Antonuccio-Delogu, V.; Comparato, M.

    2007-02-01

    Cosmological simulations of structures and galaxies formations have played a fundamental role in the study of the origin, formation and evolution of the Universe. These studies improved enormously with the use of supercomputers and parallel systems and, recently, grid based systems and Linux clusters. Now we present the new version of the tree N-body parallel code FLY that runs on a PC Linux Cluster using the one side communication paradigm MPI-2 and we show the performances obtained. FLY is included in the Computer Physics Communication Program Library. This new version was developed using the Linux Cluster of CINECA, an IBM Cluster with 1024 Intel Xeon Pentium IV 3.0 GHz. The results show that it is possible to run a 64 million particle simulation in less than 15 minutes for each time-step, and the code scalability with the number of processors is achieved. This leads us to propose FLY as a code to run very large N-body simulations with more than 109 particles with the higher resolution of a pure tree code. The FLY new version is available at the CPC Program Library, http://cpc.cs.qub.ac.uk/summaries/ADSC_v2_0.html [U. Becciani, M. Comparato, V. Antonuccio-Delogu, Comput Phys. Comm. 174 (2006) 605].

  2. Parallel rendering

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas W.

    1995-01-01

    This article provides a broad introduction to the subject of parallel rendering, encompassing both hardware and software systems. The focus is on the underlying concepts and the issues which arise in the design of parallel rendering algorithms and systems. We examine the different types of parallelism and how they can be applied in rendering applications. Concepts from parallel computing, such as data decomposition, task granularity, scalability, and load balancing, are considered in relation to the rendering problem. We also explore concepts from computer graphics, such as coherence and projection, which have a significant impact on the structure of parallel rendering algorithms. Our survey covers a number of practical considerations as well, including the choice of architectural platform, communication and memory requirements, and the problem of image assembly and display. We illustrate the discussion with numerous examples from the parallel rendering literature, representing most of the principal rendering methods currently used in computer graphics.

  3. Tree Lifecycle.

    ERIC Educational Resources Information Center

    Nature Study, 1998

    1998-01-01

    Presents a Project Learning Tree (PLT) activity that has students investigate and compare the lifecycle of a tree to other living things and the tree's role in the ecosystem. Includes background material as well as step-by-step instructions, variation and enrichment ideas, assessment opportunities, and student worksheets. (SJR)

  4. Electrical Circuit Simulation Code

    SciTech Connect

    Wix, Steven D.; Waters, Arlon J.; Shirley, David

    2001-08-09

    Massively-Parallel Electrical Circuit Simulation Code. CHILESPICE is a massively-arallel distributed-memory electrical circuit simulation tool that contains many enhanced radiation, time-based, and thermal features and models. Large scale electronic circuit simulation. Shared memory, parallel processing, enhance convergence. Sandia specific device models.

  5. Parallel programming with Ada

    SciTech Connect

    Kok, J.

    1988-01-01

    To the human programmer the ease of coding distributed computing is highly dependent on the suitability of the employed programming language. But with a particular language it is also important whether the possibilities of one or more parallel architectures can efficiently be addressed by available language constructs. In this paper the possibilities are discussed of the high-level language Ada and in particular of its tasking concept as a descriptional tool for the design and implementation of numerical and other algorithms that allow execution of parts in parallel. Language tools are explained and their use for common applications is shown. Conclusions are drawn about the usefulness of several Ada concepts.

  6. Parallelizing OVERFLOW: Experiences, Lessons, Results

    NASA Technical Reports Server (NTRS)

    Jespersen, Dennis C.

    1999-01-01

    The computer code OVERFLOW is widely used in the aerodynamic community for the numerical solution of the Navier-Stokes equations. Current trends in computer systems and architectures are toward multiple processors and parallelism, including distributed memory. This report describes work that has been carried out by the author and others at Ames Research Center with the goal of parallelizing OVERFLOW using a variety of parallel architectures and parallelization strategies. This paper begins with a brief description of the OVERFLOW code. This description includes the basic numerical algorithm and some software engineering considerations. Next comes a description of a parallel version of OVERFLOW, OVERFLOW/PVM, using PVM (Parallel Virtual Machine). This parallel version of OVERFLOW uses the manager/worker style and is part of the standard OVERFLOW distribution. Then comes a description of a parallel version of OVERFLOW, OVERFLOW/MPI, using MPI (Message Passing Interface). This parallel version of OVERFLOW uses the SPMD (Single Program Multiple Data) style. Finally comes a discussion of alternatives to explicit message-passing in the context of parallelizing OVERFLOW.

  7. The language parallel Pascal and other aspects of the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.; Bruner, J. D.

    1982-01-01

    A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.

  8. Parallel computation with the force

    NASA Technical Reports Server (NTRS)

    Jordan, H. F.

    1985-01-01

    A methodology, called the force, supports the construction of programs to be executed in parallel by a force of processes. The number of processes in the force is unspecified, but potentially very large. The force idea is embodied in a set of macros which produce multiproceossor FORTRAN code and has been studied on two shared memory multiprocessors of fairly different character. The method has simplified the writing of highly parallel programs within a limited class of parallel algorithms and is being extended to cover a broader class. The individual parallel constructs which comprise the force methodology are discussed. Of central concern are their semantics, implementation on different architectures and performance implications.

  9. On finding minimum-diameter clique trees

    SciTech Connect

    Blair, J.R.S. . Dept. of Computer Science); Peyton, B.W. )

    1991-08-01

    It is well-known that any chordal graph can be represented as a clique tree (acyclic hypergraph, join tree). Since some chordal graphs have many distinct clique tree representations, it is interesting to consider which one is most desirable under various circumstances. A clique tree of minimum diameter (or height) is sometimes a natural candidate when choosing clique trees to be processed in a parallel computing environment. This paper introduces a linear time algorithm for computing a minimum-diameter clique tree. The new algorithm is an analogue of the natural greedy algorithm for rooting an ordinary tree in order to minimize its height. It has potential application in the development of parallel algorithms for both knowledge-based systems and the solution of sparse linear systems of equations. 31 refs., 7 figs.

  10. Massively parallel visualization: Parallel rendering

    SciTech Connect

    Hansen, C.D.; Krogh, M.; White, W.

    1995-12-01

    This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume renderer use a MIMD approach. Implementations for these algorithms are presented for the Thinking Machines Corporation CM-5 MPP.

  11. The fault-tree compiler

    NASA Technical Reports Server (NTRS)

    Martensen, Anna L.; Butler, Ricky W.

    1987-01-01

    The Fault Tree Compiler Program is a new reliability tool used to predict the top event probability for a fault tree. Five different gate types are allowed in the fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N gates. The high level input language is easy to understand and use when describing the system tree. In addition, the use of the hierarchical fault tree capability can simplify the tree description and decrease program execution time. The current solution technique provides an answer precise (within the limits of double precision floating point arithmetic) to the five digits in the answer. The user may vary one failure rate or failure probability over a range of values and plot the results for sensitivity analyses. The solution technique is implemented in FORTRAN; the remaining program code is implemented in Pascal. The program is written to run on a Digital Corporation VAX with the VMS operation system.

  12. Implementation and performance of parallelized elegant.

    SciTech Connect

    Wang, Y.; Borland, M.; Accelerator Systems Division

    2008-01-01

    The program elegant is widely used for design and modeling of linacs for free-electron lasers and energy recovery linacs, as well as storage rings and other applications. As part of a multi-year effort, we have parallelized many aspects of the code, including single-particle dynamics, wakefields, and coherent synchrotron radiation. We report on the approach used for gradual parallelization, which proved very beneficial in getting parallel features into the hands of users quickly. We also report details of parallelization of collective effects. Finally, we discuss performance of the parallelized code in various applications.

  13. Distributed Merge Trees

    SciTech Connect

    Morozov, Dmitriy; Weber, Gunther

    2013-01-08

    Improved simulations and sensors are producing datasets whose increasing complexity exhausts our ability to visualize and comprehend them directly. To cope with this problem, we can detect and extract significant features in the data and use them as the basis for subsequent analysis. Topological methods are valuable in this context because they provide robust and general feature definitions. As the growth of serial computational power has stalled, data analysis is becoming increasingly dependent on massively parallel machines. To satisfy the computational demand created by complex datasets, algorithms need to effectively utilize these computer architectures. The main strength of topological methods, their emphasis on global information, turns into an obstacle during parallelization. We present two approaches to alleviate this problem. We develop a distributed representation of the merge tree that avoids computing the global tree on a single processor and lets us parallelize subsequent queries. To account for the increasing number of cores per processor, we develop a new data structure that lets us take advantage of multiple shared-memory cores to parallelize the work on a single node. Finally, we present experiments that illustrate the strengths of our approach as well as help identify future challenges.

  14. Parallel Processing of a Groundwater Contaminant Code

    SciTech Connect

    Arnett, Ronald Chester; Greenwade, Lance Eric

    2000-05-01

    The U. S. Department of Energy’s Idaho National Engineering and Environmental Laboratory (INEEL) is conducting a field test of experimental enhanced bioremediation of trichoroethylene (TCE) contaminated groundwater. TCE is a chlorinated organic substance that was used as a solvent in the early years of the INEEL and disposed in some cases to the aquifer. There is an effort underway to enhance the natural bioremediation of TCE by adding a non-toxic substance that serves as a feed material for the bacteria that can biologically degrade the TCE.

  15. Tree Amigos.

    ERIC Educational Resources Information Center

    Center for Environmental Study, Grand Rapids, MI.

    Tree Amigos is a special cross-cultural program that uses trees as a common bond to bring the people of the Americas together in unique partnerships to preserve and protect the shared global environment. It is a tangible program that embodies the philosophy that individuals, acting together, can make a difference. This resource book contains…

  16. Talking Trees

    ERIC Educational Resources Information Center

    Tolman, Marvin

    2005-01-01

    Students love outdoor activities and will love them even more when they build confidence in their tree identification and measurement skills. Through these activities, students will learn to identify the major characteristics of trees and discover how the pace--a nonstandard measuring unit--can be used to estimate not only distances but also the…

  17. Parallel pipelining

    SciTech Connect

    Joseph, D.D.; Bai, R.; Liao, T.Y.; Huang, A.; Hu, H.H.

    1995-09-01

    In this paper the authors introduce the idea of parallel pipelining for water lubricated transportation of oil (or other viscous material). A parallel system can have major advantages over a single pipe with respect to the cost of maintenance and continuous operation of the system, to the pressure gradients required to restart a stopped system and to the reduction and even elimination of the fouling of pipe walls in continuous operation. The authors show that the action of capillarity in small pipes is more favorable for restart than in large pipes. In a parallel pipeline system, they estimate the number of small pipes needed to deliver the same oil flux as in one larger pipe as N = (R/r){sup {alpha}}, where r and R are the radii of the small and large pipes, respectively, and {alpha} = 4 or 19/7 when the lubricating water flow is laminar or turbulent.

  18. Modified mesh-connected parallel computers

    SciTech Connect

    Carlson, D.A. )

    1988-10-01

    The mesh-connected parallel computer is an important parallel processing organization that has been used in the past for the design of supercomputing systems. In this paper, the authors explore modifications of a mesh-connected parallel computer for the purpose of increasing the efficiency of executing important application programs. These modifications are made by adding one or more global mesh structures to the processing array. They show how our modifications allow asymptotic improvements in the efficiency of executing computations having low to medium interprocessor communication requirements (e.g., tree computations, prefix computations, finding the connected components of a graph). For computations with high interprocessor communication requirements such as sorting, they show that they offer no speedup. They also compare the modified mesh-connected parallel computer to other similar organizations including the pyramid, the X-tree, and the mesh-of-trees.

  19. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO (CAPtools (Computer Aided Parallelization Toolkit) OpenMP) parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report some results for several benchmark codes and one full application that have been parallelized using our system.

  20. Parallel algorithms for message decomposition

    SciTech Connect

    Teng, S.H.; Wang, B.

    1987-06-01

    The authors consider the deterministic and random parallel complexity (time and processor) of message decoding: an essential problem in communications systems and translation systems. They present an optimal parallel algorithm to decompose prefix-coded messages and uniquely decipherable-coded messages in O(n/P) time, using O(P) processors (for all P:1 less than or equal toPless than or equal ton/log n) deterministically as well as randomly on the weakest version of parallel random access machines in which concurrent read and concurrent write to a cell in the common memory are not allowed. This is done by reducing decoding to parallel finite-state automata simulation and the prefix sums.

  1. Multicast Reduction Network Source Code

    SciTech Connect

    Lee, G.

    2006-12-19

    MRNet is a software tree-based overlay network developed at the University of Wisconsin, Madison that provides a scalable communication mechanism for parallel tools. MRNet, uses a tree topology of networked processes between a user tool and distributed tool daemons. This tree topology allows scalable multicast communication from the tool to the daemons. The internal nodes of the tree can be used to distribute computation and alalysis on data sent from the tool daemons to the tool. This release covers minor implementation to port this software to the BlueGene/L architecuture and for use with a new implementation of the Dynamic Probe Class Library.

  2. Parallelized modelling and solution scheme for hierarchically scaled simulations

    NASA Technical Reports Server (NTRS)

    Padovan, Joe

    1995-01-01

    This two-part paper presents the results of a benchmarked analytical-numerical investigation into the operational characteristics of a unified parallel processing strategy for implicit fluid mechanics formulations. This hierarchical poly tree (HPT) strategy is based on multilevel substructural decomposition. The Tree morphology is chosen to minimize memory, communications and computational effort. The methodology is general enough to apply to existing finite difference (FD), finite element (FEM), finite volume (FV) or spectral element (SE) based computer programs without an extensive rewrite of code. In addition to finding large reductions in memory, communications, and computational effort associated with a parallel computing environment, substantial reductions are generated in the sequential mode of application. Such improvements grow with increasing problem size. Along with a theoretical development of general 2-D and 3-D HPT, several techniques for expanding the problem size that the current generation of computers are capable of solving, are presented and discussed. Among these techniques are several interpolative reduction methods. It was found that by combining several of these techniques that a relatively small interpolative reduction resulted in substantial performance gains. Several other unique features/benefits are discussed in this paper. Along with Part 1's theoretical development, Part 2 presents a numerical approach to the HPT along with four prototype CFD applications. These demonstrate the potential of the HPT strategy.

  3. Parallel Power Grid Simulation Toolkit

    SciTech Connect

    Smith, Steve; Kelley, Brian; Banks, Lawrence; Top, Philip; Woodward, Carol

    2015-09-14

    ParGrid is a 'wrapper' that integrates a coupled Power Grid Simulation toolkit consisting of a library to manage the synchronization and communication of independent simulations. The included library code in ParGid, named FSKIT, is intended to support the coupling multiple continuous and discrete even parallel simulations. The code is designed using modern object oriented C++ methods utilizing C++11 and current Boost libraries to ensure compatibility with multiple operating systems and environments.

  4. Parallel contingency statistics with Titan.

    SciTech Connect

    Thompson, David C.; Pebay, Philippe Pierre

    2009-09-01

    This report summarizes existing statistical engines in VTK/Titan and presents the recently parallelized contingency statistics engine. It is a sequel to [PT08] and [BPRT09] which studied the parallel descriptive, correlative, multi-correlative, and principal component analysis engines. The ease of use of this new parallel engines is illustrated by the means of C++ code snippets. Furthermore, this report justifies the design of these engines with parallel scalability in mind; however, the very nature of contingency tables prevent this new engine from exhibiting optimal parallel speed-up as the aforementioned engines do. This report therefore discusses the design trade-offs we made and study performance with up to 200 processors.

  5. Parallel programming with PCN

    SciTech Connect

    Foster, I.; Tuecke, S.

    1993-01-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and Cthat allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. It includes both tutorial and reference material. It also presents the basic concepts that underlie PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous ftp from Argonne National Laboratory in the directory pub/pcn at info.mcs. ani.gov (cf. Appendix A). This version of this document describes PCN version 2.0, a major revision of the PCN programming system. It supersedes earlier versions of this report.

  6. Multitasking TORT under UNICOS: Parallel performance models and measurements

    SciTech Connect

    Barnett, A.; Azmy, Y.Y.

    1999-09-27

    The existing parallel algorithms in the TORT discrete ordinates code were updated to function in a UNICOS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.

  7. Parallel adaptive wavelet collocation method for PDEs

    SciTech Connect

    Nejadmalayeri, Alireza; Vezolainen, Alexei; Brown-Dymkoski, Eric; Vasilyev, Oleg V.

    2015-10-01

    A parallel adaptive wavelet collocation method for solving a large class of Partial Differential Equations is presented. The parallelization is achieved by developing an asynchronous parallel wavelet transform, which allows one to perform parallel wavelet transform and derivative calculations with only one data synchronization at the highest level of resolution. The data are stored using tree-like structure with tree roots starting at a priori defined level of resolution. Both static and dynamic domain partitioning approaches are developed. For the dynamic domain partitioning, trees are considered to be the minimum quanta of data to be migrated between the processes. This allows fully automated and efficient handling of non-simply connected partitioning of a computational domain. Dynamic load balancing is achieved via domain repartitioning during the grid adaptation step and reassigning trees to the appropriate processes to ensure approximately the same number of grid points on each process. The parallel efficiency of the approach is discussed based on parallel adaptive wavelet-based Coherent Vortex Simulations of homogeneous turbulence with linear forcing at effective non-adaptive resolutions up to 2048{sup 3} using as many as 2048 CPU cores.

  8. Parallel Computing in SCALE

    SciTech Connect

    DeHart, Mark D; Williams, Mark L; Bowman, Stephen M

    2010-01-01

    The SCALE computational architecture has remained basically the same since its inception 30 years ago, although constituent modules and capabilities have changed significantly. This SCALE concept was intended to provide a framework whereby independent codes can be linked to provide a more comprehensive capability than possible with the individual programs - allowing flexibility to address a wide variety of applications. However, the current system was designed originally for mainframe computers with a single CPU and with significantly less memory than today's personal computers. It has been recognized that the present SCALE computation system could be restructured to take advantage of modern hardware and software capabilities, while retaining many of the modular features of the present system. Preliminary work is being done to define specifications and capabilities for a more advanced computational architecture. This paper describes the state of current SCALE development activities and plans for future development. With the release of SCALE 6.1 in 2010, a new phase of evolutionary development will be available to SCALE users within the TRITON and NEWT modules. The SCALE (Standardized Computer Analyses for Licensing Evaluation) code system developed by Oak Ridge National Laboratory (ORNL) provides a comprehensive and integrated package of codes and nuclear data for a wide range of applications in criticality safety, reactor physics, shielding, isotopic depletion and decay, and sensitivity/uncertainty (S/U) analysis. Over the last three years, since the release of version 5.1 in 2006, several important new codes have been introduced within SCALE, and significant advances applied to existing codes. Many of these new features became available with the release of SCALE 6.0 in early 2009. However, beginning with SCALE 6.1, a first generation of parallel computing is being introduced. In addition to near-term improvements, a plan for longer term SCALE enhancement

  9. Fully Parallel MHD Stability Analysis Tool

    NASA Astrophysics Data System (ADS)

    Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

    2014-10-01

    Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Initial results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.

  10. Fully Parallel MHD Stability Analysis Tool

    NASA Astrophysics Data System (ADS)

    Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

    2013-10-01

    Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Preliminary results of the code parallelization will be reported. Work is supported by the U.S. DOE SBIR program.

  11. Fully Parallel MHD Stability Analysis Tool

    NASA Astrophysics Data System (ADS)

    Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

    2015-11-01

    Progress on full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. It is a powerful tool for studying MHD and MHD-kinetic instabilities and it is widely used by fusion community. Parallel version of MARS is intended for simulations on local parallel clusters. It will be an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, already implemented in MARS. Parallelization of the code includes parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the present MARS algorithm using parallel libraries and procedures. Results of MARS parallelization and of the development of a new fix boundary equilibrium code adapted for MARS input will be reported. Work is supported by the U.S. DOE SBIR program.

  12. Computer-Aided Parallelizer and Optimizer

    NASA Technical Reports Server (NTRS)

    Jin, Haoqiang

    2011-01-01

    The Computer-Aided Parallelizer and Optimizer (CAPO) automates the insertion of compiler directives (see figure) to facilitate parallel processing on Shared Memory Parallel (SMP) machines. While CAPO currently is integrated seamlessly into CAPTools (developed at the University of Greenwich, now marketed as ParaWise), CAPO was independently developed at Ames Research Center as one of the components for the Legacy Code Modernization (LCM) project. The current version takes serial FORTRAN programs, performs interprocedural data dependence analysis, and generates OpenMP directives. Due to the widely supported OpenMP standard, the generated OpenMP codes have the potential to run on a wide range of SMP machines. CAPO relies on accurate interprocedural data dependence information currently provided by CAPTools. Compiler directives are generated through identification of parallel loops in the outermost level, construction of parallel regions around parallel loops and optimization of parallel regions, and insertion of directives with automatic identification of private, reduction, induction, and shared variables. Attempts also have been made to identify potential pipeline parallelism (implemented with point-to-point synchronization). Although directives are generated automatically, user interaction with the tool is still important for producing good parallel codes. A comprehensive graphical user interface is included for users to interact with the parallelization process.

  13. LaMEM: a massively parallel 3D staggered-grid finite-difference code for coupled nonlinear themo-mechanical modeling of lithospheric deformation with visco-elasto-plastic rheology

    NASA Astrophysics Data System (ADS)

    Popov, Anton; Kaus, Boris

    2015-04-01

    This software project aims at bringing the 3D lithospheric deformation modeling to a qualitatively different level. Our code LaMEM (Lithosphere and Mantle Evolution Model) is based on the following building blocks: * Massively-parallel data-distributed implementation model based on PETSc library * Light, stable and accurate staggered-grid finite difference spatial discretization * Marker-in-Cell pedictor-corector time discretization with Runge-Kutta 4-th order * Elastic stress rotation algorithm based on the time integration of the vorticity pseudo-vector * Staircase-type internal free surface boundary condition without artificial viscosity contrast * Geodynamically relevant visco-elasto-plastic rheology * Global velocity-pressure-temperature Newton-Raphson nonlinear solver * Local nonlinear solver based on FZERO algorithm * Coupled velocity-pressure geometric multigrid preconditioner with Galerkin coarsening Staggered grid finite difference, being inherently Eulerian and rather complicated discretization method, provides no natural treatment of free surface boundary condition. The solution based on the quasi-viscous sticky-air phase introduces significant viscosity contrasts and spoils the convergence of the iterative solvers. In LaMEM we are currently implementing an approximate stair-case type of the free surface boundary condition which excludes the empty cells and restores the solver convergence. Because of the mutual dependence of the stress and strain-rate tensor components, and their different spatial locations in the grid, there is no straightforward way of implementing the nonlinear rheology. In LaMEM we have developed and implemented an efficient interpolation scheme for the second invariant of the strain-rate tensor, that solves this problem. Scalable efficient linear solvers are the key components of the successful nonlinear problem solution. In LaMEM we have a range of PETSc-based preconditioning techniques that either employ a block factorization of

  14. Parallel Information Processing.

    ERIC Educational Resources Information Center

    Rasmussen, Edie M.

    1992-01-01

    Examines parallel computer architecture and the use of parallel processors for text. Topics discussed include parallel algorithms; performance evaluation; parallel information processing; parallel access methods for text; parallel and distributed information retrieval systems; parallel hardware for text; and network models for information…

  15. The gene tree delusion.

    PubMed

    Springer, Mark S; Gatesy, John

    2016-01-01

    Higher-level relationships among placental mammals are mostly resolved, but several polytomies remain contentious. Song et al. (2012) claimed to have resolved three of these using shortcut coalescence methods (MP-EST, STAR) and further concluded that these methods, which assume no within-locus recombination, are required to unravel deep-level phylogenetic problems that have stymied concatenation. Here, we reanalyze Song et al.'s (2012) data and leverage these re-analyses to explore key issues in systematics including the recombination ratchet, gene tree stoichiometry, the proportion of gene tree incongruence that results from deep coalescence versus other factors, and simulations that compare the performance of coalescence and concatenation methods in species tree estimation. Song et al. (2012) reported an average locus length of 3.1 kb for the 447 protein-coding genes in their phylogenomic dataset, but the true mean length of these loci (start codon to stop codon) is 139.6 kb. Empirical estimates of recombination breakpoints in primates, coupled with consideration of the recombination ratchet, suggest that individual coalescence genes (c-genes) approach ∼12 bp or less for Song et al.'s (2012) dataset, three to four orders of magnitude shorter than the c-genes reported by these authors. This result has general implications for the application of coalescence methods in species tree estimation. We contend that it is illogical to apply coalescence methods to complete protein-coding sequences. Such analyses amalgamate c-genes with different evolutionary histories (i.e., exons separated by >100,000 bp), distort true gene tree stoichiometry that is required for accurate species tree inference, and contradict the central rationale for applying coalescence methods to difficult phylogenetic problems. In addition, Song et al.'s (2012) dataset of 447 genes includes 21 loci with switched taxonomic names, eight duplicated loci, 26 loci with non-homologous sequences that are

  16. The gene tree delusion.

    PubMed

    Springer, Mark S; Gatesy, John

    2016-01-01

    Higher-level relationships among placental mammals are mostly resolved, but several polytomies remain contentious. Song et al. (2012) claimed to have resolved three of these using shortcut coalescence methods (MP-EST, STAR) and further concluded that these methods, which assume no within-locus recombination, are required to unravel deep-level phylogenetic problems that have stymied concatenation. Here, we reanalyze Song et al.'s (2012) data and leverage these re-analyses to explore key issues in systematics including the recombination ratchet, gene tree stoichiometry, the proportion of gene tree incongruence that results from deep coalescence versus other factors, and simulations that compare the performance of coalescence and concatenation methods in species tree estimation. Song et al. (2012) reported an average locus length of 3.1 kb for the 447 protein-coding genes in their phylogenomic dataset, but the true mean length of these loci (start codon to stop codon) is 139.6 kb. Empirical estimates of recombination breakpoints in primates, coupled with consideration of the recombination ratchet, suggest that individual coalescence genes (c-genes) approach ∼12 bp or less for Song et al.'s (2012) dataset, three to four orders of magnitude shorter than the c-genes reported by these authors. This result has general implications for the application of coalescence methods in species tree estimation. We contend that it is illogical to apply coalescence methods to complete protein-coding sequences. Such analyses amalgamate c-genes with different evolutionary histories (i.e., exons separated by >100,000 bp), distort true gene tree stoichiometry that is required for accurate species tree inference, and contradict the central rationale for applying coalescence methods to difficult phylogenetic problems. In addition, Song et al.'s (2012) dataset of 447 genes includes 21 loci with switched taxonomic names, eight duplicated loci, 26 loci with non-homologous sequences that are

  17. Audubon Tree Study Program.

    ERIC Educational Resources Information Center

    National Audubon Society, New York, NY.

    Included are an illustrated student reader, "The Story of Trees," a leaders' guide, and a large tree chart with 37 colored pictures. The student reader reviews several aspects of trees: a definition of a tree; where and how trees grow; flowers, pollination and seed production; how trees make their food; how to recognize trees; seasonal changes;…

  18. Efficiency of parallel direct optimization

    NASA Technical Reports Server (NTRS)

    Janies, D. A.; Wheeler, W. C.

    2001-01-01

    Tremendous progress has been made at the level of sequential computation in phylogenetics. However, little attention has been paid to parallel computation. Parallel computing is particularly suited to phylogenetics because of the many ways large computational problems can be broken into parts that can be analyzed concurrently. In this paper, we investigate the scaling factors and efficiency of random addition and tree refinement strategies using the direct optimization software, POY, on a small (10 slave processors) and a large (256 slave processors) cluster of networked PCs running LINUX. These algorithms were tested on several data sets composed of DNA and morphology ranging from 40 to 500 taxa. Various algorithms in POY show fundamentally different properties within and between clusters. All algorithms are efficient on the small cluster for the 40-taxon data set. On the large cluster, multibuilding exhibits excellent parallel efficiency, whereas parallel building is inefficient. These results are independent of data set size. Branch swapping in parallel shows excellent speed-up for 16 slave processors on the large cluster. However, there is no appreciable speed-up for branch swapping with the further addition of slave processors (>16). This result is independent of data set size. Ratcheting in parallel is efficient with the addition of up to 32 processors in the large cluster. This result is independent of data set size. c2001 The Willi Hennig Society.

  19. Visualizing phylogenetic trees using TreeView.

    PubMed

    Page, Roderic D M

    2002-08-01

    TreeView provides a simple way to view the phylogenetic trees produced by a range of programs, such as PAUP*, PHYLIP, TREE-PUZZLE, and ClustalX. While some phylogenetic programs (such as the Macintosh version of PAUP*) have excellent tree printing facilities, many programs do not have the ability to generate publication quality trees. TreeView addresses this need. The program can read and write a range of tree file formats, display trees in a variety of styles, print trees, and save the tree as a graphic file. Protocols in this unit cover both displaying and printing a tree. Support protocols describe how to download and install TreeView, and how to display bootstrap values in trees generated by ClustalX and PAUP*. PMID:18792942

  20. A generic fine-grained parallel C

    NASA Technical Reports Server (NTRS)

    Hamet, L.; Dorband, John E.

    1988-01-01

    With the present availability of parallel processors of vastly different architectures, there is a need for a common language interface to multiple types of machines. The parallel C compiler, currently under development, is intended to be such a language. This language is based on the belief that an algorithm designed around fine-grained parallelism can be mapped relatively easily to different parallel architectures, since a large percentage of the parallelism has been identified. The compiler generates a FORTH-like machine-independent intermediate code. A machine-dependent translator will reside on each machine to generate the appropriate executable code, taking advantage of the particular architectures. The goal of this project is to allow a user to run the same program on such machines as the Massively Parallel Processor, the CRAY, the Connection Machine, and the CYBER 205 as well as serial machines such as VAXes, Macintoshes and Sun workstations.

  1. Parallelized nested sampling

    NASA Astrophysics Data System (ADS)

    Henderson, R. Wesley; Goggans, Paul M.

    2014-12-01

    One of the important advantages of nested sampling as an MCMC technique is its ability to draw representative samples from multimodal distributions and distributions with other degeneracies. This coverage is accomplished by maintaining a number of so-called live samples within a likelihood constraint. In usual practice, at each step, only the sample with the least likelihood is discarded from this set of live samples and replaced. In [1], Skilling shows that for a given number of live samples, discarding only one sample yields the highest precision in estimation of the log-evidence. However, if we increase the number of live samples, more samples can be discarded at once while still maintaining the same precision. For computer code running only serially, this modification would considerably increase the wall clock time necessary to reach convergence. However, if we use a computer with parallel processing capabilities, and we write our code to take advantage of this parallelism to replace multiple samples concurrently, the performance penalty can be eliminated entirely and possibly reversed. In this case, we must use the more general equation in [1] for computing the expectation of the shrinkage distribution: E [- log t]= (N r-r+1)-1+(Nr-r+2)-1+⋯+Nr-1, for shrinkage t with Nr live samples and r samples discarded at each iteration. The equation for the variance Var (- log t)= (N r-r+1)-2+(Nr-r+2)-2+⋯+Nr-2 is used to find the appropriate number of live samples Nr to use with r > 1 to match the variance achieved with N1 live samples and r = 1. In this paper, we show that by replacing multiple discarded samples in parallel, we are able to achieve a more thorough sampling of the constrained prior distribution, reduce runtime, and increase precision.

  2. Hybrid parallel programming with MPI and Unified Parallel C.

    SciTech Connect

    Dinan, J.; Balaji, P.; Lusk, E.; Sadayappan, P.; Thakur, R.; Mathematics and Computer Science; The Ohio State Univ.

    2010-01-01

    The Message Passing Interface (MPI) is one of the most widely used programming models for parallel computing. However, the amount of memory available to an MPI process is limited by the amount of local memory within a compute node. Partitioned Global Address Space (PGAS) models such as Unified Parallel C (UPC) are growing in popularity because of their ability to provide a shared global address space that spans the memories of multiple compute nodes. However, taking advantage of UPC can require a large recoding effort for existing parallel applications. In this paper, we explore a new hybrid parallel programming model that combines MPI and UPC. This model allows MPI programmers incremental access to a greater amount of memory, enabling memory-constrained MPI codes to process larger data sets. In addition, the hybrid model offers UPC programmers an opportunity to create static UPC groups that are connected over MPI. As we demonstrate, the use of such groups can significantly improve the scalability of locality-constrained UPC codes. This paper presents a detailed description of the hybrid model and demonstrates its effectiveness in two applications: a random access benchmark and the Barnes-Hut cosmological simulation. Experimental results indicate that the hybrid model can greatly enhance performance; using hybrid UPC groups that span two cluster nodes, RA performance increases by a factor of 1.33 and using groups that span four cluster nodes, Barnes-Hut experiences a twofold speedup at the expense of a 2% increase in code size.

  3. Tree harvesting

    SciTech Connect

    Badger, P.C.

    1995-12-31

    Short rotation intensive culture tree plantations have been a major part of biomass energy concepts since the beginning. One aspect receiving less attention than it deserves is harvesting. This article describes an method of harvesting somewhere between agricultural mowing machines and huge feller-bunchers of the pulpwood and lumber industries.

  4. Aspen Trees.

    ERIC Educational Resources Information Center

    Canfield, Elaine

    2002-01-01

    Describes a fifth-grade art activity that offers a new approach to creating pictures of Aspen trees. Explains that the students learned about art concepts, such as line and balance, in this lesson. Discusses the process in detail for creating the pictures. (CMK)

  5. Automatic Multilevel Parallelization Using OpenMP

    NASA Technical Reports Server (NTRS)

    Jin, Hao-Qiang; Jost, Gabriele; Yan, Jerry; Ayguade, Eduard; Gonzalez, Marc; Martorell, Xavier; Biegel, Bryan (Technical Monitor)

    2002-01-01

    In this paper we describe the extension of the CAPO parallelization support tool to support multilevel parallelism based on OpenMP directives. CAPO generates OpenMP directives with extensions supported by the NanosCompiler to allow for directive nesting and definition of thread groups. We report first results for several benchmark codes and one full application that have been parallelized using our system.

  6. Parallel multiscale simulations of a brain aneurysm

    NASA Astrophysics Data System (ADS)

    Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

    2013-07-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier-Stokes solver NɛκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NɛκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in future

  7. Parallel multiscale simulations of a brain aneurysm

    SciTech Connect

    Grinberg, Leopold; Fedosov, Dmitry A.; Karniadakis, George Em

    2013-07-01

    Cardiovascular pathologies, such as a brain aneurysm, are affected by the global blood circulation as well as by the local microrheology. Hence, developing computational models for such cases requires the coupling of disparate spatial and temporal scales often governed by diverse mathematical descriptions, e.g., by partial differential equations (continuum) and ordinary differential equations for discrete particles (atomistic). However, interfacing atomistic-based with continuum-based domain discretizations is a challenging problem that requires both mathematical and computational advances. We present here a hybrid methodology that enabled us to perform the first multiscale simulations of platelet depositions on the wall of a brain aneurysm. The large scale flow features in the intracranial network are accurately resolved by using the high-order spectral element Navier–Stokes solver NεκTαr. The blood rheology inside the aneurysm is modeled using a coarse-grained stochastic molecular dynamics approach (the dissipative particle dynamics method) implemented in the parallel code LAMMPS. The continuum and atomistic domains overlap with interface conditions provided by effective forces computed adaptively to ensure continuity of states across the interface boundary. A two-way interaction is allowed with the time-evolving boundary of the (deposited) platelet clusters tracked by an immersed boundary method. The corresponding heterogeneous solvers (NεκTαr and LAMMPS) are linked together by a computational multilevel message passing interface that facilitates modularity and high parallel efficiency. Results of multiscale simulations of clot formation inside the aneurysm in a patient-specific arterial tree are presented. We also discuss the computational challenges involved and present scalability results of our coupled solver on up to 300 K computer processors. Validation of such coupled atomistic-continuum models is a main open issue that has to be addressed in

  8. Unimodular trees versus Einstein trees

    NASA Astrophysics Data System (ADS)

    Álvarez, Enrique; González-Martín, Sergio; Martín, Carmelo P.

    2016-10-01

    The maximally helicity violating tree-level scattering amplitudes involving three, four or five gravitons are worked out in Unimodular Gravity. They are found to coincide with the corresponding amplitudes in General Relativity. This a remarkable result, insofar as both the propagators and the vertices are quite different in the two theories.

  9. Parallelization of the Implicit RPLUS Algorithm

    NASA Technical Reports Server (NTRS)

    Orkwis, Paul D.

    1997-01-01

    The multiblock reacting Navier-Stokes flow solver RPLUS2D was modified for parallel implementation. Results for non-reacting flow calculations of this code indicate parallelization efficiencies greater than 84% are possible for a typical test problem. Results tend to improve as the size of the problem increases. The convergence rate of the scheme is degraded slightly when additional artificial block boundaries are included for the purpose of parallelization. However, this degradation virtually disappears if the solution is converged near to machine zero. Recommendations are made for further code improvements to increase efficiency, correct bugs in the original version, and study decomposition effectiveness.

  10. Parallelization of the Implicit RPLUS Algorithm

    NASA Technical Reports Server (NTRS)

    Orkwis, Paul D.

    1994-01-01

    The multiblock reacting Navier-Stokes flow-solver RPLUS2D was modified for parallel implementation. Results for non-reacting flow calculations of this code indicate parallelization efficiencies greater than 84% are possible for a typical test problem. Results tend to improve as the size of the problem increases. The convergence rate of the scheme is degraded slightly when additional artificial block boundaries are included for the purpose of parallelization. However, this degradation virtually disappears if the solution is converged near to machine zero. Recommendations are made for further code improvements to increase efficiency, correct bugs in the original version, and study decomposition effectiveness.

  11. Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

    DOE PAGESBeta

    Radhakrishnan, Hari; Rouson, Damian W. I.; Morris, Karla; Shende, Sameer; Kassinos, Stavros C.

    2015-01-01

    This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were donemore » using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.« less

  12. Parallel execution of LISP programs

    SciTech Connect

    Weening, J.S.

    1989-01-01

    This dissertation considers several issues in the execution of Lisp programs on shared-memory multiprocessors. An overview of constructs for explicit parallelism in Lisp is first presented. The problems of partitioning a program into processes and scheduling these processes are then described, and a number of methods for performing these are proposed. These include cutting off process creation based on properties of the computation tree of the program, and basing partitioning decisions on the state of the system at runtime instead of the program. An experimental study of these methods has been performed using a simulator for parallel Lisp. The simulator, written in common Lisp using a continuation-passing style, is described in detail. This is followed by a description of the experiments that were performed and an analysis of the results. Two programs are used as illustrations-a Fast Fourier Transform, which has an abundance of parallelism, and the Cocke-Younger-Kasami parsing algorithm, for which good speedup is not as easy to obtain. The difficulty of using cutoff-based partitioning methods, and the differences between various scheduling methods, are shown. A combination of partitioning and scheduling methods which the author calls dynamic partitioning is analyzed in more detail. This method is based on examining the machine's runtime state; it requires that the programmer only identify parallelism in the program, without deciding which potential parallelism is actually useful. Several theorems are proved providing upper bounds on the amount of overhead produced by this method. He concludes that for programs whose computation trees have small height relative to their total size, dynamic partitioning can achieve asymptotically minimal overhead in the cost of process creation.

  13. Technical Tree Climbing.

    ERIC Educational Resources Information Center

    Jenkins, Peter

    Tree climbing offers a safe, inexpensive adventure sport that can be performed almost anywhere. Using standard procedures practiced in tree surgery or rock climbing, almost any tree can be climbed. Tree climbing provides challenge and adventure as well as a vigorous upper-body workout. Tree Climbers International classifies trees using a system…

  14. Parallel Implicit Algorithms for CFD

    NASA Technical Reports Server (NTRS)

    Keyes, David E.

    1998-01-01

    The main goal of this project was efficient distributed parallel and workstation cluster implementations of Newton-Krylov-Schwarz (NKS) solvers for implicit Computational Fluid Dynamics (CFD.) "Newton" refers to a quadratically convergent nonlinear iteration using gradient information based on the true residual, "Krylov" to an inner linear iteration that accesses the Jacobian matrix only through highly parallelizable sparse matrix-vector products, and "Schwarz" to a domain decomposition form of preconditioning the inner Krylov iterations with primarily neighbor-only exchange of data between the processors. Prior experience has established that Newton-Krylov methods are competitive solvers in the CFD context and that Krylov-Schwarz methods port well to distributed memory computers. The combination of the techniques into Newton-Krylov-Schwarz was implemented on 2D and 3D unstructured Euler codes on the parallel testbeds that used to be at LaRC and on several other parallel computers operated by other agencies or made available by the vendors. Early implementations were made directly in Massively Parallel Integration (MPI) with parallel solvers we adapted from legacy NASA codes and enhanced for full NKS functionality. Later implementations were made in the framework of the PETSC library from Argonne National Laboratory, which now includes pseudo-transient continuation Newton-Krylov-Schwarz solver capability (as a result of demands we made upon PETSC during our early porting experiences). A secondary project pursued with funding from this contract was parallel implicit solvers in acoustics, specifically in the Helmholtz formulation. A 2D acoustic inverse problem has been solved in parallel within the PETSC framework.

  15. Interframe vector wavelet coding technique

    NASA Astrophysics Data System (ADS)

    Wus, John P.; Li, Weiping

    1997-01-01

    Wavelet coding is often used to divide an image into multi- resolution wavelet coefficients which are quantized and coded. By 'vectorizing' scalar wavelet coding and combining this with vector quantization (VQ), vector wavelet coding (VWC) can be implemented. Using a finite number of states, finite-state vector quantization (FSVQ) takes advantage of the similarity between frames by incorporating memory into the video coding system. Lattice VQ eliminates the potential mismatch that could occur using pre-trained VQ codebooks. It also eliminates the need for codebook storage in the VQ process, thereby creating a more robust coding system. Therefore, by using the VWC coding method in conjunction with the FSVQ system and lattice VQ, the formulation of a high quality very low bit rate coding systems is proposed. A coding system using a simple FSVQ system where the current state is determined by the previous channel symbol only is developed. To achieve a higher degree of compression, a tree-like FSVQ system is implemented. The groupings are done in this tree-like structure from the lower subbands to the higher subbands in order to exploit the nature of subband analysis in terms of the parent-child relationship. Class A and Class B video sequences from the MPEG-IV testing evaluations are used in the evaluation of this coding method.

  16. Global tree network for computing structures enabling global processing operations

    DOEpatents

    Blumrich; Matthias A.; Chen, Dong; Coteus, Paul W.; Gara, Alan G.; Giampapa, Mark E.; Heidelberger, Philip; Hoenicke, Dirk; Steinmacher-Burow, Burkhard D.; Takken, Todd E.; Vranas, Pavlos M.

    2010-01-19

    A system and method for enabling high-speed, low-latency global tree network communications among processing nodes interconnected according to a tree network structure. The global tree network enables collective reduction operations to be performed during parallel algorithm operations executing in a computer structure having a plurality of the interconnected processing nodes. Router devices are included that interconnect the nodes of the tree via links to facilitate performance of low-latency global processing operations at nodes of the virtual tree and sub-tree structures. The global operations performed include one or more of: broadcast operations downstream from a root node to leaf nodes of a virtual tree, reduction operations upstream from leaf nodes to the root node in the virtual tree, and point-to-point message passing from any node to the root node. The global tree network is configurable to provide global barrier and interrupt functionality in asynchronous or synchronized manner, and, is physically and logically partitionable.

  17. Phonological coding during reading

    PubMed Central

    Leinenger, Mallorie

    2014-01-01

    The exact role that phonological coding (the recoding of written, orthographic information into a sound based code) plays during silent reading has been extensively studied for more than a century. Despite the large body of research surrounding the topic, varying theories as to the time course and function of this recoding still exist. The present review synthesizes this body of research, addressing the topics of time course and function in tandem. The varying theories surrounding the function of phonological coding (e.g., that phonological codes aid lexical access, that phonological codes aid comprehension and bolster short-term memory, or that phonological codes are largely epiphenomenal in skilled readers) are first outlined, and the time courses that each maps onto (e.g., that phonological codes come online early (pre-lexical) or that phonological codes come online late (post-lexical)) are discussed. Next the research relevant to each of these proposed functions is reviewed, discussing the varying methodologies that have been used to investigate phonological coding (e.g., response time methods, reading while eyetracking or recording EEG and MEG, concurrent articulation) and highlighting the advantages and limitations of each with respect to the study of phonological coding. In response to the view that phonological coding is largely epiphenomenal in skilled readers, research on the use of phonological codes in prelingually, profoundly deaf readers is reviewed. Finally, implications for current models of word identification (activation-verification model (Van Order, 1987), dual-route model (e.g., Coltheart, Rastle, Perry, Langdon, & Ziegler, 2001), parallel distributed processing model (Seidenberg & McClelland, 1989)) are discussed. PMID:25150679

  18. Data-Parallel Halo Finder Operator in PISTON

    SciTech Connect

    Widanagamaachchi, W. N.

    2012-08-01

    PISTON is a portable framework which supports the development of visualization and analysis operators using a platform-independent, data-parallel programming model. Operators such as isosurface, cut-surface and threshold have been implemented in this framework, with the exact same operator code achieving good parallel performance on different architectures. An important analysis operator in cosmology is the halo finder. A halo is a cluster of particles and is considered a common feature of interest found in cosmology data. As the number of cosmological simulations carried out in the recent past has increased, the resultant data of these simulations and the required analysis tasks have increased as well. As a consequence, there is a need to develop scalable and efficient tools to carry out the needed analysis. Therefore, we are currently implementing a halo finder operator using PISTON. Researchers have developed a wide variety of techniques to identify halos in raw particle data. The most basic algorithm is the friend-of-friends (FOF) halo finder, where the particles are clustered based on two parameters: linking length and halo size. In a FOF halo finder, all particles which lie within the linking length are considered as one halo and the halos are filtered based on the halo size parameter. A naive implementation of a FOF halo finder compares each and every particle pair, requiring O(n{sup 2}) operations. Our data-parallel halo finder operator uses a balanced k-d tree to reduce this number of operations in the average case, and implements the algorithm using only the data-parallel primitives in order to achieve portability and performance.

  19. Parallel auto-correlative statistics with VTK.

    SciTech Connect

    Pebay, Philippe Pierre; Bennett, Janine Camille

    2013-08-01

    This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.

  20. Locating hardware faults in a parallel computer

    DOEpatents

    Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.

    2010-04-13

    Locating hardware faults in a parallel computer, including defining within a tree network of the parallel computer two or more sets of non-overlapping test levels of compute nodes of the network that together include all the data communications links of the network, each non-overlapping test level comprising two or more adjacent tiers of the tree; defining test cells within each non-overlapping test level, each test cell comprising a subtree of the tree including a subtree root compute node and all descendant compute nodes of the subtree root compute node within a non-overlapping test level; performing, separately on each set of non-overlapping test levels, an uplink test on all test cells in a set of non-overlapping test levels; and performing, separately from the uplink tests and separately on each set of non-overlapping test levels, a downlink test on all test cells in a set of non-overlapping test levels.

  1. Multitasking TORT Under UNICOS: Parallel Performance Models and Measurements

    SciTech Connect

    Azmy, Y.Y.; Barnett, D.A.

    1999-09-27

    The existing parallel algorithms in the TORT discrete ordinates were updated to function in a UNI-COS environment. A performance model for the parallel overhead was derived for the existing algorithms. The largest contributors to the parallel overhead were identified and a new algorithm was developed. A parallel overhead model was also derived for the new algorithm. The results of the comparison of parallel performance models were compared to applications of the code to two TORT standard test problems and a large production problem. The parallel performance models agree well with the measured parallel overhead.

  2. Two Level Parallel Grammatical Evolution

    NASA Astrophysics Data System (ADS)

    Ošmera, Pavel

    This paper describes a Two Level Parallel Grammatical Evolution (TLPGE) that can evolve complete programs using a variable length linear genome to govern the mapping of a Backus Naur Form grammar definition. To increase the efficiency of Grammatical Evolution (GE) the influence of backward processing was tested and a second level with differential evolution was added. The significance of backward coding (BC) and the comparison with standard coding of GEs is presented. The new method is based on parallel grammatical evolution (PGE) with a backward processing algorithm, which is further extended with a differential evolution algorithm. Thus a two-level optimization method was formed in attempt to take advantage of the benefits of both original methods and avoid their difficulties. Both methods used are discussed and the architecture of their combination is described. Also application is discussed and results on a real-word application are described.

  3. Parallelization of ARC3D with Computer-Aided Tools

    NASA Technical Reports Server (NTRS)

    Jin, Haoqiang; Hribar, Michelle; Yan, Jerry; Saini, Subhash (Technical Monitor)

    1998-01-01

    A series of efforts have been devoted to investigating methods of porting and parallelizing applications quickly and efficiently for new architectures, such as the SCSI Origin 2000 and Cray T3E. This report presents the parallelization of a CFD application, ARC3D, using the computer-aided tools, Cesspools. Steps of parallelizing this code and requirements of achieving better performance are discussed. The generated parallel version has achieved reasonably well performance, for example, having a speedup of 30 for 36 Cray T3E processors. However, this performance could not be obtained without modification of the original serial code. It is suggested that in many cases improving serial code and performing necessary code transformations are important parts for the automated parallelization process although user intervention in many of these parts are still necessary. Nevertheless, development and improvement of useful software tools, such as Cesspools, can help trim down many tedious parallelization details and improve the processing efficiency.

  4. GAMER: A GRAPHIC PROCESSING UNIT ACCELERATED ADAPTIVE-MESH-REFINEMENT CODE FOR ASTROPHYSICS

    SciTech Connect

    Schive, H.-Y.; Tsai, Y.-C.; Chiueh Tzihong

    2010-02-01

    We present the newly developed code, GPU-accelerated Adaptive-MEsh-Refinement code (GAMER), which adopts a novel approach in improving the performance of adaptive-mesh-refinement (AMR) astrophysical simulations by a large factor with the use of the graphic processing unit (GPU). The AMR implementation is based on a hierarchy of grid patches with an oct-tree data structure. We adopt a three-dimensional relaxing total variation diminishing scheme for the hydrodynamic solver and a multi-level relaxation scheme for the Poisson solver. Both solvers have been implemented in GPU, by which hundreds of patches can be advanced in parallel. The computational overhead associated with the data transfer between the CPU and GPU is carefully reduced by utilizing the capability of asynchronous memory copies in GPU, and the computing time of the ghost-zone values for each patch is diminished by overlapping it with the GPU computations. We demonstrate the accuracy of the code by performing several standard test problems in astrophysics. GAMER is a parallel code that can be run in a multi-GPU cluster system. We measure the performance of the code by performing purely baryonic cosmological simulations in different hardware implementations, in which detailed timing analyses provide comparison between the computations with and without GPU(s) acceleration. Maximum speed-up factors of 12.19 and 10.47 are demonstrated using one GPU with 4096{sup 3} effective resolution and 16 GPUs with 8192{sup 3} effective resolution, respectively.

  5. HEATR project: ATR algorithm parallelization

    NASA Astrophysics Data System (ADS)

    Deardorf, Catherine E.

    1998-09-01

    High Performance Computing (HPC) Embedded Application for Target Recognition (HEATR) is a project funded by the High Performance Computing Modernization Office through the Common HPC Software Support Initiative (CHSSI). The goal of CHSSI is to produce portable, parallel, multi-purpose, freely distributable, support software to exploit emerging parallel computing technologies and enable application of scalable HPC's for various critical DoD applications. Specifically, the CHSSI goal for HEATR is to provide portable, parallel versions of several existing ATR detection and classification algorithms to the ATR-user community to achieve near real-time capability. The HEATR project will create parallel versions of existing automatic target recognition (ATR) detection and classification algorithms and generate reusable code that will support porting and software development process for ATR HPC software. The HEATR Team has selected detection/classification algorithms from both the model- based and training-based (template-based) arena in order to consider the parallelization requirements for detection/classification algorithms across ATR technology. This would allow the Team to assess the impact that parallelization would have on detection/classification performance across ATR technology. A field demo is included in this project. Finally, any parallel tools produced to support the project will be refined and returned to the ATR user community along with the parallel ATR algorithms. This paper will review: (1) HPCMP structure as it relates to HEATR, (2) Overall structure of the HEATR project, (3) Preliminary results for the first algorithm Alpha Test, (4) CHSSI requirements for HEATR, and (5) Project management issues and lessons learned.

  6. The dynamics of strangling among forest trees.

    PubMed

    Okamoto, Kenichi W

    2015-11-01

    Strangler trees germinate and grow on other trees, eventually enveloping and potentially even girdling their hosts. This allows them to mitigate fitness costs otherwise incurred by germinating and competing with other trees on the forest floor, as well as minimize risks associated with host tree-fall. If stranglers can themselves host other strangler trees, they may not even seem to need non-stranglers to persist. Yet despite their high fitness potential, strangler trees neither dominate the communities in which they occur nor is the strategy particularly common outside of figs (genus Ficus). Here we analyze how dynamic interactions between strangling and non-strangling trees can shape the adaptive landscape for strangling mutants and mutant trees that have lost the ability to strangle. We find a threshold which strangler germination rates must exceed for selection to favor the evolution of strangling, regardless of how effectively hemiepiphytic stranglers may subsequently replace their hosts. This condition describes the magnitude of the phenotypic displacement in the ability to germinate on other trees necessary for invasion by a mutant tree that could potentially strangle its host following establishment as an epiphyte. We show how the relative abilities of strangling and non-strangling trees to occupy empty sites can govern whether strangling is an evolutionarily stable strategy, and obtain the conditions for strangler coexistence with non-stranglers. We then elucidate when the evolution of strangling can disrupt stable coexistence between commensal epiphytic ancestors and their non-strangling host trees. This allows us to highlight parallels between the invasion fitness of strangler trees arising from commensalist ancestors, and cases where strangling can arise in concert with the evolution of hemiepiphytism among free-standing ancestors. Finally, we discuss how our results can inform the evolutionary ecology of antagonistic interactions more generally.

  7. Parallel computation of three-dimensional nonlinear magnetostatic problems.

    SciTech Connect

    Levine, D.; Gropp, W.; Forsman, K.; Kettunen, L.; Mathematics and Computer Science; Tampere Univ. of Tech.

    1999-02-01

    We describe a general-purpose parallel electromagnetic code for computing accurate solutions to large computationally demanding, 3D, nonlinear magnetostatic problems. The code, CORAL, is based on a volume integral equation formulation. Using an IBM SP parallel computer and iterative solution methods, we successfully solved the dense linear systems inherent in such formulations. A key component of our work was the use of the PETSc library, which provides parallel portability and access to the latest linear algebra solution technology.

  8. Low Density Parity Check Codes: Bandwidth Efficient Channel Coding

    NASA Technical Reports Server (NTRS)

    Fong, Wai; Lin, Shu; Maki, Gary; Yeh, Pen-Shu

    2003-01-01

    Low Density Parity Check (LDPC) Codes provide near-Shannon Capacity performance for NASA Missions. These codes have high coding rates R=0.82 and 0.875 with moderate code lengths, n=4096 and 8176. Their decoders have inherently parallel structures which allows for high-speed implementation. Two codes based on Euclidean Geometry (EG) were selected for flight ASIC implementation. These codes are cyclic and quasi-cyclic in nature and therefore have a simple encoder structure. This results in power and size benefits. These codes also have a large minimum distance as much as d,,, = 65 giving them powerful error correcting capabilities and error floors less than lo- BER. This paper will present development of the LDPC flight encoder and decoder, its applications and status.

  9. Inevitable self-similar topology of binary trees and their diverse hierarchical density

    NASA Astrophysics Data System (ADS)

    Paik, K.; Kumar, P.

    2007-11-01

    Self-similar topology, which can be characterized as power law size distribution, has been found in diverse tree networks ranging from river networks to taxonomic trees. In this study, we find that the statistical self-similar topology is an inevitable consequence of any full binary tree organization. We show this by coding a binary tree as a unique bifurcation string. This coding scheme allows us to investigate trees over the realm from deterministic to entirely random trees. To obtain partial random trees, partial random perturbation is added to the deterministic trees by an operator similar to that used in genetic algorithms. Our analysis shows that the hierarchical density of binary trees is more diverse than has been described in earlier studies. We find that the connectivity structure of river networks is far from strict self-similar trees. On the other hand, organization of some social networks is close to deterministic supercritical trees.

  10. Multicast Reduction Network Source Code

    2006-12-19

    MRNet is a software tree-based overlay network developed at the University of Wisconsin, Madison that provides a scalable communication mechanism for parallel tools. MRNet, uses a tree topology of networked processes between a user tool and distributed tool daemons. This tree topology allows scalable multicast communication from the tool to the daemons. The internal nodes of the tree can be used to distribute computation and alalysis on data sent from the tool daemons to themore » tool. This release covers minor implementation to port this software to the BlueGene/L architecuture and for use with a new implementation of the Dynamic Probe Class Library.« less

  11. Special parallel processing workshop

    SciTech Connect

    1994-12-01

    This report contains viewgraphs from the Special Parallel Processing Workshop. These viewgraphs deal with topics such as parallel processing performance, message passing, queue structure, and other basic concept detailing with parallel processing.

  12. The Tree Worker's Manual.

    ERIC Educational Resources Information Center

    Smithyman, S. J.

    This manual is designed to prepare students for entry-level positions as tree care professionals. Addressed in the individual chapters of the guide are the following topics: the tree service industry; clothing, eqiupment, and tools; tree workers; basic tree anatomy; techniques of pruning; procedures for climbing and working in the tree; aerial…

  13. Parallel Eclipse Project Checkout

    NASA Technical Reports Server (NTRS)

    Crockett, Thomas M.; Joswig, Joseph C.; Shams, Khawaja S.; Powell, Mark W.; Bachmann, Andrew G.

    2011-01-01

    Parallel Eclipse Project Checkout (PEPC) is a program written to leverage parallelism and to automate the checkout process of plug-ins created in Eclipse RCP (Rich Client Platform). Eclipse plug-ins can be aggregated in a feature project. This innovation digests a feature description (xml file) and automatically checks out all of the plug-ins listed in the feature. This resolves the issue of manually checking out each plug-in required to work on the project. To minimize the amount of time necessary to checkout the plug-ins, this program makes the plug-in checkouts parallel. After parsing the feature, a request to checkout for each plug-in in the feature has been inserted. These requests are handled by a thread pool with a configurable number of threads. By checking out the plug-ins in parallel, the checkout process is streamlined before getting started on the project. For instance, projects that took 30 minutes to checkout now take less than 5 minutes. The effect is especially clear on a Mac, which has a network monitor displaying the bandwidth use. When running the client from a developer s home, the checkout process now saturates the bandwidth in order to get all the plug-ins checked out as fast as possible. For comparison, a checkout process that ranged from 8-200 Kbps from a developer s home is now able to saturate a pipe of 1.3 Mbps, resulting in significantly faster checkouts. Eclipse IDE (integrated development environment) tries to build a project as soon as it is downloaded. As part of another optimization, this innovation programmatically tells Eclipse to stop building while checkouts are happening, which dramatically reduces lock contention and enables plug-ins to continue downloading until all of them finish. Furthermore, the software re-enables automatic building, and forces Eclipse to do a clean build once it finishes checking out all of the plug-ins. This software is fully generic and does not contain any NASA-specific code. It can be applied to any

  14. Integrated Task and Data Parallel Programming

    NASA Technical Reports Server (NTRS)

    Grimshaw, A. S.

    1998-01-01

    This research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers 1995 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program. Additional 1995 Activities During the fall I collaborated

  15. A join algorithm for combining AND parallel solutions in AND/OR parallel systems

    SciTech Connect

    Ramkumar, B. ); Kale, L.V. )

    1992-02-01

    When two or more literals in the body of a Prolog clause are solved in (AND) parallel, their solutions need to be joined to compute solutions for the clause. This is often a difficult problem in parallel Prolog systems that exploit OR and independent AND parallelism in Prolog programs. In several AND/OR parallel systems proposed recently, this problem is side-stepped at the cost of unexploited OR parallelism in the program, in part due to the complexity of the backtracking algorithm beneath AND parallel branches. In some cases, the data dependency graphs used by these systems cannot represent all the exploitable independent AND parallelism known at compile time. In this paper, we describe the compile time analysis for an optimized join algorithm for supporting independent AND parallelism in logic programs efficiently without leaving and OR parallelism unexploited. We then discuss how this analysis can be used to yield very efficient runtime behavior. We also discuss problems associated with a tree representation of the search space when arbitrarily complex data dependency graphs are permitted. We describe how these problems can be resolved by mapping the search space onto data dependency graphs themselves. The algorithm has been implemented in a compiler for parallel Prolog based on the reduce-OR process model. The algorithm is suitable for the implementation of AND/OR systems on both shared and nonshared memory machines. Performance on benchmark programs.

  16. HOPSPACK: Hybrid Optimization Parallel Search Package.

    SciTech Connect

    Gray, Genetha Anne.; Kolda, Tamara G.; Griffin, Joshua; Taddy, Matt; Martinez-Canales, Monica L.

    2008-12-01

    In this paper, we describe the technical details of HOPSPACK (Hybrid Optimization Parallel SearchPackage), a new software platform which facilitates combining multiple optimization routines into asingle, tightly-coupled, hybrid algorithm that supports parallel function evaluations. The frameworkis designed such that existing optimization source code can be easily incorporated with minimalcode modification. By maintaining the integrity of each individual solver, the strengths and codesophistication of the original optimization package are retained and exploited.4

  17. Computational electromagnetics and parallel dense matrix computations

    SciTech Connect

    Forsman, K.; Kettunen, L.; Gropp, W.; Levine, D.

    1995-06-01

    We present computational results using CORAL, a parallel, three-dimensional, nonlinear magnetostatic code based on a volume integral equation formulation. A key feature of CORAL is the ability to solve, in parallel, the large, dense systems of linear equations that are inherent in the use of integral equation methods. Using the Chameleon and PSLES libraries ensures portability and access to the latest linear algebra solution technology.

  18. Parallel rendering techniques for massively parallel visualization

    SciTech Connect

    Hansen, C.; Krogh, M.; Painter, J.

    1995-07-01

    As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory. and parallelism of Massively Parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.

  19. Constructions for finite-state codes

    NASA Technical Reports Server (NTRS)

    Pollara, F.; Mceliece, R. J.; Abdel-Ghaffar, K.

    1987-01-01

    A class of codes called finite-state (FS) codes is defined and investigated. These codes, which generalize both block and convolutional codes, are defined by their encoders, which are finite-state machines with parallel inputs and outputs. A family of upper bounds on the free distance of a given FS code is derived from known upper bounds on the minimum distance of block codes. A general construction for FS codes is then given, based on the idea of partitioning a given linear block into cosets of one of its subcodes, and it is shown that in many cases the FS codes constructed in this way have a d sub free which is as large as possible. These codes are found without the need for lengthy computer searches, and have potential applications for future deep-space coding systems. The issue of catastropic error propagation (CEP) for FS codes is also investigated.

  20. Utilizing GPUs to Accelerate Turbomachinery CFD Codes

    NASA Technical Reports Server (NTRS)

    MacCalla, Weylin; Kulkarni, Sameer

    2016-01-01

    GPU computing has established itself as a way to accelerate parallel codes in the high performance computing world. This work focuses on speeding up APNASA, a legacy CFD code used at NASA Glenn Research Center, while also drawing conclusions about the nature of GPU computing and the requirements to make GPGPU worthwhile on legacy codes. Rewriting and restructuring of the source code was avoided to limit the introduction of new bugs. The code was profiled and investigated for parallelization potential, then OpenACC directives were used to indicate parallel parts of the code. The use of OpenACC directives was not able to reduce the runtime of APNASA on either the NVIDIA Tesla discrete graphics card, or the AMD accelerated processing unit. Additionally, it was found that in order to justify the use of GPGPU, the amount of parallel work being done within a kernel would have to greatly exceed the work being done by any one portion of the APNASA code. It was determined that in order for an application like APNASA to be accelerated on the GPU, it should not be modular in nature, and the parallel portions of the code must contain a large portion of the code's computation time.

  1. Uplink Coding

    NASA Technical Reports Server (NTRS)

    Pollara, Fabrizio; Hamkins, Jon; Dolinar, Sam; Andrews, Ken; Divsalar, Dariush

    2006-01-01

    This viewgraph presentation reviews uplink coding. The purpose and goals of the briefing are (1) Show a plan for using uplink coding and describe benefits (2) Define possible solutions and their applicability to different types of uplink, including emergency uplink (3) Concur with our conclusions so we can embark on a plan to use proposed uplink system (4) Identify the need for the development of appropriate technology and infusion in the DSN (5) Gain advocacy to implement uplink coding in flight projects Action Item EMB04-1-14 -- Show a plan for using uplink coding, including showing where it is useful or not (include discussion of emergency uplink coding).

  2. Interfacing Computer Aided Parallelization and Performance Analysis

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Jin, Haoqiang; Labarta, Jesus; Gimenez, Judit; Biegel, Bryan A. (Technical Monitor)

    2003-01-01

    When porting sequential applications to parallel computer architectures, the program developer will typically go through several cycles of source code optimization and performance analysis. We have started a project to develop an environment where the user can jointly navigate through program structure and performance data information in order to make efficient optimization decisions. In a prototype implementation we have interfaced the CAPO computer aided parallelization tool with the Paraver performance analysis tool. We describe both tools and their interface and give an example for how the interface helps within the program development cycle of a benchmark code.

  3. An efficient parallel algorithm for accelerating computational protein design

    PubMed Central

    Zhou, Yichao; Xu, Wei; Donald, Bruce R.; Zeng, Jianyang

    2014-01-01

    Motivation: Structure-based computational protein design (SCPR) is an important topic in protein engineering. Under the assumption of a rigid backbone and a finite set of discrete conformations of side-chains, various methods have been proposed to address this problem. A popular method is to combine the dead-end elimination (DEE) and A* tree search algorithms, which provably finds the global minimum energy conformation (GMEC) solution. Results: In this article, we improve the efficiency of computing A* heuristic functions for protein design and propose a variant of A* algorithm in which the search process can be performed on a single GPU in a massively parallel fashion. In addition, we make some efforts to address the memory exceeding problem in A* search. As a result, our enhancements can achieve a significant speedup of the A*-based protein design algorithm by four orders of magnitude on large-scale test data through pre-computation and parallelization, while still maintaining an acceptable memory overhead. We also show that our parallel A* search algorithm could be successfully combined with iMinDEE, a state-of-the-art DEE criterion, for rotamer pruning to further improve SCPR with the consideration of continuous side-chain flexibility. Availability: Our software is available and distributed open-source under the GNU Lesser General License Version 2.1 (GNU, February 1999). The source code can be downloaded from http://www.cs.duke.edu/donaldlab/osprey.php or http://iiis.tsinghua.edu.cn/∼compbio/software.html. Contact: zengjy321@tsinghua.edu.cn Supplementary information: Supplementary data are available at Bioinformatics online. PMID:24931991

  4. An Integrated Procedure for Tree N-body Simulations: FLY and AstroMD

    NASA Astrophysics Data System (ADS)

    Becciani, U.; Antonuccio-Delogu, V.; Buonomo, F.; Gheller, C.

    We present a new code for evolving three-dimensional self-gravitating collisionless systems with a large number of particles N >= 107. FLY (Fast Level-based N-bodY code) is a fully parallel code based on a tree algorithm. It adopts periodic boundary conditions implemented by means of the Ewald summation technique. FLY is based on the one-side communication paradigm for sharing data among the processors that access remote private data, avoiding any kind of synchronization. The code was originally developed on a CRAY T3E system using the SHMEM library and it was ported to SGI ORIGIN 2000 and IBM SP (on the latter making use of the LAPI library). FLY version 1.1 is open source, freely available code. FLY output data can be analysed with AstroMD, an analysis and visualization tool specifically designed for astrophysical data. AstroMD can manage different physical quantities. It can find structures without well defined shape or symmetries, and perform quantitative calculations on selected regions. AstroMD is freely available.

  5. Xyce parallel electronic simulator : users' guide.

    SciTech Connect

    Mei, Ting; Rankin, Eric Lamont; Thornquist, Heidi K.; Santarelli, Keith R.; Fixel, Deborah A.; Coffey, Todd Stirling; Russo, Thomas V.; Schiek, Richard Louis; Warrender, Christina E.; Keiter, Eric Richard; Pawlowski, Roger Patrick

    2011-05-01

    This manual describes the use of the Xyce Parallel Electronic Simulator. Xyce has been designed as a SPICE-compatible, high-performance analog circuit simulator, and has been written to support the simulation needs of the Sandia National Laboratories electrical designers. This development has focused on improving capability over the current state-of-the-art in the following areas: (1) Capability to solve extremely large circuit problems by supporting large-scale parallel computing platforms (up to thousands of processors). Note that this includes support for most popular parallel and serial computers; (2) Improved performance for all numerical kernels (e.g., time integrator, nonlinear and linear solvers) through state-of-the-art algorithms and novel techniques. (3) Device models which are specifically tailored to meet Sandia's needs, including some radiation-aware devices (for Sandia users only); and (4) Object-oriented code design and implementation using modern coding practices that ensure that the Xyce Parallel Electronic Simulator will be maintainable and extensible far into the future. Xyce is a parallel code in the most general sense of the phrase - a message passing parallel implementation - which allows it to run efficiently on the widest possible number of computing platforms. These include serial, shared-memory and distributed-memory parallel as well as heterogeneous platforms. Careful attention has been paid to the specific nature of circuit-simulation problems to ensure that optimal parallel efficiency is achieved as the number of processors grows. The development of Xyce provides a platform for computational research and development aimed specifically at the needs of the Laboratory. With Xyce, Sandia has an 'in-house' capability with which both new electrical (e.g., device model development) and algorithmic (e.g., faster time-integration methods, parallel solver algorithms) research and development can be performed. As a result, Xyce is a unique

  6. PARAMESH: A Parallel Adaptive Mesh Refinement Community Toolkit

    NASA Technical Reports Server (NTRS)

    MacNeice, Peter; Olson, Kevin M.; Mobarry, Clark; deFainchtein, Rosalinda; Packer, Charles

    1999-01-01

    In this paper, we describe a community toolkit which is designed to provide parallel support with adaptive mesh capability for a large and important class of computational models, those using structured, logically cartesian meshes. The package of Fortran 90 subroutines, called PARAMESH, is designed to provide an application developer with an easy route to extend an existing serial code which uses a logically cartesian structured mesh into a parallel code with adaptive mesh refinement. Alternatively, in its simplest use, and with minimal effort, it can operate as a domain decomposition tool for users who want to parallelize their serial codes, but who do not wish to use adaptivity. The package can provide them with an incremental evolutionary path for their code, converting it first to uniformly refined parallel code, and then later if they so desire, adding adaptivity.

  7. Stable feature selection for clinical prediction: exploiting ICD tree structure using Tree-Lasso.

    PubMed

    Kamkar, Iman; Gupta, Sunil Kumar; Phung, Dinh; Venkatesh, Svetha

    2015-02-01

    Modern healthcare is getting reshaped by growing Electronic Medical Records (EMR). Recently, these records have been shown of great value towards building clinical prediction models. In EMR data, patients' diseases and hospital interventions are captured through a set of diagnoses and procedures codes. These codes are usually represented in a tree form (e.g. ICD-10 tree) and the codes within a tree branch may be highly correlated. These codes can be used as features to build a prediction model and an appropriate feature selection can inform a clinician about important risk factors for a disease. Traditional feature selection methods (e.g. Information Gain, T-test, etc.) consider each variable independently and usually end up having a long feature list. Recently, Lasso and related l1-penalty based feature selection methods have become popular due to their joint feature selection property. However, Lasso is known to have problems of selecting one feature of many correlated features randomly. This hinders the clinicians to arrive at a stable feature set, which is crucial for clinical decision making process. In this paper, we solve this problem by using a recently proposed Tree-Lasso model. Since, the stability behavior of Tree-Lasso is not well understood, we study the stability behavior of Tree-Lasso and compare it with other feature selection methods. Using a synthetic and two real-world datasets (Cancer and Acute Myocardial Infarction), we show that Tree-Lasso based feature selection is significantly more stable than Lasso and comparable to other methods e.g. Information Gain, ReliefF and T-test. We further show that, using different types of classifiers such as logistic regression, naive Bayes, support vector machines, decision trees and Random Forest, the classification performance of Tree-Lasso is comparable to Lasso and better than other methods. Our result has implications in identifying stable risk factors for many healthcare problems and therefore can

  8. Tree Tectonics

    NASA Astrophysics Data System (ADS)

    Vogt, Peter R.

    2004-09-01

    Nature often replicates her processes at different scales of space and time in differing media. Here a tree-trunk cross section I am preparing for a dendrochronological display at the Battle Creek Cypress Swamp Nature Sanctuary (Calvert County, Maryland) dried and cracked in a way that replicates practically all the planform features found along the Mid-Oceanic Ridge (see Figure 1). The left-lateral offset of saw marks, contrasting with the right-lateral ``rift'' offset, even illustrates the distinction between transcurrent (strike-slip) and transform faults, the latter only recognized as a geologic feature, by J. Tuzo Wilson, in 1965. However, wood cracking is but one of many examples of natural processes that replicate one or several elements of lithospheric plate tectonics. Many of these examples occur in everyday venues and thus make great teaching aids, ``teachable'' from primary school to university levels. Plate tectonics, the dominant process of Earth geology, also occurs in miniature on the surface of some lava lakes, and as ``ice plate tectonics'' on our frozen seas and lakes. Ice tectonics also happens at larger spatial and temporal scales on the Jovian moons Europa and perhaps Ganymede. Tabletop plate tectonics, in which a molten-paraffin ``asthenosphere'' is surfaced by a skin of congealing wax ``plates,'' first replicated Mid-Oceanic Ridge type seafloor spreading more than three decades ago. A seismologist (J. Brune, personal communication, 2004) discovered wax plate tectonics by casually and serendipitously pulling a stick across a container of molten wax his wife and daughters had used in making candles. Brune and his student D. Oldenburg followed up and mirabile dictu published the results in Science (178, 301-304).

  9. Parallel execution of Lisp programs. Doctoral thesis

    SciTech Connect

    Weening, J.S.

    1989-06-01

    This dissertation considers several issues in the execution of Lisp programs on shared-memory multiprocessors. An overview of constructs for explicit parallelism in Lisp is first presented. The problem of partitioning a program into process and scheduling these processes are then described, and a number of methods for performing these are proposed. These include cutting off process creation based on properties of the computation tree of the program, and basing partitioning decisions on the state of the system at runtime instead of the program. An experimental study of these methods has been performed using a simulator for parallel Lisp. This is followed by a description of the experiments that were performed and an analysis of the results. Two programs are used as illustrations-a Fast Fourier Transform, which has an abundance of parallelism, and the Cocke-Younger-Kasami parsing algorithm, for which good speedup is not as easy to obtain. The difficulty of using cutoff-based partitioning methods, and the differences between varios scheduling methods, are shown. A combination of partitioning and scheduling methods which we call dynamic partitioning is analyzed in more detail. This method is based on examining the machine's runtime state; it requires that the programmer only identify parallelism in the program, without deciding which potential parallelism is actually useful. We conclude that for programs whose computation trees have small height relative to their total size, dynamic partitioning can achieve asymptotically minimal overhead in the cost of process creation.

  10. Applications of Parallel Processing in Configuration Analyses

    NASA Technical Reports Server (NTRS)

    Sundaram, Ppchuraman; Hager, James O.; Biedron, Robert T.

    1999-01-01

    The paper presents the recent progress made towards developing an efficient and user-friendly parallel environment for routine analysis of large CFD problems. The coarse-grain parallel version of the CFL3D Euler/Navier-Stokes analysis code, CFL3Dhp, has been ported onto most available parallel platforms. The CFL3Dhp solution accuracy on these parallel platforms has been verified with the CFL3D sequential analyses. User-friendly pre- and post-processing tools that enable a seamless transfer from sequential to parallel processing have been written. Static load balancing tool for CFL3Dhp analysis has also been implemented for achieving good parallel efficiency. For large problems, load balancing efficiency as high as 95% can be achieved even when large number of processors are used. Linear scalability of the CFL3Dhp code with increasing number of processors has also been shown using a large installed transonic nozzle boattail analysis. To highlight the fast turn-around time of parallel processing, the TCA full configuration in sideslip Navier-Stokes drag polar at supersonic cruise has been obtained in a day. CFL3Dhp is currently being used as a production analysis tool.

  11. Portable parallel programming in a Fortran environment

    SciTech Connect

    May, E.N.

    1989-01-01

    Experience using the Argonne-developed PARMACs macro package to implement a portable parallel programming environment is described. Fortran programs with intrinsic parallelism of coarse and medium granularity are easily converted to parallel programs which are portable among a number of commercially available parallel processors in the class of shared-memory bus-based and local-memory network based MIMD processors. The parallelism is implemented using standard UNIX (tm) tools and a small number of easily understood synchronization concepts (monitors and message-passing techniques) to construct and coordinate multiple cooperating processes on one or many processors. Benchmark results are presented for parallel computers such as the Alliant FX/8, the Encore MultiMax, the Sequent Balance, the Intel iPSC/2 Hypercube and a network of Sun 3 workstations. These parallel machines are typical MIMD types with from 8 to 30 processors, each rated at from 1 to 10 MIPS processing power. The demonstration code used for this work is a Monte Carlo simulation of the response to photons of a ''nearly realistic'' lead, iron and plastic electromagnetic and hadronic calorimeter, using the EGS4 code system. 6 refs., 2 figs., 2 tabs.

  12. Performance issues for engineering analysis on MIMD parallel computers

    SciTech Connect

    Fang, H.E.; Vaughan, C.T.; Gardner, D.R.

    1994-08-01

    We discuss how engineering analysts can obtain greater computational resolution in a more timely manner from applications codes running on MIMD parallel computers. Both processor speed and memory capacity are important to achieving better performance than a serial vector supercomputer. To obtain good performance, a parallel applications code must be scalable. In addition, the aspect ratios of the subdomains in the decomposition of the simulation domain onto the parallel computer should be of order 1. We demonstrate these conclusions using simulations conducted with the PCTH shock wave physics code running on a Cray Y-MP, a 1024-node nCUBE 2, and an 1840-node Paragon.

  13. Parallel distributed computing using Python

    NASA Astrophysics Data System (ADS)

    Dalcin, Lisandro D.; Paz, Rodrigo R.; Kler, Pablo A.; Cosimo, Alejandro

    2011-09-01

    This work presents two software components aimed to relieve the costs of accessing high-performance parallel computing resources within a Python programming environment: MPI for Python and PETSc for Python. MPI for Python is a general-purpose Python package that provides bindings for the Message Passing Interface (MPI) standard using any back-end MPI implementation. Its facilities allow parallel Python programs to easily exploit multiple processors using the message passing paradigm. PETSc for Python provides access to the Portable, Extensible Toolkit for Scientific Computation (PETSc) libraries. Its facilities allow sequential and parallel Python applications to exploit state of the art algorithms and data structures readily available in PETSc for the solution of large-scale problems in science and engineering. MPI for Python and PETSc for Python are fully integrated to PETSc-FEM, an MPI and PETSc based parallel, multiphysics, finite elements code developed at CIMEC laboratory. This software infrastructure supports research activities related to simulation of fluid flows with applications ranging from the design of microfluidic devices for biochemical analysis to modeling of large-scale stream/aquifer interactions.

  14. The Needs of Trees

    ERIC Educational Resources Information Center

    Boyd, Amy E.; Cooper, Jim

    2004-01-01

    Tree rings can be used not only to look at plant growth, but also to make connections between plant growth and resource availability. In this lesson, students in 2nd-4th grades use role-play to become familiar with basic requirements of trees and how availability of those resources is related to tree ring sizes and tree growth. These concepts can…

  15. Sharing code.

    PubMed

    Kubilius, Jonas

    2014-01-01

    Sharing code is becoming increasingly important in the wake of Open Science. In this review I describe and compare two popular code-sharing utilities, GitHub and Open Science Framework (OSF). GitHub is a mature, industry-standard tool but lacks focus towards researchers. In comparison, OSF offers a one-stop solution for researchers but a lot of functionality is still under development. I conclude by listing alternative lesser-known tools for code and materials sharing.

  16. Sussing merger trees: stability and convergence

    NASA Astrophysics Data System (ADS)

    Wang, Yang; Pearce, Frazer R.; Knebe, Alexander; Schneider, Aurel; Srisawat, Chaichalit; Tweed, Dylan; Jung, Intae; Han, Jiaxin; Helly, John; Onions, Julian; Elahi, Pascal J.; Thomas, Peter A.; Behroozi, Peter; Yi, Sukyoung K.; Rodriguez-Gomez, Vicente; Mao, Yao-Yuan; Jing, Yipeng; Lin, Weipeng

    2016-06-01

    Merger trees are routinely used to follow the growth and merging history of dark matter haloes and subhaloes in simulations of cosmic structure formation. Srisawat et al. compared a wide range of merger-tree-building codes. Here we test the influence of output strategies and mass resolution on tree-building. We find that, somewhat surprisingly, building the tree from more snapshots does not generally produce more complete trees; instead, it tends to shorten them. Significant improvements are seen for patching schemes that attempt to bridge over occasional dropouts in the underlying halo catalogues or schemes that combine the halo-finding and tree-building steps seamlessly. The adopted output strategy does not affect the average number of branches (bushiness) of the resultant merger trees. However, mass resolution has an influence on both main branch length and the bushiness. As the resolution increases, a halo with the same mass can be traced back further in time and will encounter more small progenitors during its evolutionary history. Given these results, we recommend that, for simulations intended as precursors for galaxy formation models where of the order of 100 or more snapshots are analysed, the tree-building routine should be integrated with the halo finder, or at the very least be able to patch over multiple adjacent snapshots.

  17. Status and Verification of Edge Plasma Turbulence Code BOUT

    SciTech Connect

    Umansky, M V; Xu, X Q; Dudson, B; LoDestro, L L; Myra, J R

    2009-01-08

    The BOUT code is a detailed numerical model of tokamak edge turbulence based on collisional plasma uid equations. BOUT solves for time evolution of plasma uid variables: plasma density N{sub i}, parallel ion velocity V{sub {parallel}i}, electron temperature T{sub e}, ion temperature T{sub i}, electric potential {phi}, parallel current j{sub {parallel}}, and parallel vector potential A{sub {parallel}}, in realistic 3D divertor tokamak geometry. The current status of the code, physics model, algorithms, and implementation is described. Results of verification testing are presented along with illustrative applications to tokamak edge turbulence.

  18. MPP parallel forth

    NASA Technical Reports Server (NTRS)

    Dorband, John E.

    1987-01-01

    Massively Parallel Processor (MPP) Parallel FORTH is a derivative of FORTH-83 and Unified Software Systems' Uni-FORTH. The extension of FORTH into the realm of parallel processing on the MPP is described. With few exceptions, Parallel FORTH was made to follow the description of Uni-FORTH as closely as possible. Likewise, the parallel FORTH extensions were designed as philosophically similar to serial FORTH as possible. The MPP hardware characteristics, as viewed by the FORTH programmer, is discussed. Then a description is presented of how parallel FORTH is implemented on the MPP.

  19. Finite Element Analysis Code

    2006-03-08

    MAPVAR-KD is designed to transfer solution results from one finite element mesh to another. MAPVAR-KD draws heavily from the structure and coding of MERLIN II, but it employs a new finite element data base, EXODUS II, and offers enhanced speed and new capabilities not available in MERLIN II. In keeping with the MERLIN II documentation, the computational algorithms used in MAPVAR-KD are described. User instructions are presented. Example problems are included to demonstrate the operationmore » of the code and the effects of various input options. MAPVAR-KD is a modification of MAPVAR in which the search algorithm was replaced by a kd-tree-based search for better performance on large problems.« less

  20. Finite Element Analysis Code

    SciTech Connect

    Sjaardema, G.; Wellman, G.; Gartling, D.

    2006-03-08

    MAPVAR-KD is designed to transfer solution results from one finite element mesh to another. MAPVAR-KD draws heavily from the structure and coding of MERLIN II, but it employs a new finite element data base, EXODUS II, and offers enhanced speed and new capabilities not available in MERLIN II. In keeping with the MERLIN II documentation, the computational algorithms used in MAPVAR-KD are described. User instructions are presented. Example problems are included to demonstrate the operation of the code and the effects of various input options. MAPVAR-KD is a modification of MAPVAR in which the search algorithm was replaced by a kd-tree-based search for better performance on large problems.

  1. Computational fluid dynamics on a massively parallel computer

    NASA Technical Reports Server (NTRS)

    Jespersen, Dennis C.; Levit, Creon

    1989-01-01

    A finite difference code was implemented for the compressible Navier-Stokes equations on the Connection Machine, a massively parallel computer. The code is based on the ARC2D/ARC3D program and uses the implicit factored algorithm of Beam and Warming. The codes uses odd-even elimination to solve linear systems. Timings and computation rates are given for the code, and a comparison is made with a Cray XMP.

  2. Parallel programming interface for distributed data

    NASA Astrophysics Data System (ADS)

    Wang, Manhui; May, Andrew J.; Knowles, Peter J.

    2009-12-01

    The Parallel Programming Interface for Distributed Data (PPIDD) library provides an interface, suitable for use in parallel scientific applications, that delivers communications and global data management. The library can be built either using the Global Arrays (GA) toolkit, or a standard MPI-2 library. This abstraction allows the programmer to write portable parallel codes that can utilise the best, or only, communications library that is available on a particular computing platform. Program summaryProgram title: PPIDD Catalogue identifier: AEEF_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEEF_1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 17 698 No. of bytes in distributed program, including test data, etc.: 166 173 Distribution format: tar.gz Programming language: Fortran, C Computer: Many parallel systems Operating system: Various Has the code been vectorised or parallelized?: Yes. 2-256 processors used RAM: 50 Mbytes Classification: 6.5 External routines: Global Arrays or MPI-2 Nature of problem: Many scientific applications require management and communication of data that is global, and the standard MPI-2 protocol provides only low-level methods for the required one-sided remote memory access. Solution method: The Parallel Programming Interface for Distributed Data (PPIDD) library provides an interface, suitable for use in parallel scientific applications, that delivers communications and global data management. The library can be built either using the Global Arrays (GA) toolkit, or a standard MPI-2 library. This abstraction allows the programmer to write portable parallel codes that can utilise the best, or only, communications library that is available on a particular computing platform. Running time: Problem dependent. The test provided with

  3. Parallel community climate model: Description and user`s guide

    SciTech Connect

    Drake, J.B.; Flanery, R.E.; Semeraro, B.D.; Worley, P.H.

    1996-07-15

    This report gives an overview of a parallel version of the NCAR Community Climate Model, CCM2, implemented for MIMD massively parallel computers using a message-passing programming paradigm. The parallel implementation was developed on an Intel iPSC/860 with 128 processors and on the Intel Delta with 512 processors, and the initial target platform for the production version of the code is the Intel Paragon with 2048 processors. Because the implementation uses a standard, portable message-passing libraries, the code has been easily ported to other multiprocessors supporting a message-passing programming paradigm. The parallelization strategy used is to decompose the problem domain into geographical patches and assign each processor the computation associated with a distinct subset of the patches. With this decomposition, the physics calculations involve only grid points and data local to a processor and are performed in parallel. Using parallel algorithms developed for the semi-Lagrangian transport, the fast Fourier transform and the Legendre transform, both physics and dynamics are computed in parallel with minimal data movement and modest change to the original CCM2 source code. Sequential or parallel history tapes are written and input files (in history tape format) are read sequentially by the parallel code to promote compatibility with production use of the model on other computer systems. A validation exercise has been performed with the parallel code and is detailed along with some performance numbers on the Intel Paragon and the IBM SP2. A discussion of reproducibility of results is included. A user`s guide for the PCCM2 version 2.1 on the various parallel machines completes the report. Procedures for compilation, setup and execution are given. A discussion of code internals is included for those who may wish to modify and use the program in their own research.

  4. Distributed game-tree searching

    SciTech Connect

    Schaeffer, J. )

    1989-02-01

    Conventional parallelizations of the alpha-beta ({alpha}{beta}) algorithm have met with limited success. Implementations suffer primarily from the synchronization and search overheads of parallelization. This paper describes a parallel {alpha}{beta} searching program that achieves high performance through the use of four different types of processes: Controllers, Searchers, Table Managers, and Scouts. Synchronization is reduced by having Controller process reassigning idle processes to help out busy ones. Search overhead is reduced by having two types of parallel table management: global Table Managers and the periodic merging and redistribution of local tables. Experiments show that nine processors can achieve 5.67-fold speedups but beyond that, additional processors provide diminishing returns. Given that additional resources are of little benefit, speculative computing is introduced as a means of extending the effective number of processors that can be utilized. Scout processes speculatively search ahead in the tree looking for interesting features and communicate this information back to the {alpha}{beta} program. In this way, the effective search depth is extended. These ideas have been tested experimentally and empirically as part of the chess program ParaPhoenix.

  5. Massively parallel computational fluid dynamics calculations for aerodynamics and aerothermodynamics applications

    SciTech Connect

    Payne, J.L.; Hassan, B.

    1998-09-01

    Massively parallel computers have enabled the analyst to solve complicated flow fields (turbulent, chemically reacting) that were previously intractable. Calculations are presented using a massively parallel CFD code called SACCARA (Sandia Advanced Code for Compressible Aerothermodynamics Research and Analysis) currently under development at Sandia National Laboratories as part of the Department of Energy (DOE) Accelerated Strategic Computing Initiative (ASCI). Computations were made on a generic reentry vehicle in a hypersonic flowfield utilizing three different distributed parallel computers to assess the parallel efficiency of the code with increasing numbers of processors. The parallel efficiencies for the SACCARA code will be presented for cases using 1, 150, 100 and 500 processors. Computations were also made on a subsonic/transonic vehicle using both 236 and 521 processors on a grid containing approximately 14.7 million grid points. Ongoing and future plans to implement a parallel overset grid capability and couple SACCARA with other mechanics codes in a massively parallel environment are discussed.

  6. Parallel flow diffusion battery

    DOEpatents

    Yeh, H.C.; Cheng, Y.S.

    1984-01-01

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  7. Parallel flow diffusion battery

    DOEpatents

    Yeh, Hsu-Chi; Cheng, Yung-Sung

    1984-08-07

    A parallel flow diffusion battery for determining the mass distribution of an aerosol has a plurality of diffusion cells mounted in parallel to an aerosol stream, each diffusion cell including a stack of mesh wire screens of different density.

  8. Theory and practice of parallel direct optimization.

    PubMed

    Janies, Daniel A; Wheeler, Ward C

    2002-01-01

    Our ability to collect and distribute genomic and other biological data is growing at a staggering rate (Pagel, 1999). However, the synthesis of these data into knowledge of evolution is incomplete. Phylogenetic systematics provides a unifying intellectual approach to understanding evolution but presents formidable computational challenges. A fundamental goal of systematics, the generation of evolutionary trees, is typically approached as two distinct NP-complete problems: multiple sequence alignment and phylogenetic tree search. The number of cells in a multiple alignment matrix are exponentially related to sequence length. In addition, the number of evolutionary trees expands combinatorially with respect to the number of organisms or sequences to be examined. Biologically interesting datasets are currently comprised of hundreds of taxa and thousands of nucleotides and morphological characters. This standard will continue to grow with the advent of highly automated sequencing and development of character databases. Three areas of innovation are changing how evolutionary computation can be addressed: (1) novel concepts for determination of sequence homology, (2) heuristics and shortcuts in tree-search algorithms, and (3) parallel computing. In this paper and the online software documentation we describe the basic usage of parallel direct optimization as implemented in the software POY (ftp://ftp.amnh.org/pub/molecular/poy).

  9. A distributed particle simulation code in C++

    SciTech Connect

    Forslund, D.W.; Wingate, C.A.; Ford, P.S.; Junkins, J.S.; Pope, S.C.

    1992-03-01

    Although C++ has been successfully used in a variety of computer science applications, it has just recently begun to be used in scientific applications. We have found that the object-oriented properties of C++ lend themselves well to scientific computations by making maintenance of the code easier, by making the code easier to understand, and by providing a better paradigm for distributed memory parallel codes. We describe here aspects of developing a particle plasma simulation code using object-oriented techniques for use in a distributed computing environment. We initially designed and implemented the code for serial computation and then used the distributed programming toolkit ISIS to run it in parallel. In this connection we describe some of the difficulties presented by using C++ for doing parallel and scientific computation.

  10. Parallel simulation today

    NASA Technical Reports Server (NTRS)

    Nicol, David; Fujimoto, Richard

    1992-01-01

    This paper surveys topics that presently define the state of the art in parallel simulation. Included in the tutorial are discussions on new protocols, mathematical performance analysis, time parallelism, hardware support for parallel simulation, load balancing algorithms, and dynamic memory management for optimistic synchronization.

  11. Verbal and Visual Parallelism

    ERIC Educational Resources Information Center

    Fahnestock, Jeanne

    2003-01-01

    This study investigates the practice of presenting multiple supporting examples in parallel form. The elements of parallelism and its use in argument were first illustrated by Aristotle. Although real texts may depart from the ideal form for presenting multiple examples, rhetorical theory offers a rationale for minimal, parallel presentation. The…

  12. Research in Parallel Algorithms and Software for Computational Aerosciences

    NASA Technical Reports Server (NTRS)

    Domel, Neal D.

    1996-01-01

    Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.

  13. Research in Parallel Algorithms and Software for Computational Aerosciences

    NASA Technical Reports Server (NTRS)

    Domel, Neal D.

    1996-01-01

    Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.

  14. Parallel versus Sequential Processing of Pictures and Words

    ERIC Educational Resources Information Center

    Snodgrass, Joan Gay; Antone, George

    1974-01-01

    The purpose of this experiment was to test a proposal by Paivio (1971) that visual memory images are specialized for parallel or spatiol processing, whereas verbal memory codes are specialized for sequential or temporal processing. (Author)

  15. The EMCC / DARPA Massively Parallel Electromagnetic Scattering Project

    NASA Technical Reports Server (NTRS)

    Woo, Alex C.; Hill, Kueichien C.

    1996-01-01

    The Electromagnetic Code Consortium (EMCC) was sponsored by the Advanced Research Program Agency (ARPA) to demonstrate the effectiveness of massively parallel computing in large scale radar signature predictions. The EMCC/ARPA project consisted of three parts.

  16. LEWICE droplet trajectory calculations on a parallel computer

    NASA Technical Reports Server (NTRS)

    Caruso, Steven C.

    1993-01-01

    A parallel computer implementation (128 processors) of LEWICE, a NASA Lewis code used to predict the time-dependent ice accretion process for two-dimensional aerodynamic bodies of simple geometries, is described. Two-dimensional parallel droplet trajectory calculations are performed to demonstrate the potential benefits of applying parallel processing to ice accretion analysis. Parallel performance is evaluated as a function of the number of trajectories and the number of processors. For comparison, similar trajectory calculations are performed on single-processor Cray computers, and the best parallel results are found to be 33 and 23 times faster, respectively, than those of the Cray XMP and YMP.

  17. Translating network models to parallel hardware in NEURON

    PubMed Central

    Hines, M.L.; Carnevale, N.T.

    2008-01-01

    The increasing complexity of network models poses a growing computational burden. At the same time, computational neuroscientists are finding it easier to access parallel hardware, such as multiprocessor personal computers, workstation clusters, and massively parallel supercomputers. The practical question is how to move a working network model from a single processor to parallel hardware. Here we show how to make this transition for models implemented with NEURON, in such a way that the final result will run and produce numerically identical results on either serial or parallel hardware. This allows users to develop and debug models on readily available local resources, then run their code without modification on a parallel supercomputer. PMID:17997162

  18. A parallelized Python based Multi-Point Thomson Scattering analysis in NSTX-U

    NASA Astrophysics Data System (ADS)

    Miller, Jared; Diallo, Ahmed; Leblanc, Benoit

    2014-10-01

    Multi-Point Thomson Scattering (MPTS) is a reliable and accurate method of finding the temperature, density, and pressure of a magnetically confined plasma. Nd:YAG (1064 nm) lasers are fired into the plasma with a frequency of 60 Hz, and the light is Doppler shifted by Thomson scattering. Polychromators on the midplane of the tokamak pick up the light at various radii/scattering angles, and the avalanche photodiode's voltages are added to an MDSplus tree for later analysis. This project ports and optimizes the prior serial IDL MPTS code into a well-documented Python package that runs in parallel. Since there are 30 polychromators in the current NSTX setup (12 more will be added when NSTX-U is completed), using parallelism offers vast savings in performance. NumPy and SciPy further accelerate numerical calculations and matrix operations, Matplotlib and PyQt make an intuitive GUI with plots of the output, and Multiprocessing parallelizes the computationally intensive calculations. The Python package was designed with portability and flexibility in mind so it can be adapted for use in any polychromator-based MPTS system.

  19. Fault-Tree Compiler

    NASA Technical Reports Server (NTRS)

    Butler, Ricky W.; Boerschlein, David P.

    1993-01-01

    Fault-Tree Compiler (FTC) program, is software tool used to calculate probability of top event in fault tree. Gates of five different types allowed in fault tree: AND, OR, EXCLUSIVE OR, INVERT, and M OF N. High-level input language easy to understand and use. In addition, program supports hierarchical fault-tree definition feature, which simplifies tree-description process and reduces execution time. Set of programs created forming basis for reliability-analysis workstation: SURE, ASSIST, PAWS/STEM, and FTC fault-tree tool (LAR-14586). Written in PASCAL, ANSI-compliant C language, and FORTRAN 77. Other versions available upon request.

  20. Optimal parallel solution of sparse triangular systems

    NASA Technical Reports Server (NTRS)

    Alvarado, Fernando L.; Schreiber, Robert

    1990-01-01

    A method for the parallel solution of triangular sets of equations is described that is appropriate when there are many right-handed sides. By preprocessing, the method can reduce the number of parallel steps required to solve Lx = b compared to parallel forward or backsolve. Applications are to iterative solvers with triangular preconditioners, to structural analysis, or to power systems applications, where there may be many right-handed sides (not all available a priori). The inverse of L is represented as a product of sparse triangular factors. The problem is to find a factored representation of this inverse of L with the smallest number of factors (or partitions), subject to the requirement that no new nonzero elements be created in the formation of these inverse factors. A method from an earlier reference is shown to solve this problem. This method is improved upon by constructing a permutation of the rows and columns of L that preserves triangularity and allow for the best possible such partition. A number of practical examples and algorithmic details are presented. The parallelism attainable is illustrated by means of elimination trees and clique trees.

  1. Locating hardware faults in a data communications network of a parallel computer

    DOEpatents

    Archer, Charles J.; Megerian, Mark G.; Ratterman, Joseph D.; Smith, Brian E.

    2010-01-12

    Hardware faults location in a data communications network of a parallel computer. Such a parallel computer includes a plurality of compute nodes and a data communications network that couples the compute nodes for data communications and organizes the compute node as a tree. Locating hardware faults includes identifying a next compute node as a parent node and a root of a parent test tree, identifying for each child compute node of the parent node a child test tree having the child compute node as root, running a same test suite on the parent test tree and each child test tree, and identifying the parent compute node as having a defective link connected from the parent compute node to a child compute node if the test suite fails on the parent test tree and succeeds on all the child test trees.

  2. Efficient Helicopter Aerodynamic and Aeroacoustic Predictions on Parallel Computers

    NASA Technical Reports Server (NTRS)

    Wissink, Andrew M.; Lyrintzis, Anastasios S.; Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak

    1996-01-01

    This paper presents parallel implementations of two codes used in a combined CFD/Kirchhoff methodology to predict the aerodynamics and aeroacoustics properties of helicopters. The rotorcraft Navier-Stokes code, TURNS, computes the aerodynamic flowfield near the helicopter blades and the Kirchhoff acoustics code computes the noise in the far field, using the TURNS solution as input. The overall parallel strategy adds MPI message passing calls to the existing serial codes to allow for communication between processors. As a result, the total code modifications required for parallel execution are relatively small. The biggest bottleneck in running the TURNS code in parallel comes from the LU-SGS algorithm that solves the implicit system of equations. We use a new hybrid domain decomposition implementation of LU-SGS to obtain good parallel performance on the SP-2. TURNS demonstrates excellent parallel speedups for quasi-steady and unsteady three-dimensional calculations of a helicopter blade in forward flight. The execution rate attained by the code on 114 processors is six times faster than the same cases run on one processor of the Cray C-90. The parallel Kirchhoff code also shows excellent parallel speedups and fast execution rates. As a performance demonstration, unsteady acoustic pressures are computed at 1886 far-field observer locations for a sample acoustics problem. The calculation requires over two hundred hours of CPU time on one C-90 processor but takes only a few hours on 80 processors of the SP2. The resultant far-field acoustic field is analyzed with state of-the-art audio and video rendering of the propagating acoustic signals.

  3. Shift: A Massively Parallel Monte Carlo Radiation Transport Package

    SciTech Connect

    Pandya, Tara M; Johnson, Seth R; Davidson, Gregory G; Evans, Thomas M; Hamilton, Steven P

    2015-01-01

    This paper discusses the massively-parallel Monte Carlo radiation transport package, Shift, developed at Oak Ridge National Laboratory. It reviews the capabilities, implementation, and parallel performance of this code package. Scaling results demonstrate very good strong and weak scaling behavior of the implemented algorithms. Benchmark results from various reactor problems show that Shift results compare well to other contemporary Monte Carlo codes and experimental results.

  4. An object-oriented approach to nested data parallelism

    NASA Technical Reports Server (NTRS)

    Sheffler, Thomas J.; Chatterjee, Siddhartha

    1994-01-01

    This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called 'collections' and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for 'nested data parallelism.' Few current programming languages support nested data parallelism however. In an object-oriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an object-oriented data-parallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a 'vector' that is implemented as a template. Only one new language feature is introduced: the 'foreach' construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested data-parallel C++ into ordinary C++. Extracting the potential parallelism in nested 'foreach' constructs is called 'flattening' nested parallelism. We show how to flatten 'foreach' constructs using a simple program transformation. Our prototype system produces vector code which has been successfully run on workstations, a CM-2, and a CM-5.

  5. TreSpEx—Detection of Misleading Signal in Phylogenetic Reconstructions Based on Tree Information

    PubMed Central

    Struck, Torsten H

    2014-01-01

    Phylogenies of species or genes are commonplace nowadays in many areas of comparative biological studies. However, for phylogenetic reconstructions one must refer to artificial signals such as paralogy, long-branch attraction, saturation, or conflict between different datasets. These signals might eventually mislead the reconstruction even in phylogenomic studies employing hundreds of genes. Unfortunately, there has been no program allowing the detection of such effects in combination with an implementation into automatic process pipelines. TreSpEx (Tree Space Explorer) now combines different approaches (including statistical tests), which utilize tree-based information like nodal support or patristic distances (PDs) to identify misleading signals. The program enables the parallel analysis of hundreds of trees and/or predefined gene partitions, and being command-line driven, it can be integrated into automatic process pipelines. TreSpEx is implemented in Perl and supported on Linux, Mac OS X, and MS Windows. Source code, binaries, and additional material are freely available at http://www.annelida.de/research/bioinformatics/software.html. PMID:24701118

  6. Xyce parallel electronic simulator design.

    SciTech Connect

    Thornquist, Heidi K.; Rankin, Eric Lamont; Mei, Ting; Schiek, Richard Louis; Keiter, Eric Richard; Russo, Thomas V.

    2010-09-01

    This document is the Xyce Circuit Simulator developer guide. Xyce has been designed from the 'ground up' to be a SPICE-compatible, distributed memory parallel circuit simulator. While it is in many respects a research code, Xyce is intended to be a production simulator. As such, having software quality engineering (SQE) procedures in place to insure a high level of code quality and robustness are essential. Version control, issue tracking customer support, C++ style guildlines and the Xyce release process are all described. The Xyce Parallel Electronic Simulator has been under development at Sandia since 1999. Historically, Xyce has mostly been funded by ASC, the original focus of Xyce development has primarily been related to circuits for nuclear weapons. However, this has not been the only focus and it is expected that the project will diversify. Like many ASC projects, Xyce is a group development effort, which involves a number of researchers, engineers, scientists, mathmaticians and computer scientists. In addition to diversity of background, it is to be expected on long term projects for there to be a certain amount of staff turnover, as people move on to different projects. As a result, it is very important that the project maintain high software quality standards. The point of this document is to formally document a number of the software quality practices followed by the Xyce team in one place. Also, it is hoped that this document will be a good source of information for new developers.

  7. The NIMROD Code

    NASA Astrophysics Data System (ADS)

    Schnack, D. D.; Glasser, A. H.

    1996-11-01

    NIMROD is a new code system that is being developed for the analysis of modern fusion experiments. It is being designed from the beginning to make the maximum use of massively parallel computer architectures and computer graphics. The NIMROD physics kernel solves the three-dimensional, time-dependent two-fluid equations with neo-classical effects in toroidal geometry of arbitrary poloidal cross section. The NIMROD system also includes a pre-processor, a grid generator, and a post processor. User interaction with NIMROD is facilitated by a modern graphical user interface (GUI). The NIMROD project is using Quality Function Deployment (QFD) team management techniques to minimize re-engineering and reduce code development time. This paper gives an overview of the NIMROD project. Operation of the GUI is demonstrated, and the first results from the physics kernel are given.

  8. A simple double error correcting BCH codes

    NASA Astrophysics Data System (ADS)

    Sinha, V.

    1983-07-01

    With the availability of various cost effective digital hardware components, error correcting codes are realized in hardware in simpler fashion than was hitherto possible. Instead of computing error locations in BCH decoding by Berklekamp algorith, syndrome to error location mapping using an EPROM for double error correcting BCH code is described. The processing is parallel instead of serial. Possible applications are given.

  9. Highly parallel sparse Cholesky factorization

    NASA Technical Reports Server (NTRS)

    Gilbert, John R.; Schreiber, Robert

    1990-01-01

    Several fine grained parallel algorithms were developed and compared to compute the Cholesky factorization of a sparse matrix. The experimental implementations are on the Connection Machine, a distributed memory SIMD machine whose programming model conceptually supplies one processor per data element. In contrast to special purpose algorithms in which the matrix structure conforms to the connection structure of the machine, the focus is on matrices with arbitrary sparsity structure. The most promising algorithm is one whose inner loop performs several dense factorizations simultaneously on a 2-D grid of processors. Virtually any massively parallel dense factorization algorithm can be used as the key subroutine. The sparse code attains execution rates comparable to those of the dense subroutine. Although at present architectural limitations prevent the dense factorization from realizing its potential efficiency, it is concluded that a regular data parallel architecture can be used efficiently to solve arbitrarily structured sparse problems. A performance model is also presented and it is used to analyze the algorithms.

  10. Parallel programming with PCN. Revision 1

    SciTech Connect

    Foster, I.; Tuecke, S.

    1991-12-01

    PCN is a system for developing and executing parallel programs. It comprises a high-level programming language, tools for developing and debugging programs in this language, and interfaces to Fortran and C that allow the reuse of existing code in multilingual parallel programs. Programs developed using PCN are portable across many different workstations, networks, and parallel computers. This document provides all the information required to develop parallel programs with the PCN programming system. In includes both tutorial and reference material. It also presents the basic concepts that underly PCN, particularly where these are likely to be unfamiliar to the reader, and provides pointers to other documentation on the PCN language, programming techniques, and tools. PCN is in the public domain. The latest version of both the software and this manual can be obtained by anonymous FTP from Argonne National Laboratory in the directory pub/pcn at info.mcs.anl.gov (c.f. Appendix A).

  11. NAS Parallel Benchmarks, Multi-Zone Versions

    NASA Technical Reports Server (NTRS)

    vanderWijngaart, Rob F.; Haopiang, Jin

    2003-01-01

    We describe an extension of the NAS Parallel Benchmarks (NPB) suite that involves solving the application benchmarks LU, BT and SP on collections of loosely coupled discretization meshes. The solutions on the meshes are updated independently, but after each time step they exchange boundary value information. This strategy, which is common among structured-mesh production flow solver codes in use at NASA Ames and elsewhere, provides relatively easily exploitable coarse-grain parallelism between meshes. Since the individual application benchmarks also allow fine-grain parallelism themselves, this NPB extension, named NPB Multi-Zone (NPB-MZ), is a good candidate for testing hybrid and multi-level parallelization tools and strategies.

  12. Chem-Is-Tree.

    ERIC Educational Resources Information Center

    Barry, Dana M.

    1997-01-01

    Provides details on the chemical composition of trees including a definition of wood. Also includes an activity on anthocyanins as well as a discussion of the resistance of wood to solvents and chemicals. Lists interesting products from trees. (DDR)

  13. Tree Classification Software

    NASA Technical Reports Server (NTRS)

    Buntine, Wray

    1993-01-01

    This paper introduces the IND Tree Package to prospective users. IND does supervised learning using classification trees. This learning task is a basic tool used in the development of diagnosis, monitoring and expert systems. The IND Tree Package was developed as part of a NASA project to semi-automate the development of data analysis and modelling algorithms using artificial intelligence techniques. The IND Tree Package integrates features from CART and C4 with newer Bayesian and minimum encoding methods for growing classification trees and graphs. The IND Tree Package also provides an experimental control suite on top. The newer features give improved probability estimates often required in diagnostic and screening tasks. The package comes with a manual, Unix 'man' entries, and a guide to tree methods and research. The IND Tree Package is implemented in C under Unix and was beta-tested at university and commercial research laboratories in the United States.

  14. Parallel indexing technique for spatio-temporal data

    NASA Astrophysics Data System (ADS)

    He, Zhenwen; Kraak, Menno-Jan; Huisman, Otto; Ma, Xiaogang; Xiao, Jing

    2013-04-01

    The requirements for efficient access and management of massive multi-dimensional spatio-temporal data in geographical information system and its applications are well recognized and researched. The most popular spatio-temporal access method is the R-Tree and its variants. However, it is difficult to use them for parallel access to multi-dimensional spatio-temporal data because R-Trees, and variants thereof, are in hierarchical structures which have severe overlapping problems in high dimensional space. We extended a two-dimensional interval space representation of intervals to a multi-dimensional parallel space, and present a set of formulae to transform spatio-temporal queries into parallel interval set operations. This transformation reduces problems of multi-dimensional object relationships to simpler two-dimensional spatial intersection problems. Experimental results show that the new parallel approach presented in this paper has superior range query performance than R*-trees for handling multi-dimensional spatio-temporal data and multi-dimensional interval data. When the number of CPU cores is larger than that of the space dimensions, the insertion performance of this new approach is also superior to R*-trees. The proposed approach provides a potential parallel indexing solution for fast data retrieval of massive four-dimensional or higher dimensional spatio-temporal data.

  15. Code Optimization Techniques

    SciTech Connect

    MAGEE,GLEN I.

    2000-08-03

    Computers transfer data in a number of different ways. Whether through a serial port, a parallel port, over a modem, over an ethernet cable, or internally from a hard disk to memory, some data will be lost. To compensate for that loss, numerous error detection and correction algorithms have been developed. One of the most common error correction codes is the Reed-Solomon code, which is a special subset of BCH (Bose-Chaudhuri-Hocquenghem) linear cyclic block codes. In the AURA project, an unmanned aircraft sends the data it collects back to earth so it can be analyzed during flight and possible flight modifications made. To counter possible data corruption during transmission, the data is encoded using a multi-block Reed-Solomon implementation with a possibly shortened final block. In order to maximize the amount of data transmitted, it was necessary to reduce the computation time of a Reed-Solomon encoding to three percent of the processor's time. To achieve such a reduction, many code optimization techniques were employed. This paper outlines the steps taken to reduce the processing time of a Reed-Solomon encoding and the insight into modern optimization techniques gained from the experience.

  16. Illumination Under Trees

    SciTech Connect

    Max, N

    2002-08-19

    This paper is a survey of the author's work on illumination and shadows under trees, including the effects of sky illumination, sun penumbras, scattering in a misty atmosphere below the trees, and multiple scattering and transmission between leaves. It also describes a hierarchical image-based rendering method for trees.

  17. Winter Birch Trees

    ERIC Educational Resources Information Center

    Sweeney, Debra; Rounds, Judy

    2011-01-01

    Trees are great inspiration for artists. Many art teachers find themselves inspired and maybe somewhat obsessed with the natural beauty and elegance of the lofty tree, and how it changes through the seasons. One such tree that grows in several regions and always looks magnificent, regardless of the time of year, is the birch. In this article, the…

  18. Minnesota's Forest Trees. Revised.

    ERIC Educational Resources Information Center

    Miles, William R.; Fuller, Bruce L.

    This bulletin describes 46 of the more common trees found in Minnesota's forests and windbreaks. The bulletin contains two tree keys, a summer key and a winter key, to help the reader identify these trees. Besides the two keys, the bulletin includes an introduction, instructions for key use, illustrations of leaf characteristics and twig…

  19. The Wish Tree Project

    ERIC Educational Resources Information Center

    Brooks, Sarah DeWitt

    2010-01-01

    This article describes the author's experience in implementing a Wish Tree project in her school in an effort to bring the school community together with a positive art-making experience during a potentially stressful time. The concept of a wish tree is simple: plant a tree; provide tags and pencils for writing wishes; and encourage everyone to…

  20. Material model library for explicit numerical codes

    SciTech Connect

    Hofmann, R.; Dial, B.W.

    1982-08-01

    A material model logic structure has been developed which is useful for most explicit finite-difference and explicit finite-element Lagrange computer codes. This structure has been implemented and tested in the STEALTH codes to provide an example for researchers who wish to implement it in generically similar codes. In parallel with these models, material parameter libraries have been created for the implemented models for materials which are often needed in DoD applications.

  1. TRACKING CODE DEVELOPMENT FOR BEAM DYNAMICS OPTIMIZATION

    SciTech Connect

    Yang, L.

    2011-03-28

    Dynamic aperture (DA) optimization with direct particle tracking is a straight forward approach when the computing power is permitted. It can have various realistic errors included and is more close than theoretical estimations. In this approach, a fast and parallel tracking code could be very helpful. In this presentation, we describe an implementation of storage ring particle tracking code TESLA for beam dynamics optimization. It supports MPI based parallel computing and is robust as DA calculation engine. This code has been used in the NSLS-II dynamics optimizations and obtained promising performance.

  2. Parallel digital forensics infrastructure.

    SciTech Connect

    Liebrock, Lorie M.; Duggan, David Patrick

    2009-10-01

    This report documents the architecture and implementation of a Parallel Digital Forensics infrastructure. This infrastructure is necessary for supporting the design, implementation, and testing of new classes of parallel digital forensics tools. Digital Forensics has become extremely difficult with data sets of one terabyte and larger. The only way to overcome the processing time of these large sets is to identify and develop new parallel algorithms for performing the analysis. To support algorithm research, a flexible base infrastructure is required. A candidate architecture for this base infrastructure was designed, instantiated, and tested by this project, in collaboration with New Mexico Tech. Previous infrastructures were not designed and built specifically for the development and testing of parallel algorithms. With the size of forensics data sets only expected to increase significantly, this type of infrastructure support is necessary for continued research in parallel digital forensics. This report documents the implementation of the parallel digital forensics (PDF) infrastructure architecture and implementation.

  3. Speech coding

    SciTech Connect

    Ravishankar, C., Hughes Network Systems, Germantown, MD

    1998-05-08

    Speech is the predominant means of communication between human beings and since the invention of the telephone by Alexander Graham Bell in 1876, speech services have remained to be the core service in almost all telecommunication systems. Original analog methods of telephony had the disadvantage of speech signal getting corrupted by noise, cross-talk and distortion Long haul transmissions which use repeaters to compensate for the loss in signal strength on transmission links also increase the associated noise and distortion. On the other hand digital transmission is relatively immune to noise, cross-talk and distortion primarily because of the capability to faithfully regenerate digital signal at each repeater purely based on a binary decision. Hence end-to-end performance of the digital link essentially becomes independent of the length and operating frequency bands of the link Hence from a transmission point of view digital transmission has been the preferred approach due to its higher immunity to noise. The need to carry digital speech became extremely important from a service provision point of view as well. Modem requirements have introduced the need for robust, flexible and secure services that can carry a multitude of signal types (such as voice, data and video) without a fundamental change in infrastructure. Such a requirement could not have been easily met without the advent of digital transmission systems, thereby requiring speech to be coded digitally. The term Speech Coding is often referred to techniques that represent or code speech signals either directly as a waveform or as a set of parameters by analyzing the speech signal. In either case, the codes are transmitted to the distant end where speech is reconstructed or synthesized using the received set of codes. A more generic term that is applicable to these techniques that is often interchangeably used with speech coding is the term voice coding. This term is more generic in the sense that the

  4. A Spectrum Tree Kernel

    NASA Astrophysics Data System (ADS)

    Kuboyama, Tetsuji; Hirata, Kouichi; Kashima, Hisashi; F. Aoki-Kinoshita, Kiyoko; Yasuda, Hiroshi

    Learning from tree-structured data has received increasing interest with the rapid growth of tree-encodable data in the World Wide Web, in biology, and in other areas. Our kernel function measures the similarity between two trees by counting the number of shared sub-patterns called tree q-grams, and runs, in effect, in linear time with respect to the number of tree nodes. We apply our kernel function with a support vector machine (SVM) to classify biological data, the glycans of several blood components. The experimental results show that our kernel function performs as well as one exclusively tailored to glycan properties.

  5. PCLIPS: Parallel CLIPS

    NASA Technical Reports Server (NTRS)

    Hall, Lawrence O.; Bennett, Bonnie H.; Tello, Ivan

    1994-01-01

    A parallel version of CLIPS 5.1 has been developed to run on Intel Hypercubes. The user interface is the same as that for CLIPS with some added commands to allow for parallel calls. A complete version of CLIPS runs on each node of the hypercube. The system has been instrumented to display the time spent in the match, recognize, and act cycles on each node. Only rule-level parallelism is supported. Parallel commands enable the assertion and retraction of facts to/from remote nodes working memory. Parallel CLIPS was used to implement a knowledge-based command, control, communications, and intelligence (C(sup 3)I) system to demonstrate the fusion of high-level, disparate sources. We discuss the nature of the information fusion problem, our approach, and implementation. Parallel CLIPS has also be used to run several benchmark parallel knowledge bases such as one to set up a cafeteria. Results show from running Parallel CLIPS with parallel knowledge base partitions indicate that significant speed increases, including superlinear in some cases, are possible.

  6. Parallel MR Imaging

    PubMed Central

    Deshmane, Anagha; Gulani, Vikas; Griswold, Mark A.; Seiberlich, Nicole

    2015-01-01

    Parallel imaging is a robust method for accelerating the acquisition of magnetic resonance imaging (MRI) data, and has made possible many new applications of MR imaging. Parallel imaging works by acquiring a reduced amount of k-space data with an array of receiver coils. These undersampled data can be acquired more quickly, but the undersampling leads to aliased images. One of several parallel imaging algorithms can then be used to reconstruct artifact-free images from either the aliased images (SENSE-type reconstruction) or from the under-sampled data (GRAPPA-type reconstruction). The advantages of parallel imaging in a clinical setting include faster image acquisition, which can be used, for instance, to shorten breath-hold times resulting in fewer motion-corrupted examinations. In this article the basic concepts behind parallel imaging are introduced. The relationship between undersampling and aliasing is discussed and two commonly used parallel imaging methods, SENSE and GRAPPA, are explained in detail. Examples of artifacts arising from parallel imaging are shown and ways to detect and mitigate these artifacts are described. Finally, several current applications of parallel imaging are presented and recent advancements and promising research in parallel imaging are briefly reviewed. PMID:22696125

  7. How to write fast and clear parallel programs using algebra

    SciTech Connect

    Stiller, L. Johns Hopkins Univ., Baltimore, MD )

    1992-01-01

    An algebraic method for the design of efficient and easy to port codes for parallel machines is described. The method was applied to speed up and to clarify certain communication functions, n-body codes, a biomolecular analysis, and a chess problem.

  8. How to write fast and clear parallel programs using algebra

    SciTech Connect

    Stiller, L. |

    1992-10-01

    An algebraic method for the design of efficient and easy to port codes for parallel machines is described. The method was applied to speed up and to clarify certain communication functions, n-body codes, a biomolecular analysis, and a chess problem.

  9. A Comparison of Automatic Parallelization Tools/Compilers on the SGI Origin 2000 Using the NAS Benchmarks

    NASA Technical Reports Server (NTRS)

    Saini, Subhash; Frumkin, Michael; Hribar, Michelle; Jin, Hao-Qiang; Waheed, Abdul; Yan, Jerry

    1998-01-01

    Porting applications to new high performance parallel and distributed computing platforms is a challenging task. Since writing parallel code by hand is extremely time consuming and costly, porting codes would ideally be automated by using some parallelization tools and compilers. In this paper, we compare the performance of the hand written NAB Parallel Benchmarks against three parallel versions generated with the help of tools and compilers: 1) CAPTools: an interactive computer aided parallelization too] that generates message passing code, 2) the Portland Group's HPF compiler and 3) using compiler directives with the native FORTAN77 compiler on the SGI Origin2000.

  10. Growth of a Pine Tree

    ERIC Educational Resources Information Center

    Rollinson, Susan Wells

    2012-01-01

    The growth of a pine tree is examined by preparing "tree cookies" (cross-sectional disks) between whorls of branches. The use of Christmas trees allows the tree cookies to be obtained with inexpensive, commonly available tools. Students use the tree cookies to investigate the annual growth of the tree and how it corresponds to the number of whorls…

  11. Eclipse Parallel Tools Platform

    2005-02-18

    Designing and developing parallel programs is an inherently complex task. Developers must choose from the many parallel architectures and programming paradigms that are available, and face a plethora of tools that are required to execute, debug, and analyze parallel programs i these environments. Few, if any, of these tools provide any degree of integration, or indeed any commonality in their user interfaces at all. This further complicates the parallel developer's task, hampering software engineering practices,more » and ultimately reducing productivity. One consequence of this complexity is that best practice in parallel application development has not advanced to the same degree as more traditional programming methodologies. The result is that there is currently no open-source, industry-strength platform that provides a highly integrated environment specifically designed for parallel application development. Eclipse is a universal tool-hosting platform that is designed to providing a robust, full-featured, commercial-quality, industry platform for the development of highly integrated tools. It provides a wide range of core services for tool integration that allow tool producers to concentrate on their tool technology rather than on platform specific issues. The Eclipse Integrated Development Environment is an open-source project that is supported by over 70 organizations, including IBM, Intel and HP. The Eclipse Parallel Tools Platform (PTP) plug-in extends the Eclipse framwork by providing support for a rich set of parallel programming languages and paradigms, and a core infrastructure for the integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration of a wide variety of parallel tools. The first version of the PTP is a prototype that only provides minimal functionality for parallel tool integration, support for a small number of parallel architectures

  12. Experience in highly parallel processing using DAP

    NASA Technical Reports Server (NTRS)

    Parkinson, D.

    1987-01-01

    Distributed Array Processors (DAP) have been in day to day use for ten years and a large amount of user experience has been gained. The profile of user applications is similar to that of the Massively Parallel Processor (MPP) working group. Experience has shown that contrary to expectations, highly parallel systems provide excellent performance on so-called dirty problems such as the physics part of meteorological codes. The reasons for this observation are discussed. The arguments against replacing bit processors with floating point processors are also discussed.

  13. Xyce parallel electronic simulator release notes.

    SciTech Connect

    Keiter, Eric R; Hoekstra, Robert John; Mei, Ting; Russo, Thomas V.; Schiek, Richard Louis; Thornquist, Heidi K.; Rankin, Eric Lamont; Coffey, Todd S; Pawlowski, Roger P; Santarelli, Keith R.

    2010-05-01

    The Xyce Parallel Electronic Simulator has been written to support, in a rigorous manner, the simulation needs of the Sandia National Laboratories electrical designers. Specific requirements include, among others, the ability to solve extremely large circuit problems by supporting large-scale parallel computing platforms, improved numerical performance and object-oriented code design and implementation. The Xyce release notes describe: Hardware and software requirements New features and enhancements Any defects fixed since the last release Current known defects and defect workarounds For up-to-date information not available at the time these notes were produced, please visit the Xyce web page at http://www.cs.sandia.gov/xyce.

  14. Parallelization of the CI Program PEDICI

    NASA Astrophysics Data System (ADS)

    Thorsteinsson, Thorstein; Rettrup, Sten

    The general CI code PEDICI has been parallelized by decomposing the occurring summation over two-electron integrals. The parallelization was formulated in terms of a "master/slave'' model, and realized through use of the "PVM'' message passing facility. We have aimed at achieving a reasonably simple implementation for use on machines with intermediate numbers of processors. Exploratory test runs on an IBM SP supercomputer (consisting of RS/6000 model P2SC (120 MHz) nodes) show a very satisfactory performance increase with the number of processors used, as well as encouraging balancing of the workload. Our largest 32-processor test case gives a speed-up factor of 30.27.

  15. Reconstruction of coded aperture images

    NASA Technical Reports Server (NTRS)

    Bielefeld, Michael J.; Yin, Lo I.

    1987-01-01

    Balanced correlation method and the Maximum Entropy Method (MEM) were implemented to reconstruct a laboratory X-ray source as imaged by a Uniformly Redundant Array (URA) system. Although the MEM method has advantages over the balanced correlation method, it is computationally time consuming because of the iterative nature of its solution. Massively Parallel Processing, with its parallel array structure is ideally suited for such computations. These preliminary results indicate that it is possible to use the MEM method in future coded-aperture experiments with the help of the MPP.

  16. Serial-Turbo-Trellis-Coded Modulation with Rate-1 Inner Code

    NASA Technical Reports Server (NTRS)

    Divsalar, Dariush; Dolinar, Sam; Pollara, Fabrizio

    2004-01-01

    Serially concatenated turbo codes have been proposed to satisfy requirements for low bit- and word-error rates and for low (in comparison with related previous codes) complexity of coding and decoding algorithms and thus low complexity of coding and decoding circuitry. These codes are applicable to such high-level modulations as octonary phase-shift keying (8PSK) and 16-state quadrature amplitude modulation (16QAM); the signal product obtained by applying one of these codes to one of these modulations is denoted, generally, as serially concatenated trellis-coded modulation (SCTCM). These codes could be particularly beneficial for communication systems that must be designed and operated subject to limitations on bandwidth and power. Some background information is prerequisite to a meaningful summary of this development. Trellis-coded modulation (TCM) is now a well-established technique in digital communications. A turbo code combines binary component codes (which typically include trellis codes) with interleaving. A turbo code of the type that has been studied prior to this development is composed of parallel concatenated convolutional codes (PCCCs) implemented by two or more constituent systematic encoders joined through one or more interleavers. The input information bits feed the first encoder and, after having been scrambled by the interleaver, enter the second encoder. A code word of a parallel concatenated code consists of the input bits to the first encoder followed by the parity check bits of both encoders. The suboptimal iterative decoding structure for such a code is modular, and consists of a set of concatenated decoding modules one for each constituent code connected through an interleaver identical to the one in the encoder side. Each decoder performs weighted soft decoding of the input sequence. PCCCs yield very large coding gains at the cost of a reduction in the data rate and/or an increase in bandwidth.

  17. Parallel Lisp simulator

    SciTech Connect

    Weening, J.S.

    1988-05-01

    CSIM is a simulator for parallel Lisp, based on a continuation passing interpreter. It models a shared-memory multiprocessor executing programs written in Common Lisp, extended with several primitives for creating and controlling processes. This paper describes the structure of the simulator, measures its performance, and gives an example of its use with a parallel Lisp program.

  18. User's Guide for ENSAERO_FE Parallel Finite Element Solver

    NASA Technical Reports Server (NTRS)

    Eldred, Lloyd B.; Guruswamy, Guru P.

    1999-01-01

    A high fidelity parallel static structural analysis capability is created and interfaced to the multidisciplinary analysis package ENSAERO-MPI of Ames Research Center. This new module replaces ENSAERO's lower fidelity simple finite element and modal modules. Full aircraft structures may be more accurately modeled using the new finite element capability. Parallel computation is performed by breaking the full structure into multiple substructures. This approach is conceptually similar to ENSAERO's multizonal fluid analysis capability. The new substructure code is used to solve the structural finite element equations for each substructure in parallel. NASTRANKOSMIC is utilized as a front end for this code. Its full library of elements can be used to create an accurate and realistic aircraft model. It is used to create the stiffness matrices for each substructure. The new parallel code then uses an iterative preconditioned conjugate gradient method to solve the global structural equations for the substructure boundary nodes.

  19. Tree encoding for symmetric sources with a distortion measure

    NASA Technical Reports Server (NTRS)

    Gallager, R. G.

    1974-01-01

    A simple algorithm is developed for mapping the outputs of a source into a set of code sequences generated by a tree code. The algorithm is analyzed for the case of a source producing discrete independent equiprobable letters when the distortion measure satisfies a certain symmetry condition. It is shown that the algorithm is capable of achieving an average distortion as close as desired to the minimum average distortion for the code rate given by Shannon's rate-distortion theorem.

  20. MCNP code

    SciTech Connect

    Cramer, S.N.

    1984-01-01

    The MCNP code is the major Monte Carlo coupled neutron-photon transport research tool at the Los Alamos National Laboratory, and it represents the most extensive Monte Carlo development program in the United States which is available in the public domain. The present code is the direct descendent of the original Monte Carlo work of Fermi, von Neumaum, and Ulam at Los Alamos in the 1940s. Development has continued uninterrupted since that time, and the current version of MCNP (or its predecessors) has always included state-of-the-art methods in the Monte Carlo simulation of radiation transport, basic cross section data, geometry capability, variance reduction, and estimation procedures. The authors of the present code have oriented its development toward general user application. The documentation, though extensive, is presented in a clear and simple manner with many examples, illustrations, and sample problems. In addition to providing the desired results, the output listings give a a wealth of detailed information (some optional) concerning each state of the calculation. The code system is continually updated to take advantage of advances in computer hardware and software, including interactive modes of operation, diagnostic interrupts and restarts, and a variety of graphical and video aids.

  1. QR Codes

    ERIC Educational Resources Information Center

    Lai, Hsin-Chih; Chang, Chun-Yen; Li, Wen-Shiane; Fan, Yu-Lin; Wu, Ying-Tien

    2013-01-01

    This study presents an m-learning method that incorporates Integrated Quick Response (QR) codes. This learning method not only achieves the objectives of outdoor education, but it also increases applications of Cognitive Theory of Multimedia Learning (CTML) (Mayer, 2001) in m-learning for practical use in a diverse range of outdoor locations. When…

  2. ANTLR Tree Grammar Generator and Extensions

    NASA Technical Reports Server (NTRS)

    Craymer, Loring

    2005-01-01

    A computer program implements two extensions of ANTLR (Another Tool for Language Recognition), which is a set of software tools for translating source codes between different computing languages. ANTLR supports predicated- LL(k) lexer and parser grammars, a notation for annotating parser grammars to direct tree construction, and predicated tree grammars. [ LL(k) signifies left-right, leftmost derivation with k tokens of look-ahead, referring to certain characteristics of a grammar.] One of the extensions is a syntax for tree transformations. The other extension is the generation of tree grammars from annotated parser or input tree grammars. These extensions can simplify the process of generating source-to-source language translators and they make possible an approach, called "polyphase parsing," to translation between computing languages. The typical approach to translator development is to identify high-level semantic constructs such as "expressions," "declarations," and "definitions" as fundamental building blocks in the grammar specification used for language recognition. The polyphase approach is to lump ambiguous syntactic constructs during parsing and then disambiguate the alternatives in subsequent tree transformation passes. Polyphase parsing is believed to be useful for generating efficient recognizers for C++ and other languages that, like C++, have significant ambiguities.

  3. The Fault Tree Compiler (FTC): Program and mathematics

    NASA Technical Reports Server (NTRS)

    Butler, Ricky W.; Martensen, Anna L.

    1989-01-01

    The Fault Tree Compiler Program is a new reliability tool used to predict the top-event probability for a fault tree. Five different gate types are allowed in the fault tree: AND, OR, EXCLUSIVE OR, INVERT, AND m OF n gates. The high-level input language is easy to understand and use when describing the system tree. In addition, the use of the hierarchical fault tree capability can simplify the tree description and decrease program execution time. The current solution technique provides an answer precisely (within the limits of double precision floating point arithmetic) within a user specified number of digits accuracy. The user may vary one failure rate or failure probability over a range of values and plot the results for sensitivity analyses. The solution technique is implemented in FORTRAN; the remaining program code is implemented in Pascal. The program is written to run on a Digital Equipment Corporation (DEC) VAX computer with the VMS operation system.

  4. Parallel computing works

    SciTech Connect

    Not Available

    1991-10-23

    An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of many computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.

  5. Totally parallel multilevel algorithms

    NASA Technical Reports Server (NTRS)

    Frederickson, Paul O.

    1988-01-01

    Four totally parallel algorithms for the solution of a sparse linear system have common characteristics which become quite apparent when they are implemented on a highly parallel hypercube such as the CM2. These four algorithms are Parallel Superconvergent Multigrid (PSMG) of Frederickson and McBryan, Robust Multigrid (RMG) of Hackbusch, the FFT based Spectral Algorithm, and Parallel Cyclic Reduction. In fact, all four can be formulated as particular cases of the same totally parallel multilevel algorithm, which are referred to as TPMA. In certain cases the spectral radius of TPMA is zero, and it is recognized to be a direct algorithm. In many other cases the spectral radius, although not zero, is small enough that a single iteration per timestep keeps the local error within the required tolerance.

  6. Massively parallel mathematical sieves

    SciTech Connect

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  7. SMARTS: Exploiting Temporal Locality and Parallelism through Vertical Execution

    SciTech Connect

    Beckman, P.; Crotinger, J.; Karmesin, S.; Malony, A.; Oldehoeft, R.; Shende, S.; Smith, S.; Vajracharya, S.

    1999-01-04

    In the solution of large-scale numerical prob- lems, parallel computing is becoming simultaneously more important and more difficult. The complex organization of today's multiprocessors with several memory hierarchies has forced the scientific programmer to make a choice between simple but unscalable code and scalable but extremely com- plex code that does not port to other architectures. This paper describes how the SMARTS runtime system and the POOMA C++ class library for high-performance scientific computing work together to exploit data parallelism in scientific applications while hiding the details of manag- ing parallelism and data locality from the user. We present innovative algorithms, based on the macro -dataflow model, for detecting data parallelism and efficiently executing data- parallel statements on shared-memory multiprocessors. We also desclibe how these algorithms can be implemented on clusters of SMPS.

  8. Parallel paving: An algorithm for generating distributed, adaptive, all-quadrilateral meshes on parallel computers

    SciTech Connect

    Lober, R.R.; Tautges, T.J.; Vaughan, C.T.

    1997-03-01

    Paving is an automated mesh generation algorithm which produces all-quadrilateral elements. It can additionally generate these elements in varying sizes such that the resulting mesh adapts to a function distribution, such as an error function. While powerful, conventional paving is a very serial algorithm in its operation. Parallel paving is the extension of serial paving into parallel environments to perform the same meshing functions as conventional paving only on distributed, discretized models. This extension allows large, adaptive, parallel finite element simulations to take advantage of paving`s meshing capabilities for h-remap remeshing. A significantly modified version of the CUBIT mesh generation code has been developed to host the parallel paving algorithm and demonstrate its capabilities on both two dimensional and three dimensional surface geometries and compare the resulting parallel produced meshes to conventionally paved meshes for mesh quality and algorithm performance. Sandia`s {open_quotes}tiling{close_quotes} dynamic load balancing code has also been extended to work with the paving algorithm to retain parallel efficiency as subdomains undergo iterative mesh refinement.

  9. Driver Code for Adaptive Optics

    NASA Technical Reports Server (NTRS)

    Rao, Shanti

    2007-01-01

    A special-purpose computer code for a deformable-mirror adaptive-optics control system transmits pixel-registered control from (1) a personal computer running software that generates the control data to (2) a circuit board with 128 digital-to-analog converters (DACs) that generate voltages to drive the deformable-mirror actuators. This program reads control-voltage codes from a text file, then sends them, via the computer s parallel port, to a circuit board with four AD5535 (or equivalent) chips. Whereas a similar prior computer program was capable of transmitting data to only one chip at a time, this program can send data to four chips simultaneously. This program is in the form of C-language code that can be compiled and linked into an adaptive-optics software system. The program as supplied includes source code for integration into the adaptive-optics software, documentation, and a component that provides a demonstration of loading DAC codes from a text file. On a standard Windows desktop computer, the software can update 128 channels in 10 ms. On Real-Time Linux with a digital I/O card, the software can update 1024 channels (8 boards in parallel) every 8 ms.

  10. Monte Carlo radiation transport&parallelism

    SciTech Connect

    Cox, L. J.; Post, S. E.

    2002-01-01

    This talk summarizes the main aspects of the LANL ASCI Eolus project and its major unclassified code project, MCNP. The MCNP code provide a state-of-the-art Monte Carlo radiation transport to approximately 3000 users world-wide. Almost all hardware platforms are supported because we strictly adhere to the FORTRAN-90/95 standard. For parallel processing, MCNP uses a mixture of OpenMp combined with either MPI or PVM (shared and distributed memory). This talk summarizes our experiences on various platforms using MPI with and without OpenMP. These platforms include PC-Windows, Intel-LINUX, BlueMountain, Frost, ASCI-Q and others.

  11. CFD Optimization on Network-Based Parallel Computer System

    NASA Technical Reports Server (NTRS)

    Cheung, Samson H.; VanDalsem, William (Technical Monitor)

    1994-01-01

    Combining multiple engineering workstations into a network-based heterogeneous parallel computer allows application of aerodynamic optimization with advance computational fluid dynamics codes, which is computationally expensive in mainframe supercomputer. This paper introduces a nonlinear quasi-Newton optimizer designed for this network-based heterogeneous parallel computer on a software called Parallel Virtual Machine. This paper will introduce the methodology behind coupling a Parabolized Navier-Stokes flow solver to the nonlinear optimizer. This parallel optimization package has been applied to reduce the wave drag of a body of revolution and a wing/body configuration with results of 5% to 6% drag reduction.

  12. Parallel CFD design on network-based computer

    NASA Technical Reports Server (NTRS)

    Cheung, Samson

    1995-01-01

    Combining multiple engineering workstations into a network-based heterogeneous parallel computer allows application of aerodynamic optimization with advanced computational fluid dynamics codes, which can be computationally expensive on mainframe supercomputers. This paper introduces a nonlinear quasi-Newton optimizer designed for this network-based heterogeneous parallel computing environment utilizing a software called Parallel Virtual Machine. This paper will introduce the methodology behind coupling a Parabolized Navier-Stokes flow solver to the nonlinear optimizer. This parallel optimization package is applied to reduce the wave drag of a body of revolution and a wing/body configuration with results of 5% to 6% drag reduction.

  13. Programming Probabilistic Structural Analysis for Parallel Processing Computer

    NASA Technical Reports Server (NTRS)

    Sues, Robert H.; Chen, Heh-Chyun; Twisdale, Lawrence A.; Chamis, Christos C.; Murthy, Pappu L. N.

    1991-01-01

    The ultimate goal of this research program is to make Probabilistic Structural Analysis (PSA) computationally efficient and hence practical for the design environment by achieving large scale parallelism. The paper identifies the multiple levels of parallelism in PSA, identifies methodologies for exploiting this parallelism, describes the development of a parallel stochastic finite element code, and presents results of two example applications. It is demonstrated that speeds within five percent of those theoretically possible can be achieved. A special-purpose numerical technique, the stochastic preconditioned conjugate gradient method, is also presented and demonstrated to be extremely efficient for certain classes of PSA problems.

  14. HPC Infrastructure for Solid Earth Simulation on Parallel Computers

    NASA Astrophysics Data System (ADS)

    Nakajima, K.; Chen, L.; Okuda, H.

    2004-12-01

    Recently, various types of parallel computers with various types of architectures and processing elements (PE) have emerged, which include PC clusters and the Earth Simulator. Moreover, users can easily access to these computer resources through network on Grid environment. It is well-known that thorough tuning is required for programmers to achieve excellent performance on each computer. The method for tuning strongly depends on the type of PE and architecture. Optimization by tuning is a very tough work, especially for developers of applications. Moreover, parallel programming using message passing library such as MPI is another big task for application programmers. In GeoFEM project (http://gefeom.tokyo.rist.or.jp), authors have developed a parallel FEM platform for solid earth simulation on the Earth Simulator, which supports parallel I/O, parallel linear solvers and parallel visualization. This platform can efficiently hide complicated procedures for parallel programming and optimization on vector processors from application programmers. This type of infrastructure is very useful. Source codes developed on PC with single processor is easily optimized on massively parallel computer by linking the source code to the parallel platform installed on the target computer. This parallel platform, called HPC Infrastructure will provide dramatic efficiency, portability and reliability in development of scientific simulation codes. For example, line number of the source codes is expected to be less than 10,000 and porting legacy codes to parallel computer takes 2 or 3 weeks. Original GeoFEM platform supports only I/O, linear solvers and visualization. In the present work, further development for adaptive mesh refinement (AMR) and dynamic load-balancing (DLB) have been carried out. In this presentation, examples of large-scale solid earth simulation using the Earth Simulator will be demonstrated. Moreover, recent results of a parallel computational steering tool using an

  15. The NAS parallel benchmarks

    NASA Technical Reports Server (NTRS)

    Bailey, David (Editor); Barton, John (Editor); Lasinski, Thomas (Editor); Simon, Horst (Editor)

    1993-01-01

    A new set of benchmarks was developed for the performance evaluation of highly parallel supercomputers. These benchmarks consist of a set of kernels, the 'Parallel Kernels,' and a simulated application benchmark. Together they mimic the computation and data movement characteristics of large scale computational fluid dynamics (CFD) applications. The principal distinguishing feature of these benchmarks is their 'pencil and paper' specification - all details of these benchmarks are specified only algorithmically. In this way many of the difficulties associated with conventional benchmarking approaches on highly parallel systems are avoided.

  16. Species integrity in trees.

    PubMed

    Ortiz-Barrientos, Daniel; Baack, Eric J

    2014-09-01

    From California sequoia, to Australian eucalyptus, to the outstanding diversity of Amazonian forests, trees are fundamental to many processes in ecology and evolution. Trees define the communities that they inhabit, are host to a multiplicity of other organisms and can determine the ecological dynamics of other plants and animals. Trees are also at the heart of major patterns of biodiversity such as the latitudinal gradient of species diversity and thus are important systems for studying the origin of new plant species. Although the role of trees in community assembly and ecological succession is partially understood, the origin of tree diversity remains largely opaque. For instance, the relative importance of differing habitats and phenologies as barriers to hybridization between closely related species is still largely uncharacterized in trees. Consequently, we know very little about the origin of trees species and their integrity. Similarly, studies on the interplay between speciation and tree community assembly are in their infancy and so are studies on how processes like forest maturation modifies the context in which reproductive isolation evolves. In this issue of Molecular Ecology, Lindtke et al. (2014) and Lagache et al. (2014) overcome some traditional difficulties in studying mating systems and sexual isolation in the iconic oaks and poplars, providing novel insights about the integrity of tree species and on how ecology leads to variation in selection on reproductive isolation over time and space. PMID:25155715

  17. Partitioning problems in parallel, pipelined and distributed computing

    NASA Technical Reports Server (NTRS)

    Bokhari, S.

    1985-01-01

    The problem of optimally assigning the modules of a parallel program over the processors of a multiple computer system is addressed. A Sum-Bottleneck path algorithm is developed that permits the efficient solution of many variants of this problem under some constraints on the structure of the partitions. In particular, the following problems are solved optimally for a single-host, multiple satellite system: partitioning multiple chain structured parallel programs, multiple arbitrarily structured serial programs and single tree structured parallel programs. In addition, the problems of partitioning chain structured parallel programs across chain connected systems and across shared memory (or shared bus) systems are also solved under certain constraints. All solutions for parallel programs are equally applicable to pipelined programs. These results extend prior research in this area by explicitly taking concurrency into account and permit the efficient utilization of multiple computer architectures for a wide range of problems of practical interest.

  18. A parallel, portable and versatile treecode

    SciTech Connect

    Warren, M.S.; Salmon, J.K. |

    1994-10-01

    Portability and versatility are important characteristics of a computer program which is meant to be generally useful. We describe how we have developed a parallel N-body treecode to meet these goals. A variety of applications to which the code can be applied are mentioned. Performance of the program is also measured on several machines. A 512 processor Intel Paragon can solve for the forces on 10 million gravitationally interacting particles to 0.5% rms accuracy in 28.6 seconds.

  19. Parallel Molecular Dynamics Program for Molecules

    SciTech Connect

    Plimpton, Steve

    1995-03-07

    ParBond is a parallel classical molecular dynamics code that models bonded molecular systems, typically of an organic nature. It uses classical force fields for both non-bonded Coulombic and Van der Waals interactions and for 2-, 3-, and 4-body bonded (bond, angle, dihedral, and improper) interactions. It integrates Newton''s equation of motion for the molecular system and evaluates various thermodynamical properties of the system as it progresses.

  20. Parallel computing techniques for rotorcraft aerodynamics

    NASA Astrophysics Data System (ADS)

    Ekici, Kivanc

    The modification of unsteady three-dimensional Navier-Stokes codes for application on massively parallel and distributed computing environments is investigated. The Euler/Navier-Stokes code TURNS (Transonic Unsteady Rotor Navier-Stokes) was chosen as a test bed because of its wide use by universities and industry. For the efficient implementation of TURNS on parallel computing systems, two algorithmic changes are developed. First, main modifications to the implicit operator, Lower-Upper Symmetric Gauss Seidel (LU-SGS) originally used in TURNS, is performed. Second, application of an inexact Newton method, coupled with a Krylov subspace iterative method (Newton-Krylov method) is carried out. Both techniques have been tried previously for the Euler equations mode of the code. In this work, we have extended the methods to the Navier-Stokes mode. Several new implicit operators were tried because of convergence problems of traditional operators with the high cell aspect ratio (CAR) grids needed for viscous calculations on structured grids. Promising results for both Euler and Navier-Stokes cases are presented for these operators. For the efficient implementation of Newton-Krylov methods to the Navier-Stokes mode of TURNS, efficient preconditioners must be used. The parallel implicit operators used in the previous step are employed as preconditioners and the results are compared. The Message Passing Interface (MPI) protocol has been used because of its portability to various parallel architectures. It should be noted that the proposed methodology is general and can be applied to several other CFD codes (e.g. OVERFLOW).

  1. Incremental Parallelization of Non-Data-Parallel Programs Using the Charon Message-Passing Library

    NASA Technical Reports Server (NTRS)

    VanderWijngaart, Rob F.

    2000-01-01

    Message passing is among the most popular techniques for parallelizing scientific programs on distributed-memory architectures. The reasons for its success are wide availability (MPI), efficiency, and full tuning control provided to the programmer. A major drawback, however, is that incremental parallelization, as offered by compiler directives, is not generally possible, because all data structures have to be changed throughout the program simultaneously. Charon remedies this situation through mappings between distributed and non-distributed data. It allows breaking up the parallelization into small steps, guaranteeing correctness at every stage. Several tools are available to help convert legacy codes into high-performance message-passing programs. They usually target data-parallel applications, whose loops carrying most of the work can be distributed among all processors without much dependency analysis. Others do a full dependency analysis and then convert the code virtually automatically. Even more toolkits are available that aid construction from scratch of message passing programs. None, however, allows piecemeal translation of codes with complex data dependencies (i.e. non-data-parallel programs) into message passing codes. The Charon library (available in both C and Fortran) provides incremental parallelization capabilities by linking legacy code arrays with distributed arrays. During the conversion process, non-distributed and distributed arrays exist side by side, and simple mapping functions allow the programmer to switch between the two in any location in the program. Charon also provides wrapper functions that leave the structure of the legacy code intact, but that allow execution on truly distributed data. Finally, the library provides a rich set of communication functions that support virtually all patterns of remote data demands in realistic structured grid scientific programs, including transposition, nearest-neighbor communication, pipelining

  2. A Data Parallel Algorithm for XML DOM Parsing

    NASA Astrophysics Data System (ADS)

    Shah, Bhavik; Rao, Praveen R.; Moon, Bongki; Rajagopalan, Mohan

    The extensible markup language XML has become the de facto standard for information representation and interchange on the Internet. XML parsing is a core operation performed on an XML document for it to be accessed and manipulated. This operation is known to cause performance bottlenecks in applications and systems that process large volumes of XML data. We believe that parallelism is a natural way to boost performance. Leveraging multicore processors can offer a cost-effective solution, because future multicore processors will support hundreds of cores, and will offer a high degree of parallelism in hardware. We propose a data parallel algorithm called ParDOM for XML DOM parsing, that builds an in-memory tree structure for an XML document. ParDOM has two phases. In the first phase, an XML document is partitioned into chunks and parsed in parallel. In the second phase, partial DOM node tree structures created during the first phase, are linked together (in parallel) to build a complete DOM node tree. ParDOM offers fine-grained parallelism by adopting a flexible chunking scheme - each chunk can contain an arbitrary number of start and end XML tags that are not necessarily matched. ParDOM can be conveniently implemented using a data parallel programming model that supports map and sort operations. Through empirical evaluation, we show that ParDOM yields better scalability than PXP [23] - a recently proposed parallel DOM parsing algorithm - on commodity multicore processors. Furthermore, ParDOM can process a wide-variety of XML datasets with complex structures which PXP fails to parse.

  3. Parallelization of the Lagrangian Particle Dispersion Model

    SciTech Connect

    Buckley, R.L.; O`Steen, B.L.

    1997-08-01

    An advanced stochastic Lagrangian Particle Dispersion Model (LPDM) is used by the Atmospheric Technologies Group (ATG) to simulate contaminant transport. The model uses time-dependent three-dimensional fields of wind and turbulence to determine the location of individual particles released into the atmosphere. This report describes modifications to LPDM using the Message Passing Interface (MPI) which allows for execution in a parallel configuration on the Cray Supercomputer facility at the SRS. Use of a parallel version allows for many more particles to be released in a given simulation, with little or no increase in computational time. This significantly lowers (greater than an order of magnitude) the minimum resolvable concentration levels without ad hoc averaging schemes and/or without reducing spatial resolution. The general changes made to LPDM are discussed and a series of tests are performed comparing the serial (single processor) and parallel versions of the code.

  4. Parallel contact detection algorithm for transient solid dynamics simulations using PRONTO3D

    SciTech Connect

    Attaway, S.W.; Hendrickson, B.A.; Plimpton, S.J.

    1996-09-01

    An efficient, scalable, parallel algorithm for treating material surface contacts in solid mechanics finite element programs has been implemented in a modular way for MIMD parallel computers. The serial contact detection algorithm that was developed previously for the transient dynamics finite element code PRONTO3D has been extended for use in parallel computation by devising a dynamic (adaptive) processor load balancing scheme.

  5. Parallel and Portable Monte Carlo Particle Transport

    NASA Astrophysics Data System (ADS)

    Lee, S. R.; Cummings, J. C.; Nolen, S. D.; Keen, N. D.

    1997-08-01

    We have developed a multi-group, Monte Carlo neutron transport code in C++ using object-oriented methods and the Parallel Object-Oriented Methods and Applications (POOMA) class library. This transport code, called MC++, currently computes k and α eigenvalues of the neutron transport equation on a rectilinear computational mesh. It is portable to and runs in parallel on a wide variety of platforms, including MPPs, clustered SMPs, and individual workstations. It contains appropriate classes and abstractions for particle transport and, through the use of POOMA, for portable parallelism. Current capabilities are discussed, along with physics and performance results for several test problems on a variety of hardware, including all three Accelerated Strategic Computing Initiative (ASCI) platforms. Current parallel performance indicates the ability to compute α-eigenvalues in seconds or minutes rather than days or weeks. Current and future work on the implementation of a general transport physics framework (TPF) is also described. This TPF employs modern C++ programming techniques to provide simplified user interfaces, generic STL-style programming, and compile-time performance optimization. Physics capabilities of the TPF will be extended to include continuous energy treatments, implicit Monte Carlo algorithms, and a variety of convergence acceleration techniques such as importance combing.

  6. The Parallel Axiom

    ERIC Educational Resources Information Center

    Rogers, Pat

    1972-01-01

    Criteria for a reasonable axiomatic system are discussed. A discussion of the historical attempts to prove the independence of Euclids parallel postulate introduces non-Euclidean geometries. Poincare's model for a non-Euclidean geometry is defined and analyzed. (LS)

  7. Parallel interactive data analysis with PROOF

    NASA Astrophysics Data System (ADS)

    Ballintijn, Maarten; Biskup, Marek; Brun, René; Canal, Philippe; Feichtinger, Derek; Ganis, Gerardo; Kickinger, Günter; Peters, Andreas; Rademakers, Fons

    2006-04-01

    The Parallel ROOT Facility, PROOF, enables the analysis of much larger data sets on a shorter time scale. It exploits the inherent parallelism in data of uncorrelated events via a multi-tier architecture that optimizes I/O and CPU utilization in heterogeneous clusters with distributed storage. The system provides transparent and interactive access to gigabytes today. Being part of the ROOT framework PROOF inherits the benefits of a performant object storage system and a wealth of statistical and visualization tools. This paper describes the data analysis model of ROOT and the latest developments on closer integration of PROOF into that model and the ROOT user environment, e.g. support for PROOF-based browsing of trees stored remotely, and the popular TTree::Draw() interface. We also outline the ongoing developments aimed to improve the flexibility and user-friendliness of the system.

  8. Scalable parallel communications

    NASA Technical Reports Server (NTRS)

    Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

    1992-01-01

    Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth

  9. Automatic generation of tree level helicity amplitudes

    NASA Astrophysics Data System (ADS)

    Stelzer, T.; Long, W. F.

    1994-11-01

    The program MadGraph is presented which automatically generates postscript Feynman diagrams and Fortran code to calculate arbitrary tree level helicity amplitudes by calling HELAS[1] subroutines. The program is written in Fortran and is available in Unix and VMS versions. MadGraph currently includes standard model interactions of QCD and QFD, but is easily modified to include additional models such as supersymmetry.

  10. Automatic generation of tree level helicity amplitudes

    NASA Astrophysics Data System (ADS)

    Stelzer, T.; Long, W. F.

    1994-07-01

    The program MadGraph is presented which automatically generates postscript Feynman diagrams and Fortran code to calculate arbitrary tree level helicity amplitudes by calling HELAS[1] subroutines. The program is written in Fortran and is available in Unix and VMS versions. MadGraph currently includes standard model interactions of QCD and QFD, but is easily modified to include additional models such as supersymmetry.

  11. Parallel image compression

    NASA Technical Reports Server (NTRS)

    Reif, John H.

    1987-01-01

    A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.

  12. Artificial intelligence in parallel

    SciTech Connect

    Waldrop, M.M.

    1984-08-10

    The current rage in the Artificial Intelligence (AI) community is parallelism: the idea is to build machines with many independent processors doing many things at once. The upshot is that about a dozen parallel machines are now under development for AI alone. As might be expected, the approaches are diverse yet there are a number of fundamental issues in common: granularity, topology, control, and algorithms.

  13. Continuous parallel coordinates.

    PubMed

    Heinrich, Julian; Weiskopf, Daniel

    2009-01-01

    Typical scientific data is represented on a grid with appropriate interpolation or approximation schemes,defined on a continuous domain. The visualization of such data in parallel coordinates may reveal patterns latently contained in the data and thus can improve the understanding of multidimensional relations. In this paper, we adopt the concept of continuous scatterplots for the visualization of spatially continuous input data to derive a density model for parallel coordinates. Based on the point-line duality between scatterplots and parallel coordinates, we propose a mathematical model that maps density from a continuous scatterplot to parallel coordinates and present different algorithms for both numerical and analytical computation of the resulting density field. In addition, we show how the 2-D model can be used to successively construct continuous parallel coordinates with an arbitrary number of dimensions. Since continuous parallel coordinates interpolate data values within grid cells, a scalable and dense visualization is achieved, which will be demonstrated for typical multi-variate scientific data.

  14. Programming with a high degree of parallelism in fortran

    NASA Astrophysics Data System (ADS)

    Jesshope, C. R.

    1982-06-01

    Many parallel extensions to FORTRAN have been proposed by 'supercomputer' manufacturers. The major differences between these language extensions is reviewed briefly. The Principle of Conservation of Parallelism is also introduced, which is argued to be a desirable foundation on which to base the development of code for parallel computers. Simply stated it requires that the degree of parallelism should not increase during the translation of an algorithm from a concept to a high level language (FORTRAN say) and finally into the machine code of the target computer. Cray FORTRAN and other vectorising compilers do not adhere to this principle, as the parallelism increases from 1 to some greater degree during the compilation process. A simple example will be used to illustrate the implications of this principle, which shows that it will reduce operations at the expense of storage locations. Vectorising compilers may reduce this storage requirement but will increase the number of operations. Two further examples of highly parallel and practical codes are also presented. These illustrate the compactness of code and the close relationship between the mathematical description of the problem and the FORTRAN implementation. The examples show the matrix multiplication and fast Fourier transform algorithms.

  15. Performance of a parallel algorithm for standard cell placement on the Intel Hypercube

    NASA Technical Reports Server (NTRS)

    Jones, Mark; Banerjee, Prithviraj

    1987-01-01

    A parallel simulated annealing algorithm for standard cell placement on the Intel Hypercube is presented. A novel tree broadcasting strategy is used extensively for updating cell locations in the parallel environment. Studies on the performance of the algorithm on example industrial circuits show that it is faster and gives better final placement results than uniprocessor simulated annealing algorithms.

  16. A parallel implementation of kriging with a trend

    SciTech Connect

    Gajraj, A.; Joubert, W.; Jones, J.

    1997-11-01

    This paper describes the parallelization of the GSLIB ktb3dm code. The code is parallelized using the message passing paradigm, Parallel Virtual Machine (PVM), under a Multiple Instructions, Multiple Data (MIMD) architecture. The code performance is analyzed using different grid sizes of 5x5x1, 50x50x1, 100x100x1 and 500x500x1 with 1, 2, 4, 8 and in some cases 16 processors on the Cray T3D supercomputer. The parallelization effort focused on the main kriging do loop. The results confirm that there is a substantial benefit to be derived in terms of CPU time savings (or execution speed) by using the parallel version of the code, especially when considering larger grids. Additionally, speed-up and scalability analyses show that actual speed-up is close to theoretical, while the code scales appropriately within the 1 to 16 processor range tested. The kriging of a quarter-million grid cell system fell from over 9 CPU minutes on one Cray T3D processor to about 1.25 CPU minutes on 16 processors on the same machine.

  17. The Flame Tree

    ERIC Educational Resources Information Center

    Lewis, Richard

    2004-01-01

    Lewis's own experiences living in Indonesia are fertile ground for telling "a ripping good story," one found in "The Flame Tree." He hopes people will enjoy the tale and appreciate the differences of an unfamiliar culture. The excerpt from "The Flame Tree" will reel readers in quickly.

  18. CSI for Trees

    ERIC Educational Resources Information Center

    Rubino, Darrin L.; Hanson, Deborah

    2009-01-01

    The circles and patterns in a tree's stem tell a story, but that story can be a mystery. Interpreting the story of tree rings provides a way to heighten the natural curiosity of students and help them gain insight into the interaction of elements in the environment. It also represents a wonderful opportunity to incorporate the nature of science.…

  19. Trees Are Terrific!

    ERIC Educational Resources Information Center

    Braus, Judy, Ed.

    1992-01-01

    Ranger Rick's NatureScope is a creative education series dedicated to inspiring in children an understanding and appreciation of the natural world while developing the skills they will need to make responsible decisions about the environment. Contents are organized into the following sections: (1) "What Makes a Tree a Tree?," including information…

  20. Tree Topology Estimation.

    PubMed

    Estrada, Rolando; Tomasi, Carlo; Schmidler, Scott C; Farsiu, Sina

    2015-08-01

    Tree-like structures are fundamental in nature, and it is often useful to reconstruct the topology of a tree - what connects to what - from a two-dimensional image of it. However, the projected branches often cross in the image: the tree projects to a planar graph, and the inverse problem of reconstructing the topology of the tree from that of the graph is ill-posed. We regularize this problem with a generative, parametric tree-growth model. Under this model, reconstruction is possible in linear time if one knows the direction of each edge in the graph - which edge endpoint is closer to the root of the tree - but becomes NP-hard if the directions are not known. For the latter case, we present a heuristic search algorithm to estimate the most likely topology of a rooted, three-dimensional tree from a single two-dimensional image. Experimental results on retinal vessel, plant root, and synthetic tree data sets show that our methodology is both accurate and efficient. PMID:26353004

  1. Tree nut oils

    Technology Transfer Automated Retrieval System (TEKTRAN)

    The major tree nuts include almonds, Brazil nuts, cashew nuts, hazelnuts, macadamia nuts, pecans, pine nuts, pistachio nuts, and walnuts. Tree nut oils are appreciated in food applications because of their flavors and are generally more expensive than other gourmet oils. Research during the last de...

  2. Trees for Mother Earth.

    ERIC Educational Resources Information Center

    Greer, Sandy

    1993-01-01

    Describes Trees for Mother Earth, a program in which secondary students raise funds to buy fruit trees to plant during visits to the Navajo Reservation. Benefits include developing feelings of self-worth among participants, promoting cultural exchange and understanding, and encouraging self-sufficiency among the Navajo. (LP)

  3. Structural Equation Model Trees

    ERIC Educational Resources Information Center

    Brandmaier, Andreas M.; von Oertzen, Timo; McArdle, John J.; Lindenberger, Ulman

    2013-01-01

    In the behavioral and social sciences, structural equation models (SEMs) have become widely accepted as a modeling tool for the relation between latent and observed variables. SEMs can be seen as a unification of several multivariate analysis techniques. SEM Trees combine the strengths of SEMs and the decision tree paradigm by building tree…

  4. Parallelism for quantum computation with qudits

    SciTech Connect

    O'Leary, Dianne P.; Brennen, Gavin K.; Bullock, Stephen S.

    2006-09-15

    Robust quantum computation with d-level quantum systems (qudits) poses two requirements: fast, parallel quantum gates and high-fidelity two-qudit gates. We first describe how to implement parallel single-qudit operations. It is by now well known that any single-qudit unitary can be decomposed into a sequence of Givens rotations on two-dimensional subspaces of the qudit state space. Using a coupling graph to represent physically allowed couplings between pairs of qudit states, we then show that the logical depth (time) of the parallel gate sequence is equal to the height of an associated tree. The implementation of a given unitary can then optimize the tradeoff between gate time and resources used. These ideas are illustrated for qudits encoded in the ground hyperfine states of the alkali-metal atoms {sup 87}Rb and {sup 133}Cs. Second, we provide a protocol for implementing parallelized nonlocal two-qudit gates using the assistance of entangled qubit pairs. Using known protocols for qubit entanglement purification, this offers the possibility of high-fidelity two-qudit gates.

  5. Parallel image computation in clusters with task-distributor.

    PubMed

    Baun, Christian

    2016-01-01

    Distributed systems, especially clusters, can be used to execute ray tracing tasks in parallel for speeding up the image computation. Because ray tracing is a computational expensive and memory consuming task, ray tracing can also be used to benchmark clusters. This paper introduces task-distributor, a free software solution for the parallel execution of ray tracing tasks in distributed systems. The ray tracing solution used for this work is the Persistence Of Vision Raytracer (POV-Ray). Task-distributor does not require any modification of the POV-Ray source code or the installation of an additional message passing library like the Message Passing Interface or Parallel Virtual Machine to allow parallel image computation, in contrast to various other projects. By analyzing the runtime of the sequential and parallel program parts of task-distributor, it becomes clear how the problem size and available hardware resources influence the scaling of the parallel application.

  6. The BLAZE language: A parallel language for scientific programming

    NASA Technical Reports Server (NTRS)

    Mehrotra, P.; Vanrosendale, J.

    1985-01-01

    A Pascal-like scientific programming language, Blaze, is described. Blaze contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus Blaze should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with onceptually sequential control flow. A central goal in the design of Blaze is portability across a broad range of parallel architectures. The multiple levels of parallelism present in Blaze code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of Blaze are described and shows how this language would be used in typical scientific programming.

  7. The BLAZE language - A parallel language for scientific programming

    NASA Technical Reports Server (NTRS)

    Mehrotra, Piyush; Van Rosendale, John

    1987-01-01

    A Pascal-like scientific programming language, BLAZE, is described. BLAZE contains array arithmetic, forall loops, and APL-style accumulation operators, which allow natural expression of fine grained parallelism. It also employs an applicative or functional procedure invocation mechanism, which makes it easy for compilers to extract coarse grained parallelism using machine specific program restructuring. Thus BLAZE should allow one to achieve highly parallel execution on multiprocessor architectures, while still providing the user with conceptually sequential control flow. A central goal in the design of BLAZE is portability across a broad range of parallel architectures. The multiple levels of parallelism present in BLAZE code, in principle, allow a compiler to extract the types of parallelism appropriate for the given architecture while neglecting the remainder. The features of BLAZE are described and it is shown how this language would be used in typical scientific programming.

  8. Parallel image computation in clusters with task-distributor.

    PubMed

    Baun, Christian

    2016-01-01

    Distributed systems, especially clusters, can be used to execute ray tracing tasks in parallel for speeding up the image computation. Because ray tracing is a computational expensive and memory consuming task, ray tracing can also be used to benchmark clusters. This paper introduces task-distributor, a free software solution for the parallel execution of ray tracing tasks in distributed systems. The ray tracing solution used for this work is the Persistence Of Vision Raytracer (POV-Ray). Task-distributor does not require any modification of the POV-Ray source code or the installation of an additional message passing library like the Message Passing Interface or Parallel Virtual Machine to allow parallel image computation, in contrast to various other projects. By analyzing the runtime of the sequential and parallel program parts of task-distributor, it becomes clear how the problem size and available hardware resources influence the scaling of the parallel application. PMID:27330898

  9. The tree of eukaryotes.

    PubMed

    Keeling, Patrick J; Burger, Gertraud; Durnford, Dion G; Lang, B Franz; Lee, Robert W; Pearlman, Ronald E; Roger, Andrew J; Gray, Michael W

    2005-12-01

    Recent advances in resolving the tree of eukaryotes are converging on a model composed of a few large hypothetical 'supergroups', each comprising a diversity of primarily microbial eukaryotes (protists, or protozoa and algae). The process of resolving the tree involves the synthesis of many kinds of data, including single-gene trees, multigene analyses, and other kinds of molecular and structural characters. Here, we review the recent progress in assembling the tree of eukaryotes, describing the major evidence for each supergroup, and where gaps in our knowledge remain. We also consider other factors emerging from phylogenetic analyses and comparative genomics, in particular lateral gene transfer, and whether such factors confound our understanding of the eukaryotic tree.

  10. From Family Trees to Decision Trees.

    ERIC Educational Resources Information Center

    Trobian, Helen R.

    This paper is a preliminary inquiry by a non-mathematician into graphic methods of sequential planning and ways in which hierarchical analysis and tree structures can be helpful in developing interest in the use of mathematical modeling in the search for creative solutions to real-life problems. Highlights include a discussion of hierarchical…

  11. A Code for Probabilistic Safety Assessment

    1997-10-10

    An integrated fault-event tree software package PSAPACK was developed for level-1 PSA using personal computers. It is a menu driven interactive modular system which permits different choices, depending on the user's purposes and needs. The event tree development module is capable of developing the logic accident sequences based on the user's specified relations between event tree headings. Identification of success sequences and core damage sequences is done automatically by the code based on the successmore » function input by the user. It links minimum cut sets (MCS) from system fault trees and performs the Boolean reduction. It can also retrieve data from the reliability data base to perform the quantification of accident sequences.« less

  12. Walking tree heuristics for biological string alignment, gene location, and phylogenies

    NASA Astrophysics Data System (ADS)

    Cull, P.; Holloway, J. L.; Cavener, J. D.

    1999-03-01

    Basic biological information is stored in strings of nucleic acids (DNA, RNA) or amino acids (proteins). Teasing out the meaning of these strings is a central problem of modern biology. Matching and aligning strings brings out their shared characteristics. Although string matching is well-understood in the edit-distance model, biological strings with transpositions and inversions violate this model's assumptions. We propose a family of heuristics called walking trees to align biologically reasonable strings. Both edit-distance and walking tree methods can locate specific genes within a large string when the genes' sequences are given. When we attempt to match whole strings, the walking tree matches most genes, while the edit-distance method fails. We also give examples in which the walking tree matches substrings even if they have been moved or inverted. The edit-distance method was not designed to handle these problems. We include an example in which the walking tree "discovered" a gene. Calculating scores for whole genome matches gives a method for approximating evolutionary distance. We show two evolutionary trees for the picornaviruses which were computed by the walking tree heuristic. Both of these trees show great similarity to previously constructed trees. The point of this demonstration is that WHOLE genomes can be matched and distances calculated. The first tree was created on a Sequent parallel computer and demonstrates that the walking tree heuristic can be efficiently parallelized. The second tree was created using a network of work stations and demonstrates that there is suffient parallelism in the phylogenetic tree calculation that the sequential walking tree can be used effectively on a network.

  13. Tree encoding of Gaussian sources. [in data compression

    NASA Technical Reports Server (NTRS)

    Dick, R. J.; Berger, T.; Jelinek, F.

    1974-01-01

    Tree codes are known to be capable of performing arbitrarily close to the rate-distortion function for any memoryless source and single-letter fidelity criterion. Tree coding and tree search strategies are investigated for the discrete-time memoryless Gaussian source encoded for a signal-power-to-mean-squared-error ratio of about 30 dB (about 5 binary digits per source output). Also, a theoretical lower bound on average search effort is derived. Two code search strategies (the Viterbi algorithm and the stack algorithm) were simulated in assembly language on a large digital computer. After suitable modifications, both strategies yielded encoding with a signal-to-distortion ratio about 1 dB below the limit set by the rate-distortion function. Although this performance is better than that of any previously known instrumentable scheme, it unfortunately requires search computation of the order of 100,000 machine cycles per source output encoded.

  14. Integrated Task And Data Parallel Programming: Language Design

    NASA Technical Reports Server (NTRS)

    Grimshaw, Andrew S.; West, Emily A.

    1998-01-01

    his research investigates the combination of task and data parallel language constructs within a single programming language. There are an number of applications that exhibit properties which would be well served by such an integrated language. Examples include global climate models, aircraft design problems, and multidisciplinary design optimization problems. Our approach incorporates data parallel language constructs into an existing, object oriented, task parallel language. The language will support creation and manipulation of parallel classes and objects of both types (task parallel and data parallel). Ultimately, the language will allow data parallel and task parallel classes to be used either as building blocks or managers of parallel objects of either type, thus allowing the development of single and multi-paradigm parallel applications. 1995 Research Accomplishments In February I presented a paper at Frontiers '95 describing the design of the data parallel language subset. During the spring I wrote and defended my dissertation proposal. Since that time I have developed a runtime model for the language subset. I have begun implementing the model and hand-coding simple examples which demonstrate the language subset. I have identified an astrophysical fluid flow application which will validate the data parallel language subset. 1996 Research Agenda Milestones for the coming year include implementing a significant portion of the data parallel language subset over the Legion system. Using simple hand-coded methods, I plan to demonstrate (1) concurrent task and data parallel objects and (2) task parallel objects managing both task and data parallel objects. My next steps will focus on constructing a compiler and implementing the fluid flow application with the language. Concurrently, I will conduct a search for a real-world application exhibiting both task and data parallelism within the same program m. Additional 1995 Activities During the fall I collaborated

  15. Relative Debugging of Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

    2002-01-01

    We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular, the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify, the program execution with out changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.

  16. Support for Debugging Automatically Parallelized Programs

    NASA Technical Reports Server (NTRS)

    Jost, Gabriele; Hood, Robert; Biegel, Bryan (Technical Monitor)

    2001-01-01

    We describe a system that simplifies the process of debugging programs produced by computer-aided parallelization tools. The system uses relative debugging techniques to compare serial and parallel executions in order to show where the computations begin to differ. If the original serial code is correct, errors due to parallelization will be isolated by the comparison. One of the primary goals of the system is to minimize the effort required of the user. To that end, the debugging system uses information produced by the parallelization tool to drive the comparison process. In particular the debugging system relies on the parallelization tool to provide information about where variables may have been modified and how arrays are distributed across multiple processes. User effort is also reduced through the use of dynamic instrumentation. This allows us to modify the program execution without changing the way the user builds the executable. The use of dynamic instrumentation also permits us to compare the executions in a fine-grained fashion and only involve the debugger when a difference has been detected. This reduces the overhead of executing instrumentation.

  17. A parallel adaptive mesh refinement algorithm

    NASA Technical Reports Server (NTRS)

    Quirk, James J.; Hanebutte, Ulf R.

    1993-01-01

    Over recent years, Adaptive Mesh Refinement (AMR) algorithms which dynamically match the local resolution of the computational grid to the numerical solution being sought have emerged as powerful tools for solving problems that contain disparate length and time scales. In particular, several workers have demonstrated the effectiveness of employing an adaptive, block-structured hierarchical grid system for simulations of complex shock wave phenomena. Unfortunately, from the parallel algorithm developer's viewpoint, this class of scheme is quite involved; these schemes cannot be distilled down to a small kernel upon which various parallelizing strategies may be tested. However, because of their block-structured nature such schemes are inherently parallel, so all is not lost. In this paper we describe the method by which Quirk's AMR algorithm has been parallelized. This method is built upon just a few simple message passing routines and so it may be implemented across a broad class of MIMD machines. Moreover, the method of parallelization is such that the original serial code is left virtually intact, and so we are left with just a single product to support. The importance of this fact should not be underestimated given the size and complexity of the original algorithm.

  18. Parallel algorithms for the spectral transform method

    SciTech Connect

    Foster, I.T.; Worley, P.H.

    1994-04-01

    The spectral transform method is a standard numerical technique for solving partial differential equations on a sphere and is widely used in atmospheric circulation models. Recent research has identified several promising algorithms for implementing this method on massively parallel computers; however, no detailed comparison of the different algorithms has previously been attempted. In this paper, we describe these different parallel algorithms and report on computational experiments that we have conducted to evaluate their efficiency on parallel computers. The experiments used a testbed code that solves the nonlinear shallow water equations or a sphere; considerable care was taken to ensure that the experiments provide a fair comparison of the different algorithms and that the results are relevant to global models. We focus on hypercube- and mesh-connected multicomputers with cut-through routing, such as the Intel iPSC/860, DELTA, and Paragon, and the nCUBE/2, but also indicate how the results extend to other parallel computer architectures. The results of this study are relevant not only to the spectral transform method but also to multidimensional FFTs and other parallel transforms.

  19. Parallel algorithms for the spectral transform method

    SciTech Connect

    Foster, I.T.; Worley, P.H.

    1997-05-01

    The spectral transform method is a standard numerical technique for solving partial differential equations on a sphere and is widely used in atmospheric circulation models. Recent research has identified several promising algorithms for implementing this method on massively parallel computers; however, no detailed comparison of the different algorithms has previously been attempted. In this paper, the authors describe these different parallel algorithms and report on computational experiments that they have conducted to evaluate their efficiency on parallel computers. The experiments used a testbed code that solves the nonlinear shallow water equations on a sphere; considerable care was taken to ensure that the experiments provide a fair comparison of the different algorithms and that the results are relevant to global models. The authors focus on hypercube- and mesh-connected multicomputers with cut-through routing, such as the Intel iPSC/860, DELTA, and Paragon, and the nCUBE/2, but they also indicate how the results extend to other parallel computer architectures. The results of this study are relevant not only to the spectral transform method but also to multidimensional fast Fourier transforms (FFTs) and other parallel transforms.

  20. Parallel time integration software

    SciTech Connect

    2014-07-01

    This package implements an optimal-scaling multigrid solver for the (non) linear systems that arise from the discretization of problems with evolutionary behavior. Typically, solution algorithms for evolution equations are based on a time-marching approach, solving sequentially for one time step after the other. Parallelism in these traditional time-integrarion techniques is limited to spatial parallelism. However, current trends in computer architectures are leading twards system with more, but not faster. processors. Therefore, faster compute speeds must come from greater parallelism. One approach to achieve parallelism in time is with multigrid, but extending classical multigrid methods for elliptic poerators to this setting is a significant achievement. In this software, we implement a non-intrusive, optimal-scaling time-parallel method based on multigrid reduction techniques. The examples in the package demonstrate optimality of our multigrid-reduction-in-time algorithm (MGRIT) for solving a variety of parabolic equations in two and three sparial dimensions. These examples can also be used to show that MGRIT can achieve significant speedup in comparison to sequential time marching on modern architectures.