Science.gov

Sample records for 3-d massively parallel

  1. 3D seismic imaging on massively parallel computers

    SciTech Connect

    Womble, D.E.; Ober, C.C.; Oldfield, R.

    1997-02-01

    The ability to image complex geologies such as salt domes in the Gulf of Mexico and thrusts in mountainous regions is a key to reducing the risk and cost associated with oil and gas exploration. Imaging these structures, however, is computationally expensive. Datasets can be terabytes in size, and the processing time required for the multiple iterations needed to produce a velocity model can take months, even with the massively parallel computers available today. Some algorithms, such as 3D, finite-difference, prestack, depth migration remain beyond the capacity of production seismic processing. Massively parallel processors (MPPs) and algorithms research are the tools that will enable this project to provide new seismic processing capabilities to the oil and gas industry. The goals of this work are to (1) develop finite-difference algorithms for 3D, prestack, depth migration; (2) develop efficient computational approaches for seismic imaging and for processing terabyte datasets on massively parallel computers; and (3) develop a modular, portable, seismic imaging code.

  2. Time efficient 3-D electromagnetic modeling on massively parallel computers

    SciTech Connect

    Alumbaugh, D.L.; Newman, G.A.

    1995-08-01

    A numerical modeling algorithm has been developed to simulate the electromagnetic response of a three dimensional earth to a dipole source for frequencies ranging from 100 to 100MHz. The numerical problem is formulated in terms of a frequency domain--modified vector Helmholtz equation for the scattered electric fields. The resulting differential equation is approximated using a staggered finite difference grid which results in a linear system of equations for which the matrix is sparse and complex symmetric. The system of equations is solved using a preconditioned quasi-minimum-residual method. Dirichlet boundary conditions are employed at the edges of the mesh by setting the tangential electric fields equal to zero. At frequencies less than 1MHz, normal grid stretching is employed to mitigate unwanted reflections off the grid boundaries. For frequencies greater than this, absorbing boundary conditions must be employed by making the stretching parameters of the modified vector Helmholtz equation complex which introduces loss at the boundaries. To allow for faster calculation of realistic models, the original serial version of the code has been modified to run on a massively parallel architecture. This modification involves three distinct tasks; (1) mapping the finite difference stencil to a processor stencil which allows for the necessary information to be exchanged between processors that contain adjacent nodes in the model, (2) determining the most efficient method to input the model which is accomplished by dividing the input into ``global`` and ``local`` data and then reading the two sets in differently, and (3) deciding how to output the data which is an inherently nonparallel process.

  3. Massively parallel regularized 3D inversion of potential fields on CPUs and GPUs

    NASA Astrophysics Data System (ADS)

    Čuma, Martin; Zhdanov, Michael S.

    2014-01-01

    We have recently introduced a massively parallel regularized 3D inversion of potential fields data. This program takes as an input gravity or magnetic vector, tensor and Total Magnetic Intensity (TMI) measurements and produces 3D volume of density, susceptibility, or three dimensional magnetization vector, the latest also including magnetic remanence information. The code uses combined MPI and OpenMP approach that maps well onto current multiprocessor multicore clusters and exhibits nearly linear strong and weak parallel scaling. It has been used to invert regional to continental size data sets with up to billion cells of the 3D Earth's volume on large clusters for interpretation of large airborne gravity and magnetics surveys. In this paper we explain the features that made this massive parallelization feasible and extend the code to add GPU support in the form of the OpenACC directives. This implementation resulted in up to a 22x speedup as compared to the scalar multithreaded implementation on a 12 core Intel CPU based computer node. Furthermore, we also introduce a mixed single-double precision approach, which allows us to perform most of the calculation at a single floating point number precision while keeping the result as precise as if the double precision had been used. This approach provides an additional 40% speedup on the GPUs, as compared to the pure double precision implementation. It also has about half of the memory footprint of the fully double precision version.

  4. 3-D readout-electronics packaging for high-bandwidth massively paralleled imager

    DOEpatents

    Kwiatkowski, Kris; Lyke, James

    2007-12-18

    Dense, massively parallel signal processing electronics are co-packaged behind associated sensor pixels. Microchips containing a linear or bilinear arrangement of photo-sensors, together with associated complex electronics, are integrated into a simple 3-D structure (a "mirror cube"). An array of photo-sensitive cells are disposed on a stacked CMOS chip's surface at a 45.degree. angle from light reflecting mirror surfaces formed on a neighboring CMOS chip surface. Image processing electronics are held within the stacked CMOS chip layers. Electrical connections couple each of said stacked CMOS chip layers and a distribution grid, the connections for distributing power and signals to components associated with each stacked CSMO chip layer.

  5. PORTA: A Massively Parallel Code for 3D Non-LTE Polarized Radiative Transfer

    NASA Astrophysics Data System (ADS)

    Štěpán, J.

    2014-10-01

    The interpretation of the Stokes profiles of the solar (stellar) spectral line radiation requires solving a non-LTE radiative transfer problem that can be very complex, especially when the main interest lies in modeling the linear polarization signals produced by scattering processes and their modification by the Hanle effect. One of the main difficulties is due to the fact that the plasma of a stellar atmosphere can be highly inhomogeneous and dynamic, which implies the need to solve the non-equilibrium problem of generation and transfer of polarized radiation in realistic three-dimensional stellar atmospheric models. Here we present PORTA, a computer program we have developed for solving, in three-dimensional (3D) models of stellar atmospheres, the problem of the generation and transfer of spectral line polarization taking into account anisotropic radiation pumping and the Hanle and Zeeman effects in multilevel atoms. The numerical method of solution is based on a highly convergent iterative algorithm, whose convergence rate is insensitive to the grid size, and on an accurate short-characteristics formal solver of the Stokes-vector transfer equation which uses monotonic Bezier interpolation. In addition to the iterative method and the 3D formal solver, another important feature of PORTA is a novel parallelization strategy suitable for taking advantage of massively parallel computers. Linear scaling of the solution with the number of processors allows to reduce the solution time by several orders of magnitude. We present useful benchmarks and a few illustrations of applications using a 3D model of the solar chromosphere resulting from MHD simulations. Finally, we present our conclusions with a view to future research. For more details see Štěpán & Trujillo Bueno (2013).

  6. The implementation of the upwind leapfrog scheme for 3D electromagnetic scattering on massively parallel computers

    SciTech Connect

    Nguyen, B.T.; Hutchinson, S.A.

    1995-07-01

    The upwind leapfrog scheme for electromagnetic scattering is briefly described. Its application to the 3D Maxwell`s time domain equations is shown in detail. The scheme`s use of upwind characteristic variables and a narrow stencil result in a smaller demand in communication overhead, making it ideal for implementation on distributed memory parallel computers. The algorithm`s implementation on two message passing computers, a 1024-processor nCUBE 2 and a 1840-processor Intel Paragon, is described. Performance evaluation demonstrates that the scheme performs well with both good scaling qualities and high efficiencies on these machines.

  7. 3D frequency modeling of elastic seismic wave propagation via a structured massively parallel direct Helmholtz solver

    NASA Astrophysics Data System (ADS)

    Wang, S.; De Hoop, M. V.; Xia, J.; Li, X.

    2011-12-01

    We consider the modeling of elastic seismic wave propagation on a rectangular domain via the discretization and solution of the inhomogeneous coupled Helmholtz equation in 3D, by exploiting a parallel multifrontal sparse direct solver equipped with Hierarchically Semi-Separable (HSS) structure to reduce the computational complexity and storage. In particular, we are concerned with solving this equation on a large domain, for a large number of different forcing terms in the context of seismic problems in general, and modeling in particular. We resort to a parsimonious mixed grid finite differences scheme for discretizing the Helmholtz operator and Perfect Matched Layer boundaries, resulting in a non-Hermitian matrix. We make use of a nested dissection based domain decomposition, and introduce an approximate direct solver by developing a parallel HSS matrix compression, factorization, and solution approach. We cast our massive parallelization in the framework of the multifrontal method. The assembly tree is partitioned into local trees and a global tree. The local trees are eliminated independently in each processor, while the global tree is eliminated through massive communication. The solver for the inhomogeneous equation is a parallel hybrid between multifrontal and HSS structure. The computational complexity associated with the factorization is almost linear with the size of the Helmholtz matrix. Our numerical approach can be compared with the spectral element method in 3D seismic applications.

  8. Massively parallel computation of 3D flow and reactions in chemical vapor deposition reactors

    SciTech Connect

    Salinger, A.G.; Shadid, J.N.; Hutchinson, S.A.; Hennigan, G.L.; Devine, K.D.; Moffat, H.K.

    1997-12-01

    Computer modeling of Chemical Vapor Deposition (CVD) reactors can greatly aid in the understanding, design, and optimization of these complex systems. Modeling is particularly attractive in these systems since the costs of experimentally evaluating many design alternatives can be prohibitively expensive, time consuming, and even dangerous, when working with toxic chemicals like Arsine (AsH{sub 3}): until now, predictive modeling has not been possible for most systems since the behavior is three-dimensional and governed by complex reaction mechanisms. In addition, CVD reactors often exhibit large thermal gradients, large changes in physical properties over regions of the domain, and significant thermal diffusion for gas mixtures with widely varying molecular weights. As a result, significant simplifications in the models have been made which erode the accuracy of the models` predictions. In this paper, the authors will demonstrate how the vast computational resources of massively parallel computers can be exploited to make possible the analysis of models that include coupled fluid flow and detailed chemistry in three-dimensional domains. For the most part, models have either simplified the reaction mechanisms and concentrated on the fluid flow, or have simplified the fluid flow and concentrated on rigorous reactions. An important CVD research thrust has been in detailed modeling of fluid flow and heat transfer in the reactor vessel, treating transport and reaction of chemical species either very simply or as a totally decoupled problem. Using the analogy between heat transfer and mass transfer, and the fact that deposition is often diffusion limited, much can be learned from these calculations; however, the effects of thermal diffusion, the change in physical properties with composition, and the incorporation of surface reaction mechanisms are not included in this model, nor can transitions to three-dimensional flows be detected.

  9. 3-D prestack Kirchhoff depth migration: From prototype to production in a massively parallel processor environment

    SciTech Connect

    Chang, H.; Solano, M.; VanDyke, J.P.; McMechan, G.A.; Epili, D.

    1998-03-01

    Portable, production-scale 3-D prestack Kirchhoff depth migration software capable of full-volume imaging has been successfully implemented and applied to a six-million trace (46.9 Gbyte) marine data set from a salt/subsalt play in the Gulf of Mexico. Velocity model building and updates use an image-driven strategy and were performed in a Sun Sparc environment. Images obtained by 3-D prestack migration after three velocity iterations are substantially better focused and reveal drilling targets that were not visible in images obtained from conventional 3-D poststack time migration. Amplitudes are well preserved, so anomalies associated with known reservoirs conform to the petrophysical predictions. Prototype development was on an 8-node Intel iPSC860 computer; the production version was run on an 1824-node Intel Paragon computer. The code has been successfully ported to CRAY (T3D) and Unix workstation (PVM) environments.

  10. Massive parallelization of a 3D finite difference electromagnetic forward solution using domain decomposition methods on multiple CUDA enabled GPUs

    NASA Astrophysics Data System (ADS)

    Schultz, A.

    2010-12-01

    3D forward solvers lie at the core of inverse formulations used to image the variation of electrical conductivity within the Earth's interior. This property is associated with variations in temperature, composition, phase, presence of volatiles, and in specific settings, the presence of groundwater, geothermal resources, oil/gas or minerals. The high cost of 3D solutions has been a stumbling block to wider adoption of 3D methods. Parallel algorithms for modeling frequency domain 3D EM problems have not achieved wide scale adoption, with emphasis on fairly coarse grained parallelism using MPI and similar approaches. The communications bandwidth as well as the latency required to send and receive network communication packets is a limiting factor in implementing fine grained parallel strategies, inhibiting wide adoption of these algorithms. Leading Graphics Processor Unit (GPU) companies now produce GPUs with hundreds of GPU processor cores per die. The footprint, in silicon, of the GPU's restricted instruction set is much smaller than the general purpose instruction set required of a CPU. Consequently, the density of processor cores on a GPU can be much greater than on a CPU. GPUs also have local memory, registers and high speed communication with host CPUs, usually through PCIe type interconnects. The extremely low cost and high computational power of GPUs provides the EM geophysics community with an opportunity to achieve fine grained (i.e. massive) parallelization of codes on low cost hardware. The current generation of GPUs (e.g. NVidia Fermi) provides 3 billion transistors per chip die, with nearly 500 processor cores and up to 6 GB of fast (DDR5) GPU memory. This latest generation of GPU supports fast hardware double precision (64 bit) floating point operations of the type required for frequency domain EM forward solutions. Each Fermi GPU board can sustain nearly 1 TFLOP in double precision, and multiple boards can be installed in the host computer system. We

  11. Massively parallel visualization: Parallel rendering

    SciTech Connect

    Hansen, C.D.; Krogh, M.; White, W.

    1995-12-01

    This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume renderer use a MIMD approach. Implementations for these algorithms are presented for the Thinking Machines Corporation CM-5 MPP.

  12. Massively parallel mathematical sieves

    SciTech Connect

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  13. Massively Parallel QCD

    SciTech Connect

    Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G

    2007-04-11

    The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results.

  14. Parallel rendering techniques for massively parallel visualization

    SciTech Connect

    Hansen, C.; Krogh, M.; Painter, J.

    1995-07-01

    As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory. and parallelism of Massively Parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.

  15. Holographic renormalization of 3D minimal massive gravity

    NASA Astrophysics Data System (ADS)

    Alishahiha, Mohsen; Qaemmaqami, Mohammad M.; Naseh, Ali; Shirzad, Ahmad

    2016-01-01

    We study holographic renormalization of 3D minimal massive gravity using the Chern-Simons-like formulation of the model. We explicitly present Gibbons- Hawking term as well as all counterterms needed to make the action finite in terms of dreibein and spin-connection. This can be used to find correlation functions of stress tensor of holographic dual field theory.

  16. Finding evidence for massive neutrinos using 3D weak lensing

    SciTech Connect

    Kitching, T. D.; Heavens, A. F.; Verde, L.; Serra, P.; Melchiorri, A.

    2008-05-15

    In this paper we investigate the potential of 3D cosmic shear to constrain massive neutrino parameters. We find that if the total mass is substantial (near the upper limits from large scale structure, but setting aside the Ly alpha limit for now), then 3D cosmic shear+Planck is very sensitive to neutrino mass and one may expect that a next generation photometric redshift survey could constrain the number of neutrinos N{sub {nu}} and the sum of their masses m{sub {nu}}=im{sub i} to an accuracy of {delta}N{sub {nu}}{approx}0.08 and {delta}m{sub {nu}}{approx}0.03 eV, respectively. If in fact the masses are close to zero, then the errors weaken to {delta}N{sub {nu}}{approx}0.10 and {delta}m{sub {nu}}{approx}0.07 eV. In either case there is a factor 4 improvement over Planck alone. We use a Bayesian evidence method to predict joint expected evidence for N{sub {nu}} and m{sub {nu}}. We find that 3D cosmic shear combined with a Planck prior could provide 'substantial' evidence for massive neutrinos and be able to distinguish 'decisively' between many competing massive neutrino models. This technique should 'decisively' distinguish between models in which there are no massive neutrinos and models in which there are massive neutrinos with |N{sub {nu}}-3| > or approx. 0.35 and m{sub {nu}} > or approx. 0.25 eV. We introduce the notion of marginalized and conditional evidence when considering evidence for individual parameter values within a multiparameter model.

  17. Efficient, massively parallel eigenvalue computation

    NASA Technical Reports Server (NTRS)

    Huo, Yan; Schreiber, Robert

    1993-01-01

    In numerical simulations of disordered electronic systems, one of the most common approaches is to diagonalize random Hamiltonian matrices and to study the eigenvalues and eigenfunctions of a single electron in the presence of a random potential. An effort to implement a matrix diagonalization routine for real symmetric dense matrices on massively parallel SIMD computers, the Maspar MP-1 and MP-2 systems, is described. Results of numerical tests and timings are also presented.

  18. Massively parallel MRI detector arrays

    NASA Astrophysics Data System (ADS)

    Keil, Boris; Wald, Lawrence L.

    2013-04-01

    Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas via reception, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays.

  19. Massively Parallel MRI Detector Arrays

    PubMed Central

    Keil, Boris; Wald, Lawrence L

    2013-01-01

    Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758

  20. Massively parallel MRI detector arrays.

    PubMed

    Keil, Boris; Wald, Lawrence L

    2013-04-01

    Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas via reception, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called "ultimate" SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758

  1. LaMEM: a massively parallel 3D staggered-grid finite-difference code for coupled nonlinear themo-mechanical modeling of lithospheric deformation with visco-elasto-plastic rheology

    NASA Astrophysics Data System (ADS)

    Popov, Anton; Kaus, Boris

    2015-04-01

    This software project aims at bringing the 3D lithospheric deformation modeling to a qualitatively different level. Our code LaMEM (Lithosphere and Mantle Evolution Model) is based on the following building blocks: * Massively-parallel data-distributed implementation model based on PETSc library * Light, stable and accurate staggered-grid finite difference spatial discretization * Marker-in-Cell pedictor-corector time discretization with Runge-Kutta 4-th order * Elastic stress rotation algorithm based on the time integration of the vorticity pseudo-vector * Staircase-type internal free surface boundary condition without artificial viscosity contrast * Geodynamically relevant visco-elasto-plastic rheology * Global velocity-pressure-temperature Newton-Raphson nonlinear solver * Local nonlinear solver based on FZERO algorithm * Coupled velocity-pressure geometric multigrid preconditioner with Galerkin coarsening Staggered grid finite difference, being inherently Eulerian and rather complicated discretization method, provides no natural treatment of free surface boundary condition. The solution based on the quasi-viscous sticky-air phase introduces significant viscosity contrasts and spoils the convergence of the iterative solvers. In LaMEM we are currently implementing an approximate stair-case type of the free surface boundary condition which excludes the empty cells and restores the solver convergence. Because of the mutual dependence of the stress and strain-rate tensor components, and their different spatial locations in the grid, there is no straightforward way of implementing the nonlinear rheology. In LaMEM we have developed and implemented an efficient interpolation scheme for the second invariant of the strain-rate tensor, that solves this problem. Scalable efficient linear solvers are the key components of the successful nonlinear problem solution. In LaMEM we have a range of PETSc-based preconditioning techniques that either employ a block factorization of

  2. Seismic imaging on massively parallel computers

    SciTech Connect

    Ober, C.C.; Oldfield, R.A.; Womble, D.E.; Mosher, C.C.

    1997-07-01

    A key to reducing the risks and costs associated with oil and gas exploration is the fast, accurate imaging of complex geologies, such as salt domes in the Gulf of Mexico and overthrust regions in US onshore regions. Pre-stack depth migration generally yields the most accurate images, and one approach to this is to solve the scalar-wave equation using finite differences. Current industry computational capabilities are insufficient for the application of finite-difference, 3-D, prestack, depth-migration algorithms. High performance computers and state-of-the-art algorithms and software are required to meet this need. As part of an ongoing ACTI project funded by the US Department of Energy, the authors have developed a finite-difference, 3-D prestack, depth-migration code for massively parallel computer systems. The goal of this work is to demonstrate that massively parallel computers (thousands of processors) can be used efficiently for seismic imaging, and that sufficient computing power exists (or soon will exist) to make finite-difference, prestack, depth migration practical for oil and gas exploration.

  3. Merlin - Massively parallel heterogeneous computing

    NASA Technical Reports Server (NTRS)

    Wittie, Larry; Maples, Creve

    1989-01-01

    Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.

  4. Massively parallel quantum computer simulator

    NASA Astrophysics Data System (ADS)

    De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

    2007-01-01

    We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.

  5. Parallel 3-D method of characteristics in MPACT

    SciTech Connect

    Kochunas, B.; Dovvnar, T. J.; Liu, Z.

    2013-07-01

    A new parallel 3-D MOC kernel has been developed and implemented in MPACT which makes use of the modular ray tracing technique to reduce computational requirements and to facilitate parallel decomposition. The parallel model makes use of both distributed and shared memory parallelism which are implemented with the MPI and OpenMP standards, respectively. The kernel is capable of parallel decomposition of problems in space, angle, and by characteristic rays up to 0(104) processors. Initial verification of the parallel 3-D MOC kernel was performed using the Takeda 3-D transport benchmark problems. The eigenvalues computed by MPACT are within the statistical uncertainty of the benchmark reference and agree well with the averages of other participants. The MPACT k{sub eff} differs from the benchmark results for rodded and un-rodded cases by 11 and -40 pcm, respectively. The calculations were performed for various numbers of processors and parallel decompositions up to 15625 processors; all producing the same result at convergence. The parallel efficiency of the worst case was 60%, while very good efficiency (>95%) was observed for cases using 500 processors. The overall run time for the 500 processor case was 231 seconds and 19 seconds for the case with 15625 processors. Ongoing work is focused on developing theoretical performance models and the implementation of acceleration techniques to minimize the number of iterations to converge. (authors)

  6. The 3D Death of a Massive Star

    NASA Astrophysics Data System (ADS)

    Kohler, Susanna

    2015-07-01

    What happens at the very end of a massive star's life, just before its core's collapse? A group led by Sean Couch (California Institute of Technology and Michigan State University) claim to have carried out the first three-dimensional simulations of these final few minutes — revealing new clues about the factors that can lead a massive star to explode in a catastrophic supernova at the end of its life. A Giant Collapses In dying massive stars, in-falling matter bounces off the of collapsed core, creating a shock wave. If the shock wave loses too much energy as it expands into the star, it can stall out — but further energy input can revive it and result in a successful explosion of the star as a core-collapse supernova. In simulations of this process, however, theorists have trouble getting the stars to consistently explode: the shocks often stall out and fail to revive. Couch and his group suggest that one reason might be that these simulations usually start at core collapse assuming spherical symmetry of the progenitor star. Adding Turbulence Couch and his collaborators suspect that the key is in the final minutes just before the star collapses. Models that assume a spherically-symmetric star can't include the effects of convection as the final shell of silicon is burned around the core — and those effects might have a significant impact! To test this hypothesis, the group ran fully 3D simulations of the final three minutes of the life of a 15 solar-mass star, ending with core collapse, bounce, and shock-revival. The outcome was striking: the 3D modeling introduced powerful turbulent convection (with speeds of several hundred km/s!) in the last few minutes of silicon-shell burning. As a result, the initial structure and motions in the star just before core collapse were very different from those in core-collapse simulations that use spherically-symmetric initial conditions. The turbulence was then further amplified during collapse and formation of the shock

  7. Parallelization of Program to Optimize Simulated Trajectories (POST3D)

    NASA Technical Reports Server (NTRS)

    Hammond, Dana P.; Korte, John J. (Technical Monitor)

    2001-01-01

    This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.

  8. Massively parallel femtosecond laser processing.

    PubMed

    Hasegawa, Satoshi; Ito, Haruyasu; Toyoda, Haruyoshi; Hayasaki, Yoshio

    2016-08-01

    Massively parallel femtosecond laser processing with more than 1000 beams was demonstrated. Parallel beams were generated by a computer-generated hologram (CGH) displayed on a spatial light modulator (SLM). The key to this technique is to optimize the CGH in the laser processing system using a scheme called in-system optimization. It was analytically demonstrated that the number of beams is determined by the horizontal number of pixels in the SLM NSLM that is imaged at the pupil plane of an objective lens and a distance parameter pd obtained by dividing the distance between adjacent beams by the diffraction-limited beam diameter. A performance limitation of parallel laser processing in our system was estimated at NSLM of 250 and pd of 7.0. Based on these parameters, the maximum number of beams in a hexagonal close-packed structure was calculated to be 1189 by using an analytical equation. PMID:27505815

  9. Massive fermion model in 3d and higher spin currents

    NASA Astrophysics Data System (ADS)

    Bonora, L.; Cvitan, M.; Prester, P. Dominis; de Souza, B. Lima; Smolić, I.

    2016-05-01

    We analyze the 3d free massive fermion theory coupled to external sources. The presence of a mass explicitly breaks parity invariance. We calculate two- and three-point functions of a gauge current and the energy momentum tensor and, for instance, obtain the well-known result that in the IR limit (but also in the UV one) we reconstruct the relevant CS action. We then couple the model to higher spin currents and explicitly work out the spin 3 case. In the UV limit we obtain an effective action which was proposed many years ago as a possible generalization of spin 3 CS action. In the IR limit we derive a different higher spin action. This analysis can evidently be generalized to higher spins. We also discuss the conservation and properties of the correlators we obtain in the intermediate steps of our derivation.

  10. Parallel algorithm for computing 3-D reachable workspaces

    NASA Astrophysics Data System (ADS)

    Alameldin, Tarek K.; Sobh, Tarek M.

    1992-03-01

    The problem of computing the 3-D workspace for redundant articulated chains has applications in a variety of fields such as robotics, computer aided design, and computer graphics. The computational complexity of the workspace problem is at least NP-hard. The recent advent of parallel computers has made practical solutions for the workspace problem possible. Parallel algorithms for computing the 3-D workspace for redundant articulated chains with joint limits are presented. The first phase of these algorithms computes workspace points in parallel. The second phase uses workspace points that are computed in the first phase and fits a 3-D surface around the volume that encompasses the workspace points. The second phase also maps the 3- D points into slices, uses region filling to detect the holes and voids in the workspace, extracts the workspace boundary points by testing the neighboring cells, and tiles the consecutive contours with triangles. The proposed algorithms are efficient for computing the 3-D reachable workspace for articulated linkages, not only those with redundant degrees of freedom but also those with joint limits.

  11. Parallelization of ARC3D with Computer-Aided Tools

    NASA Technical Reports Server (NTRS)

    Jin, Haoqiang; Hribar, Michelle; Yan, Jerry; Saini, Subhash (Technical Monitor)

    1998-01-01

    A series of efforts have been devoted to investigating methods of porting and parallelizing applications quickly and efficiently for new architectures, such as the SCSI Origin 2000 and Cray T3E. This report presents the parallelization of a CFD application, ARC3D, using the computer-aided tools, Cesspools. Steps of parallelizing this code and requirements of achieving better performance are discussed. The generated parallel version has achieved reasonably well performance, for example, having a speedup of 30 for 36 Cray T3E processors. However, this performance could not be obtained without modification of the original serial code. It is suggested that in many cases improving serial code and performing necessary code transformations are important parts for the automated parallelization process although user intervention in many of these parts are still necessary. Nevertheless, development and improvement of useful software tools, such as Cesspools, can help trim down many tedious parallelization details and improve the processing efficiency.

  12. CALTRANS: A parallel, deterministic, 3D neutronics code

    SciTech Connect

    Carson, L.; Ferguson, J.; Rogers, J.

    1994-04-01

    Our efforts to parallelize the deterministic solution of the neutron transport equation has culminated in a new neutronics code CALTRANS, which has full 3D capability. In this article, we describe the layout and algorithms of CALTRANS and present performance measurements of the code on a variety of platforms. Explicit implementation of the parallel algorithms of CALTRANS using both the function calls of the Parallel Virtual Machine software package (PVM 3.2) and the Meiko CS-2 tagged message passing library (based on the Intel NX/2 interface) are provided in appendices.

  13. Implementation of parallel matrix decomposition for NIKE3D on the KSR1 system

    SciTech Connect

    Su, Philip S.; Fulton, R.E.; Zacharia, T.

    1995-06-01

    New massively parallel computer architecture has revolutionized the design of computer algorithms and promises to have significant influence on algorithms for engineering computations. Realistic engineering problems using finite element analysis typically imply excessively large computational requirements. Parallel supercomputers that have the potential for significantly increasing calculation speeds can meet these computational requirements. This report explores the potential for the parallel Cholesky (U{sup T}DU) matrix decomposition algorithm on NIKE3D through actual computations. The examples of two- and three-dimensional nonlinear dynamic finite element problems are presented on the Kendall Square Research (KSR1) multiprocessor system, with 64 processors, at Oak Ridge National Laboratory. The numerical results indicate that the parallel Cholesky (U{sup T}DU) matrix decomposition algorithm is attractive for NIKE3D under multi-processor system environments.

  14. A parallel algorithm for solving the 3d Schroedinger equation

    SciTech Connect

    Strickland, Michael; Yager-Elorriaga, David

    2010-08-20

    We describe a parallel algorithm for solving the time-independent 3d Schroedinger equation using the finite difference time domain (FDTD) method. We introduce an optimized parallelization scheme that reduces communication overhead between computational nodes. We demonstrate that the compute time, t, scales inversely with the number of computational nodes as t {proportional_to} (N{sub nodes}){sup -0.95} {sup {+-} 0.04}. This makes it possible to solve the 3d Schroedinger equation on extremely large spatial lattices using a small computing cluster. In addition, we present a new method for precisely determining the energy eigenvalues and wavefunctions of quantum states based on a symmetry constraint on the FDTD initial condition. Finally, we discuss the usage of multi-resolution techniques in order to speed up convergence on extremely large lattices.

  15. Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU

    PubMed Central

    Xia, Yong; Wang, Kuanquan; Zhang, Henggui

    2015-01-01

    Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation) and the other is the diffusion term of the monodomain model (partial differential equation). Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations. PMID:26581957

  16. Seismic imaging on massively parallel computers

    SciTech Connect

    Ober, C.C.; Oldfield, R.; Womble, D.E.; VanDyke, J.; Dosanjh, S.

    1996-03-01

    Fast, accurate imaging of complex, oil-bearing geologies, such as overthrusts and salt domes, is the key to reducing the costs of domestic oil and gas exploration. Geophysicists say that the known oil reserves in the Gulf of Mexico could be significantly increased if accurate seismic imaging beneath salt domes was possible. A range of techniques exist for imaging these regions, but the highly accurate techniques involve the solution of the wave equation and are characterized by large data sets and large computational demands. Massively parallel computers can provide the computational power for these highly accurate imaging techniques. A brief introduction to seismic processing will be presented, and the implementation of a seismic-imaging code for distributed memory computers will be discussed. The portable code, Salvo, performs a wave equation-based, 3-D, prestack, depth imaging and currently runs on the Intel Paragon and the Cray T3D. It used MPI for portability, and has sustained 22 Mflops/sec/proc (compiled FORTRAN) on the Intel Paragon.

  17. Parallel PAB3D: Experiences with a Prototype in MPI

    NASA Technical Reports Server (NTRS)

    Guerinoni, Fabio; Abdol-Hamid, Khaled S.; Pao, S. Paul

    1998-01-01

    PAB3D is a three-dimensional Navier Stokes solver that has gained acceptance in the research and industrial communities. It takes as computational domain, a set disjoint blocks covering the physical domain. This is the first report on the implementation of PAB3D using the Message Passing Interface (MPI), a standard for parallel processing. We discuss briefly the characteristics of tile code and define a prototype for testing. The principal data structure used for communication is derived from preprocessing "patching". We describe a simple interface (COMMSYS) for MPI communication, and some general techniques likely to be encountered when working on problems of this nature. Last, we identify levels of improvement from the current version and outline future work.

  18. Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver

    NASA Astrophysics Data System (ADS)

    Moustafa, Salli; Dutka-Malen, Ivan; Plagne, Laurent; Ponçot, Angélique; Ramet, Pierre

    2014-06-01

    This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multicore+SIMD) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46 × 106 spatial cells and 1 × 1012 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40:74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool.

  19. 3D finite-difference seismic migration with parallel computers

    SciTech Connect

    Ober, C.C.; Gjertsen, R.; Minkoff, S.; Womble, D.E.

    1998-11-01

    The ability to image complex geologies such as salt domes in the Gulf of Mexico and thrusts in mountainous regions is essential for reducing the risk associated with oil exploration. Imaging these structures, however, is computationally expensive as datasets can be terabytes in size. Traditional ray-tracing migration methods cannot handle complex velocity variations commonly found near such salt structures. Instead the authors use the full 3D acoustic wave equation, discretized via a finite difference algorithm. They reduce the cost of solving the apraxial wave equation by a number of numerical techniques including the method of fractional steps and pipelining the tridiagonal solves. The imaging code, Salvo, uses both frequency parallelism (generally 90% efficient) and spatial parallelism (65% efficient). Salvo has been tested on synthetic and real data and produces clear images of the subsurface even beneath complicated salt structures.

  20. A parallel algorithm for 3D dislocation dynamics

    NASA Astrophysics Data System (ADS)

    Wang, Zhiqiang; Ghoniem, Nasr; Swaminarayan, Sriram; LeSar, Richard

    2006-12-01

    Dislocation dynamics (DD), a discrete dynamic simulation method in which dislocations are the fundamental entities, is a powerful tool for investigation of plasticity, deformation and fracture of materials at the micron length scale. However, severe computational difficulties arising from complex, long-range interactions between these curvilinear line defects limit the application of DD in the study of large-scale plastic deformation. We present here the development of a parallel algorithm for accelerated computer simulations of DD. By representing dislocations as a 3D set of dislocation particles, we show here that the problem of an interacting ensemble of dislocations can be converted to a problem of a particle ensemble, interacting with a long-range force field. A grid using binary space partitioning is constructed to keep track of node connectivity across domains. We demonstrate the computational efficiency of the parallel micro-plasticity code and discuss how O(N) methods map naturally onto the parallel data structure. Finally, we present results from applications of the parallel code to deformation in single crystal fcc metals.

  1. Parallel 3D Mortar Element Method for Adaptive Nonconforming Meshes

    NASA Technical Reports Server (NTRS)

    Feng, Huiyu; Mavriplis, Catherine; VanderWijngaart, Rob; Biswas, Rupak

    2004-01-01

    High order methods are frequently used in computational simulation for their high accuracy. An efficient way to avoid unnecessary computation in smooth regions of the solution is to use adaptive meshes which employ fine grids only in areas where they are needed. Nonconforming spectral elements allow the grid to be flexibly adjusted to satisfy the computational accuracy requirements. The method is suitable for computational simulations of unsteady problems with very disparate length scales or unsteady moving features, such as heat transfer, fluid dynamics or flame combustion. In this work, we select the Mark Element Method (MEM) to handle the non-conforming interfaces between elements. A new technique is introduced to efficiently implement MEM in 3-D nonconforming meshes. By introducing an "intermediate mortar", the proposed method decomposes the projection between 3-D elements and mortars into two steps. In each step, projection matrices derived in 2-D are used. The two-step method avoids explicitly forming/deriving large projection matrices for 3-D meshes, and also helps to simplify the implementation. This new technique can be used for both h- and p-type adaptation. This method is applied to an unsteady 3-D moving heat source problem. With our new MEM implementation, mesh adaptation is able to efficiently refine the grid near the heat source and coarsen the grid once the heat source passes. The savings in computational work resulting from the dynamic mesh adaptation is demonstrated by the reduction of the the number of elements used and CPU time spent. MEM and mesh adaptation, respectively, bring irregularity and dynamics to the computer memory access pattern. Hence, they provide a good way to gauge the performance of computer systems when running scientific applications whose memory access patterns are irregular and unpredictable. We select a 3-D moving heat source problem as the Unstructured Adaptive (UA) grid benchmark, a new component of the NAS Parallel

  2. IM3D: A parallel Monte Carlo code for efficient simulations of primary radiation displacements and damage in 3D geometry

    PubMed Central

    Li, Yong Gang; Yang, Yang; Short, Michael P.; Ding, Ze Jun; Zeng, Zhi; Li, Ju

    2015-01-01

    SRIM-like codes have limitations in describing general 3D geometries, for modeling radiation displacements and damage in nanostructured materials. A universal, computationally efficient and massively parallel 3D Monte Carlo code, IM3D, has been developed with excellent parallel scaling performance. IM3D is based on fast indexing of scattering integrals and the SRIM stopping power database, and allows the user a choice of Constructive Solid Geometry (CSG) or Finite Element Triangle Mesh (FETM) method for constructing 3D shapes and microstructures. For 2D films and multilayers, IM3D perfectly reproduces SRIM results, and can be ∼102 times faster in serial execution and > 104 times faster using parallel computation. For 3D problems, it provides a fast approach for analyzing the spatial distributions of primary displacements and defect generation under ion irradiation. Herein we also provide a detailed discussion of our open-source collision cascade physics engine, revealing the true meaning and limitations of the “Quick Kinchin-Pease” and “Full Cascades” options. The issues of femtosecond to picosecond timescales in defining displacement versus damage, the limitation of the displacements per atom (DPA) unit in quantifying radiation damage (such as inadequacy in quantifying degree of chemical mixing), are discussed. PMID:26658477

  3. IM3D: A parallel Monte Carlo code for efficient simulations of primary radiation displacements and damage in 3D geometry

    NASA Astrophysics Data System (ADS)

    Li, Yong Gang; Yang, Yang; Short, Michael P.; Ding, Ze Jun; Zeng, Zhi; Li, Ju

    2015-12-01

    SRIM-like codes have limitations in describing general 3D geometries, for modeling radiation displacements and damage in nanostructured materials. A universal, computationally efficient and massively parallel 3D Monte Carlo code, IM3D, has been developed with excellent parallel scaling performance. IM3D is based on fast indexing of scattering integrals and the SRIM stopping power database, and allows the user a choice of Constructive Solid Geometry (CSG) or Finite Element Triangle Mesh (FETM) method for constructing 3D shapes and microstructures. For 2D films and multilayers, IM3D perfectly reproduces SRIM results, and can be ∼102 times faster in serial execution and > 104 times faster using parallel computation. For 3D problems, it provides a fast approach for analyzing the spatial distributions of primary displacements and defect generation under ion irradiation. Herein we also provide a detailed discussion of our open-source collision cascade physics engine, revealing the true meaning and limitations of the “Quick Kinchin-Pease” and “Full Cascades” options. The issues of femtosecond to picosecond timescales in defining displacement versus damage, the limitation of the displacements per atom (DPA) unit in quantifying radiation damage (such as inadequacy in quantifying degree of chemical mixing), are discussed.

  4. Multigrid on massively parallel architectures

    SciTech Connect

    Falgout, R D; Jones, J E

    1999-09-17

    The scalable implementation of multigrid methods for machines with several thousands of processors is investigated. Parallel performance models are presented for three different structured-grid multigrid algorithms, and a description is given of how these models can be used to guide implementation. Potential pitfalls are illustrated when moving from moderate-sized parallelism to large-scale parallelism, and results are given from existing multigrid codes to support the discussion. Finally, the use of mixed programming models is investigated for multigrid codes on clusters of SMPs.

  5. Computational fluid dynamics on a massively parallel computer

    NASA Technical Reports Server (NTRS)

    Jespersen, Dennis C.; Levit, Creon

    1989-01-01

    A finite difference code was implemented for the compressible Navier-Stokes equations on the Connection Machine, a massively parallel computer. The code is based on the ARC2D/ARC3D program and uses the implicit factored algorithm of Beam and Warming. The codes uses odd-even elimination to solve linear systems. Timings and computation rates are given for the code, and a comparison is made with a Cray XMP.

  6. Warped black holes in 3D general massive gravity

    NASA Astrophysics Data System (ADS)

    Tonni, Erik

    2010-08-01

    We study regular spacelike warped black holes in the three dimensional general massive gravity model, which contains both the gravitational Chern-Simons term and the linear combination of curvature squared terms characterizing the new massive gravity besides the Einstein-Hilbert term. The parameters of the metric are found by solving a quartic equation, constrained by an inequality that imposes the absence of closed timelike curves. Explicit expressions for the central charges are suggested by exploiting the fact that these black holes are discrete quotients of spacelike warped AdS 3 and a known formula for the entropy. Previous results obtained separately in topological massive gravity and in new massive gravity are recovered as special cases.

  7. Massively parallel neurocomputing for aerospace applications

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Barhen, Jacob; Toomarian, Nikzad

    1993-01-01

    An innovative hybrid, analog-digital charge-domain technology, for the massively parallel VLSI implementation of certain large scale matrix-vector operations, has recently been introduced. It employs arrays of Charge Coupled/Charge Injection Device cells holding an analog matrix of charge, which process digital vectors in parallel by means of binary, non-destructive charge transfer operations. The impact of this technology on massively parallel processing is discussed. Fundamentally new classes of algorithms, specifically designed for this emerging technology, as applied to signal processing, are derived.

  8. Type D solutions of 3D new massive gravity

    SciTech Connect

    Ahmedov, Haji; Aliev, Alikram N.

    2011-04-15

    In a recent reformulation of three-dimensional new massive gravity, the field equations of the theory consist of a massive (tensorial) Klein-Gordon type equation with a curvature-squared source term and a constraint equation. Using this framework, we present all algebraic type D solutions of new massive gravity with constant and nonconstant scalar curvatures. For constant scalar curvature, they include homogeneous anisotropic solutions which encompass both solutions originating from topologically massive gravity, Bianchi types II, VIII, IX, and those of non-topologically massive gravity origin, Bianchi types VI{sub 0} and VII{sub 0}. For a special relation between the cosmological and mass parameters, {lambda}=m{sup 2}, they also include conformally flat solutions, and, in particular, those being locally isometric to the previously-known Kaluza-Klein type AdS{sub 2}xS{sup 1} or dS{sub 2}xS{sup 1} solutions. For nonconstant scalar curvature, all the solutions are conformally flat and exist only for {lambda}=m{sup 2}. We find two general metrics which possess at least one Killing vector and comprise all such solutions. We also discuss some properties of these solutions, delineating among them black hole type solutions.

  9. Matter coupling in 3D ‘minimal massive gravity’

    NASA Astrophysics Data System (ADS)

    Arvanitakis, Alex S.; Routh, Alasdair J.; Townsend, Paul K.

    2014-12-01

    The ‘minimal massive gravity’ model of massive gravity in three spacetime dimensions (which has the same anti-de Sitter (AdS) bulk properties as ‘topologically massive gravity’ but improved boundary properties) is coupled to matter. Consistency requires a particular matter source tensor, which is quadratic in the stress tensor. The consequences are explored for an ideal fluid in the context of asymptotically de Sitter (dS) cosmological solutions, which bounce smoothly from contraction to expansion. Various vacuum solutions are also found, including warped (A)dS, and (for special values of parameters) static black holes and an (A)dS2× {{S}1} vacuum.

  10. Massively Parallel Computing: A Sandia Perspective

    SciTech Connect

    Dosanjh, Sudip S.; Greenberg, David S.; Hendrickson, Bruce; Heroux, Michael A.; Plimpton, Steve J.; Tomkins, James L.; Womble, David E.

    1999-05-06

    The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is beginning to realize its potential for enabling significant break-throughs in science and engineering. This paper provides a perspective on the state of the field, colored by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address trends in hardware, system software and algorithms, and we also offer our view of the forces shaping the parallel computing industry.

  11. A 3D parallel model of Ganymede's exosphere

    NASA Astrophysics Data System (ADS)

    Leclercq, Ludivine; Turc, Lucile; François, Leblanc; Ronan, Modolo

    2013-04-01

    Ganymede is a unique object : it is the biggest moon of our solar system, and the only satellite which has its own intrinsic magnetic field. Its surface is covered by water ice and by regolith. Some previous observations suggest that below its surface may exist an ocean of liquid water. The atmosphere of the planet is poorly known but should be composed essentially of water, hydrogen and oxygen (Marconi et al., Icarus, 2007). These atmospheric particles mainly originate from the surface thanks to sublimation of water-ice and sputtering, a process driven by the magnetospheric Jovian particles impacting Ganymede surface and leading to ejection of atoms and molecules into Ganymede atmosphere. We developed a model of Ganymede's atmosphere based on a 3D Monte Carlo description of the fate of the ejected particles from the surface. This model has been parallelized allowing a much better statistical, spatial and temporal description of Ganymede's environment. This model includes the main sources of the neutral atmosphere and is able to calculate all its characteristics. It was successfully compared to the few known observations as well as to previous modeling. In this presentation, we will present the main characteristics of this model and what it tells us on Ganymede's atmosphere, in terms of spatial structure, composition, temporal variability and relations with both magnetosphere and surface.

  12. Performance analysis of high quality parallel preconditioners applied to 3D finite element structural analysis

    SciTech Connect

    Kolotilina, L.; Nikishin, A.; Yeremin, A.

    1994-12-31

    The solution of large systems of linear equations is a crucial bottleneck when performing 3D finite element analysis of structures. Also, in many cases the reliability and robustness of iterative solution strategies, and their efficiency when exploiting hardware resources, fully determine the scope of industrial applications which can be solved on a particular computer platform. This is especially true for modern vector/parallel supercomputers with large vector length and for modern massively parallel supercomputers. Preconditioned iterative methods have been successfully applied to industrial class finite element analysis of structures. The construction and application of high quality preconditioners constitutes a high percentage of the total solution time. Parallel implementation of high quality preconditioners on such architectures is a formidable challenge. Two common types of existing preconditioners are the implicit preconditioners and the explicit preconditioners. The implicit preconditioners (e.g. incomplete factorizations of several types) are generally high quality but require solution of lower and upper triangular systems of equations per iteration which are difficult to parallelize without deteriorating the convergence rate. The explicit type of preconditionings (e.g. polynomial preconditioners or Jacobi-like preconditioners) require sparse matrix-vector multiplications and can be parallelized but their preconditioning qualities are less than desirable. The authors present results of numerical experiments with Factorized Sparse Approximate Inverses (FSAI) for symmetric positive definite linear systems. These are high quality preconditioners that possess a large resource of parallelism by construction without increasing the serial complexity.

  13. Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing

    NASA Technical Reports Server (NTRS)

    Fricker, David M.

    1997-01-01

    The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.

  14. Massive parallelism in the future of science

    NASA Technical Reports Server (NTRS)

    Denning, Peter J.

    1988-01-01

    Massive parallelism appears in three domains of action of concern to scientists, where it produces collective action that is not possible from any individual agent's behavior. In the domain of data parallelism, computers comprising very large numbers of processing agents, one for each data item in the result will be designed. These agents collectively can solve problems thousands of times faster than current supercomputers. In the domain of distributed parallelism, computations comprising large numbers of resource attached to the world network will be designed. The network will support computations far beyond the power of any one machine. In the domain of people parallelism collaborations among large groups of scientists around the world who participate in projects that endure well past the sojourns of individuals within them will be designed. Computing and telecommunications technology will support the large, long projects that will characterize big science by the turn of the century. Scientists must become masters in these three domains during the coming decade.

  15. Massively parallel sequencing and rare disease

    PubMed Central

    Ng, Sarah B.; Nickerson, Deborah A.; Bamshad, Michael J.; Shendure, Jay

    2010-01-01

    Massively parallel sequencing has enabled the rapid, systematic identification of variants on a large scale. This has, in turn, accelerated the pace of gene discovery and disease diagnosis on a molecular level and has the potential to revolutionize methods particularly for the analysis of Mendelian disease. Using massively parallel sequencing has enabled investigators to interrogate variants both in the context of linkage intervals and also on a genome-wide scale, in the absence of linkage information entirely. The primary challenge now is to distinguish between background polymorphisms and pathogenic mutations. Recently developed strategies for rare monogenic disorders have met with some early success. These strategies include filtering for potential causal variants based on frequency and function, and also ranking variants based on conservation scores and predicted deleteriousness to protein structure. Here, we review the recent literature in the use of high-throughput sequence data and its analysis in the discovery of causal mutations for rare disorders. PMID:20846941

  16. Associative massively parallel processor for video processing

    NASA Astrophysics Data System (ADS)

    Krikelis, Argy; Tawiah, T.

    1996-03-01

    Massively parallel processing architectures have matured primarily through image processing and computer vision application. The similarity of processing requirements between these areas and video processing suggest that they should be very appropriate for video processing applications. This research describes the use of an associative massively parallel processing based system for video compression which includes architectural and system description, discussion of the implementation of compression tasks such as DCT/IDCT, Motion Estimation and Quantization and system evaluation. The core of the processing system is the ASP (Associative String Processor) architecture a modular massively parallel, programmable and inherently fault-tolerant fine-grain SIMD processing architecture incorporating a string of identical APEs (Associative Processing Elements), a reconfigurable inter-processor communication network and a Vector Data Buffer for fully-overlapped data input-output. For video compression applications a prototype system is developed, which is using ASP modules to implement the required compression tasks. This scheme leads to a linear speed up of the computation by simply adding more APEs to the modules.

  17. Template based parallel checkpointing in a massively parallel computer system

    DOEpatents

    Archer, Charles Jens; Inglett, Todd Alan

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  18. Efficient communication in massively parallel computers

    SciTech Connect

    Cypher, R.E.

    1989-01-01

    A fundamental operation in parallel computation is sorting. Sorting is important not only because it is required by many algorithms, but also because it can be used to implement irregular, pointer-based communication. The author studies two algorithms for sorting in massively parallel computers. First, he examines Shellsort. Shellsort is a sorting algorithm that is based on a sequence of parameters called increments. Shellsort can be used to create a parallel sorting device known as a sorting network. Researchers have suggested that if the correct increment sequence is used, an optimal size sorting network can be obtained. All published increment sequences have been monotonically decreasing. He shows that no monotonically decreasing increment sequence will yield an optimal size sorting network. Second, he presents a sorting algorithm called Cubesort. Cubesort is the fastest known sorting algorithm for a variety of parallel computers aver a wide range of parameters. He also presents a paradigm for developing parallel algorithms that have efficient communication. The paradigm, called the data reduction paradigm, consists of using a divide-and-conquer strategy. Both the division and combination phases of the divide-and-conquer algorithm may require irregular, pointer-based communication between processors. However, the problem is divided so as to limit the amount of data that must be communicated. As a result the communication can be performed efficiently. He presents data reduction algorithms for the image component labeling problem, the closest pair problem and four versions of the parallel prefix problem.

  19. 3-D parallel program for numerical calculation of gas dynamics problems with heat conductivity on distributed memory computational systems (CS)

    SciTech Connect

    Sofronov, I.D.; Voronin, B.L.; Butnev, O.I.

    1997-12-31

    The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle. The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.

  20. New 3D parallel GILD electromagnetic modeling and nonlinear inversion using global magnetic integral and local differential equation

    SciTech Connect

    Xie, G.; Li, J.; Majer, E.; Zuo, D.

    1998-07-01

    This paper describes a new 3D parallel GILD electromagnetic (EM) modeling and nonlinear inversion algorithm. The algorithm consists of: (a) a new magnetic integral equation instead of the electric integral equation to solve the electromagnetic forward modeling and inverse problem; (b) a collocation finite element method for solving the magnetic integral and a Galerkin finite element method for the magnetic differential equations; (c) a nonlinear regularizing optimization method to make the inversion stable and of high resolution; and (d) a new parallel 3D modeling and inversion using a global integral and local differential domain decomposition technique (GILD). The new 3D nonlinear electromagnetic inversion has been tested with synthetic data and field data. The authors obtained very good imaging for the synthetic data and reasonable subsurface EM imaging for the field data. The parallel algorithm has high parallel efficiency over 90% and can be a parallel solver for elliptic, parabolic, and hyperbolic modeling and inversion. The parallel GILD algorithm can be extended to develop a high resolution and large scale seismic and hydrology modeling and inversion in the massively parallel computer.

  1. Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA.

    PubMed

    Mrozek, Dariusz; Brożek, Miłosz; Małysiak-Mrozek, Bożena

    2014-02-01

    Searching for similar 3D protein structures is one of the primary processes employed in the field of structural bioinformatics. However, the computational complexity of this process means that it is constantly necessary to search for new methods that can perform such a process faster and more efficiently. Finding molecular substructures that complex protein structures have in common is still a challenging task, especially when entire databases containing tens or even hundreds of thousands of protein structures must be scanned. Graphics processing units (GPUs) and general purpose graphics processing units (GPGPUs) can perform many time-consuming and computationally demanding processes much more quickly than a classical CPU can. In this paper, we describe the GPU-based implementation of the CASSERT algorithm for 3D protein structure similarity searching. This algorithm is based on the two-phase alignment of protein structures when matching fragments of the compared proteins. The GPU (GeForce GTX 560Ti: 384 cores, 2GB RAM) implementation of CASSERT ("GPU-CASSERT") parallelizes both alignment phases and yields an average 180-fold increase in speed over its CPU-based, single-core implementation on an Intel Xeon E5620 (2.40GHz, 4 cores). In this paper, we show that massive parallelization of the 3D structure similarity search process on many-core GPU devices can reduce the execution time of the process, allowing it to be performed in real time. GPU-CASSERT is available at: http://zti.polsl.pl/dmrozek/science/gpucassert/cassert.htm. PMID:24481593

  2. Massively Parallel Direct Simulation of Multiphase Flow

    SciTech Connect

    COOK,BENJAMIN K.; PREECE,DALE S.; WILLIAMS,J.R.

    2000-08-10

    The authors understanding of multiphase physics and the associated predictive capability for multi-phase systems are severely limited by current continuum modeling methods and experimental approaches. This research will deliver an unprecedented modeling capability to directly simulate three-dimensional multi-phase systems at the particle-scale. The model solves the fully coupled equations of motion governing the fluid phase and the individual particles comprising the solid phase using a newly discovered, highly efficient coupled numerical method based on the discrete-element method and the Lattice-Boltzmann method. A massively parallel implementation will enable the solution of large, physically realistic systems.

  3. Time sharing massively parallel machines. Draft

    SciTech Connect

    Gorda, B.; Wolski, R.

    1995-03-01

    As part of the Massively Parallel Computing Initiative (MPCI) at the Lawrence Livermore National Laboratory, the authors have developed a simple, effective and portable time sharing mechanism by scheduling gangs of processes on tightly coupled parallel machines. By time-sharing the resources, the system interleaves production and interactive jobs. Immediate priority is given to interactive use, maintaining good response time. Production jobs are scheduled during idle periods, making use of the otherwise unused resources. In this paper the authors discuss their experience with gang scheduling over the 3 year life-time of the project. In section 2, they motivate the project and discuss some of its details. Section 3.0 describes the general scheduling problem and how gang scheduling addresses it. In section 4.0, they describe the implementation. Section 8.0 presents results culled over the lifetime of the project. They conclude this paper with some observations and possible future directions.

  4. Parallel adaptive mesh refinement within the PUMAA3D Project

    NASA Technical Reports Server (NTRS)

    Freitag, Lori; Jones, Mark; Plassmann, Paul

    1995-01-01

    To enable the solution of large-scale applications on distributed memory architectures, we are designing and implementing parallel algorithms for the fundamental tasks of unstructured mesh computation. In this paper, we discuss efficient algorithms developed for two of these tasks: parallel adaptive mesh refinement and mesh partitioning. The algorithms are discussed in the context of two-dimensional finite element solution on triangular meshes, but are suitable for use with a variety of element types and with h- or p-refinement. Results demonstrating the scalability and efficiency of the refinement algorithm and the quality of the mesh partitioning are presented for several test problems on the Intel DELTA.

  5. Arbitrary and Parallel Nanofabrication of 3D Metal Structures with Polymer Brush Resists.

    PubMed

    Chen, Chaojian; Xie, Zhuang; Wei, Xiaoling; Zheng, Zijian

    2015-12-01

    3D polymer brushes are reported for the first time as ideal resists for the alignment-free nanofabrication of complex 3D metal structures with sub-100 nm lateral resolution and sub-10 nm vertical resolution. Since 3D polymer brushes can be serially fabricated in parallel, this method is effective to generate arbitrary 3D metal structures over a large area at a high throughput. PMID:26439441

  6. Parallel deterministic neutronics with AMR in 3D

    SciTech Connect

    Clouse, C.; Ferguson, J.; Hendrickson, C.

    1997-12-31

    AMTRAN, a three dimensional Sn neutronics code with adaptive mesh refinement (AMR) has been parallelized over spatial domains and energy groups and runs on the Meiko CS-2 with MPI message passing. Block refined AMR is used with linear finite element representations for the fluxes, which allows for a straight forward interpretation of fluxes at block interfaces with zoning differences. The load balancing algorithm assumes 8 spatial domains, which minimizes idle time among processors.

  7. New 3D parallel SGILD modeling and inversion

    SciTech Connect

    Xie, G.; Li, J.; Majer, E.

    1998-09-01

    In this paper, a new parallel modeling and inversion algorithm using a Stochastic Global Integral and Local Differential equation (SGILD) is presented. The authors derived new acoustic integral equations and differential equation for statistical moments of the parameters and field. The new statistical moments integral equation on the boundary and local differential equations in domain will be used together to obtain mean wave field and its moments in the modeling. The new moments global Jacobian volume integral equation and the local Jacobian differential equations in domain will be used together to update the mean parameters and their moments in the inversion. A new parallel multiple hierarchy substructure direct algorithm or direct-iteration hybrid algorithm will be used to solve the sparse matrices and one smaller full matrix from domain to the boundary, in parallel. The SGILD modeling and imaging algorithm has many advantages over the conventional imaging approaches. The SGILD algorithm can be used for the stochastic acoustic, electromagnetic, and flow modeling and inversion, and are important for the prediction of oil, gas, coal, and geothermal energy reservoirs in geophysical exploration.

  8. Massive hybrid parallelism for fully implicit multiphysics

    SciTech Connect

    Gaston, D. R.; Permann, C. J.; Andrs, D.; Peterson, J. W.

    2013-07-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided. (authors)

  9. MASSIVE HYBRID PARALLELISM FOR FULLY IMPLICIT MULTIPHYSICS

    SciTech Connect

    Cody J. Permann; David Andrs; John W. Peterson; Derek R. Gaston

    2013-05-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided.

  10. Parallel contact detection algorithm for transient solid dynamics simulations using PRONTO3D

    SciTech Connect

    Attaway, S.W.; Hendrickson, B.A.; Plimpton, S.J.

    1996-09-01

    An efficient, scalable, parallel algorithm for treating material surface contacts in solid mechanics finite element programs has been implemented in a modular way for MIMD parallel computers. The serial contact detection algorithm that was developed previously for the transient dynamics finite element code PRONTO3D has been extended for use in parallel computation by devising a dynamic (adaptive) processor load balancing scheme.

  11. Solid modeling on a massively parallel processor

    SciTech Connect

    Strip, D. ); Karasick, M. )

    1992-01-01

    Solid modeling underlies many technologies that are key to modern manufacturing. These range from computer-aided design systems to robot simulators, from finite element analysis to integrated circuit process modeling. The accuracy, and hence the utility, of these models is often constrained by the amount of computer time required to perform the desired operations. This paper presents a family of algorithms for solid modeling operations using the Connection Machine, a massively parallel SIMD processor. The authors describe a data structure for representing solid models and algorithms that use the representation to implement efficiently a variety of solid modeling operations. The authors give a sketch of the algorithm for intersecting solids and present computational experience using these algorithms. The data structure and algorithms are contrasted with those of serial architectures, and execution times are compared.

  12. Parallel processing for efficient 3D slope stability modelling

    NASA Astrophysics Data System (ADS)

    Marchesini, Ivan; Mergili, Martin; Alvioli, Massimiliano; Metz, Markus; Schneider-Muntau, Barbara; Rossi, Mauro; Guzzetti, Fausto

    2014-05-01

    We test the performance of the GIS-based, three-dimensional slope stability model r.slope.stability. The model was developed as a C- and python-based raster module of the GRASS GIS software. It considers the three-dimensional geometry of the sliding surface, adopting a modification of the model proposed by Hovland (1977), and revised and extended by Xie and co-workers (2006). Given a terrain elevation map and a set of relevant thematic layers, the model evaluates the stability of slopes for a large number of randomly selected potential slip surfaces, ellipsoidal or truncated in shape. Any single raster cell may be intersected by multiple sliding surfaces, each associated with a value of the factor of safety, FS. For each pixel, the minimum value of FS and the depth of the associated slip surface are stored. This information is used to obtain a spatial overview of the potentially unstable slopes in the study area. We test the model in the Collazzone area, Umbria, central Italy, an area known to be susceptible to landslides of different type and size. Availability of a comprehensive and detailed landslide inventory map allowed for a critical evaluation of the model results. The r.slope.stability code automatically splits the study area into a defined number of tiles, with proper overlap in order to provide the same statistical significance for the entire study area. The tiles are then processed in parallel by a given number of processors, exploiting a multi-purpose computing environment at CNR IRPI, Perugia. The map of the FS is obtained collecting the individual results, taking the minimum values on the overlapping cells. This procedure significantly reduces the processing time. We show how the gain in terms of processing time depends on the tile dimensions and on the number of cores.

  13. MPSim: A Massively Parallel General Simulation Program for Materials

    NASA Astrophysics Data System (ADS)

    Iotov, Mihail; Gao, Guanghua; Vaidehi, Nagarajan; Cagin, Tahir; Goddard, William A., III

    1997-08-01

    In this talk, we describe a general purpose Massively Parallel Simulation (MPSim) program used for computational materials science and life sciences. We also will present scaling aspects of the program along with several case studies. The program incorporates highly efficient CMM method to accurately calculate the interactions. For studying bulk materials, the program uses the Reduced CMM to account for infinite range sums. The software embodies various advanced molecular dynamics algorithms, energy and structure optimization techniques with a set of analysis tools suitable for large scale structures. The applications using the program range amorphous polymers, liquid-polymer interfaces, large viruses, million atom clusters, surfaces, gas diffusion in polymers. Program is originally developed on KSR in an object oriented fashion and is ported to SGI-PC, and HP-Examplar. Message Passing version is originally implemented on Intel Paragon using NX, then MPI and later tested on Cray T3D, and IBM SP2 platforms.

  14. Prediction of parallel NIKE3D performance on the KSR1 system

    SciTech Connect

    Su, P.S.; Zacharia, T.; Fulton, R.E.

    1995-05-01

    Finite element method is one of the bases for numerical solutions to engineering problems. Complex engineering problems using finite element analysis typically imply excessively large computational time. Parallel supercomputers have the potential for significantly increasing calculation speeds in order to meet these computational requirements. This paper predicts parallel NIKE3D performance on the Kendall Square Research (KSR1) system. The first part of the prediction is based on the implementation of parallel Cholesky (U{sup T}DU) matrix decomposition algorithm through actual computations on the KSRI multiprocessor system, with 64 processors, at Oak Ridge National Laboratory. The other predictions are based on actual computations for parallel element matrix generation, parallel global stiffness matrix assembly, and parallel forward/backward substitution on the BBN TC2000 multiprocessor system at Lawrence Livermore National Laboratory. The preliminary results indicate that parallel NIKE3D performance can be attractive under local/shared-memory multiprocessor system environments.

  15. A parallel multigrid-based preconditioner for the 3D heterogeneous high-frequency Helmholtz equation

    SciTech Connect

    Riyanti, C.D. . E-mail: C.D.Riyanti@tudelft.nl; Kononov, A.; Erlangga, Y.A.; Vuik, C.; Oosterlee, C.W.; Plessix, R.-E.; Mulder, W.A.

    2007-05-20

    We investigate the parallel performance of an iterative solver for 3D heterogeneous Helmholtz problems related to applications in seismic wave propagation. For large 3D problems, the computation is no longer feasible on a single processor, and the memory requirements increase rapidly. Therefore, parallelization of the solver is needed. We employ a complex shifted-Laplace preconditioner combined with the Bi-CGSTAB iterative method and use a multigrid method to approximate the inverse of the resulting preconditioning operator. A 3D multigrid method with 2D semi-coarsening is employed. We show numerical results for large problems arising in geophysical applications.

  16. Massively parallel neural network intelligent browse

    NASA Astrophysics Data System (ADS)

    Maxwell, Thomas P.; Zion, Philip M.

    1992-04-01

    A massively parallel neural network architecture is currently being developed as a potential component of a distributed information system in support of NASA's Earth Observing System. This architecture can be trained, via an iterative learning process, to recognize objects in images based on texture features, allowing scientists to search for all patterns which are similar to a target pattern in a database of images. It may facilitate scientific inquiry by allowing scientists to automatically search for physical features of interest in a database through computer pattern recognition, alleviating the need for exhaustive visual searches through possibly thousands of images. The architecture is implemented on a Connection Machine such that each physical processor contains a simulated 'neuron' which views a feature vector derived from a subregion of the input image. Each of these neurons is trained, via the perceptron rule, to identify the same pattern. The network output gives a probability distribution over the input image of finding the target pattern in a given region. In initial tests the architecture was trained to separate regions containing clouds from clear regions in 512 by 512 pixel AVHRR images. We found that in about 10 minutes we can train a network to perform with high accuracy in recognizing clouds which were texturally similar to a target cloud group. These promising results suggest that this type of architecture may play a significant role in coping with the forthcoming flood of data from the Earth-monitoring missions of the major space-faring nations.

  17. Multiplexed microsatellite recovery using massively parallel sequencing

    USGS Publications Warehouse

    Jennings, T.N.; Knaus, B.J.; Mullins, T.D.; Haig, S.M.; Cronn, R.C.

    2011-01-01

    Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5M (USD).

  18. Fault tolerant massively parallel processing architecture

    SciTech Connect

    Balasubramanian, V.; Banerjee, P.

    1987-08-01

    This paper presents two massively parallel processing architectures suitable for solving a wide variety of algorithms of divide-and-conquer type for problems such as the discrete Fourier transform, production systems, design automation, and others. The first architecture, called the Chain-structured Butterfly ARchitecture (CBAR), consists of a two-dimensional array of N-L . (log/sub 2/(L)+1) processing elements (PE) organized as L levels of log/sub 2/(L)+1 stages, and which has the butterfly connection between PEs in consecutive stages with straight-through feedback between PEs in the last and first stages. This connection system has the desirable property of allowing thousands of PEs to be connected with O(N) connection cost, O(log/sub 2/(N/log/sub 2/N)) communication paths, and a small number (=4) of I/O ports per PE. However, this architecture is not fault tolerant. The authors, therefore, propose a second architecture, called the REconfigurable Chain-structured Butterfly ARchitecture (RECBAR), which is a modified version of the CBAR. The RECBAR possesses all the desirable features of the CBAR, with the number of I/O ports per PE increased to six, and uses O(log/sub 2/N)/N) overhead in PEs and approximately 50% overhead in links to achieve single-level fault tolerance. Reliability improvements of the RECBAR over the CBAR are studied. This paper also presents a distributed diagnostic and structuring algorithm for the RECBAR that enables the architecture to detect faults and structure itself accordingly within 2 . log/sub 2/(L)+1 time steps, thus making it a truly fault tolerant architecture.

  19. The DANTE Boltzmann transport solver: An unstructured mesh, 3-D, spherical harmonics algorithm compatible with parallel computer architectures

    SciTech Connect

    McGhee, J.M.; Roberts, R.M.; Morel, J.E.

    1997-06-01

    A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner for scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated.

  20. Linking 1D evolutionary to 3D hydrodynamical simulations of massive stars

    NASA Astrophysics Data System (ADS)

    Cristini, A.; Meakin, C.; Hirschi, R.; Arnett, D.; Georgy, C.; Viallet, M.

    2016-03-01

    Stellar evolution models of massive stars are important for many areas of astrophysics, for example nucleosynthesis yields, supernova progenitor models and understanding physics under extreme conditions. Turbulence occurs in stars primarily due to nuclear burning at different mass coordinates within the star. The understanding and correct treatment of turbulence and turbulent mixing at convective boundaries in stellar models has been studied for decades but still lacks a definitive solution. This paper presents initial results of a study on convective boundary mixing (CBM) in massive stars. The ‘stiffness’ of a convective boundary can be quantified using the bulk Richardson number ({{Ri}}{{B}}), the ratio of the potential energy for restoration of the boundary to the kinetic energy of turbulent eddies. A ‘stiff’ boundary ({{Ri}}{{B}}˜ {10}4) will suppress CBM, whereas in the opposite case a ‘soft’ boundary ({{Ri}}{{B}}˜ 10) will be more susceptible to CBM. One of the key results obtained so far is that lower convective boundaries (closer to the centre) of nuclear burning shells are ‘stiffer’ than the corresponding upper boundaries, implying limited CBM at lower shell boundaries. This is in agreement with 3D hydrodynamic simulations carried out by Meakin and Arnett (2007 Astrophys. J. 667 448-75). This result also has implications for new CBM prescriptions in massive stars as well as for nuclear burning flame front propagation in super-asymptotic giant branch stars and also the onset of novae.

  1. The EMCC / DARPA Massively Parallel Electromagnetic Scattering Project

    NASA Technical Reports Server (NTRS)

    Woo, Alex C.; Hill, Kueichien C.

    1996-01-01

    The Electromagnetic Code Consortium (EMCC) was sponsored by the Advanced Research Program Agency (ARPA) to demonstrate the effectiveness of massively parallel computing in large scale radar signature predictions. The EMCC/ARPA project consisted of three parts.

  2. Visualization on massively parallel computers using CM/AVS

    SciTech Connect

    Krogh, M.F.; Hansen, C.D.

    1993-09-01

    CM/AVS is a visualization environment for the massively parallel CM-5 from Thinking Machines. It provides a backend to the standard commercially available AVS visualization product. At the Advanced Computing Laboratory at Los Alamos National Laboratory, we have been experimenting and utilizing this software within our visualization environment. This paper describes our experiences with CM/AVS. The conclusions reached are applicable to any implimentation of visualization software within a massively parallel computing environment.

  3. Experimental free-space optical network for massively parallel computers

    NASA Astrophysics Data System (ADS)

    Araki, S.; Kajita, M.; Kasahara, K.; Kubota, K.; Kurihara, K.; Redmond, I.; Schenfeld, E.; Suzaki, T.

    1996-03-01

    A free-space optical interconnection scheme is described for massively parallel processors based on the interconnection-cached network architecture. The optical network operates in a circuit-switching mode. Combined with a packet-switching operation among the circuit-switched optical channels, a high-bandwidth, low-latency network for massively parallel processing results. The design and assembly of a 64-channel experimental prototype is discussed, and operational results are presented.

  4. Three-dimensional radiative transfer on a massively parallel computer

    NASA Astrophysics Data System (ADS)

    Vath, H. M.

    1994-04-01

    We perform 3D radiative transfer calculations in non-local thermodynamic equilibrium (NLTE) in the simple two-level atom approximation on the Mas-Par MP-1, which contains 8192 processors and is a single instruction multiple data (SIMD) machine, an example of the new generation of massively parallel computers. On such a machine, all processors execute the same command at a given time, but on different data. To make radiative transfer calculations efficient, we must re-consider the numerical methods and storage of data. To solve the transfer equation, we adopt the short characteristic method and examine different acceleration methods to obtain the source function. We use the ALI method and test local and non-local operators. Furthermore, we compare the Ng and the orthomin methods of acceleration. We also investigate the use of multi-grid methods to get fast solutions for the NLTE case. In order to test these numerical methods, we apply them to two problems with and without periodic boundary conditions.

  5. 3D Modeling of the Massive Binary Wind Interaction Region in Eta Carinae

    NASA Astrophysics Data System (ADS)

    Madura, Thomas; Gull, T.; Owocki, S.; Okazaki, A.; Russell, C.

    2009-01-01

    We present recent work on the theoretical modeling of low excitation ([Fe II]) and high excitation ([Fe III]) wind lines observed in Eta Carinae using the HST/STIS. The spatially resolved structures seen in these lines are interpreted as the time-averaged, outer extensions of the wind from the primary star and the wind-wind interaction region of the massive binary system. For most of the orbit, the wind-wind interface can be approximated as a cone with a half-opening angle of 65° whose axis of rotation is aligned with the major axis of the binary orbit and appears to lie in the plane of the Homunculus disk. However, because the orbit is highly elliptical, this approximation breaks down at periastron and so full 3D Smoothed Particle Hydrodynamics (SPH) simulations become necessary. By analyzing the results of these 3D SPH simulations of the binary interactions and comparing them to the spectra obtained with the HST/STIS we place further constraints on the orientation of the binary orbit, and hope to eventually determine how/where UV light is escaping in the system, to search for any direct signatures of the companion star, and to ultimately establish a mass ratio for the system.

  6. RAMA: A file system for massively parallel computers

    NASA Technical Reports Server (NTRS)

    Miller, Ethan L.; Katz, Randy H.

    1993-01-01

    This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.

  7. IMPAIR: massively parallel deconvolution on the GPU

    NASA Astrophysics Data System (ADS)

    Sherry, Michael; Shearer, Andy

    2013-02-01

    The IMPAIR software is a high throughput image deconvolution tool for processing large out-of-core datasets of images, varying from large images with spatially varying PSFs to large numbers of images with spatially invariant PSFs. IMPAIR implements a parallel version of the tried and tested Richardson-Lucy deconvolution algorithm regularised via a custom wavelet thresholding library. It exploits the inherently parallel nature of the convolution operation to achieve quality results on consumer grade hardware: through the NVIDIA Tesla GPU implementation, the multi-core OpenMP implementation, and the cluster computing MPI implementation of the software. IMPAIR aims to address the problem of parallel processing in both top-down and bottom-up approaches: by managing the input data at the image level, and by managing the execution at the instruction level. These combined techniques will lead to a scalable solution with minimal resource consumption and maximal load balancing. IMPAIR is being developed as both a stand-alone tool for image processing, and as a library which can be embedded into non-parallel code to transparently provide parallel high throughput deconvolution.

  8. EFFICIENT SCHEDULING OF PARALLEL JOBS ON MASSIVELY PARALLEL SYSTEMS

    SciTech Connect

    F. PETRINI; W. FENG

    1999-09-01

    We present buffered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the efficient implementation of a distributed operating system. Buffered coscheduling is based on three innovative techniques: communication buffering, strobing, and non-blocking communication. By leveraging these techniques, we can perform effective optimizations based on the global status of the parallel machine rather than on the limited knowledge available locally to each processor. The advantages of buffered coscheduling include higher resource utilization, reduced communication overhead, efficient implementation of low-control strategies and fault-tolerant protocols, accurate performance modeling, and a simplified yet still expressive parallel programming model. Preliminary experimental results show that buffered coscheduling is very effective in increasing the overall performance in the presence of load imbalance and communication-intensive workloads.

  9. Scan line graphics generation on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Dorband, John E.

    1988-01-01

    Described here is how researchers implemented a scan line graphics generation algorithm on the Massively Parallel Processor (MPP). Pixels are computed in parallel and their results are applied to the Z buffer in large groups. To perform pixel value calculations, facilitate load balancing across the processors and apply the results to the Z buffer efficiently in parallel requires special virtual routing (sort computation) techniques developed by the author especially for use on single-instruction multiple-data (SIMD) architectures.

  10. Gust Acoustics Computation with a Space-Time CE/SE Parallel 3D Solver

    NASA Technical Reports Server (NTRS)

    Wang, X. Y.; Himansu, A.; Chang, S. C.; Jorgenson, P. C. E.; Reddy, D. R. (Technical Monitor)

    2002-01-01

    The benchmark Problem 2 in Category 3 of the Third Computational Aero-Acoustics (CAA) Workshop is solved using the space-time conservation element and solution element (CE/SE) method. This problem concerns the unsteady response of an isolated finite-span swept flat-plate airfoil bounded by two parallel walls to an incident gust. The acoustic field generated by the interaction of the gust with the flat-plate airfoil is computed by solving the 3D (three-dimensional) Euler equations in the time domain using a parallel version of a 3D CE/SE solver. The effect of the gust orientation on the far-field directivity is studied. Numerical solutions are presented and compared with analytical solutions, showing a reasonable agreement.

  11. An improved parallel SPH approach to solve 3D transient generalized Newtonian free surface flows

    NASA Astrophysics Data System (ADS)

    Ren, Jinlian; Jiang, Tao; Lu, Weigang; Li, Gang

    2016-08-01

    In this paper, a corrected parallel smoothed particle hydrodynamics (C-SPH) method is proposed to simulate the 3D generalized Newtonian free surface flows with low Reynolds number, especially the 3D viscous jets buckling problems are investigated. The proposed C-SPH method is achieved by coupling an improved SPH method based on the incompressible condition with the traditional SPH (TSPH), that is, the improved SPH with diffusive term and first-order Kernel gradient correction scheme is used in the interior of the fluid domain, and the TSPH is used near the free surface. Thus the C-SPH method possesses the advantages of two methods. Meanwhile, an effective and convenient boundary treatment is presented to deal with 3D multiple-boundary problem, and the MPI parallelization technique with a dynamic cells neighbor particle searching method is considered to improve the computational efficiency. The validity and the merits of the C-SPH are first verified by solving several benchmarks and compared with other results. Then the viscous jet folding/coiling based on the Cross model is simulated by the C-SPH method and compared with other experimental or numerical results. Specially, the influences of macroscopic parameters on the flow are discussed. All the numerical results agree well with available data, and show that the C-SPH method has higher accuracy and better stability for solving 3D moving free surface flows over other particle methods.

  12. Massively Parallel Sequencing: The Next Big Thing in Genetic Medicine

    PubMed Central

    Tucker, Tracy; Marra, Marco; Friedman, Jan M.

    2009-01-01

    Massively parallel sequencing has reduced the cost and increased the throughput of genomic sequencing by more than three orders of magnitude, and it seems likely that costs will fall and throughput improve even more in the next few years. Clinical use of massively parallel sequencing will provide a way to identify the cause of many diseases of unknown etiology through simultaneous screening of thousands of loci for pathogenic mutations and by sequencing biological specimens for the genomic signatures of novel infectious agents. In addition to providing these entirely new diagnostic capabilities, massively parallel sequencing may also replace arrays and Sanger sequencing in clinical applications where they are currently being used. Routine clinical use of massively parallel sequencing will require higher accuracy, better ways to select genomic subsets of interest, and improvements in the functionality, speed, and ease of use of data analysis software. In addition, substantial enhancements in laboratory computer infrastructure, data storage, and data transfer capacity will be needed to handle the extremely large data sets produced. Clinicians and laboratory personnel will require training to use the sequence data effectively, and appropriate methods will need to be developed to deal with the incidental discovery of pathogenic mutations and variants of uncertain clinical significance. Massively parallel sequencing has the potential to transform the practice of medical genetics and related fields, but the vast amount of personal genomic data produced will increase the responsibility of geneticists to ensure that the information obtained is used in a medically and socially responsible manner. PMID:19679224

  13. Spatial parallelism of a 3D finite difference, velocity-stress elastic wave propagation code

    SciTech Connect

    Minkoff, S.E.

    1999-12-01

    Finite difference methods for solving the wave equation more accurately capture the physics of waves propagating through the earth than asymptotic solution methods. Unfortunately, finite difference simulations for 3D elastic wave propagation are expensive. The authors model waves in a 3D isotropic elastic earth. The wave equation solution consists of three velocity components and six stresses. The partial derivatives are discretized using 2nd-order in time and 4th-order in space staggered finite difference operators. Staggered schemes allow one to obtain additional accuracy (via centered finite differences) without requiring additional storage. The serial code is most unique in its ability to model a number of different types of seismic sources. The parallel implementation uses the MPI library, thus allowing for portability between platforms. Spatial parallelism provides a highly efficient strategy for parallelizing finite difference simulations. In this implementation, one can decompose the global problem domain into one-, two-, and three-dimensional processor decompositions with 3D decompositions generally producing the best parallel speedup. Because I/O is handled largely outside of the time-step loop (the most expensive part of the simulation) the authors have opted for straight-forward broadcast and reduce operations to handle I/O. The majority of the communication in the code consists of passing subdomain face information to neighboring processors for use as ghost cells. When this communication is balanced against computation by allocating subdomains of reasonable size, they observe excellent scaled speedup. Allocating subdomains of size 25 x 25 x 25 on each node, they achieve efficiencies of 94% on 128 processors. Numerical examples for both a layered earth model and a homogeneous medium with a high-velocity blocky inclusion illustrate the accuracy of the parallel code.

  14. Spatial Parallelism of a 3D Finite Difference, Velocity-Stress Elastic Wave Propagation Code

    SciTech Connect

    MINKOFF,SUSAN E.

    1999-12-09

    Finite difference methods for solving the wave equation more accurately capture the physics of waves propagating through the earth than asymptotic solution methods. Unfortunately. finite difference simulations for 3D elastic wave propagation are expensive. We model waves in a 3D isotropic elastic earth. The wave equation solution consists of three velocity components and six stresses. The partial derivatives are discretized using 2nd-order in time and 4th-order in space staggered finite difference operators. Staggered schemes allow one to obtain additional accuracy (via centered finite differences) without requiring additional storage. The serial code is most unique in its ability to model a number of different types of seismic sources. The parallel implementation uses the MP1 library, thus allowing for portability between platforms. Spatial parallelism provides a highly efficient strategy for parallelizing finite difference simulations. In this implementation, one can decompose the global problem domain into one-, two-, and three-dimensional processor decompositions with 3D decompositions generally producing the best parallel speed up. Because i/o is handled largely outside of the time-step loop (the most expensive part of the simulation) we have opted for straight-forward broadcast and reduce operations to handle i/o. The majority of the communication in the code consists of passing subdomain face information to neighboring processors for use as ''ghost cells''. When this communication is balanced against computation by allocating subdomains of reasonable size, we observe excellent scaled speed up. Allocating subdomains of size 25 x 25 x 25 on each node, we achieve efficiencies of 94% on 128 processors. Numerical examples for both a layered earth model and a homogeneous medium with a high-velocity blocky inclusion illustrate the accuracy of the parallel code.

  15. Massively parallel neural encoding and decoding of visual stimuli.

    PubMed

    Lazar, Aurel A; Zhou, Yiyin

    2012-08-01

    The massively parallel nature of video Time Encoding Machines (TEMs) calls for scalable, massively parallel decoders that are implemented with neural components. The current generation of decoding algorithms is based on computing the pseudo-inverse of a matrix and does not satisfy these requirements. Here we consider video TEMs with an architecture built using Gabor receptive fields and a population of Integrate-and-Fire neurons. We show how to build a scalable architecture for video Time Decoding Machines using recurrent neural networks. Furthermore, we extend our architecture to handle the reconstruction of visual stimuli encoded with massively parallel video TEMs having neurons with random thresholds. Finally, we discuss in detail our algorithms and demonstrate their scalability and performance on a large scale GPU cluster. PMID:22397951

  16. Staging memory for massively parallel processor

    NASA Technical Reports Server (NTRS)

    Batcher, Kenneth E. (Inventor)

    1988-01-01

    The invention herein relates to a computer organization capable of rapidly processing extremely large volumes of data. A staging memory is provided having a main stager portion consisting of a large number of memory banks which are accessed in parallel to receive, store, and transfer data words simultaneous with each other. Substager portions interconnect with the main stager portion to match input and output data formats with the data format of the main stager portion. An address generator is coded for accessing the data banks for receiving or transferring the appropriate words. Input and output permutation networks arrange the lineal order of data into and out of the memory banks.

  17. The Challenge of Massively Parallel Computing

    SciTech Connect

    WOMBLE,DAVID E.

    1999-11-03

    Since the mid-1980's, there have been a number of commercially available parallel computers with hundreds or thousands of processors. These machines have provided a new capability to the scientific community, and they been used successfully by scientists and engineers although with varying degrees of success. One of the reasons for the limited success is the difficulty, or perceived difficulty, in developing code for these machines. In this paper we discuss many of the issues and challenges in developing scalable hardware, system software and algorithms for machines comprising hundreds or thousands of processors.

  18. Design and implementation of a massively parallel version of DIRECT

    SciTech Connect

    He, J.; Verstak, A.; Watson, L.; Sosonkina, M.

    2007-10-24

    This paper describes several massively parallel implementations for a global search algorithm DIRECT. Two parallel schemes take different approaches to address DIRECT's design challenges imposed by memory requirements and data dependency. Three design aspects in topology, data structures, and task allocation are compared in detail. The goal is to analytically investigate the strengths and weaknesses of these parallel schemes, identify several key sources of inefficiency, and experimentally evaluate a number of improvements in the latest parallel DIRECT implementation. The performance studies demonstrate improved data structure efficiency and load balancing on a 2200 processor cluster.

  19. Parallel computation of 3-D Navier-Stokes flowfields for supersonic vehicles

    NASA Technical Reports Server (NTRS)

    Ryan, James S.; Weeratunga, Sisira

    1993-01-01

    Multidisciplinary design optimization of aircraft will require unprecedented capabilities of both analysis software and computer hardware. The speed and accuracy of the analysis will depend heavily on the computational fluid dynamics (CFD) module which is used. A new CFD module has been developed to combine the robust accuracy of conventional codes with the ability to run on parallel architectures. This is achieved by parallelizing the ARC3D algorithm, a central-differenced Navier-Stokes method, on the Intel iPSC/860. The computed solutions are identical to those from conventional machines. Computational speed on 64 processors is comparable to the rate on one Cray Y-MP processor and will increase as new generations of parallel computers become available.

  20. Description of a parallel, 3D, finite element, hydrodynamics-diffusion code

    SciTech Connect

    Milovich, J L; Prasad, M K; Shestakov, A I

    1999-04-11

    We describe a parallel, 3D, unstructured grid finite element, hydrodynamic diffusion code for inertial confinement fusion (ICF) applications and the ancillary software used to run it. The code system is divided into two entities, a controller and a stand-alone physics code. The code system may reside on different computers; the controller on the user's workstation and the physics code on a supercomputer. The physics code is composed of separate hydrodynamic, equation-of-state, laser energy deposition, heat conduction, and radiation transport packages and is parallelized for distributed memory architectures. For parallelization, a SPMD model is adopted; the domain is decomposed into a disjoint collection of subdomains, one per processing element (PE). The PEs communicate using MPI. The code is used to simulate the hydrodynamic implosion of a spherical bubble.

  1. Massively parallel solution of the assignment problem. Technical report

    SciTech Connect

    Wein, J.; Zenios, S.

    1990-12-01

    In this paper we discuss the design, implementation and effectiveness of massively parallel algorithms for the solution of large-scale assignment problems. In particular, we study the auction algorithms of Bertsekas, an algorithm based on the method of multipliers of Hestenes and Powell, and an algorithm based on the alternating direction method of multipliers of Eckstein. We discuss alternative approaches to the massively parallel implementation of the auction algorithm, including Jacobi, Gauss-Seidel and a hybrid scheme. The hybrid scheme, in particular, exploits two different levels of parallelism and an efficient way of communicating the data between them without the need to perform general router operations across the hypercube network. We then study the performance of massively parallel implementations of two methods of multipliers. Implementations are carried out on the Connection Machine CM-2, and the algorithms are evaluated empirically with the solution of large scale problems. The hybrid scheme significantly outperforms all of the other methods and gives the best computational results to date for a massively parallel solution to this problem.

  2. Shift: A Massively Parallel Monte Carlo Radiation Transport Package

    SciTech Connect

    Pandya, Tara M; Johnson, Seth R; Davidson, Gregory G; Evans, Thomas M; Hamilton, Steven P

    2015-01-01

    This paper discusses the massively-parallel Monte Carlo radiation transport package, Shift, developed at Oak Ridge National Laboratory. It reviews the capabilities, implementation, and parallel performance of this code package. Scaling results demonstrate very good strong and weak scaling behavior of the implemented algorithms. Benchmark results from various reactor problems show that Shift results compare well to other contemporary Monte Carlo codes and experimental results.

  3. Advanced quadratures and periodic boundary conditions in parallel 3D S{sub n} transport

    SciTech Connect

    Manalo, K.; Yi, C.; Huang, M.; Sjoden, G.

    2013-07-01

    Significant updates in numerical quadratures have warranted investigation with 3D Sn discrete ordinates transport. We show new applications of quadrature departing from level symmetric (S{sub 2}o). investigating 3 recently developed quadratures: Even-Odd (EO), Linear-Discontinuous Finite Element - Surface Area (LDFE-SA), and the non-symmetric Icosahedral Quadrature (IC). We discuss implementation changes to 3D Sn codes (applied to Hybrid MOC-Sn TITAN and 3D parallel PENTRAN) that can be performed to accommodate Icosahedral Quadrature, as this quadrature is not 90-degree rotation invariant. In particular, as demonstrated using PENTRAN, the properties of Icosahedral Quadrature are suitable for trivial application using periodic BCs versus that of reflective BCs. In addition to implementing periodic BCs for 3D Sn PENTRAN, we implemented a technique termed 'angular re-sweep' which properly conditions periodic BCs for outer eigenvalue iterative loop convergence. As demonstrated by two simple transport problems (3-group fixed source and 3-group reflected/periodic eigenvalue pin cell), we remark that all of the quadratures we investigated are generally superior to level symmetric quadrature, with Icosahedral Quadrature performing the most efficiently for problems tested. (authors)

  4. Comparison of Parallel MRI Reconstruction Methods for Accelerated 3D Fast Spin-Echo Imaging

    PubMed Central

    Xiao, Zhikui; Hoge, W. Scott; Mulkern, R.V.; Zhao, Lei; Hu, Guangshu; Kyriakos, Walid E.

    2014-01-01

    Parallel MRI (pMRI) achieves imaging acceleration by partially substituting gradient-encoding steps with spatial information contained in the component coils of the acquisition array. Variable-density subsampling in pMRI was previously shown to yield improved two-dimensional (2D) imaging in comparison to uniform subsampling, but has yet to be used routinely in clinical practice. In an effort to reduce acquisition time for 3D fast spin-echo (3D-FSE) sequences, this work explores a specific nonuniform sampling scheme for 3D imaging, subsampling along two phase-encoding (PE) directions on a rectilinear grid. We use two reconstruction methods—2D-GRAPPA-Operator and 2D-SPACE RIP—and present a comparison between them. We show that high-quality images can be reconstructed using both techniques. To evaluate the proposed sampling method and reconstruction schemes, results via simulation, phantom study, and in vivo 3D human data are shown. We find that fewer artifacts can be seen in the 2D-SPACE RIP reconstructions than in 2D-GRAPPA-Operator reconstructions, with comparable reconstruction times. PMID:18727083

  5. Efficient parallel global garbage collection on massively parallel computers

    SciTech Connect

    Kamada, Tomio; Matsuoka, Satoshi; Yonezawa, Akinori

    1994-12-31

    On distributed-memory high-performance MPPs where processors are interconnected by an asynchronous network, efficient Garbage Collection (GC) becomes difficult due to inter-node references and references within pending, unprocessed messages. The parallel global GC algorithm (1) takes advantage of reference locality, (2) efficiently traverses references over nodes, (3) admits minimum pause time of ongoing computations, and (4) has been shown to scale up to 1024 node MPPs. The algorithm employs a global weight counting scheme to substantially reduce message traffic. The two methods for confirming the arrival of pending messages are used: one counts numbers of messages and the other uses network `bulldozing.` Performance evaluation in actual implementations on a multicomputer with 32-1024 nodes, Fujitsu AP1000, reveals various favorable properties of the algorithm.

  6. Parallel Imaging of 3D Surface Profile with Space-Division Multiplexing

    PubMed Central

    Lee, Hyung Seok; Cho, Soon-Woo; Kim, Gyeong Hun; Jeong, Myung Yung; Won, Young Jae; Kim, Chang-Seok

    2016-01-01

    We have developed a modified optical frequency domain imaging (OFDI) system that performs parallel imaging of three-dimensional (3D) surface profiles by using the space division multiplexing (SDM) method with dual-area swept sourced beams. We have also demonstrated that 3D surface information for two different areas could be well obtained in a same time with only one camera by our method. In this study, double field of views (FOVs) of 11.16 mm × 5.92 mm were achieved within 0.5 s. Height range for each FOV was 460 µm and axial and transverse resolutions were 3.6 and 5.52 µm, respectively. PMID:26805840

  7. Uniformly spaced 3D modeling of human face from two images using parallel particle swarm optimization

    NASA Astrophysics Data System (ADS)

    Chang, Yau-Zen; Hou, Jung-Fu; Tsao, Yi Hsiang; Lee, Shih-Tseng

    2011-09-01

    This paper proposes a scheme for finding the correspondence between uniformly spaced locations on the images of human face captured from different viewpoints at the same instant. The correspondence is dedicated for 3D reconstruction to be used in the registration procedure for neurosurgery where the exposure to projectors must be seriously restricted. The approach utilizes structured light to enhance patterns on the images and is initialized with the scale-invariant feature transform (SIFT). Successive locations are found according to spatial order using a parallel version of the particle swarm optimization algorithm. Furthermore, false locations are singled out for correction by searching for outliers from fitted curves. Case studies show that the scheme is able to correctly generate 456 evenly spaced 3D coordinate points in 23 seconds from a single shot of projected human face using a PC with 2.66 GHz Intel Q9400 CPU and 4GB RAM.

  8. A Parallelized 3D Particle-In-Cell Method With Magnetostatic Field Solver And Its Applications

    NASA Astrophysics Data System (ADS)

    Hsu, Kuo-Hsien; Chen, Yen-Sen; Wu, Men-Zan Bill; Wu, Jong-Shinn

    2008-10-01

    A parallelized 3D self-consistent electrostatic particle-in-cell finite element (PIC-FEM) code using an unstructured tetrahedral mesh was developed. For simulating some applications with external permanent magnet set, the distribution of the magnetostatic field usually also need to be considered and determined accurately. In this paper, we will firstly present the development of a 3D magnetostatic field solver with an unstructured mesh for the flexibility of modeling objects with complex geometry. The vector Poisson equation for magnetostatic field is formulated using the Galerkin nodal finite element method and the resulting matrix is solved by parallel conjugate gradient method. A parallel adaptive mesh refinement module is coupled to this solver for better resolution. Completed solver is then verified by simulating a permanent magnet array with results comparable to previous experimental observations and simulations. By taking the advantage of the same unstructured grid format of this solver, the developed PIC-FEM code could directly and easily read the magnetostatic field for particle simulation. In the upcoming conference, magnetron is simulated and presented for demonstrating the capability of this code.

  9. A 3D parallel simulator for crystal growth and solidification in complex alloy systems

    NASA Astrophysics Data System (ADS)

    Nestler, Britta

    2005-02-01

    A 3D parallel simulator is developed to numerically solve the evolution equations of a new non-isothermal phase-field model for crystal growth and solidification in complex alloy systems. The new model and the simulator are capable to simultaneously describe the diffusion processes of multiple components, the phase transitions between multiple phases and the development of the temperature field. Weak and facetted formulations of both, surface energy and kinetic anisotropies are incorporated in the phase-field model. Multicomponent bulk diffusion effects including interdiffusion coefficients as well as diffusion in the interfacial region of phase or grain boundaries are considered. We introduce our parallel simulator that is based on a finite difference discretization including effective adaptive strategies and multigrid methods to reduce computation time and memory usage. The parallelization is realized for distributed as well as shared memory computer architectures using MPI libraries and OpenMP concepts. Applying the new computer model, we present a variety of simulated crystal structures such as dendrites, grains, binary and ternary eutectics in 2D and 3D. The influence of anisotropy on the microstructure evolution shows the formation of facets in preferred crystallographic directions. Phase transformations and solidification processes in a real multi-component alloy can be described by incorporating the physical data (e.g. surface tensions, kinetic coefficients, specific heat, heat and mass diffusion coefficients) and the specific phase diagram (in particular latent heats and melting temperatures) into the diffuse interface model via the free energies.

  10. Parallel graph search: application to intraretinal layer segmentation of 3D macular OCT scans

    NASA Astrophysics Data System (ADS)

    Lee, Kyungmoo; Abràmoff, Michael D.; Garvin, Mona K.; Sonka, Milan

    2012-02-01

    Image segmentation is of paramount importance for quantitative analysis of medical image data. Recently, a 3-D graph search method which can detect globally optimal interacting surfaces with respect to the cost function of volumetric images has been introduced, and its utility demonstrated in several application areas. Although the method provides excellent segmentation accuracy, its limitation is a slow processing speed when many surfaces are simultaneously segmented in large volumetric datasets. Here, we propose a novel method of parallel graph search, which overcomes the limitation and allows the quick detection of multiple surfaces. To demonstrate the obtained performance with respect to segmentation accuracy and processing speedup, the new approach was applied to retinal optical coherence tomography (OCT) image data and compared with the performance of the former non-parallel method. Our parallel graph search methods for single and double surface detection are approximately 267 and 181 times faster than the original graph search approach in 5 macular OCT volumes (200 x 5 x 1024 voxels) acquired from the right eyes of 5 normal subjects. The resulting segmentation differences were small as demonstrated by the mean unsigned differences between the non-parallel and parallel methods of 0.0 +/- 0.0 voxels (0.0 +/- 0.0 μm) and 0.27 +/- 0.34 voxels (0.53 +/- 0.66 μm) for the single- and dual-surface approaches, respectively.

  11. Parallel implementation of 3D FFT with volumetric decomposition schemes for efficient molecular dynamics simulations

    NASA Astrophysics Data System (ADS)

    Jung, Jaewoon; Kobayashi, Chigusa; Imamura, Toshiyuki; Sugita, Yuji

    2016-03-01

    Three-dimensional Fast Fourier Transform (3D FFT) plays an important role in a wide variety of computer simulations and data analyses, including molecular dynamics (MD) simulations. In this study, we develop hybrid (MPI+OpenMP) parallelization schemes of 3D FFT based on two new volumetric decompositions, mainly for the particle mesh Ewald (PME) calculation in MD simulations. In one scheme, (1d_Alltoall), five all-to-all communications in one dimension are carried out, and in the other, (2d_Alltoall), one two-dimensional all-to-all communication is combined with two all-to-all communications in one dimension. 2d_Alltoall is similar to the conventional volumetric decomposition scheme. We performed benchmark tests of 3D FFT for the systems with different grid sizes using a large number of processors on the K computer in RIKEN AICS. The two schemes show comparable performances, and are better than existing 3D FFTs. The performances of 1d_Alltoall and 2d_Alltoall depend on the supercomputer network system and number of processors in each dimension. There is enough leeway for users to optimize performance for their conditions. In the PME method, short-range real-space interactions as well as long-range reciprocal-space interactions are calculated. Our volumetric decomposition schemes are particularly useful when used in conjunction with the recently developed midpoint cell method for short-range interactions, due to the same decompositions of real and reciprocal spaces. The 1d_Alltoall scheme of 3D FFT takes 4.7 ms to simulate one MD cycle for a virus system containing more than 1 million atoms using 32,768 cores on the K computer.

  12. Parallel goal-oriented adaptive finite element modeling for 3D electromagnetic exploration

    NASA Astrophysics Data System (ADS)

    Zhang, Y.; Key, K.; Ovall, J.; Holst, M.

    2014-12-01

    We present a parallel goal-oriented adaptive finite element method for accurate and efficient electromagnetic (EM) modeling of complex 3D structures. An unstructured tetrahedral mesh allows this approach to accommodate arbitrarily complex 3D conductivity variations and a priori known boundaries. The total electric field is approximated by the lowest order linear curl-conforming shape functions and the discretized finite element equations are solved by a sparse LU factorization. Accuracy of the finite element solution is achieved through adaptive mesh refinement that is performed iteratively until the solution converges to the desired accuracy tolerance. Refinement is guided by a goal-oriented error estimator that uses a dual-weighted residual method to optimize the mesh for accurate EM responses at the locations of the EM receivers. As a result, the mesh refinement is highly efficient since it only targets the elements where the inaccuracy of the solution corrupts the response at the possibly distant locations of the EM receivers. We compare the accuracy and efficiency of two approaches for estimating the primary residual error required at the core of this method: one uses local element and inter-element residuals and the other relies on solving a global residual system using a hierarchical basis. For computational efficiency our method follows the Bank-Holst algorithm for parallelization, where solutions are computed in subdomains of the original model. To resolve the load-balancing problem, this approach applies a spectral bisection method to divide the entire model into subdomains that have approximately equal error and the same number of receivers. The finite element solutions are then computed in parallel with each subdomain carrying out goal-oriented adaptive mesh refinement independently. We validate the newly developed algorithm by comparison with controlled-source EM solutions for 1D layered models and with 2D results from our earlier 2D goal oriented

  13. PARALLEL 3-D SPACE CHARGE CALCULATIONS IN THE UNIFIED ACCELERATOR LIBRARY.

    SciTech Connect

    D'IMPERIO, N.L.; LUCCIO, A.U.; MALITSKY, N.

    2006-06-26

    The paper presents the integration of the SIMBAD space charge module in the UAL framework. SIMBAD is a Particle-in-Cell (PIC) code. Its 3-D Parallel approach features an optimized load balancing scheme based on a genetic algorithm. The UAL framework enhances the SIMBAD standalone version with the interactive ROOT-based analysis environment and an open catalog of accelerator algorithms. The composite package addresses complex high intensity beam dynamics and has been developed as part of the FAIR SIS 100 project.

  14. The development of a scalable parallel 3-D CFD algorithm for turbomachinery. M.S. Thesis Final Report

    NASA Technical Reports Server (NTRS)

    Luke, Edward Allen

    1993-01-01

    Two algorithms capable of computing a transonic 3-D inviscid flow field about rotating machines are considered for parallel implementation. During the study of these algorithms, a significant new method of measuring the performance of parallel algorithms is developed. The theory that supports this new method creates an empirical definition of scalable parallel algorithms that is used to produce quantifiable evidence that a scalable parallel application was developed. The implementation of the parallel application and an automated domain decomposition tool are also discussed.

  15. Billion-atom synchronous parallel kinetic Monte Carlo simulations of critical 3D Ising systems

    SciTech Connect

    Martinez, E.; Monasterio, P.R.; Marian, J.

    2011-02-20

    An extension of the synchronous parallel kinetic Monte Carlo (spkMC) algorithm developed by Martinez et al. [J. Comp. Phys. 227 (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors' time clocks current in a global sense. Boundary conflicts are resolved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of serial calculations, which confirms the statistical validity of our algorithm. We have analyzed the parallel efficiency of spkMC and find that it scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin simulations.

  16. Parallel 3D Finite Element Numerical Modelling of DC Electron Guns

    SciTech Connect

    Prudencio, E.; Candel, A.; Ge, L.; Kabel, A.; Ko, K.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; /SLAC

    2008-02-04

    In this paper we present Gun3P, a parallel 3D finite element application that the Advanced Computations Department at the Stanford Linear Accelerator Center is developing for the analysis of beam formation in DC guns and beam transport in klystrons. Gun3P is targeted specially to complex geometries that cannot be described by 2D models and cannot be easily handled by finite difference discretizations. Its parallel capability allows simulations with more accuracy and less processing time than packages currently available. We present simulation results for the L-band Sheet Beam Klystron DC gun, in which case Gun3P is able to reduce simulation time from days to some hours.

  17. Billion-atom synchronous parallel kinetic Monte Carlo simulations of critical 3D Ising systems

    NASA Astrophysics Data System (ADS)

    Martínez, E.; Monasterio, P. R.; Marian, J.

    2011-02-01

    An extension of the synchronous parallel kinetic Monte Carlo (spkMC) algorithm developed by Martinez et al. [J. Comp. Phys. 227 (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors' time clocks current in a global sense. Boundary conflicts are resolved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of serial calculations, which confirms the statistical validity of our algorithm. We have analyzed the parallel efficiency of spkMC and find that it scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin simulations.

  18. Solving unstructured grid problems on massively parallel computers

    NASA Technical Reports Server (NTRS)

    Hammond, Steven W.; Schreiber, Robert

    1990-01-01

    A highly parallel graph mapping technique that enables one to efficiently solve unstructured grid problems on massively parallel computers is presented. Many implicit and explicit methods for solving discretized partial differential equations require each point in the discretization to exchange data with its neighboring points every time step or iteration. The cost of this communication can negate the high performance promised by massively parallel computing. To eliminate this bottleneck, the graph of the irregular problem is mapped into the graph representing the interconnection topology of the computer such that the sum of the distances that the messages travel is minimized. It is shown that using the heuristic mapping algorithm significantly reduces the communication time compared to a naive assignment of processes to processors.

  19. Parallel 3D Multi-Stage Simulation of a Turbofan Engine

    NASA Technical Reports Server (NTRS)

    Turner, Mark G.; Topp, David A.

    1998-01-01

    A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force

  20. Three-dimensional parallel UNIPIC-3D code for simulations of high-power microwave devices

    NASA Astrophysics Data System (ADS)

    Wang, Jianguo; Chen, Zaigao; Wang, Yue; Zhang, Dianhui; Liu, Chunliang; Li, Yongdong; Wang, Hongguang; Qiao, Hailiang; Fu, Meiyan; Yuan, Yuan

    2010-07-01

    This paper introduces a self-developed, three-dimensional parallel fully electromagnetic particle simulation code UNIPIC-3D. In this code, the electromagnetic fields are updated using the second-order, finite-difference time-domain method, and the particles are moved using the relativistic Newton-Lorentz force equation. The electromagnetic field and particles are coupled through the current term in Maxwell's equations. Two numerical examples are used to verify the algorithms adopted in this code, numerical results agree well with theoretical ones. This code can be used to simulate the high-power microwave (HPM) devices, such as the relativistic backward wave oscillator, coaxial vircator, and magnetically insulated line oscillator, etc. UNIPIC-3D is written in the object-oriented C++ language and can be run on a variety of platforms including WINDOWS, LINUX, and UNIX. Users can use the graphical user's interface to create the complex geometric structures of the simulated HPM devices, which can be automatically meshed by UNIPIC-3D code. This code has a powerful postprocessor which can display the electric field, magnetic field, current, voltage, power, spectrum, momentum of particles, etc. For the sake of comparison, the results computed by using the two-and-a-half-dimensional UNIPIC code are also provided for the same parameters of HPM devices, the numerical results computed from these two codes agree well with each other.

  1. BioFVM: an efficient, parallelized diffusive transport solver for 3-D biological simulations

    PubMed Central

    Ghaffarizadeh, Ahmadreza; Friedman, Samuel H.; Macklin, Paul

    2016-01-01

    Motivation: Computational models of multicellular systems require solving systems of PDEs for release, uptake, decay and diffusion of multiple substrates in 3D, particularly when incorporating the impact of drugs, growth substrates and signaling factors on cell receptors and subcellular systems biology. Results: We introduce BioFVM, a diffusive transport solver tailored to biological problems. BioFVM can simulate release and uptake of many substrates by cell and bulk sources, diffusion and decay in large 3D domains. It has been parallelized with OpenMP, allowing efficient simulations on desktop workstations or single supercomputer nodes. The code is stable even for large time steps, with linear computational cost scalings. Solutions are first-order accurate in time and second-order accurate in space. The code can be run by itself or as part of a larger simulator. Availability and implementation: BioFVM is written in C ++ with parallelization in OpenMP. It is maintained and available for download at http://BioFVM.MathCancer.org and http://BioFVM.sf.net under the Apache License (v2.0). Contact: paul.macklin@usc.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26656933

  2. 3D-radiative transfer in terrestrial atmosphere: An efficient parallel numerical procedure

    NASA Astrophysics Data System (ADS)

    Bass, L. P.; Germogenova, T. A.; Nikolaeva, O. V.; Kokhanovsky, A. A.; Kuznetsov, V. S.

    2003-04-01

    Light propagation and scattering in terrestrial atmosphere is usually studied in the framework of the 1D radiative transfer theory [1]. However, in reality particles (e.g., ice crystals, solid and liquid aerosols, cloud droplets) are randomly distributed in 3D space. In particular, their concentrations vary both in vertical and horizontal directions. Therefore, 3D effects influence modern cloud and aerosol retrieval procedures, which are currently based on the 1D radiative transfer theory. It should be pointed out that the standard radiative transfer equation allows to study these more complex situations as well [2]. In recent year the parallel version of the 2D and 3D RADUGA code has been developed. This version is successfully used in gammas and neutrons transport problems [3]. Applications of this code to radiative transfer in atmosphere problems are contained in [4]. Possibilities of code RADUGA are presented in [5]. The RADUGA code system is an universal solver of radiative transfer problems for complicated models, including 2D and 3D aerosol and cloud fields with arbitrary scattering anisotropy, light absorption, inhomogeneous underlying surface and topography. Both delta type and distributed light sources can be accounted for in the framework of the algorithm developed. The accurate numerical procedure is based on the new discrete ordinate SWDD scheme [6]. The algorithm is specifically designed for parallel supercomputers. The version RADUGA 5.1(P) can run on MBC1000M [7] (768 processors with 10 Gb of hard disc memory for each processor). The peak productivity is equal 1 Tfl. Corresponding scalar version RADUGA 5.1 is working on PC. As a first example of application of the algorithm developed, we have studied the shadowing effects of clouds on neighboring cloudless atmosphere, depending on the cloud optical thickness, surface albedo, and illumination conditions. This is of importance for modern satellite aerosol retrieval algorithms development. [1] Sobolev

  3. A Programming Model for Massive Data Parallelism with Data Dependencies

    SciTech Connect

    Cui, Xiaohui; Mueller, Frank; Potok, Thomas E; Zhang, Yongpeng

    2009-01-01

    Accelerating processors can often be more cost and energy effective for a wide range of data-parallel computing problems than general-purpose processors. For graphics processor units (GPUs), this is particularly the case when program development is aided by environments such as NVIDIA s Compute Unified Device Architecture (CUDA), which dramatically reduces the gap between domain-specific architectures and general purpose programming. Nonetheless, general-purpose GPU (GPGPU) programming remains subject to several restrictions. Most significantly, the separation of host (CPU) and accelerator (GPU) address spaces requires explicit management of GPU memory resources, especially for massive data parallelism that well exceeds the memory capacity of GPUs. One solution to this problem is to transfer data between the GPU and host memories frequently. In this work, we investigate another approach. We run massively data-parallel applications on GPU clusters. We further propose a programming model for massive data parallelism with data dependencies for this scenario. Experience from micro benchmarks and real-world applications shows that our model provides not only ease of programming but also significant performance gains.

  4. Design and verification of an ultra-precision 3D-coordinate measuring machine with parallel drives

    NASA Astrophysics Data System (ADS)

    Bos, Edwin; Moers, Ton; van Riel, Martijn

    2015-08-01

    An ultra-precision 3D coordinate measuring machine (CMM), the TriNano N100, has been developed. In our design, the workpiece is mounted on a 3D stage, which is driven by three parallel drives that are mutually orthogonal. The linear drives support the 3D stage using vacuum preloaded (VPL) air bearings, whereby each drive determines the position of the 3D stage along one translation direction only. An exactly constrained design results in highly repeatable machine behavior. Furthermore, the machine complies with the Abbé principle over its full measurement range and the application of parallel drives allows for excellent dynamic behavior. The design allows a 3D measurement uncertainty of 100 nanometers in a measurement range of 200 cubic centimeters. Verification measurements using a Gannen XP 3D tactile probing system on a spherical artifact show a standard deviation in single point repeatability of around 2 nm in each direction.

  5. The language parallel Pascal and other aspects of the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.; Bruner, J. D.

    1982-01-01

    A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.

  6. Three-Dimensional Radiative Transfer on a Massively Parallel Computer.

    NASA Astrophysics Data System (ADS)

    Vath, Horst Michael

    1994-01-01

    We perform three-dimensional radiative transfer calculations on the MasPar MP-1, which contains 8192 processors and is a single instruction multiple data (SIMD) machine, an example of the new generation of massively parallel computers. To make radiative transfer calculations efficient, we must re-consider the numerical methods and methods of storage of data that have been used with serial machines. We developed a numerical code which efficiently calculates images and spectra of astrophysical systems as seen from different viewing directions and at different wavelengths. We use this code to examine a number of different astrophysical systems. First we image the HI distribution of model galaxies. Then we investigate the galaxy NGC 5055, which displays a radial asymmetry in its optical appearance. This can be explained by the presence of dust in the outer HI disk far beyond the optical disk. As the formation of dust is connected to the presence of stars, the existence of dust in outer regions of this galaxy could have consequences for star formation at a time when this galaxy was just forming. Next we use the code for polarized radiative transfer. We first discuss the numerical computation of the required cyclotron opacities and use them to calculate spectra of AM Her systems, binaries containing accreting magnetic white dwarfs. Then we obtain spectra of an extended polar cap. Previous calculations did not consider the three -dimensional extension of the shock. We find that this results in a significant underestimate of the radiation emitted in the shock. Next we calculate the spectrum of the intermediate polar RE 0751+14. For this system we obtain a magnetic field of ~10 MG, which has consequences for the evolution of intermediate polars. Finally we perform 3D radiative transfer in NLTE in the two-level atom approximation. To solve the transfer equation in this case, we adapt the short characteristic method and examine different acceleration methods to obtain the

  7. Supercomputing on massively parallel bit-serial architectures

    NASA Technical Reports Server (NTRS)

    Iobst, Ken

    1985-01-01

    Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.

  8. Development of massively parallel quantum chemistry program SMASH

    SciTech Connect

    Ishimura, Kazuya

    2015-12-31

    A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C{sub 150}H{sub 30}){sub 2} with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.

  9. Development of massively parallel quantum chemistry program SMASH

    NASA Astrophysics Data System (ADS)

    Ishimura, Kazuya

    2015-12-01

    A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C150H30)2 with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.

  10. TSE computers - A means for massively parallel computations

    NASA Technical Reports Server (NTRS)

    Strong, J. P., III

    1976-01-01

    A description is presented of hardware concepts for building a massively parallel processing system for two-dimensional data. The processing system is to use logic arrays of 128 x 128 elements which perform over 16 thousand operations simultaneously. Attention is given to image data, logic arrays, basic image logic functions, a prototype negator, an interleaver device, image logic circuits, and an image memory circuit.

  11. MIMD massively parallel methods for engineering and science problems

    SciTech Connect

    Camp, W.J.; Plimpton, S.J.

    1993-08-01

    MIMD massively parallel computers promise unique power and flexibility for engineering and scientific simulations. In this paper we review the development of a number of software methods and algorithms for scientific and engineering problems which are helping to realize that promise. We discuss new domain decomposition, load balancing, data layout and communications methods applicable to simulations in a broad range of technical field including signal processing, multi-dimensional structural and fluid mechanics, materials science, and chemical and biological systems.

  12. Massively parallel Wang Landau sampling on multiple GPUs

    SciTech Connect

    Yin, Junqi; Landau, D. P.

    2012-01-01

    Wang Landau sampling is implemented on the Graphics Processing Unit (GPU) with the Compute Unified Device Architecture (CUDA). Performances on three different GPU cards, including the new generation Fermi architecture card, are compared with that on a Central Processing Unit (CPU). The parameters for massively parallel Wang Landau sampling are tuned in order to achieve fast convergence. For simulations of the water cluster systems, we obtain an average of over 50 times speedup for a given workload.

  13. Time-dependent 3-D dterministic transport on parallel architectures using Dantsys/MPI

    SciTech Connect

    Baker, R.S.; Alcouffe, R.E.

    1996-12-31

    In addition to the ability to solve the static transport equation, we have also incorporated time dependence into our parallel 3-D S{sub {ital N}} code DANTSYS/MPI. Using a semi-implicit scheme, DANTSYS/MPI is capable of performing time-dependent calculations for both fissioning and pure source driven problems. We have applied this to various types of problems such as nuclear well logging and prompt fission experiments. This paper describes the form of the time- dependent equations implemented, their solution strategies in DANTSYS/MPI including iteration acceleration, and the strategies used for time-step control. Results are presented for a model nuclear well logging calculation.

  14. Fast parallel interferometric 3D tracking of numerous optically trapped particles and their hydrodynamic interaction.

    PubMed

    Ruh, Dominic; Tränkle, Benjamin; Rohrbach, Alexander

    2011-10-24

    Multi-dimensional, correlated particle tracking is a key technology to reveal dynamic processes in living and synthetic soft matter systems. In this paper we present a new method for tracking micron-sized beads in parallel and in all three dimensions - faster and more precise than existing techniques. Using an acousto-optic deflector and two quadrant-photo-diodes, we can track numerous optically trapped beads at up to tens of kHz with a precision of a few nanometers by back-focal plane interferometry. By time-multiplexing the laser focus, we can calibrate individually all traps and all tracking signals in a few seconds and in 3D. We show 3D histograms and calibration constants for nine beads in a quadratic arrangement, although trapping and tracking is easily possible for more beads also in arbitrary 2D arrangements. As an application, we investigate the hydrodynamic coupling and diffusion anomalies of spheres trapped in a 3 × 3 arrangement. PMID:22109012

  15. Requirements for supercomputing in energy research: The transition to massively parallel computing

    SciTech Connect

    Not Available

    1993-02-01

    This report discusses: The emergence of a practical path to TeraFlop computing and beyond; requirements of energy research programs at DOE; implementation: supercomputer production computing environment on massively parallel computers; and implementation: user transition to massively parallel computing.

  16. The 2nd Symposium on the Frontiers of Massively Parallel Computations

    NASA Technical Reports Server (NTRS)

    Mills, Ronnie (Editor)

    1988-01-01

    Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.

  17. 3-D inversion of airborne electromagnetic data parallelized and accelerated by local mesh and adaptive soundings

    NASA Astrophysics Data System (ADS)

    Yang, Dikun; Oldenburg, Douglas W.; Haber, Eldad

    2014-03-01

    Airborne electromagnetic (AEM) methods are highly efficient tools for assessing the Earth's conductivity structures in a large area at low cost. However, the configuration of AEM measurements, which typically have widely distributed transmitter-receiver pairs, makes the rigorous modelling and interpretation extremely time-consuming in 3-D. Excessive overcomputing can occur when working on a large mesh covering the entire survey area and inverting all soundings in the data set. We propose two improvements. The first is to use a locally optimized mesh for each AEM sounding for the forward modelling and calculation of sensitivity. This dedicated local mesh is small with fine cells near the sounding location and coarse cells far away in accordance with EM diffusion and the geometric decay of the signals. Once the forward problem is solved on the local meshes, the sensitivity for the inversion on the global mesh is available through quick interpolation. Using local meshes for AEM forward modelling avoids unnecessary computing on fine cells on a global mesh that are far away from the sounding location. Since local meshes are highly independent, the forward modelling can be efficiently parallelized over an array of processors. The second improvement is random and dynamic down-sampling of the soundings. Each inversion iteration only uses a random subset of the soundings, and the subset is reselected for every iteration. The number of soundings in the random subset, determined by an adaptive algorithm, is tied to the degree of model regularization. This minimizes the overcomputing caused by working with redundant soundings. Our methods are compared against conventional methods and tested with a synthetic example. We also invert a field data set that was previously considered to be too large to be practically inverted in 3-D. These examples show that our methodology can dramatically reduce the processing time of 3-D inversion to a practical level without losing resolution

  18. An efficient parallel algorithm: Poststack and prestack Kirchhoff 3D depth migration using flexi-depth iterations

    NASA Astrophysics Data System (ADS)

    Rastogi, Richa; Srivastava, Abhishek; Khonde, Kiran; Sirasala, Kirannmayi M.; Londhe, Ashutosh; Chavhan, Hitesh

    2015-07-01

    This paper presents an efficient parallel 3D Kirchhoff depth migration algorithm suitable for current class of multicore architecture. The fundamental Kirchhoff depth migration algorithm exhibits inherent parallelism however, when it comes to 3D data migration, as the data size increases the resource requirement of the algorithm also increases. This challenges its practical implementation even on current generation high performance computing systems. Therefore a smart parallelization approach is essential to handle 3D data for migration. The most compute intensive part of Kirchhoff depth migration algorithm is the calculation of traveltime tables due to its resource requirements such as memory/storage and I/O. In the current research work, we target this area and develop a competent parallel algorithm for post and prestack 3D Kirchhoff depth migration, using hybrid MPI+OpenMP programming techniques. We introduce a concept of flexi-depth iterations while depth migrating data in parallel imaging space, using optimized traveltime table computations. This concept provides flexibility to the algorithm by migrating data in a number of depth iterations, which depends upon the available node memory and the size of data to be migrated during runtime. Furthermore, it minimizes the requirements of storage, I/O and inter-node communication, thus making it advantageous over the conventional parallelization approaches. The developed parallel algorithm is demonstrated and analysed on Yuva II, a PARAM series of supercomputers. Optimization, performance and scalability experiment results along with the migration outcome show the effectiveness of the parallel algorithm.

  19. In situ patterned micro 3D liver constructs for parallel toxicology testing in a fluidic device.

    PubMed

    Skardal, Aleksander; Devarasetty, Mahesh; Soker, Shay; Hall, Adam R

    2015-09-01

    3D tissue models are increasingly being implemented for drug and toxicology testing. However, the creation of tissue-engineered constructs for this purpose often relies on complex biofabrication techniques that are time consuming, expensive, and difficult to scale up. Here, we describe a strategy for realizing multiple tissue constructs in a parallel microfluidic platform using an approach that is simple and can be easily scaled for high-throughput formats. Liver cells mixed with a UV-crosslinkable hydrogel solution are introduced into parallel channels of a sealed microfluidic device and photopatterned to produce stable tissue constructs in situ. The remaining uncrosslinked material is washed away, leaving the structures in place. By using a hydrogel that specifically mimics the properties of the natural extracellular matrix, we closely emulate native tissue, resulting in constructs that remain stable and functional in the device during a 7-day culture time course under recirculating media flow. As proof of principle for toxicology analysis, we expose the constructs to ethyl alcohol (0-500 mM) and show that the cell viability and the secretion of urea and albumin decrease with increasing alcohol exposure, while markers for cell damage increase. PMID:26355538

  20. Parallel robot for micro assembly with integrated innovative optical 3D-sensor

    NASA Astrophysics Data System (ADS)

    Hesselbach, Juergen; Ispas, Diana; Pokar, Gero; Soetebier, Sven; Tutsch, Rainer

    2002-10-01

    Recent advances in the fields of MEMS and MOEMS often require precise assembly of very small parts with an accuracy of a few microns. In order to meet this demand, a new approach using a robot based on parallel mechanisms in combination with a novel 3D-vision system has been chosen. The planar parallel robot structure with 2 DOF provides a high resolution in the XY-plane. It carries two additional serial axes for linear and rotational movement in/about z direction. In order to achieve high precision as well as good dynamic capabilities, the drive concept for the parallel (main) axes incorporates air bearings in combination with a linear electric servo motors. High accuracy position feedback is provided by optical encoders with a resolution of 0.1 μm. To allow for visualization and visual control of assembly processes, a camera module fits into the hollow tool head. It consists of a miniature CCD camera and a light source. In addition a modular gripper support is integrated into the tool head. To increase the accuracy a control loop based on an optoelectronic sensor will be implemented. As a result of an in-depth analysis of different approaches a photogrammetric system using one single camera and special beam-splitting optics was chosen. A pattern of elliptical marks is applied to the surfaces of workpiece and gripper. Using a model-based recognition algorithm the image processing software identifies the gripper and the workpiece and determines their relative position. A deviation vector is calculated and fed into the robot control to guide the gripper.

  1. 3D numerical calculations and synthetic observations of magnetized massive dense core collapse and fragmentation.

    NASA Astrophysics Data System (ADS)

    Commerçon, B.; Hennebelle, P.; Levrier, F.; Launhardt, R.; Henning, Th.

    2012-03-01

    I will present radiation-magneto-hydrodynamics calculations of low-mass and massive dense core collapse, focusing on the first collapse and the first hydrostatic core (first Larson core) formation. The influence of magnetic field and initial mass on the fragmentation properties will be investigated. In the first part reporting low mass dense core collapse calculations, synthetic observations of spectral energy distributions will be derived, as well as classical observational quantities such as bolometric temperature and luminosity. I will show how the dust continuum can help to target first hydrostatic cores and to state about the nature of VeLLOs. Last, I will present synthetic ALMA observation predictions of first hydrostatic cores which may give an answer, if not definitive, to the fragmentation issue at the early Class 0 stage. In the second part, I will report the results of radiation-magneto-hydrodynamics calculations in the context of high mass star formation, using for the first time a self-consistent model for photon emission (i.e. via thermal emission and in radiative shocks) and with the high resolution necessary to resolve properly magnetic braking effects and radiative shocks on scales <100 AU (Commercon, Hennebelle & Henning ApJL 2011). In this study, we investigate the combined effects of magnetic field, turbulence, and radiative transfer on the early phases of the collapse and the fragmentation of massive dense cores (M=100 M_⊙). We identify a new mechanism that inhibits initial fragmentation of massive dense cores, where magnetic field and radiative transfer interplay. We show that this interplay becomes stronger as the magnetic field strength increases. We speculate that highly magnetized massive dense cores are good candidates for isolated massive star formation, while moderately magnetized massive dense cores are more appropriate to form OB associations or small star clusters. Finally we will also present synthetic observations of these

  2. Comparison of 3-D synthetic aperture phased-array ultrasound imaging and parallel beamforming.

    PubMed

    Rasmussen, Morten Fischer; Jensen, Jørgen Arendt

    2014-10-01

    This paper demonstrates that synthetic aperture imaging (SAI) can be used to achieve real-time 3-D ultrasound phased-array imaging. It investigates whether SAI increases the image quality compared with the parallel beamforming (PB) technique for real-time 3-D imaging. Data are obtained using both simulations and measurements with an ultrasound research scanner and a commercially available 3.5- MHz 1024-element 2-D transducer array. To limit the probe cable thickness, 256 active elements are used in transmit and receive for both techniques. The two imaging techniques were designed for cardiac imaging, which requires sequences designed for imaging down to 15 cm of depth and a frame rate of at least 20 Hz. The imaging quality of the two techniques is investigated through simulations as a function of depth and angle. SAI improved the full-width at half-maximum (FWHM) at low steering angles by 35%, and the 20-dB cystic resolution by up to 62%. The FWHM of the measured line spread function (LSF) at 80 mm depth showed a difference of 20% in favor of SAI. SAI reduced the cyst radius at 60 mm depth by 39% in measurements. SAI improved the contrast-to-noise ratio measured on anechoic cysts embedded in a tissue-mimicking material by 29% at 70 mm depth. The estimated penetration depth on the same tissue-mimicking phantom shows that SAI increased the penetration by 24% compared with PB. Neither SAI nor PB achieved the design goal of 15 cm penetration depth. This is likely due to the limited transducer surface area and a low SNR of the experimental scanner used. PMID:25265174

  3. Routing performance analysis and optimization within a massively parallel computer

    DOEpatents

    Archer, Charles Jens; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen

    2013-04-16

    An apparatus, program product and method optimize the operation of a massively parallel computer system by, in part, receiving actual performance data concerning an application executed by the plurality of interconnected nodes, and analyzing the actual performance data to identify an actual performance pattern. A desired performance pattern may be determined for the application, and an algorithm may be selected from among a plurality of algorithms stored within a memory, the algorithm being configured to achieve the desired performance pattern based on the actual performance data.

  4. The Massively Parallel Processor and its applications. [for environmental monitoring

    NASA Technical Reports Server (NTRS)

    Strong, J. P.; Schaefer, D. H.; Fischer, J. R.; Wallgren, K. R.; Bracken, P. A.

    1979-01-01

    A long-term experimental development program conducted at Goddard Space Flight Center to implement an ultrahigh-speed data processing system known as the Massively Parallel Processor (MPP) is described. The MPP is a single instruction multiple data stream computer designed to perform logical, integer, and floating point arithmetic operations on variable word length data. Information is presented on system architecture, the system configuration, the array unit architecture, individual processing units, and expected operating rates for several image processing applications (including the processing of Landsat data).

  5. A Massively Parallel Solver for the Mechanical Harmonic Analysis of Accelerator Cavities

    SciTech Connect

    O. Kononenko

    2015-02-17

    ACE3P is a 3D massively parallel simulation suite that developed at SLAC National Accelerator Laboratory that can perform coupled electromagnetic, thermal and mechanical study. Effectively utilizing supercomputer resources, ACE3P has become a key simulation tool for particle accelerator R and D. A new frequency domain solver to perform mechanical harmonic response analysis of accelerator components is developed within the existing parallel framework. This solver is designed to determine the frequency response of the mechanical system to external harmonic excitations for time-efficient accurate analysis of the large-scale problems. Coupled with the ACE3P electromagnetic modules, this capability complements a set of multi-physics tools for a comprehensive study of microphonics in superconducting accelerating cavities in order to understand the RF response and feedback requirements for the operational reliability of a particle accelerator. (auth)

  6. A biconjugate gradient type algorithm on massively parallel architectures

    NASA Technical Reports Server (NTRS)

    Freund, Roland W.; Hochbruck, Marlis

    1991-01-01

    The biconjugate gradient (BCG) method is the natural generalization of the classical conjugate gradient algorithm for Hermitian positive definite matrices to general non-Hermitian linear systems. Unfortunately, the original BCG algorithm is susceptible to possible breakdowns and numerical instabilities. Recently, Freund and Nachtigal have proposed a novel BCG type approach, the quasi-minimal residual method (QMR), which overcomes the problems of BCG. Here, an implementation is presented of QMR based on an s-step version of the nonsymmetric look-ahead Lanczos algorithm. The main feature of the s-step Lanczos algorithm is that, in general, all inner products, except for one, can be computed in parallel at the end of each block; this is unlike the other standard Lanczos process where inner products are generated sequentially. The resulting implementation of QMR is particularly attractive on massively parallel SIMD architectures, such as the Connection Machine.

  7. 3D magnetospheric parallel hybrid multi-grid method applied to planet-plasma interactions

    NASA Astrophysics Data System (ADS)

    Leclercq, L.; Modolo, R.; Leblanc, F.; Hess, S.; Mancini, M.

    2016-03-01

    We present a new method to exploit multiple refinement levels within a 3D parallel hybrid model, developed to study planet-plasma interactions. This model is based on the hybrid formalism: ions are kinetically treated whereas electrons are considered as a inertia-less fluid. Generally, ions are represented by numerical particles whose size equals the volume of the cells. Particles that leave a coarse grid subsequently entering a refined region are split into particles whose volume corresponds to the volume of the refined cells. The number of refined particles created from a coarse particle depends on the grid refinement rate. In order to conserve velocity distribution functions and to avoid calculations of average velocities, particles are not coalesced. Moreover, to ensure the constancy of particles' shape function sizes, the hybrid method is adapted to allow refined particles to move within a coarse region. Another innovation of this approach is the method developed to compute grid moments at interfaces between two refinement levels. Indeed, the hybrid method is adapted to accurately account for the special grid structure at the interfaces, avoiding any overlapping grid considerations. Some fundamental test runs were performed to validate our approach (e.g. quiet plasma flow, Alfven wave propagation). Lastly, we also show a planetary application of the model, simulating the interaction between Jupiter's moon Ganymede and the Jovian plasma.

  8. A 3D Parallel Beam Dynamics Code for Modeling High Brightness Beams in Photoinjectors

    SciTech Connect

    Qiang, Ji; Lidia, S.; Ryne, R.D.; Limborg, C.; /SLAC

    2006-02-13

    In this paper we report on IMPACT-T, a 3D beam dynamics code for modeling high brightness beams in photoinjectors and rf linacs. IMPACT-T is one of the few codes used in the photoinjector community that has a parallel implementation, making it very useful for high statistics simulations of beam halos and beam diagnostics. It has a comprehensive set of beamline elements, and furthermore allows arbitrary overlap of their fields. It is unique in its use of space-charge solvers based on an integrated Green function to efficiently and accurately treat beams with large aspect ratio, and a shifted Green function to efficiently treat image charge effects of a cathode. It is also unique in its inclusion of energy binning in the space-charge calculation to model beams with large energy spread. Together, all these features make IMPACT-T a powerful and versatile tool for modeling beams in photoinjectors and other systems. In this paper we describe the code features and present results of IMPACT-T simulations of the LCLS photoinjectors. We also include a comparison of IMPACT-T and PARMELA results.

  9. A 3d Parallel Beam Dynamics Code for Modeling High BrightnessBeams in Photoinjectors

    SciTech Connect

    Qiang, J.; Lidia, S.; Ryne, R.; Limborg, C.

    2005-05-16

    In this paper we report on IMPACT-T, a 3D beam dynamics code for modeling high brightness beams in photoinjectors and rf linacs. IMPACT-T is one of the few codes used in the photoinjector community that has a parallel implementation, making it very useful for high statistics simulations of beam halos and beam diagnostics. It has a comprehensive set of beamline elements, and furthermore allows arbitrary overlap of their fields. It is unique in its use of space-charge solvers based on an integrated Green function to efficiently and accurately treat beams with large aspect ratio, and a shifted Green function to efficiently treat image charge effects of a cathode. It is also unique in its inclusion of energy binning in the space-charge calculation to model beams with large energy spread. Together, all these features make IMPACT-T a powerful and versatile tool for modeling beams in photoinjectors and other systems. In this paper we describe the code features and present results of IMPACT-T simulations of the LCLS photoinjectors. We also include a comparison of IMPACT-T and PARMELA results.

  10. Numerical computation on massively parallel hypercubes. [Connection machine

    SciTech Connect

    McBryan, O.A.

    1986-01-01

    We describe numerical computations on the Connection Machine, a massively parallel hypercube architecture with 65,536 single-bit processors and 32 Mbytes of memory. A parallel extension of COMMON LISP, provides access to the processors and network. The rich software environment is further enhanced by a powerful virtual processor capability, which extends the degree of fine-grained parallelism beyond 1,000,000. We briefly describe the hardware and indicate the principal features of the parallel programming environment. We then present implementations of SOR, multigrid and pre-conditioned conjugate gradient algorithms for solving partial differential equations on the Connection Machine. Despite the lack of floating point hardware, computation rates above 100 megaflops have been achieved in PDE solution. Virtual processors prove to be a real advantage, easing the effort of software development while improving system performance significantly. The software development effort is also facilitated by the fact that hypercube communications prove to be fast and essentially independent of distance. 29 refs., 4 figs.

  11. The performance realities of massively parallel processors: A case study

    SciTech Connect

    Lubeck, O.M.; Simmons, M.L.; Wasserman, H.J.

    1992-07-01

    This paper presents the results of an architectural comparison of SIMD massive parallelism, as implemented in the Thinking Machines Corp. CM-2 computer, and vector or concurrent-vector processing, as implemented in the Cray Research Inc. Y-MP/8. The comparison is based primarily upon three application codes that represent Los Alamos production computing. Tests were run by porting optimized CM Fortran codes to the Y-MP, so that the same level of optimization was obtained on both machines. The results for fully-configured systems, using measured data rather than scaled data from smaller configurations, show that the Y-MP/8 is faster than the 64k CM-2 for all three codes. A simple model that accounts for the relative characteristic computational speeds of the two machines, and reduction in overall CM-2 performance due to communication or SIMD conditional execution, is included. The model predicts the performance of two codes well, but fails for the third code, because the proportion of communications in this code is very high. Other factors, such as memory bandwidth and compiler effects, are also discussed. Finally, the paper attempts to show the equivalence of the CM-2 and Y-MP programming models, and also comments on selected future massively parallel processor designs.

  12. Comparison of massively parallel hand-print segmenters

    SciTech Connect

    Wilkinson, R.A.; Garris, M.D.

    1992-09-01

    NIST has developed a massively parallel hand-print recognition system that allows components to be interchanged. Using this system, three different character segmentation algorithms have been developed and studied. They are blob coloring, histogramming, and a hybrid of the two. The blob coloring method uses connected components to isolate characters. The histogramming method locates linear spaces, which may be slanted, to segment characters. The hybrid method is an augmented histogramming method that incorporates statistically adaptive rules to decide when a histogrammed item is too large and applies blob coloring to further segment the difficult item. The hardware configuration is a serial host computer with a 1024 processor Single Instruction Multiple Data (SIMD) machine attached to it. The data used in this comparison is 'NIST Special Database 1' which contains 2100 forms from different writers where each form contains 130 digit characters distributed across 28 fields. This gives a potential 273,000 characters to be segmented. Running the massively parallel system across the 2100 forms, blob coloring required 2.1 seconds per form with an accuracy of 97.5%, histogramming required 14.4 seconds with an accuracy of 95.3%, and the hybrid method required 13.2 seconds with an accuracy of 95.4%. The results of this comparison show that the blob coloring method on a SIMD architecture is superior.

  13. Learning Quantitative Sequence-Function Relationships from Massively Parallel Experiments

    NASA Astrophysics Data System (ADS)

    Atwal, Gurinder S.; Kinney, Justin B.

    2016-03-01

    A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships—functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes"—directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.

  14. 3D parallel computations of turbofan noise propagation using a spectral element method

    NASA Astrophysics Data System (ADS)

    Taghaddosi, Farzad

    2006-12-01

    A three-dimensional code has been developed for the simulation of tone noise generated by turbofan engine inlets using computational aeroacoustics. The governing equations are the linearized Euler equations, which are further simplified to a set of equations in terms of acoustic potential, using the irrotational flow assumption, and subsequently solved in the frequency domain. Due to the special nature of acoustic wave propagation, the spatial discretization is performed using a spectral element method, where a tensor product of the nth-degree polynomials based on Chebyshev orthogonal functions is used to approximate variations within hexahedral elements. Non-reflecting boundary conditions are imposed at the far-field using a damping layer concept. This is done by augmenting the continuity equation with an additional term without modifying the governing equations as in PML methods. Solution of the linear system of equations for the acoustic problem is based on the Schur complement method, which is a nonoverlapping domain decomposition technique. The Schur matrix is first solved using a matrix-free iterative method, whose convergence is accelerated with a novel local preconditioner. The solution in the entire domain is then obtained by finding solutions in smaller subdomains. The 3D code also contains a mean flow solver based on the full potential equation in order to take into account the effects of flow variations around the nacelle on the scattering of the radiated sound field. All aspects of numerical simulations, including building and assembling the coefficient matrices, implementation of the Schur complement method, and solution of the system of equations for both the acoustic and mean flow problems are performed on multiprocessors in parallel using the resources of the CLUMEQ Supercomputer Center. A large number of test cases are presented, ranging in size from 100 000-2 000 000 unknowns for which, depending on the size of the problem, between 8-48 CPU's are

  15. Exact solutions and the consistency of 3D minimal massive gravity

    NASA Astrophysics Data System (ADS)

    Altas, Emel; Tekin, Bayram

    2015-07-01

    We show that all algebraic type-O , type-N and type-D and some Kundt-type solutions of topologically massive gravity are inherited by its holographically well-defined deformation, that is, the recently found minimal massive gravity. This construction provides a large class of constant scalar curvature solutions to the theory. We also study the consistency of the field equations both in the source-free and matter-coupled cases. Since the field equations of MMG do not come from a Lagrangian that depends on the metric and its derivatives only, it lacks the Bianchi identity valid for all nonsingular metrics. But it turns out that for the solutions of the equations, the Bianchi identity is satisfied. This is a necessary condition for the consistency of the classical field equations but not a sufficient one, since the rank-two tensor equations are susceptible to double divergence. We show that for the source-free case the double divergence of the field equations vanishes for the solutions. In the matter-coupled case, we show that the double divergences on the left-hand side and the right-hand side are equal to each other for the solutions of the theory. This construction completes the proof of the consistency of the field equations.

  16. Reconstruction for Time-Domain In Vivo EPR 3D Multigradient Oximetric Imaging—A Parallel Processing Perspective

    PubMed Central

    Dharmaraj, Christopher D.; Thadikonda, Kishan; Fletcher, Anthony R.; Doan, Phuc N.; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A.; Cook, John A.; Mitchell, James B.; Subramanian, Sankaran; Krishna, Murali C.

    2009-01-01

    Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 × 23 × 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time. PMID:19672315

  17. Reconstruction for time-domain in vivo EPR 3D multigradient oximetric imaging--a parallel processing perspective.

    PubMed

    Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C

    2009-01-01

    Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time. PMID:19672315

  18. World Wide Web interface for advanced SPECT reconstruction algorithms implemented on a remote massively parallel computer.

    PubMed

    Formiconi, A R; Passeri, A; Guelfi, M R; Masoni, M; Pupi, A; Meldolesi, U; Malfetti, P; Calori, L; Guidazzoli, A

    1997-11-01

    Data from Single Photon Emission Computed Tomography (SPECT) studies are blurred by inevitable physical phenomena occurring during data acquisition. These errors may be compensated by means of reconstruction algorithms which take into account accurate physical models of the data acquisition procedure. Unfortunately, this approach involves high memory requirements as well as a high computational burden which cannot be afforded by the computer systems of SPECT acquisition devices. In this work the possibility of accessing High Performance Computing and Networking (HPCN) resources through a World Wide Web interface for the advanced reconstruction of SPECT data in a clinical environment was investigated. An iterative algorithm with an accurate model of the variable system response was ported on the Multiple Instruction Multiple Data (MIMD) parallel architecture of a Cray T3D massively parallel computer. The system was accessible even from low cost PC-based workstations through standard TCP/IP networking. A speedup factor of 148 was predicted by the benchmarks run on the Cray T3D. A complete brain study of 30 (64 x 64) slices was reconstructed from a set of 90 (64 x 64) projections with ten iterations of the conjugate gradients algorithm in 9 s which corresponds to an actual speed-up factor of 135. The technique was extended to a more accurate 3D modeling of the system response for a true 3D reconstruction of SPECT data; the reconstruction time of the same data set with this more accurate model was 5 min. This work demonstrates the possibility of exploiting remote HPCN resources from hospital sites by means of low cost workstations using standard communication protocols and an user-friendly WWW interface without particular problems for routine use. PMID:9506406

  19. Optimal evaluation of array expressions on massively parallel machines

    NASA Technical Reports Server (NTRS)

    Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert; Teng, Shang-Hua

    1992-01-01

    We investigate the problem of evaluating FORTRAN 90 style array expressions on massively parallel distributed-memory machines. On such machines, an elementwise operation can be performed in constant time for arrays whose corresponding elements are in the same processor. If the arrays are not aligned in this manner, the cost of aligning them is part of the cost of evaluating the expression. The choice of where to perform the operation then affects this cost. We present algorithms based on dynamic programming to solve this problem efficiently for a wide variety of interconnection schemes, including multidimensional grids and rings, hypercubes, and fat-trees. We also consider expressions containing operations that change the shape of the arrays, and show that our approach extends naturally to handle this case.

  20. Applications of massively parallel computers in telemetry processing

    NASA Technical Reports Server (NTRS)

    El-Ghazawi, Tarek A.; Pritchard, Jim; Knoble, Gordon

    1994-01-01

    Telemetry processing refers to the reconstruction of full resolution raw instrumentation data with artifacts, of space and ground recording and transmission, removed. Being the first processing phase of satellite data, this process is also referred to as level-zero processing. This study is aimed at investigating the use of massively parallel computing technology in providing level-zero processing to spaceflights that adhere to the recommendations of the Consultative Committee on Space Data Systems (CCSDS). The workload characteristics, of level-zero processing, are used to identify processing requirements in high-performance computing systems. An example of level-zero functions on a SIMD MPP, such as the MasPar, is discussed. The requirements in this paper are based in part on the Earth Observing System (EOS) Data and Operation System (EDOS).

  1. A Computational Fluid Dynamics Algorithm on a Massively Parallel Computer

    NASA Technical Reports Server (NTRS)

    Jespersen, Dennis C.; Levit, Creon

    1989-01-01

    The discipline of computational fluid dynamics is demanding ever-increasing computational power to deal with complex fluid flow problems. We investigate the performance of a finite-difference computational fluid dynamics algorithm on a massively parallel computer, the Connection Machine. Of special interest is an implicit time-stepping algorithm; to obtain maximum performance from the Connection Machine, it is necessary to use a nonstandard algorithm to solve the linear systems that arise in the implicit algorithm. We find that the Connection Machine ran achieve very high computation rates on both explicit and implicit algorithms. The performance of the Connection Machine puts it in the same class as today's most powerful conventional supercomputers.

  2. Beam dynamics calculations and particle tracking using massively parallel processors

    SciTech Connect

    Ryne, R.D.; Habib, S.

    1995-12-31

    During the past decade massively parallel processors (MPPs) have slowly gained acceptance within the scientific community. At present these machines typically contain a few hundred to one thousand off-the-shelf microprocessors and a total memory of up to 32 GBytes. The potential performance of these machines is illustrated by the fact that a month long job on a high end workstation might require only a few hours on an MPP. The acceptance of MPPs has been slow for a variety of reasons. For example, some algorithms are not easily parallelizable. Also, in the past these machines were difficult to program. But in recent years the development of Fortran-like languages such as CM Fortran and High Performance Fortran have made MPPs much easier to use. In the following we will describe how MPPs can be used for beam dynamics calculations and long term particle tracking.

  3. Integration of IR focal plane arrays with massively parallel processor

    NASA Astrophysics Data System (ADS)

    Esfandiari, P.; Koskey, P.; Vaccaro, K.; Buchwald, W.; Clark, F.; Krejca, B.; Rekeczky, C.; Zarandy, A.

    2008-04-01

    The intent of this investigation is to replace the low fill factor visible sensor of a Cellular Neural Network (CNN) processor with an InGaAs Focal Plane Array (FPA) using both bump bonding and epitaxial layer transfer techniques for use in the Ballistic Missile Defense System (BMDS) interceptor seekers. The goal is to fabricate a massively parallel digital processor with a local as well as a global interconnect architecture. Currently, this unique CNN processor is capable of processing a target scene in excess of 10,000 frames per second with its visible sensor. What makes the CNN processor so unique is that each processing element includes memory, local data storage, local and global communication devices and a visible sensor supported by a programmable analog or digital computer program.

  4. Transmissive Nanohole Arrays for Massively-Parallel Optical Biosensing

    PubMed Central

    2015-01-01

    A high-throughput optical biosensing technique is proposed and demonstrated. This hybrid technique combines optical transmission of nanoholes with colorimetric silver staining. The size and spacing of the nanoholes are chosen so that individual nanoholes can be independently resolved in massive parallel using an ordinary transmission optical microscope, and, in place of determining a spectral shift, the brightness of each nanohole is recorded to greatly simplify the readout. Each nanohole then acts as an independent sensor, and the blocking of nanohole optical transmission by enzymatic silver staining defines the specific detection of a biological agent. Nearly 10000 nanoholes can be simultaneously monitored under the field of view of a typical microscope. As an initial proof of concept, biotinylated lysozyme (biotin-HEL) was used as a model analyte, giving a detection limit as low as 0.1 ng/mL. PMID:25530982

  5. Development of a massively parallel parachute performance prediction code

    SciTech Connect

    Peterson, C.W.; Strickland, J.H.; Wolfe, W.P.; Sundberg, W.D.; McBride, D.D.

    1997-04-01

    The Department of Energy has given Sandia full responsibility for the complete life cycle (cradle to grave) of all nuclear weapon parachutes. Sandia National Laboratories is initiating development of a complete numerical simulation of parachute performance, beginning with parachute deployment and continuing through inflation and steady state descent. The purpose of the parachute performance code is to predict the performance of stockpile weapon parachutes as these parachutes continue to age well beyond their intended service life. A new massively parallel computer will provide unprecedented speed and memory for solving this complex problem, and new software will be written to treat the coupled fluid, structure and trajectory calculations as part of a single code. Verification and validation experiments have been proposed to provide the necessary confidence in the computations.

  6. Efficient Identification of Assembly Neurons within Massively Parallel Spike Trains

    PubMed Central

    Berger, Denise; Borgelt, Christian; Louis, Sebastien; Morrison, Abigail; Grün, Sonja

    2010-01-01

    The chance of detecting assembly activity is expected to increase if the spiking activities of large numbers of neurons are recorded simultaneously. Although such massively parallel recordings are now becoming available, methods able to analyze such data for spike correlation are still rare, as a combinatorial explosion often makes it infeasible to extend methods developed for smaller data sets. By evaluating pattern complexity distributions the existence of correlated groups can be detected, but their member neurons cannot be identified. In this contribution, we present approaches to actually identify the individual neurons involved in assemblies. Our results may complement other methods and also provide a way to reduce data sets to the “relevant” neurons, thus allowing us to carry out a refined analysis of the detailed correlation structure due to reduced computation time. PMID:19809521

  7. Massively parallel high-order combinatorial genetics in human cells

    PubMed Central

    Wong, Alan S L; Choi, Gigi C G; Cheng, Allen A; Purcell, Oliver; Lu, Timothy K

    2016-01-01

    The systematic functional analysis of combinatorial genetics has been limited by the throughput that can be achieved and the order of complexity that can be studied. To enable massively parallel characterization of genetic combinations in human cells, we developed a technology for rapid, scalable assembly of high-order barcoded combinatorial genetic libraries that can be quantified with high-throughput sequencing. We applied this technology, combinatorial genetics en masse (CombiGEM), to create high-coverage libraries of 1,521 two-wise and 51,770 three-wise barcoded combinations of 39 human microRNA (miRNA) precursors. We identified miRNA combinations that synergistically sensitize drug-resistant cancer cells to chemotherapy and/or inhibit cancer cell proliferation, providing insights into complex miRNA networks. More broadly, our method will enable high-throughput profiling of multifactorial genetic combinations that regulate phenotypes of relevance to biomedicine, biotechnology and basic science. PMID:26280411

  8. Field-Scale, Massively Parallel Simulation of Production from Oceanic Gas Hydrate Deposits

    NASA Astrophysics Data System (ADS)

    Reagan, M. T.; Moridis, G. J.; Freeman, C. M.; Pan, L.; Boyle, K. L.; Johnson, J. N.; Husebo, J. A.

    2012-12-01

    The quantity of hydrocarbon gases trapped in natural hydrate accumulations is enormous, leading to significant interest in the evaluation of their potential as an energy source. It has been shown that large volumes of gas can be readily produced at high rates for long times from some types of methane hydrate accumulations by means of depressurization-induced dissociation, and using conventional technologies with horizontal or vertical well configurations. However, these systems are currently assessed using simplified or reduced-scale 3D or even 2D production simulations. In this study, we use the massively parallel TOUGH+HYDRATE code (pT+H) to assess the production potential of a large, deep-ocean hydrate reservoir and develop strategies for effective production. The simulations model a full 3D system of over 24 km2 extent, examining the productivity of vertical and horizontal wells, single or multiple wells, and explore variations in reservoir properties. Systems of up to 2.5M gridblocks, running on thousands of supercomputing nodes, are required to simulate such large systems at the highest level of detail. The simulations reveal the challenges inherent in producing from deep, relatively cold systems with extensive water-bearing channels and connectivity to large aquifers, including the difficulty of achieving depressurizing, the challenges of high water removal rates, and the complexity of production design. Also highlighted are new frontiers in large-scale reservoir simulation of coupled flow, transport, thermodynamics, and phase behavior, including the construction of large meshes, the use parallel numerical solvers and MPI, and large-scale, parallel 3D visualization of results.

  9. Massively Parallel Simulations of Diffusion in Dense Polymeric Structures

    SciTech Connect

    Faulon, Jean-Loup, Wilcox, R.T. , Hobbs, J.D. , Ford, D.M.

    1997-11-01

    An original computational technique to generate close-to-equilibrium dense polymeric structures is proposed. Diffusion of small gases are studied on the equilibrated structures using massively parallel molecular dynamics simulations running on the Intel Teraflops (9216 Pentium Pro processors) and Intel Paragon(1840 processors). Compared to the current state-of-the-art equilibration methods this new technique appears to be faster by some orders of magnitude.The main advantage of the technique is that one can circumvent the bottlenecks in configuration space that inhibit relaxation in molecular dynamics simulations. The technique is based on the fact that tetravalent atoms (such as carbon and silicon) fit in the center of a regular tetrahedron and that regular tetrahedrons can be used to mesh the three-dimensional space. Thus, the problem of polymer equilibration described by continuous equations in molecular dynamics is reduced to a discrete problem where solutions are approximated by simple algorithms. Practical modeling applications include the constructing of butyl rubber and ethylene-propylene-dimer-monomer (EPDM) models for oxygen and water diffusion calculations. Butyl and EPDM are used in O-ring systems and serve as sealing joints in many manufactured objects. Diffusion coefficients of small gases have been measured experimentally on both polymeric systems, and in general the diffusion coefficients in EPDM are an order of magnitude larger than in butyl. In order to better understand the diffusion phenomena, 10, 000 atoms models were generated and equilibrated for butyl and EPDM. The models were submitted to a massively parallel molecular dynamics simulation to monitor the trajectories of the diffusing species.

  10. TOMO3D: 3-D joint refraction and reflection traveltime tomography parallel code for active-source seismic data—synthetic test

    NASA Astrophysics Data System (ADS)

    Meléndez, A.; Korenaga, J.; Sallarès, V.; Miniussi, A.; Ranero, C. R.

    2015-10-01

    We present a new 3-D traveltime tomography code (TOMO3D) for the modelling of active-source seismic data that uses the arrival times of both refracted and reflected seismic phases to derive the velocity distribution and the geometry of reflecting boundaries in the subsurface. This code is based on its popular 2-D version TOMO2D from which it inherited the methods to solve the forward and inverse problems. The traveltime calculations are done using a hybrid ray-tracing technique combining the graph and bending methods. The LSQR algorithm is used to perform the iterative regularized inversion to improve the initial velocity and depth models. In order to cope with an increased computational demand due to the incorporation of the third dimension, the forward problem solver, which takes most of the run time (˜90 per cent in the test presented here), has been parallelized with a combination of multi-processing and message passing interface standards. This parallelization distributes the ray-tracing and traveltime calculations among available computational resources. The code's performance is illustrated with a realistic synthetic example, including a checkerboard anomaly and two reflectors, which simulates the geometry of a subduction zone. The code is designed to invert for a single reflector at a time. A data-driven layer-stripping strategy is proposed for cases involving multiple reflectors, and it is tested for the successive inversion of the two reflectors. Layers are bound by consecutive reflectors, and an initial velocity model for each inversion step incorporates the results from previous steps. This strategy poses simpler inversion problems at each step, allowing the recovery of strong velocity discontinuities that would otherwise be smoothened.

  11. The role of MHD in 3D aspects of massive gas injection

    NASA Astrophysics Data System (ADS)

    Izzo, V. A.; Parks, P. B.; Eidietis, N. W.; Shiraki, D.; Hollmann, E. M.; Commaux, N.; Granetz, R. S.; Humphreys, D. A.; Lasnier, C. J.; Moyer, R. A.; Paz-Soldan, C.; Raman, R.; Strait, E. J.

    2015-07-01

    Simulations of massive gas injection for disruption mitigation in DIII-D are carried out to compare the toroidal peaking of radiated power for the cases of one and two gas jets. The radiation toroidal peaking factor (TPF) results from a combination of the distribution of impurities and the distribution of heat flux associated with the n=1 mode. When ignoring the effects of strong uni-directional neutral beam injection and rotation present in the experiment, the injected impurities are found to spread helically along field lines preferentially toward the high-field-side, which is explained in terms of a nozzle equation. Therefore when considering the plasma rest frame, reversing the current direction also reverses the toroidal direction of impurity spreading. During the pre-thermal quench phase of the disruption, the toroidal peaking of radiated power is reduced in a straightforward manner by increasing from one to two gas jets. However, during the thermal quench phase, reduction in the TPF is achieved only for a particular arrangement of the two gas valves with respect to the field line pitch. In particular, the relationship between the two valve locations and the 1/1 mode phase is critical, where gas valve spacing that is coherent with 1/1 symmetry effectively reduces TPF.

  12. Three-dimensional electromagnetic modeling and inversion on massively parallel computers

    SciTech Connect

    Newman, G.A.; Alumbaugh, D.L.

    1996-03-01

    This report has demonstrated techniques that can be used to construct solutions to the 3-D electromagnetic inverse problem using full wave equation modeling. To this point great progress has been made in developing an inverse solution using the method of conjugate gradients which employs a 3-D finite difference solver to construct model sensitivities and predicted data. The forward modeling code has been developed to incorporate absorbing boundary conditions for high frequency solutions (radar), as well as complex electrical properties, including electrical conductivity, dielectric permittivity and magnetic permeability. In addition both forward and inverse codes have been ported to a massively parallel computer architecture which allows for more realistic solutions that can be achieved with serial machines. While the inversion code has been demonstrated on field data collected at the Richmond field site, techniques for appraising the quality of the reconstructions still need to be developed. Here it is suggested that rather than employing direct matrix inversion to construct the model covariance matrix which would be impossible because of the size of the problem, one can linearize about the 3-D model achieved in the inverse and use Monte-Carlo simulations to construct it. Using these appraisal and construction tools, it is now necessary to demonstrate 3-D inversion for a variety of EM data sets that span the frequency range from induction sounding to radar: below 100 kHz to 100 MHz. Appraised 3-D images of the earth`s electrical properties can provide researchers opportunities to infer the flow paths, flow rates and perhaps the chemistry of fluids in geologic mediums. It also offers a means to study the frequency dependence behavior of the properties in situ. This is of significant relevance to the Department of Energy, paramount to characterizing and monitoring of environmental waste sites and oil and gas exploration.

  13. Digital relief 3D model of the Khibiny massive (Kola peninsula)

    NASA Astrophysics Data System (ADS)

    Chesalova, Elena; Asavin, Alex

    2015-04-01

    On the basis of maps of 1: 50,000 and 1: 200,000 3D model Khibiny massif developed. We used software ARC / INFO v10.2 ESRI. This project will be organised to build background for gas pollution monitoring network. We planned to use the model to estimate local heterogeneities in the composition of the atmosphere at the emanation of greenhouse gases in the area, the construction of models of vertical distribution of the content of trace gases in the rock mass. In addition to the project GIS digital elevation model contains layers of geological and tectonic map that allows us to estimate the area of the output of certain petrographic rock groups characterized by different ratios of emitted hydrocarbons (CH4/ H2). The model allows to construct a classification of fault in the array. At first glance, there are two groups of faults - the ancient associated with the formation of the intrusive phases sequence, and the young - due to recent tectonic shifts. Ancient faults form a common semicircular structure of the pluton cause overall asymmetry Khibin heights with the transition to the border area between the Khibiny and Lovoozero. Modern tectonics mainly represented by radial and chord faults which are formed narrow mountain valleys and troughs. It remains an open question as to which system fault (old or young) is more productive to gas emanations? On the one hand the system characterized by a large old depth, on the other hand a young more active faults. Address these issues require further detailed observations. The essential question is to assess the possibility of maintaining a constant concentration gradient of these impurities in the atmosphere due to gas emanations of fracture zones and areas enriched occluded gases. In the simulation of these processes can be used initially set parameters: 1 the flow rate of the gas impurities 2 the value of wind flows in closed and open valley 3 Assessment of thermal diffusion coefficients determined by the temperature gradient

  14. Massively parallel electrical conductivity imaging of the subsurface: Applications to hydrocarbon exploration

    SciTech Connect

    Newman, G.A.; Commer, M.

    2009-06-01

    Three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical conductivity mapping of potential offshore oil and gas reservoirs. The imaging technology employs controlled source electromagnetic (CSEM) and magnetotelluric (MT) fields and treats geological media exhibiting transverse anisotropy. Moreover when combined with established seismic methods, direct imaging of reservoir fluids is possible. Because of the size of the 3D conductivity imaging problem, strategies are required exploiting computational parallelism and optimal meshing. The algorithm thus developed has been shown to scale to tens of thousands of processors. In one imaging experiment, 32,768 tasks/processors on the IBM Watson Research Blue Gene/L supercomputer were successfully utilized. Over a 24 hour period we were able to image a large scale field data set that previously required over four months of processing time on distributed clusters based on Intel or AMD processors utilizing 1024 tasks on an InfiniBand fabric. Electrical conductivity imaging using massively parallel computational resources produces results that cannot be obtained otherwise and are consistent with timeframes required for practical exploration problems.

  15. Massively parallel electrical conductivity imaging of the subsurface: Applications to hydrocarbon exploration

    NASA Astrophysics Data System (ADS)

    Newman, Gregory A.; Commer, Michael

    2009-07-01

    Three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical conductivity mapping of potential offshore oil and gas reservoirs. The imaging technology employs controlled source electromagnetic (CSEM) and magnetotelluric (MT) fields and treats geological media exhibiting transverse anisotropy. Moreover when combined with established seismic methods, direct imaging of reservoir fluids is possible. Because of the size of the 3D conductivity imaging problem, strategies are required exploiting computational parallelism and optimal meshing. The algorithm thus developed has been shown to scale to tens of thousands of processors. In one imaging experiment, 32,768 tasks/processors on the IBM Watson Research Blue Gene/L supercomputer were successfully utilized. Over a 24 hour period we were able to image a large scale field data set that previously required over four months of processing time on distributed clusters based on Intel or AMD processors utilizing 1024 tasks on an InfiniBand fabric. Electrical conductivity imaging using massively parallel computational resources produces results that cannot be obtained otherwise and are consistent with timeframes required for practical exploration problems.

  16. Particle simulation of plasmas on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Gledhill, I. M. A.; Storey, L. R. O.

    1987-01-01

    Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.

  17. Massively parallel simulations of multiphase flows using Lattice Boltzmann methods

    NASA Astrophysics Data System (ADS)

    Ahrenholz, Benjamin

    2010-03-01

    In the last two decades the lattice Boltzmann method (LBM) has matured as an alternative and efficient numerical scheme for the simulation of fluid flows and transport problems. Unlike conventional numerical schemes based on discretizations of macroscopic continuum equations, the LBM is based on microscopic models and mesoscopic kinetic equations. The fundamental idea of the LBM is to construct simplified kinetic models that incorporate the essential physics of microscopic or mesoscopic processes so that the macroscopic averaged properties obey the desired macroscopic equations. Especially applications involving interfacial dynamics, complex and/or changing boundaries and complicated constitutive relationships which can be derived from a microscopic picture are suitable for the LBM. In this talk a modified and optimized version of a Gunstensen color model is presented to describe the dynamics of the fluid/fluid interface where the flow field is based on a multi-relaxation-time model. Based on that modeling approach validation studies of contact line motion are shown. Due to the fact that the LB method generally needs only nearest neighbor information, the algorithm is an ideal candidate for parallelization. Hence, it is possible to perform efficient simulations in complex geometries at a large scale by massively parallel computations. Here, the results of drainage and imbibition (Degree of Freedom > 2E11) in natural porous media gained from microtomography methods are presented. Those fully resolved pore scale simulations are essential for a better understanding of the physical processes in porous media and therefore important for the determination of constitutive relationships.

  18. Efficiently modeling neural networks on massively parallel computers

    NASA Technical Reports Server (NTRS)

    Farber, Robert M.

    1993-01-01

    Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.

  19. Reactor Dosimetry Applications Using RAPTOR-M3G:. a New Parallel 3-D Radiation Transport Code

    NASA Astrophysics Data System (ADS)

    Longoni, Gianluca; Anderson, Stanwood L.

    2009-08-01

    The numerical solution of the Linearized Boltzmann Equation (LBE) via the Discrete Ordinates method (SN) requires extensive computational resources for large 3-D neutron and gamma transport applications due to the concurrent discretization of the angular, spatial, and energy domains. This paper will discuss the development RAPTOR-M3G (RApid Parallel Transport Of Radiation - Multiple 3D Geometries), a new 3-D parallel radiation transport code, and its application to the calculation of ex-vessel neutron dosimetry responses in the cavity of a commercial 2-loop Pressurized Water Reactor (PWR). RAPTOR-M3G is based domain decomposition algorithms, where the spatial and angular domains are allocated and processed on multi-processor computer architectures. As compared to traditional single-processor applications, this approach reduces the computational load as well as the memory requirement per processor, yielding an efficient solution methodology for large 3-D problems. Measured neutron dosimetry responses in the reactor cavity air gap will be compared to the RAPTOR-M3G predictions. This paper is organized as follows: Section 1 discusses the RAPTOR-M3G methodology; Section 2 describes the 2-loop PWR model and the numerical results obtained. Section 3 addresses the parallel performance of the code, and Section 4 concludes this paper with final remarks and future work.

  20. Large-Scale Eigenvalue Calculations for Stability Analysis of Steady Flows on Massively Parallel Computers

    SciTech Connect

    Lehoucq, Richard B.; Salinger, Andrew G.

    1999-08-01

    We present an approach for determining the linear stability of steady states of PDEs on massively parallel computers. Linearizing the transient behavior around a steady state leads to a generalized eigenvalue problem. The eigenvalues with largest real part are calculated using Arnoldi's iteration driven by a novel implementation of the Cayley transformation to recast the problem as an ordinary eigenvalue problem. The Cayley transformation requires the solution of a linear system at each Arnoldi iteration, which must be done iteratively for the algorithm to scale with problem size. A representative model problem of 3D incompressible flow and heat transfer in a rotating disk reactor is used to analyze the effect of algorithmic parameters on the performance of the eigenvalue algorithm. Successful calculations of leading eigenvalues for matrix systems of order up to 4 million were performed, identifying the critical Grashof number for a Hopf bifurcation.

  1. Massively Parallel Interrogation of Aptamer Sequence, Structure and Function

    SciTech Connect

    Fischer, N O; Tok, J B; Tarasow, T M

    2008-02-08

    Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. Methodology/Principal Findings. High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and interchip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.

  2. A parallel 3-D staggered grid pseudospectral time domain method for ground-penetrating radar wave simulation

    NASA Astrophysics Data System (ADS)

    Huang, Qinghua; Li, Zhanhui; Wang, Yanbin

    2010-12-01

    We presented a parallel 3-D staggered grid pseudospectral time domain (PSTD) method for simulating ground-penetrating radar (GPR) wave propagation. We took the staggered grid method to weaken the global effect in PSTD and developed a modified fast Fourier transform (FFT) spatial derivative operator to eliminate the wraparound effect due to the implicit periodical boundary condition in FFT operator. After the above improvements, we achieved the parallel PSTD computation based on an overlap domain decomposition method without any absorbing condition for each subdomain, which can significantly reduce the required grids in each overlap subdomain comparing with other proposed algorithms. We test our parallel technique for some numerical models and obtained consistent results with the analytical ones and/or those of the nonparallel PSTD method. The above numerical tests showed that our parallel PSTD algorithm is effective in simulating 3-D GPR wave propagation, with merits of saving computation time, as well as more flexibility in dealing with complicated models without losing the accuracy. The application of our parallel PSTD method in applied geophysics and paleoseismology based on GPR data confirmed the efficiency of our algorithm and its potential applications in various subdisciplines of solid earth geophysics. This study would also provide a useful parallel PSTD approach to the simulation of other geophysical problems on distributed memory PC cluster.

  3. Outer electrospun polycaprolactone shell induces massive foreign body reaction and impairs axonal regeneration through 3D multichannel chitosan nerve guides.

    PubMed

    Duda, Sven; Dreyer, Lutz; Behrens, Peter; Wienecke, Soenke; Chakradeo, Tanmay; Glasmacher, Birgit; Haastert-Talini, Kirsten

    2014-01-01

    We report on the performance of composite nerve grafts with an inner 3D multichannel porous chitosan core and an outer electrospun polycaprolactone shell. The inner chitosan core provided multiple guidance channels for regrowing axons. To analyze the in vivo properties of the bare chitosan cores, we separately implanted them into an epineural sheath. The effects of both graft types on structural and functional regeneration across a 10 mm rat sciatic nerve gap were compared to autologous nerve transplantation (ANT). The mechanical biomaterial properties and the immunological impact of the grafts were assessed with histological techniques before and after transplantation in vivo. Furthermore during a 13-week examination period functional tests and electrophysiological recordings were performed and supplemented by nerve morphometry. The sheathing of the chitosan core with a polycaprolactone shell induced massive foreign body reaction and impairment of nerve regeneration. Although the isolated novel chitosan core did allow regeneration of axons in a similar size distribution as the ANT, the ANT was superior in terms of functional regeneration. We conclude that an outer polycaprolactone shell should not be used for the purpose of bioartificial nerve grafting, while 3D multichannel porous chitosan cores could be candidate scaffolds for structured nerve grafts. PMID:24818158

  4. Outer Electrospun Polycaprolactone Shell Induces Massive Foreign Body Reaction and Impairs Axonal Regeneration through 3D Multichannel Chitosan Nerve Guides

    PubMed Central

    Behrens, Peter; Wienecke, Soenke; Chakradeo, Tanmay; Glasmacher, Birgit

    2014-01-01

    We report on the performance of composite nerve grafts with an inner 3D multichannel porous chitosan core and an outer electrospun polycaprolactone shell. The inner chitosan core provided multiple guidance channels for regrowing axons. To analyze the in vivo properties of the bare chitosan cores, we separately implanted them into an epineural sheath. The effects of both graft types on structural and functional regeneration across a 10 mm rat sciatic nerve gap were compared to autologous nerve transplantation (ANT). The mechanical biomaterial properties and the immunological impact of the grafts were assessed with histological techniques before and after transplantation in vivo. Furthermore during a 13-week examination period functional tests and electrophysiological recordings were performed and supplemented by nerve morphometry. The sheathing of the chitosan core with a polycaprolactone shell induced massive foreign body reaction and impairment of nerve regeneration. Although the isolated novel chitosan core did allow regeneration of axons in a similar size distribution as the ANT, the ANT was superior in terms of functional regeneration. We conclude that an outer polycaprolactone shell should not be used for the purpose of bioartificial nerve grafting, while 3D multichannel porous chitosan cores could be candidate scaffolds for structured nerve grafts. PMID:24818158

  5. Seismic waves modeling with the Fourier pseudo-spectral method on massively parallel machines.

    NASA Astrophysics Data System (ADS)

    Klin, Peter

    2015-04-01

    The Fourier pseudo-spectral method (FPSM) is an approach for the 3D numerical modeling of the wave propagation, which is based on the discretization of the spatial domain in a structured grid and relies on global spatial differential operators for the solution of the wave equation. This last peculiarity is advantageous from the accuracy point of view but poses difficulties for an efficient implementation of the method to be run on parallel computers with distributed memory architecture. The 1D spatial domain decomposition approach has been so far commonly adopted in the parallel implementations of the FPSM, but it implies an intensive data exchange among all the processors involved in the computation, which can degrade the performance because of communication latencies. Moreover, the scalability of the 1D domain decomposition is limited, since the number of processors can not exceed the number of grid points along the directions in which the domain is partitioned. This limitation inhibits an efficient exploitation of the computational environments with a very large number of processors. In order to overcome the limitations of the 1D domain decomposition we implemented a parallel version of the FPSM based on a 2D domain decomposition, which allows to achieve a higher degree of parallelism and scalability on massively parallel machines with several thousands of processing elements. The parallel programming is essentially achieved using the MPI protocol but OpenMP parts are also included in order to exploit the single processor multi - threading capabilities, when available. The developed tool is aimed at the numerical simulation of the seismic waves propagation and in particular is intended for earthquake ground motion research. We show the scalability tests performed up to 16k processing elements on the IBM Blue Gene/Q computer at CINECA (Italy), as well as the application to the simulation of the earthquake ground motion in the alluvial plain of the Po river (Italy).

  6. Characterization of a parallel-beam CCD optical-CT apparatus for 3D radiation dosimetry.

    PubMed

    Krstajić, Nikola; Doran, Simon J

    2007-07-01

    3D measurement of optical attenuation is of interest in a variety of fields of biomedical importance, including spectrophotometry, optical projection tomography (OPT) and analysis of 3D radiation dosimeters. Accurate, precise and economical 3D measurements of optical density (OD) are a crucial step in enabling 3D radiation dosimeters to enter wider use in clinics. Polymer gels and Fricke gels, as well as dosimeters not based around gels, have been characterized for 3D dosimetry over the last two decades. A separate problem is the verification of the best readout method. A number of different imaging modalities (magnetic resonance imaging (MRI), optical CT, x-ray CT and ultrasound) have been suggested for the readout of information from 3D dosimeters. To date only MRI and laser-based optical CT have been characterized in detail. This paper describes some initial steps we have taken in establishing charge coupled device (CCD)-based optical CT as a viable alternative to MRI for readout of 3D radiation dosimeters. The main advantage of CCD-based optical CT over traditional laser-based optical CT is a speed increase of at least an order of magnitude, while the simplicity of its architecture would lend itself to cheaper implementation than both MRI and laser-based optical CT if the camera itself were inexpensive enough. Specifically, we study the following aspects of optical metrology, using high quality test targets: (i) calibration and quality of absorbance measurements and the camera requirements for 3D dosimetry; (ii) the modulation transfer function (MTF) of individual projections; (iii) signal-to-noise ratio (SNR) in the projection and reconstruction domains; (iv) distortion in the projection domain, depth-of-field (DOF) and telecentricity. The principal results for our current apparatus are as follows: (i) SNR of optical absorbance in projections is better than 120:1 for uniform phantoms in absorbance range 0.3 to 1.6 (and better than 200:1 for absorbances 1.0 to

  7. Characterization of a parallel-beam CCD optical-CT apparatus for 3D radiation dosimetry

    NASA Astrophysics Data System (ADS)

    Krstajic, Nikola; Doran, Simon J.

    2007-07-01

    3D measurement of optical attenuation is of interest in a variety of fields of biomedical importance, including spectrophotometry, optical projection tomography (OPT) and analysis of 3D radiation dosimeters. Accurate, precise and economical 3D measurements of optical density (OD) are a crucial step in enabling 3D radiation dosimeters to enter wider use in clinics. Polymer gels and Fricke gels, as well as dosimeters not based around gels, have been characterized for 3D dosimetry over the last two decades. A separate problem is the verification of the best readout method. A number of different imaging modalities (magnetic resonance imaging (MRI), optical CT, x-ray CT and ultrasound) have been suggested for the readout of information from 3D dosimeters. To date only MRI and laser-based optical CT have been characterized in detail. This paper describes some initial steps we have taken in establishing charge coupled device (CCD)-based optical CT as a viable alternative to MRI for readout of 3D radiation dosimeters. The main advantage of CCD-based optical CT over traditional laser-based optical CT is a speed increase of at least an order of magnitude, while the simplicity of its architecture would lend itself to cheaper implementation than both MRI and laser-based optical CT if the camera itself were inexpensive enough. Specifically, we study the following aspects of optical metrology, using high quality test targets: (i) calibration and quality of absorbance measurements and the camera requirements for 3D dosimetry; (ii) the modulation transfer function (MTF) of individual projections; (iii) signal-to-noise ratio (SNR) in the projection and reconstruction domains; (iv) distortion in the projection domain, depth-of-field (DOF) and telecentricity. The principal results for our current apparatus are as follows: (i) SNR of optical absorbance in projections is better than 120:1 for uniform phantoms in absorbance range 0.3 to 1.6 (and better than 200:1 for absorbances 1.0 to

  8. Analysis of composite ablators using massively parallel computation

    NASA Technical Reports Server (NTRS)

    Shia, David

    1995-01-01

    In this work, the feasibility of using massively parallel computation to study the response of ablative materials is investigated. Explicit and implicit finite difference methods are used on a massively parallel computer, the Thinking Machines CM-5. The governing equations are a set of nonlinear partial differential equations. The governing equations are developed for three sample problems: (1) transpiration cooling, (2) ablative composite plate, and (3) restrained thermal growth testing. The transpiration cooling problem is solved using a solution scheme based solely on the explicit finite difference method. The results are compared with available analytical steady-state through-thickness temperature and pressure distributions and good agreement between the numerical and analytical solutions is found. It is also found that a solution scheme based on the explicit finite difference method has the following advantages: incorporates complex physics easily, results in a simple algorithm, and is easily parallelizable. However, a solution scheme of this kind needs very small time steps to maintain stability. A solution scheme based on the implicit finite difference method has the advantage that it does not require very small times steps to maintain stability. However, this kind of solution scheme has the disadvantages that complex physics cannot be easily incorporated into the algorithm and that the solution scheme is difficult to parallelize. A hybrid solution scheme is then developed to combine the strengths of the explicit and implicit finite difference methods and minimize their weaknesses. This is achieved by identifying the critical time scale associated with the governing equations and applying the appropriate finite difference method according to this critical time scale. The hybrid solution scheme is then applied to the ablative composite plate and restrained thermal growth problems. The gas storage term is included in the explicit pressure calculation of both

  9. gEMfitter: a highly parallel FFT-based 3D density fitting tool with GPU texture memory acceleration.

    PubMed

    Hoang, Thai V; Cavin, Xavier; Ritchie, David W

    2013-11-01

    Fitting high resolution protein structures into low resolution cryo-electron microscopy (cryo-EM) density maps is an important technique for modeling the atomic structures of very large macromolecular assemblies. This article presents "gEMfitter", a highly parallel fast Fourier transform (FFT) EM density fitting program which can exploit the special hardware properties of modern graphics processor units (GPUs) to accelerate both the translational and rotational parts of the correlation search. In particular, by using the GPU's special texture memory hardware to rotate 3D voxel grids, the cost of rotating large 3D density maps is almost completely eliminated. Compared to performing 3D correlations on one core of a contemporary central processor unit (CPU), running gEMfitter on a modern GPU gives up to 26-fold speed-up. Furthermore, using our parallel processing framework, this speed-up increases linearly with the number of CPUs or GPUs used. Thus, it is now possible to use routinely more robust but more expensive 3D correlation techniques. When tested on low resolution experimental cryo-EM data for the GroEL-GroES complex, we demonstrate the satisfactory fitting results that may be achieved by using a locally normalised cross-correlation with a Laplacian pre-filter, while still being up to three orders of magnitude faster than the well-known COLORES program. PMID:24060989

  10. Scalable High Performance Computing: Direct and Large-Eddy Turbulent Flow Simulations Using Massively Parallel Computers

    NASA Technical Reports Server (NTRS)

    Morgan, Philip E.

    2004-01-01

    This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.

  11. Massively Parallel Processing for Fast and Accurate Stamping Simulations

    NASA Astrophysics Data System (ADS)

    Gress, Jeffrey J.; Xu, Siguang; Joshi, Ramesh; Wang, Chuan-tao; Paul, Sabu

    2005-08-01

    The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.

  12. Cloud identification using genetic algorithms and massively parallel computation

    NASA Technical Reports Server (NTRS)

    Buckles, Bill P.; Petry, Frederick E.

    1996-01-01

    As a Guest Computational Investigator under the NASA administered component of the High Performance Computing and Communication Program, we implemented a massively parallel genetic algorithm on the MasPar SIMD computer. Experiments were conducted using Earth Science data in the domains of meteorology and oceanography. Results obtained in these domains are competitive with, and in most cases better than, similar problems solved using other methods. In the meteorological domain, we chose to identify clouds using AVHRR spectral data. Four cloud speciations were used although most researchers settle for three. Results were remarkedly consistent across all tests (91% accuracy). Refinements of this method may lead to more timely and complete information for Global Circulation Models (GCMS) that are prevalent in weather forecasting and global environment studies. In the oceanographic domain, we chose to identify ocean currents from a spectrometer having similar characteristics to AVHRR. Here the results were mixed (60% to 80% accuracy). Given that one is willing to run the experiment several times (say 10), then it is acceptable to claim the higher accuracy rating. This problem has never been successfully automated. Therefore, these results are encouraging even though less impressive than the cloud experiment. Successful conclusion of an automated ocean current detection system would impact coastal fishing, naval tactics, and the study of micro-climates. Finally we contributed to the basic knowledge of GA (genetic algorithm) behavior in parallel environments. We developed better knowledge of the use of subpopulations in the context of shared breeding pools and the migration of individuals. Rigorous experiments were conducted based on quantifiable performance criteria. While much of the work confirmed current wisdom, for the first time we were able to submit conclusive evidence. The software developed under this grant was placed in the public domain. An extensive user

  13. Comparing current cluster, massively parallel, and accelerated systems

    SciTech Connect

    Barker, Kevin J; Davis, Kei; Hoisie, Adolfy; Kerbyson, Darren J; Pakin, Scott; Lang, Mike; Sancho Pitarch, Jose C

    2010-01-01

    Currently there is large architectural diversity in high perfonnance computing systems. They include 'commodity' cluster systems that optimize per-node performance for small jobs, massively parallel processors (MPPs) that optimize aggregate perfonnance for large jobs, and accelerated systems that optimize both per-node and aggregate performance but only for applications custom-designed to take advantage of such systems. Because of these dissimilarities, meaningful comparisons of achievable performance are not straightforward. In this work we utilize a methodology that combines both empirical analysis and performance modeling to compare clusters (represented by a 4,352-core IB cluster), MPPs (represented by a 147,456-core BG/P), and accelerated systems (represented by the 129,600-core Roadrunner) across a workload of four applications. Strengths of our approach include the ability to compare architectures - as opposed to specific implementations of an architecture - attribute each application's performance bottlenecks to characteristics unique to each system, and to explore performance scenarios in advance of their availability for measurement. Our analysis illustrates that application performance is essentially unrelated to relative peak performance but that application performance can be both predicted and explained using modeling.

  14. Investigation of reflective notching with massively parallel simulation

    NASA Astrophysics Data System (ADS)

    Tadros, Karim H.; Neureuther, Andrew R.; Gamelin, John K.; Guerrieri, Roberto

    1990-06-01

    A massively parallel simulation program TEMPEST is used to investigate the role of topography in generating reflective notching and to study the possibility of reducing effects through the introduction of special properties of resists and antireflection coating materials. The emphasis is on examining physical scattering mechanisms such as focused specular reflections resist thickness interference effects reflections from substrate grains and focusing of incident light by the resist curvature. Specular reflection from topography can focus incident radiation causing a 10-fold increase in effective exposure. Further complications such as dimples in the surface of positive resist features can result from a second reflection of focused energy by the resist/air interface. Variations in line-edge exposure due to substrate grain structure are primarily specular in nature and can become significant for grains larger than )tresi Local exposure variations due to vertical standing waves and changes in energy coupling due to changes in resist thickness are displaced laterally and are significant effects even though they are slightly less severe than vertical wave propagation theory suggests. Focusing effects due to refraction by the curved surface of the resist produce only minor changes in exposure. Increased resist contrast and resist absorption offer some improvement in reducing notching effects though minimizing substrate reflectivity is more effective. CPU time using 32 virtual nodes to simulate a 4 pm by 2 pm isolated domain with 13 bleaching steps was 30 minutes

  15. A high-plex PCR approach for massively parallel sequencing.

    PubMed

    Nguyen-Dumont, Tú; Pope, Bernard J; Hammet, Fleur; Southey, Melissa C; Park, Daniel J

    2013-08-01

    Current methods for targeted massively parallel sequencing (MPS) have several drawbacks, including limited design flexibility, expense, and protocol complexity, which restrict their application to settings involving modest target size and requiring low cost and high throughput. To address this, we have developed Hi-Plex, a PCR-MPS strategy intended for high-throughput screening of multiple genomic target regions that integrates simple, automated primer design software to control product size. Featuring permissive thermocycling conditions and clamp bias reduction, our protocol is simple, cost- and time-effective, uses readily available reagents, does not require expensive instrumentation, and requires minimal optimization. In a 60-plex assay targeting the breast cancer predisposition genes PALB2 and XRCC2, we applied Hi-Plex to 100 ng LCL-derived DNA, and 100 ng and 25 ng FFPE tumor-derived DNA. Altogether, at least 86.94% of the human genome-mapped reads were on target, and 100% of targeted amplicons were represented within 25-fold of the mean. Using 25 ng FFPE-derived DNA, 95.14% of mapped reads were on-target and relative representation ranged from 10.1-fold lower to 5.8-fold higher than the mean. These results were obtained using only the initial automatically-designed primers present in equal concentration. Hi-Plex represents a powerful new approach for screening panels of genomic target regions. PMID:23931594

  16. Wavelet-Based DFT calculations on Massively Parallel Hybrid Architectures

    NASA Astrophysics Data System (ADS)

    Genovese, Luigi

    2011-03-01

    In this contribution, we present an implementation of a full DFT code that can run on massively parallel hybrid CPU-GPU clusters. Our implementation is based on modern GPU architectures which support double-precision floating-point numbers. This DFT code, named BigDFT, is delivered within the GNU-GPL license either in a stand-alone version or integrated in the ABINIT software package. Hybrid BigDFT routines were initially ported with NVidia's CUDA language, and recently more functionalities have been added with new routines writeen within Kronos' OpenCL standard. The formalism of this code is based on Daubechies wavelets, which is a systematic real-space based basis set. As we will see in the presentation, the properties of this basis set are well suited for an extension on a GPU-accelerated environment. In addition to focusing on the implementation of the operators of the BigDFT code, this presentation also relies of the usage of the GPU resources in a complex code with different kinds of operations. A discussion on the interest of present and expected performances of Hybrid architectures computation in the framework of electronic structure calculations is also adressed.

  17. Massively parallel processor networks with optical express channels

    DOEpatents

    Deri, R.J.; Brooks, E.D. III; Haigh, R.E.; DeGroot, A.J.

    1999-08-24

    An optical method for separating and routing local and express channel data comprises interconnecting the nodes in a network with fiber optic cables. A single fiber optic cable carries both express channel traffic and local channel traffic, e.g., in a massively parallel processor (MPP) network. Express channel traffic is placed on, or filtered from, the fiber optic cable at a light frequency or a color different from that of the local channel traffic. The express channel traffic is thus placed on a light carrier that skips over the local intermediate nodes one-by-one by reflecting off of selective mirrors placed at each local node. The local-channel-traffic light carriers pass through the selective mirrors and are not reflected. A single fiber optic cable can thus be threaded throughout a three-dimensional matrix of nodes with the x,y,z directions of propagation encoded by the color of the respective light carriers for both local and express channel traffic. Thus frequency division multiple access is used to hierarchically separate the local and express channels to eliminate the bucket brigade latencies that would otherwise result if the express traffic had to hop between every local node to reach its ultimate destination. 3 figs.

  18. Massively parallel processor networks with optical express channels

    DOEpatents

    Deri, Robert J.; Brooks, III, Eugene D.; Haigh, Ronald E.; DeGroot, Anthony J.

    1999-01-01

    An optical method for separating and routing local and express channel data comprises interconnecting the nodes in a network with fiber optic cables. A single fiber optic cable carries both express channel traffic and local channel traffic, e.g., in a massively parallel processor (MPP) network. Express channel traffic is placed on, or filtered from, the fiber optic cable at a light frequency or a color different from that of the local channel traffic. The express channel traffic is thus placed on a light carrier that skips over the local intermediate nodes one-by-one by reflecting off of selective mirrors placed at each local node. The local-channel-traffic light carriers pass through the selective mirrors and are not reflected. A single fiber optic cable can thus be threaded throughout a three-dimensional matrix of nodes with the x,y,z directions of propagation encoded by the color of the respective light carriers for both local and express channel traffic. Thus frequency division multiple access is used to hierarchically separate the local and express channels to eliminate the bucket brigade latencies that would otherwise result if the express traffic had to hop between every local node to reach its ultimate destination.

  19. Massively parallel support for a case-based planning system

    NASA Technical Reports Server (NTRS)

    Kettler, Brian P.; Hendler, James A.; Anderson, William A.

    1993-01-01

    Case-based planning (CBP), a kind of case-based reasoning, is a technique in which previously generated plans (cases) are stored in memory and can be reused to solve similar planning problems in the future. CBP can save considerable time over generative planning, in which a new plan is produced from scratch. CBP thus offers a potential (heuristic) mechanism for handling intractable problems. One drawback of CBP systems has been the need for a highly structured memory to reduce retrieval times. This approach requires significant domain engineering and complex memory indexing schemes to make these planners efficient. In contrast, our CBP system, CaPER, uses a massively parallel frame-based AI language (PARKA) and can do extremely fast retrieval of complex cases from a large, unindexed memory. The ability to do fast, frequent retrievals has many advantages: indexing is unnecessary; very large case bases can be used; memory can be probed in numerous alternate ways; and queries can be made at several levels, allowing more specific retrieval of stored plans that better fit the target problem with less adaptation. In this paper we describe CaPER's case retrieval techniques and some experimental results showing its good performance, even on large case bases.

  20. Parallel Adaptive Computation of Blood Flow in a 3D ``Whole'' Body Model

    NASA Astrophysics Data System (ADS)

    Zhou, M.; Figueroa, C. A.; Taylor, C. A.; Sahni, O.; Jansen, K. E.

    2008-11-01

    Accurate numerical simulations of vascular trauma require the consideration of a larger portion of the vasculature than previously considered, due to the systemic nature of the human body's response. A patient-specific 3D model composed of 78 connected arterial branches extending from the neck to the lower legs is constructed to effectively represent the entire body. Recently developed outflow boundary conditions that appropriately represent the downstream vasculature bed which is not included in the 3D computational domain are applied at 78 outlets. In this work, the pulsatile blood flow simulations are started on a fairly uniform, unstructured mesh that is subsequently adapted using a solution-based approach to efficiently resolve the flow features. The adapted mesh contains non-uniform, anisotropic elements resulting in resolution that conforms with the physical length scales present in the problem. The effects of the mesh resolution on the flow field are studied, specifically on relevant quantities of pressure, velocity and wall shear stress.

  1. Parallel load balancing strategy for Volume-of-Fluid methods on 3-D unstructured meshes

    NASA Astrophysics Data System (ADS)

    Jofre, Lluís; Borrell, Ricard; Lehmkuhl, Oriol; Oliva, Assensi

    2015-02-01

    Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the simulation of multi-fluid flows. One of its main strengths is its accuracy in capturing sharp interface geometries, although requiring for it a number of geometric calculations. Under these circumstances, achieving parallel performance on current supercomputers is a must. The main obstacle for the parallelization is that the computing costs are concentrated only in the discrete elements that lie on the interface between fluids. Consequently, if the interface is not homogeneously distributed throughout the domain, standard domain decomposition (DD) strategies lead to imbalanced workload distributions. In this paper, we present a new parallelization strategy for general unstructured VOF solvers, based on a dynamic load balancing process complementary to the underlying DD. Its parallel efficiency has been analyzed and compared to the DD one using up to 1024 CPU-cores on an Intel SandyBridge based supercomputer. The results obtained on the solution of several artificially generated test cases show a speedup of up to ∼12× with respect to the standard DD, depending on the interface size, the initial distribution and the number of parallel processes engaged. Moreover, the new parallelization strategy presented is of general purpose, therefore, it could be used to parallelize any VOF solver without requiring changes on the coupled flow solver. Finally, note that although designed for the VOF method, our approach could be easily adapted to other interface-capturing methods, such as the Level-Set, which may present similar workload imbalances.

  2. Implementation of a 3D mixing layer code on parallel computers

    NASA Technical Reports Server (NTRS)

    Roe, K.; Thakur, R.; Dang, T.; Bogucz, E.

    1995-01-01

    This paper summarizes our progress and experience in the development of a Computational-Fluid-Dynamics code on parallel computers to simulate three-dimensional spatially-developing mixing layers. In this initial study, the three-dimensional time-dependent Euler equations are solved using a finite-volume explicit time-marching algorithm. The code was first programmed in Fortran 77 for sequential computers. The code was then converted for use on parallel computers using the conventional message-passing technique, while we have not been able to compile the code with the present version of HPF compilers.

  3. Focusing optics of a parallel beam CCD optical tomography apparatus for 3D radiation gel dosimetry.

    PubMed

    Krstajić, Nikola; Doran, Simon J

    2006-04-21

    Optical tomography of gel dosimeters is a promising and cost-effective avenue for quality control of radiotherapy treatments such as intensity-modulated radiotherapy (IMRT). Systems based on a laser coupled to a photodiode have so far shown the best results within the context of optical scanning of radiosensitive gels, but are very slow ( approximately 9 min per slice) and poorly suited to measurements that require many slices. Here, we describe a fast, three-dimensional (3D) optical computed tomography (optical-CT) apparatus, based on a broad, collimated beam, obtained from a high power LED and detected by a charged coupled detector (CCD). The main advantages of such a system are (i) an acquisition speed approximately two orders of magnitude higher than a laser-based system when 3D data are required, and (ii) a greater simplicity of design. This paper advances our previous work by introducing a new design of focusing optics, which take information from a suitably positioned focal plane and project an image onto the CCD. An analysis of the ray optics is presented, which explains the roles of telecentricity, focusing, acceptance angle and depth-of-field (DOF) in the formation of projections. A discussion of the approximation involved in measuring the line integrals required for filtered backprojection reconstruction is given. Experimental results demonstrate (i) the effect on projections of changing the position of the focal plane of the apparatus, (ii) how to measure the acceptance angle of the optics, and (iii) the ability of the new scanner to image both absorbing and scattering gel phantoms. The quality of reconstructed images is very promising and suggests that the new apparatus may be useful in a clinical setting for fast and accurate 3D dosimetry. PMID:16585845

  4. Parallel 3-D particle-in-cell modelling of charged ultrarelativistic beam dynamics

    NASA Astrophysics Data System (ADS)

    Boronina, Marina A.; Vshivkov, Vitaly A.

    2015-12-01

    > ) in supercolliders. We use the 3-D set of Maxwell's equations for the electromagnetic fields, and the Vlasov equation for the distribution function of the beam particles. The model incorporates automatically the longitudinal effects, which can play a significant role in the cases of super-high densities. We present numerical results for the dynamics of two focused ultrarelativistic beams with a size ratio 10:1:100. The results demonstrate high efficiency of the proposed computational methods and algorithms, which are applicable to a variety of problems in relativistic plasma physics.

  5. Characterization of a parallel beam CCD optical-CT apparatus for 3D radiation dosimetry

    NASA Astrophysics Data System (ADS)

    Krstajić, Nikola; Doran, Simon J.

    2006-12-01

    This paper describes the initial steps we have taken in establishing CCD based optical-CT as a viable alternative for 3-D radiation dosimetry. First, we compare the optical density (OD) measurements from a high quality test target and variable neutral density filter (VNDF). A modulation transfer function (MTF) of individual projections is derived for three positions of the sinusoidal test target within the scanning tank. Our CCD is then characterized in terms of its signal-to-noise ratio (SNR). Finally, a sample reconstruction of a scan of a PRESAGETM (registered trademark of Heuris Pharma, NJ, Skillman, USA.) dosimeter is given, demonstrating the capabilities of the apparatus.

  6. Simulations of implosions with a 3D, parallel, unstructured-grid, radiation-hydrodynamics code

    SciTech Connect

    Kaiser, T B; Milovich, J L; Prasad, M K; Rathkopf, J; Shestakov, A I

    1998-12-28

    An unstructured-grid, radiation-hydrodynamics code is used to simulate implosions. Although most of the problems are spherically symmetric, they are run on 3D, unstructured grids in order to test the code's ability to maintain spherical symmetry of the converging waves. Three problems, of increasing complexity, are presented. In the first, a cold, spherical, ideal gas bubble is imploded by an enclosing high pressure source. For the second, we add non-linear heat conduction and drive the implosion with twelve laser beams centered on the vertices of an icosahedron. In the third problem, a NIF capsule is driven with a Planckian radiation source.

  7. 3D multi-scale analysis of coupled heat and moisture transport and its parallel implementation

    NASA Astrophysics Data System (ADS)

    Kruis, Jaroslav

    2016-06-01

    Parallel implementation of two-scale model of coupled heat and moisture transport is described. The coupled heat and moisture transport is based on the Künzel model. Motivation for the two-scale analysis comes from the requirement to describe distribution of the relative humidity and temperature in historical masonry structures.

  8. PFLOTRAN: Recent Developments Facilitating Massively-Parallel Reactive Biogeochemical Transport

    NASA Astrophysics Data System (ADS)

    Hammond, G. E.

    2015-12-01

    With the recent shift towards modeling carbon and nitrogen cycling in support of climate-related initiatives, emphasis has been placed on incorporating increasingly mechanistic biogeochemistry within Earth system models to more accurately predict the response of terrestrial processes to natural and anthropogenic climate cycles. PFLOTRAN is an open-source subsurface code that is specialized for simulating multiphase flow and multicomponent biogeochemical transport on supercomputers. The object-oriented code was designed with modularity in mind and has been coupled with several third-party simulators (e.g. CLM to simulate land surface processes and E4D for coupled hydrogeophysical inversion). Central to PFLOTRAN's capabilities is its ability to simulate tightly-coupled reactive transport processes. This presentation focuses on recent enhancements to the code that enable the solution of large parameterized biogeochemical reaction networks with numerous chemical species. PFLOTRAN's "reaction sandbox" is described, which facilitates the implementation of user-defined reaction networks without the need for a comprehensive understanding of PFLOTRAN software infrastructure. The reaction sandbox is written in modern Fortran (2003-2008) and leverages encapsulation, inheritance, and polymorphism to provide the researcher with a flexible workspace for prototyping reactions within a massively parallel flow and transport simulation framework. As these prototypical reactions mature into well-accepted implementations, they can be incorporated into PFLOTRAN as native biogeochemistry capability. Users of the reaction sandbox are encouraged to upload their source code to PFLOTRAN's main source code repository, including the addition of simple regression tests to better ensure the long-term code compatibility and validity of simulation results.

  9. A new method to combine 3D reconstruction volumes for multiple parallel circular cone beam orbits

    PubMed Central

    Baek, Jongduk; Pelc, Norbert J.

    2010-01-01

    Purpose: This article presents a new reconstruction method for 3D imaging using a multiple 360° circular orbit cone beam CT system, specifically a way to combine 3D volumes reconstructed with each orbit. The main goal is to improve the noise performance in the combined image while avoiding cone beam artifacts. Methods: The cone beam projection data of each orbit are reconstructed using the FDK algorithm. When at least a portion of the total volume can be reconstructed by more than one source, the proposed combination method combines these overlap regions using weighted averaging in frequency space. The local exactness and the noise performance of the combination method were tested with computer simulations of a Defrise phantom, a FORBILD head phantom, and uniform noise in the raw data. Results: A noiseless simulation showed that the local exactness of the reconstructed volume from the source with the smallest tilt angle was preserved in the combined image. A noise simulation demonstrated that the combination method improved the noise performance compared to a single orbit reconstruction. Conclusions: In CT systems which have overlap volumes that can be reconstructed with data from more than one orbit and in which the spatial frequency content of each reconstruction can be calculated, the proposed method offers improved noise performance while keeping the local exactness of data from the source with the smallest tilt angle. PMID:21089770

  10. Simulation of the 3D viscoelastic free surface flow by a parallel corrected particle scheme

    NASA Astrophysics Data System (ADS)

    Jin-Lian, Ren; Tao, Jiang

    2016-02-01

    In this work, the behavior of the three-dimensional (3D) jet coiling based on the viscoelastic Oldroyd-B model is investigated by a corrected particle scheme, which is named the smoothed particle hydrodynamics with corrected symmetric kernel gradient and shifting particle technique (SPH_CS_SP) method. The accuracy and stability of SPH_CS_SP method is first tested by solving Poiseuille flow and Taylor-Green flow. Then the capacity for the SPH_CS_SP method to solve the viscoelastic fluid is verified by the polymer flow through a periodic array of cylinders. Moreover, the convergence of the SPH_CS_SP method is also investigated. Finally, the proposed method is further applied to the 3D viscoelastic jet coiling problem, and the influences of macroscopic parameters on the jet coiling are discussed. The numerical results show that the SPH_CS_SP method has higher accuracy and better stability than the traditional SPH method and other corrected SPH method, and can improve the tensile instability. Project supported by the Natural Science Foundation of Jiangsu Province, China (Grant Nos. BK20130436 and BK20150436) and the Natural Science Foundation of the Higher Education Institutions of Jiangsu Province, China (Grant No. 15KJB110025).

  11. User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code

    SciTech Connect

    Earth Sciences Division; Zhang, Keni; Zhang, Keni; Wu, Yu-Shu; Pruess, Karsten

    2008-05-27

    TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator is to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used

  12. Parallel microfluidic synthesis of size-tunable polymeric nanoparticles using 3D flow focusing towards in vivo study

    PubMed Central

    Lim, Jong-Min; Bertrand, Nicolas; Valencia, Pedro M.; Rhee, Minsoung; Langer, Robert; Jon, Sangyong; Farokhzad, Omid C.; Karnik, Rohit

    2014-01-01

    Microfluidic synthesis of nanoparticles (NPs) can enhance the controllability and reproducibility in physicochemical properties of NPs compared to bulk synthesis methods. However, applications of microfluidic synthesis are typically limited to in vitro studies due to low production rates. Herein, we report the parallelization of NP synthesis by 3D hydrodynamic flow focusing (HFF) using a multilayer microfluidic system to enhance the production rate without losing the advantages of reproducibility, controllability, and robustness. Using parallel 3D HFF, polymeric poly(lactide-co-glycolide)-b-polyethyleneglycol (PLGA-PEG) NPs with sizes tunable in the range of 13–150 nm could be synthesized reproducibly with high production rate. As a proof of concept, we used this system to perform in vivo pharmacokinetic and biodistribution study of small (20 nm diameter) PLGA-PEG NPs that are otherwise difficult to synthesize. Microfluidic parallelization thus enables synthesis of NPs with tunable properties with production rates suitable for both in vitro and in vivo studies. PMID:23969105

  13. Massively parallel computational fluid dynamics calculations for aerodynamics and aerothermodynamics applications

    SciTech Connect

    Payne, J.L.; Hassan, B.

    1998-09-01

    Massively parallel computers have enabled the analyst to solve complicated flow fields (turbulent, chemically reacting) that were previously intractable. Calculations are presented using a massively parallel CFD code called SACCARA (Sandia Advanced Code for Compressible Aerothermodynamics Research and Analysis) currently under development at Sandia National Laboratories as part of the Department of Energy (DOE) Accelerated Strategic Computing Initiative (ASCI). Computations were made on a generic reentry vehicle in a hypersonic flowfield utilizing three different distributed parallel computers to assess the parallel efficiency of the code with increasing numbers of processors. The parallel efficiencies for the SACCARA code will be presented for cases using 1, 150, 100 and 500 processors. Computations were also made on a subsonic/transonic vehicle using both 236 and 521 processors on a grid containing approximately 14.7 million grid points. Ongoing and future plans to implement a parallel overset grid capability and couple SACCARA with other mechanics codes in a massively parallel environment are discussed.

  14. Large-Scale Parallel Unstructured Mesh Computations for 3D High-Lift Analysis

    NASA Technical Reports Server (NTRS)

    Mavriplis, D. J.; Pirzadeh, S.

    1999-01-01

    A complete "geometry to drag-polar" analysis capability for three-dimensional high-lift configurations is described. The approach is based on the use of unstructured meshes in order to enable rapid turnaround for complicated geometries which arise in high-lift configurations. Special attention is devoted to creating a capability for enabling analyses on highly resolved grids. Unstructured meshes of several million vertices are initially generated on a work-station, and subsequently refined on a supercomputer. The flow is solved on these refined meshes on large parallel computers using an unstructured agglomeration multigrid algorithm. Good prediction of lift and drag throughout the range of incidences is demonstrated on a transport take-off configuration using up to 24.7 million grid points. The feasibility of using this approach in a production environment on existing parallel machines is demonstrated, as well as the scalability of the solver on machines using up to 1450 processors.

  15. Large-scale Parallel Unstructured Mesh Computations for 3D High-lift Analysis

    NASA Technical Reports Server (NTRS)

    Mavriplis, Dimitri J.; Pirzadeh, S.

    1999-01-01

    A complete "geometry to drag-polar" analysis capability for the three-dimensional high-lift configurations is described. The approach is based on the use of unstructured meshes in order to enable rapid turnaround for complicated geometries that arise in high-lift configurations. Special attention is devoted to creating a capability for enabling analyses on highly resolved grids. Unstructured meshes of several million vertices are initially generated on a work-station, and subsequently refined on a supercomputer. The flow is solved on these refined meshes on large parallel computers using an unstructured agglomeration multigrid algorithm. Good prediction of lift and drag throughout the range of incidences is demonstrated on a transport take-off configuration using up to 24.7 million grid points. The feasibility of using this approach in a production environment on existing parallel machines is demonstrated, as well as the scalability of the solver on machines using up to 1450 processors.

  16. Large-Scale Parallel Unstructured Mesh Computations for 3D High-Lift Analysis

    NASA Technical Reports Server (NTRS)

    Mavriplis, D. J.; Pirzadeh, S.

    1999-01-01

    A complete "geometry to drag-polar" analysis capability for three-dimensional high-lift configurations is described. The approach is based on the use of unstructured meshes in order to enable rapid turnaround for complicated geometries which arise in high-lift con gurations. Special attention is devoted to creating a capability for enabling analyses on highly resolved grids. Unstructured meshes of several million vertices are initially generated on a work-station, and subsequently refined on a supercomputer. The flow is solved on these refined meshes on large parallel computers using an unstructured agglomeration multigrid algorithm. Good prediction of lift and drag throughout the range of incidences is demonstrated on a transport take-off configuration using up to 24.7 million grid points. The feasibility of using this approach in a production environment on existing parallel machines is demonstrated, as well as the scalability of the solver on machines using up to 1450 processors.

  17. A parallel dynamic load balancing algorithm for 3-D adaptive unstructured grids

    NASA Technical Reports Server (NTRS)

    Vidwans, A.; Kallinderis, Y.; Venkatakrishnan, V.

    1993-01-01

    Adaptive local grid refinement and coarsening results in unequal distribution of workload among the processors of a parallel system. A novel method for balancing the load in cases of dynamically changing tetrahedral grids is developed. The approach employs local exchange of cells among processors in order to redistribute the load equally. An important part of the load balancing algorithm is the method employed by a processor to determine which cells within its subdomain are to be exchanged. Two such methods are presented and compared. The strategy for load balancing is based on the Divide-and-Conquer approach which leads to an efficient parallel algorithm. This method is implemented on a distributed-memory MIMD system.

  18. High-Performance Computation of Distributed-Memory Parallel 3D Voronoi and Delaunay Tessellation

    SciTech Connect

    Peterka, Tom; Morozov, Dmitriy; Phillips, Carolyn

    2014-11-14

    Computing a Voronoi or Delaunay tessellation from a set of points is a core part of the analysis of many simulated and measured datasets: N-body simulations, molecular dynamics codes, and LIDAR point clouds are just a few examples. Such computational geometry methods are common in data analysis and visualization; but as the scale of simulations and observations surpasses billions of particles, the existing serial and shared-memory algorithms no longer suffice. A distributed-memory scalable parallel algorithm is the only feasible approach. The primary contribution of this paper is a new parallel Delaunay and Voronoi tessellation algorithm that automatically determines which neighbor points need to be exchanged among the subdomains of a spatial decomposition. Other contributions include periodic and wall boundary conditions, comparison of our method using two popular serial libraries, and application to numerous science datasets.

  19. Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

    DOE PAGESBeta

    Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael

    2015-04-08

    The growth in size of networked high performance computers along with novel accelerator-based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub-optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on themore » performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter-task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm-based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. As a result, application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, therefore enabling the applications to achieve better time to solution and scalability on Titan during production.« less

  20. Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

    SciTech Connect

    Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael

    2015-04-08

    The growth in size of networked high performance computers along with novel accelerator-based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub-optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on the performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter-task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm-based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. As a result, application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, therefore enabling the applications to achieve better time to solution and scalability on Titan during production.

  1. Climate system modeling on massively parallel systems: LDRD Project 95-ERP-47 final report

    SciTech Connect

    Mirin, A.A.; Dannevik, W.P.; Chan, B.; Duffy, P.B.; Eltgroth, P.G.; Wehner, M.F.

    1996-12-01

    Global warming, acid rain, ozone depletion, and biodiversity loss are some of the major climate-related issues presently being addressed by climate and environmental scientists. Because unexpected changes in the climate could have significant effect on our economy, it is vitally important to improve the scientific basis for understanding and predicting the earth`s climate. The impracticality of modeling the earth experimentally in the laboratory together with the fact that the model equations are highly nonlinear has created a unique and vital role for computer-based climate experiments. However, today`s computer models, when run at desired spatial and temporal resolution and physical complexity, severely overtax the capabilities of our most powerful computers. Parallel processing offers significant potential for attaining increased performance and making tractable simulations that cannot be performed today. The principal goals of this project have been to develop and demonstrate the capability to perform large-scale climate simulations on high-performance computing systems (using methodology that scales to the systems of tomorrow), and to carry out leading-edge scientific calculations using parallelized models. The demonstration platform for these studies has been the 256-processor Cray-T3D located at Lawrence Livermore National Laboratory. Our plan was to undertake an ambitious program in optimization, proof-of-principle and scientific study. These goals have been met. We are now regularly using massively parallel processors for scientific study of the ocean and atmosphere, and preliminary parallel coupled ocean/atmosphere calculations are being carried out as well. Furthermore, our work suggests that it should be possible to develop an advanced comprehensive climate system model with performance scalable to the teraflops range. 9 refs., 3 figs.

  2. Task-parallel implementation of 3D shortest path raytracing for geophysical applications

    NASA Astrophysics Data System (ADS)

    Giroux, Bernard; Larouche, Benoît

    2013-04-01

    This paper discusses two variants of the shortest path method and their parallel implementation on a shared-memory system. One variant is designed to perform raytracing in models with stepwise distributions of interval velocity while the other is better suited for continuous velocity models. Both rely on a discretization scheme where primary nodes are located at the corners of cuboid cells and where secondary nodes are found on the edges and sides of the cells. The parallel implementations allow raytracing concurrently for different sources, providing an attractive framework for ray-based tomography. The accuracy and performance of the implementations were measured by comparison with the analytic solution for a layered model and for a vertical gradient model. Mean relative error less than 0.2% was obtained with 5 secondary nodes for the layered model and 9 secondary nodes for the gradient model. Parallel performance depends on the level of discretization refinement, on the number of threads, and on the problem size, with the most determinant variable being the level of discretization refinement (number of secondary nodes). The results indicate that a good trade-off between speed and accuracy is achieved with the number of secondary nodes equal to 5. The programs are written in C++ and rely on the Standard Template Library and OpenMP.

  3. A massively parallel algorithm for the collision probability calculations in the Apollo-II code using the PVM library

    SciTech Connect

    Stankovski, Z.

    1995-12-31

    The collision probability method in neutron transport, as applied to 2D geometries, consume a great amount of computer time, for a typical 2D assembly calculation about 90% of the computing time is consumed in the collision probability evaluations. Consequently RZ or 3D calculations became prohibitive. In this paper the author presents a simple but efficient parallel algorithm based on the message passing host/node programmation model. Parallelization was applied to the energy group treatment. Such approach permits parallelization of the existing code, requiring only limited modifications. Sequential/parallel computer portability is preserved, which is a necessary condition for a industrial code. Sequential performances are also preserved. The algorithm is implemented on a CRAY 90 coupled to a 128 processor T3D computer, a 16 processor IBM SPI and a network of workstations, using the Public Domain PVM library. The tests were executed for a 2D geometry with the standard 99-group library. All results were very satisfactory, the best ones with IBM SPI. Because of heterogeneity of the workstation network, the author did not ask high performances for this architecture. The same source code was used for all computers. A more impressive advantage of this algorithm will appear in the calculations of the SAPHYR project (with the future fine multigroup library of about 8000 groups) with a massively parallel computer, using several hundreds of processors.

  4. 3D parallel-detection microwave tomography for clinical breast imaging

    SciTech Connect

    Epstein, N. R.; Meaney, P. M.; Paulsen, K. D.

    2014-12-15

    A biomedical microwave tomography system with 3D-imaging capabilities has been constructed and translated to the clinic. Updates to the hardware and reconfiguration of the electronic-network layouts in a more compartmentalized construct have streamlined system packaging. Upgrades to the data acquisition and microwave components have increased data-acquisition speeds and improved system performance. By incorporating analog-to-digital boards that accommodate the linear amplification and dynamic-range coverage our system requires, a complete set of data (for a fixed array position at a single frequency) is now acquired in 5.8 s. Replacement of key components (e.g., switches and power dividers) by devices with improved operational bandwidths has enhanced system response over a wider frequency range. High-integrity, low-power signals are routinely measured down to −130 dBm for frequencies ranging from 500 to 2300 MHz. Adequate inter-channel isolation has been maintained, and a dynamic range >110 dB has been achieved for the full operating frequency range (500–2900 MHz). For our primary band of interest, the associated measurement deviations are less than 0.33% and 0.5° for signal amplitude and phase values, respectively. A modified monopole antenna array (composed of two interwoven eight-element sub-arrays), in conjunction with an updated motion-control system capable of independently moving the sub-arrays to various in-plane and cross-plane positions within the illumination chamber, has been configured in the new design for full volumetric data acquisition. Signal-to-noise ratios (SNRs) are more than adequate for all transmit/receive antenna pairs over the full frequency range and for the variety of in-plane and cross-plane configurations. For proximal receivers, in-plane SNRs greater than 80 dB are observed up to 2900 MHz, while cross-plane SNRs greater than 80 dB are seen for 6 cm sub-array spacing (for frequencies up to 1500 MHz). We demonstrate accurate

  5. 3D parallel-detection microwave tomography for clinical breast imaging

    NASA Astrophysics Data System (ADS)

    Epstein, N. R.; Meaney, P. M.; Paulsen, K. D.

    2014-12-01

    A biomedical microwave tomography system with 3D-imaging capabilities has been constructed and translated to the clinic. Updates to the hardware and reconfiguration of the electronic-network layouts in a more compartmentalized construct have streamlined system packaging. Upgrades to the data acquisition and microwave components have increased data-acquisition speeds and improved system performance. By incorporating analog-to-digital boards that accommodate the linear amplification and dynamic-range coverage our system requires, a complete set of data (for a fixed array position at a single frequency) is now acquired in 5.8 s. Replacement of key components (e.g., switches and power dividers) by devices with improved operational bandwidths has enhanced system response over a wider frequency range. High-integrity, low-power signals are routinely measured down to -130 dBm for frequencies ranging from 500 to 2300 MHz. Adequate inter-channel isolation has been maintained, and a dynamic range >110 dB has been achieved for the full operating frequency range (500-2900 MHz). For our primary band of interest, the associated measurement deviations are less than 0.33% and 0.5° for signal amplitude and phase values, respectively. A modified monopole antenna array (composed of two interwoven eight-element sub-arrays), in conjunction with an updated motion-control system capable of independently moving the sub-arrays to various in-plane and cross-plane positions within the illumination chamber, has been configured in the new design for full volumetric data acquisition. Signal-to-noise ratios (SNRs) are more than adequate for all transmit/receive antenna pairs over the full frequency range and for the variety of in-plane and cross-plane configurations. For proximal receivers, in-plane SNRs greater than 80 dB are observed up to 2900 MHz, while cross-plane SNRs greater than 80 dB are seen for 6 cm sub-array spacing (for frequencies up to 1500 MHz). We demonstrate accurate recovery

  6. 3D parallel-detection microwave tomography for clinical breast imaging.

    PubMed

    Epstein, N R; Meaney, P M; Paulsen, K D

    2014-12-01

    A biomedical microwave tomography system with 3D-imaging capabilities has been constructed and translated to the clinic. Updates to the hardware and reconfiguration of the electronic-network layouts in a more compartmentalized construct have streamlined system packaging. Upgrades to the data acquisition and microwave components have increased data-acquisition speeds and improved system performance. By incorporating analog-to-digital boards that accommodate the linear amplification and dynamic-range coverage our system requires, a complete set of data (for a fixed array position at a single frequency) is now acquired in 5.8 s. Replacement of key components (e.g., switches and power dividers) by devices with improved operational bandwidths has enhanced system response over a wider frequency range. High-integrity, low-power signals are routinely measured down to -130 dBm for frequencies ranging from 500 to 2300 MHz. Adequate inter-channel isolation has been maintained, and a dynamic range >110 dB has been achieved for the full operating frequency range (500-2900 MHz). For our primary band of interest, the associated measurement deviations are less than 0.33% and 0.5° for signal amplitude and phase values, respectively. A modified monopole antenna array (composed of two interwoven eight-element sub-arrays), in conjunction with an updated motion-control system capable of independently moving the sub-arrays to various in-plane and cross-plane positions within the illumination chamber, has been configured in the new design for full volumetric data acquisition. Signal-to-noise ratios (SNRs) are more than adequate for all transmit/receive antenna pairs over the full frequency range and for the variety of in-plane and cross-plane configurations. For proximal receivers, in-plane SNRs greater than 80 dB are observed up to 2900 MHz, while cross-plane SNRs greater than 80 dB are seen for 6 cm sub-array spacing (for frequencies up to 1500 MHz). We demonstrate accurate recovery

  7. 3D parallel-detection microwave tomography for clinical breast imaging

    PubMed Central

    Meaney, P. M.; Paulsen, K. D.

    2014-01-01

    A biomedical microwave tomography system with 3D-imaging capabilities has been constructed and translated to the clinic. Updates to the hardware and reconfiguration of the electronic-network layouts in a more compartmentalized construct have streamlined system packaging. Upgrades to the data acquisition and microwave components have increased data-acquisition speeds and improved system performance. By incorporating analog-to-digital boards that accommodate the linear amplification and dynamic-range coverage our system requires, a complete set of data (for a fixed array position at a single frequency) is now acquired in 5.8 s. Replacement of key components (e.g., switches and power dividers) by devices with improved operational bandwidths has enhanced system response over a wider frequency range. High-integrity, low-power signals are routinely measured down to −130 dBm for frequencies ranging from 500 to 2300 MHz. Adequate inter-channel isolation has been maintained, and a dynamic range >110 dB has been achieved for the full operating frequency range (500–2900 MHz). For our primary band of interest, the associated measurement deviations are less than 0.33% and 0.5° for signal amplitude and phase values, respectively. A modified monopole antenna array (composed of two interwoven eight-element sub-arrays), in conjunction with an updated motion-control system capable of independently moving the sub-arrays to various in-plane and cross-plane positions within the illumination chamber, has been configured in the new design for full volumetric data acquisition. Signal-to-noise ratios (SNRs) are more than adequate for all transmit/receive antenna pairs over the full frequency range and for the variety of in-plane and cross-plane configurations. For proximal receivers, in-plane SNRs greater than 80 dB are observed up to 2900 MHz, while cross-plane SNRs greater than 80 dB are seen for 6 cm sub-array spacing (for frequencies up to 1500 MHz). We demonstrate accurate

  8. Parallel phase-shifting digital holography and its application to high-speed 3D imaging of dynamic object

    NASA Astrophysics Data System (ADS)

    Awatsuji, Yasuhiro; Xia, Peng; Wang, Yexin; Matoba, Osamu

    2016-03-01

    Digital holography is a technique of 3D measurement of object. The technique uses an image sensor to record the interference fringe image containing the complex amplitude of object, and numerically reconstructs the complex amplitude by computer. Parallel phase-shifting digital holography is capable of accurate 3D measurement of dynamic object. This is because this technique can reconstruct the complex amplitude of object, on which the undesired images are not superimposed, form a single hologram. The undesired images are the non-diffraction wave and the conjugate image which are associated with holography. In parallel phase-shifting digital holography, a hologram, whose phase of the reference wave is spatially and periodically shifted every other pixel, is recorded to obtain complex amplitude of object by single-shot exposure. The recorded hologram is decomposed into multiple holograms required for phase-shifting digital holography. The complex amplitude of the object is free from the undesired images is reconstructed from the multiple holograms. To validate parallel phase-shifting digital holography, a high-speed parallel phase-shifting digital holography system was constructed. The system consists of a Mach-Zehnder interferometer, a continuous-wave laser, and a high-speed polarization imaging camera. Phase motion picture of dynamic air flow sprayed from a nozzle was recorded at 180,000 frames per second (FPS) have been recorded by the system. Also phase motion picture of dynamic air induced by discharge between two electrodes has been recorded at 1,000,000 FPS, when high voltage was applied between the electrodes.

  9. High-speed 3D imaging using two-wavelength parallel-phase-shift interferometry.

    PubMed

    Safrani, Avner; Abdulhalim, Ibrahim

    2015-10-15

    High-speed three dimensional imaging based on two-wavelength parallel-phase-shift interferometry is presented. The technique is demonstrated using a high-resolution polarization-based Linnik interferometer operating with three high-speed phase-masked CCD cameras and two quasi-monochromatic modulated light sources. The two light sources allow for phase unwrapping the single source wrapped phase so that relatively high step profiles having heights as large as 3.7 μm can be imaged in video rate with ±2  nm accuracy and repeatability. The technique is validated using a certified very large scale integration (VLSI) step standard followed by a demonstration from the semiconductor industry showing an integrated chip with 2.75 μm height copper micro pillars at different packing densities. PMID:26469586

  10. Parallel deconvolution of large 3D images obtained by confocal laser scanning microscopy.

    PubMed

    Pawliczek, Piotr; Romanowska-Pawliczek, Anna; Soltys, Zbigniew

    2010-03-01

    Various deconvolution algorithms are often used for restoration of digital images. Image deconvolution is especially needed for the correction of three-dimensional images obtained by confocal laser scanning microscopy. Such images suffer from distortions, particularly in the Z dimension. As a result, reliable automatic segmentation of these images may be difficult or even impossible. Effective deconvolution algorithms are memory-intensive and time-consuming. In this work, we propose a parallel version of the well-known Richardson-Lucy deconvolution algorithm developed for a system with distributed memory and implemented with the use of Message Passing Interface (MPI). It enables significantly more rapid deconvolution of two-dimensional and three-dimensional images by efficiently splitting the computation across multiple computers. The implementation of this algorithm can be used on professional clusters provided by computing centers as well as on simple networks of ordinary PC machines. PMID:19725070

  11. Parallel Implementation of an Adaptive Scheme for 3D Unstructured Grids on the SP2

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Biswas, Rupak; Strawn, Roger C.

    1996-01-01

    Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.OX speedup on 64 processors when 10% of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.

  12. Parallel implementation of an adaptive scheme for 3D unstructured grids on the SP2

    NASA Technical Reports Server (NTRS)

    Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak

    1996-01-01

    Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.0X speedup on 64 processors when 10 percent of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all the mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.

  13. Enhancements, Parallelization and Future Directions of the V3FIT 3-D Equilibrium Reconstruction Code

    NASA Astrophysics Data System (ADS)

    Cianciosa, M. R.; Hanson, J. D.; Maurer, D. A.; Hartwell, G. J.; Archmiller, M. C.; Ma, X.; Herfindal, J.

    2014-10-01

    Three-dimensional equilibrium reconstruction is spreading beyond its original application to stellarators. Three-dimensional effects in nominally axisymmetric systems, including quasi-helical states in reversed field pinches and error fields in tokamaks, are becoming increasingly important. V3FIT is a fully three dimensional equilibrium reconstruction code in widespread use throughout the fusion community. The code has recently undergone extensive revision to prepare for the next generation of equilibrium reconstruction problems. The most notable changes are the abstraction of the equilibrium model, the propagation of experimental errors to the reconstructed results, support for multicolor soft x-ray emissivity cameras, and recent efforts to add parallelization for efficient computation on multi-processor system. Work presented will contain discussions on these new capabilities. We will compare probability distributions of reconstructed parameters with results from whole shot reconstructions. We will show benchmarking and profiling results of initial performance improvements through the addition of OpenMP and MPI support. We will discuss future directions of the V3FIT code including steps taken for support of the W-7X stellarator. Work supported by US. Department of Energy Grant No. DEFG-0203-ER-54692B.

  14. A 3D MPI-Parallel GPU-accelerated framework for simulating ocean wave energy converters

    NASA Astrophysics Data System (ADS)

    Pathak, Ashish; Raessi, Mehdi

    2015-11-01

    We present an MPI-parallel GPU-accelerated computational framework for studying the interaction between ocean waves and wave energy converters (WECs). The computational framework captures the viscous effects, nonlinear fluid-structure interaction (FSI), and breaking of waves around the structure, which cannot be captured in many potential flow solvers commonly used for WEC simulations. The full Navier-Stokes equations are solved using the two-step projection method, which is accelerated by porting the pressure Poisson equation to GPUs. The FSI is captured using the numerically stable fictitious domain method. A novel three-phase interface reconstruction algorithm is used to resolve three phases in a VOF-PLIC context. A consistent mass and momentum transport approach enables simulations at high density ratios. The accuracy of the overall framework is demonstrated via an array of test cases. Numerical simulations of the interaction between ocean waves and WECs are presented. Funding from the National Science Foundation CBET-1236462 grant is gratefully acknowledged.

  15. A parallel overset-curvilinear-immersed boundary framework for simulating complex 3D incompressible flows

    PubMed Central

    Borazjani, Iman; Ge, Liang; Le, Trung; Sotiropoulos, Fotis

    2013-01-01

    We develop an overset-curvilinear immersed boundary (overset-CURVIB) method in a general non-inertial frame of reference to simulate a wide range of challenging biological flow problems. The method incorporates overset-curvilinear grids to efficiently handle multi-connected geometries and increase the resolution locally near immersed boundaries. Complex bodies undergoing arbitrarily large deformations may be embedded within the overset-curvilinear background grid and treated as sharp interfaces using the curvilinear immersed boundary (CURVIB) method (Ge and Sotiropoulos, Journal of Computational Physics, 2007). The incompressible flow equations are formulated in a general non-inertial frame of reference to enhance the overall versatility and efficiency of the numerical approach. Efficient search algorithms to identify areas requiring blanking, donor cells, and interpolation coefficients for constructing the boundary conditions at grid interfaces of the overset grid are developed and implemented using efficient parallel computing communication strategies to transfer information among sub-domains. The governing equations are discretized using a second-order accurate finite-volume approach and integrated in time via an efficient fractional-step method. Various strategies for ensuring globally conservative interpolation at grid interfaces suitable for incompressible flow fractional step methods are implemented and evaluated. The method is verified and validated against experimental data, and its capabilities are demonstrated by simulating the flow past multiple aquatic swimmers and the systolic flow in an anatomic left ventricle with a mechanical heart valve implanted in the aortic position. PMID:23833331

  16. Algebraic multigrid preconditioning within parallel finite-element solvers for 3-D electromagnetic modelling problems in geophysics

    NASA Astrophysics Data System (ADS)

    Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; Cela, José María

    2014-06-01

    We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element (FE) solvers for 3-D electromagnetic (EM) numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid (AMG) that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation (SSOR) and Gauss-Seidel, as smoothers and the wave front algorithm to create groups, which are used for a coarse-level generation. We have implemented and tested this new preconditioner within our parallel nodal FE solver for 3-D forward problems in EM induction geophysics. We have performed series of experiments for several models with different conductivity structures and characteristics to test the performance of our AMG preconditioning technique when combined with biconjugate gradient stabilized method. The results have shown that, the more challenging the problem is in terms of conductivity contrasts, ratio between the sizes of grid elements and/or frequency, the more benefit is obtained by using this preconditioner. Compared to other preconditioning schemes, such as diagonal, SSOR and truncated approximate inverse, the AMG preconditioner greatly improves the convergence of the iterative solver for all tested models. Also, when it comes to cases in which other preconditioners succeed to converge to a desired precision, AMG is able to considerably reduce the total execution time of the forward-problem code-up to an order of magnitude. Furthermore, the tests have confirmed that our AMG scheme ensures grid-independent rate of convergence, as well as improvement in convergence regardless of how big local mesh refinements are. In addition, AMG is designed to be a black-box preconditioner, which makes it easy to use and combine with different iterative methods. Finally, it has proved to be very practical and efficient in the

  17. Improved Algorithms and Methods for Solving Strongly Variable-Viscosity 3D Stokes flow and Strongly Variable Permeability 3D D’Arcy flow on a Parallel Computer

    NASA Astrophysics Data System (ADS)

    Morgan, J. P.; Hasenclever, J.; Shi, C.

    2009-12-01

    Computational studies of mantle convection face large challenges to obtain fast and accurate solutions for variable viscosity 3d flow. Recently we have been using parallel (MPI-based) MATLAB to more thoroughly explore possible pitfalls and algorithmic improvements to current ‘best-practice’ variable viscosity Stokes and D’Arcy flow solvers. Here we focus on study of finite-element solvers based on a decomposition of the equations for incompressible Stokes flow: Ku + Gp = f and G’u = 0 (K-velocity stiffness matrix, G-discretized gradient operator, G’=transpose(G)-discretized divergence operator) into a single equation for pressure Sp==G’K^-1Gp =G’K^-1f, in which the velocity is also updated as part of each pressure iteration. The outer pressure iteration is solved with preconditioned conjugate gradients (CG) (Maday and Patera, 1989), with a multigrid-preconditioned CG solver for the z=K^-1 (Gq) step of each pressure iteration. One fairly well-known pitfall (Fortin, 1985) is that constant-pressure elements can generate a spurious non-zero flow under a constant body force within non-rectangular geometries. We found a new pitfall when using an iterative method to solve the Kz=y operation in evaluating each G’K^-1Gq product -- even if the residual of the outer pressure equation converges to zero, the discrete divergence of this equation does not correspondingly converge; the error in the incompressibility depends on roughly the square of the tolerance used to solve each Kz=y velocity-like subproblem. Our current best recipe is: (1) Use flexible CG (cf. Notay, 2001) to solve the outer pressure problem. This is analogous to GMRES for a symmetric positive definite problem. It allows use of numerically unsymmetric and/or inexact preconditioners with CG. (2) In this outer-iteration, use an ‘alpha-bar’ technique to find the appropriate magnitude alpha to change the solution in each search direction. This improvement allows a similar iterative tolerance of

  18. SWAMP+: multiple subsequence alignment using associative massive parallelism

    SciTech Connect

    Steinfadt, Shannon Irene; Baker, Johnnie W

    2010-10-18

    A new parallel algorithm SWAMP+ incorporates the Smith-Waterman sequence alignment on an associative parallel model known as ASC. It is a highly sensitive parallel approach that expands traditional pairwise sequence alignment. This is the first parallel algorithm to provide multiple non-overlapping, non-intersecting subsequence alignments with the accuracy of Smith-Waterman. The efficient algorithm provides multiple alignments similar to BLAST while creating a better workflow for the end users. The parallel portions of the code run in O(m+n) time using m processors. When m = n, the algorithmic analysis becomes O(n) with a coefficient of two, yielding a linear speedup. Implementation of the algorithm on the SIMD ClearSpeed CSX620 confirms this theoretical linear speedup with real timings.

  19. High-performance parallel solver for 3D time-dependent Schrodinger equation for large-scale nanosystems

    NASA Astrophysics Data System (ADS)

    Gainullin, I. K.; Sonkin, M. A.

    2015-03-01

    A parallelized three-dimensional (3D) time-dependent Schrodinger equation (TDSE) solver for one-electron systems is presented in this paper. The TDSE Solver is based on the finite-difference method (FDM) in Cartesian coordinates and uses a simple and explicit leap-frog numerical scheme. The simplicity of the numerical method provides very efficient parallelization and high performance of calculations using Graphics Processing Units (GPUs). For example, calculation of 106 time-steps on the 1000ṡ1000ṡ1000 numerical grid (109 points) takes only 16 hours on 16 Tesla M2090 GPUs. The TDSE Solver demonstrates scalability (parallel efficiency) close to 100% with some limitations on the problem size. The TDSE Solver is validated by calculation of energy eigenstates of the hydrogen atom (13.55 eV) and affinity level of H- ion (0.75 eV). The comparison with other TDSE solvers shows that a GPU-based TDSE Solver is 3 times faster for the problems of the same size and with the same cost of computational resources. The usage of a non-regular Cartesian grid or problem-specific non-Cartesian coordinates increases this benefit up to 10 times. The TDSE Solver was applied to the calculation of the resonant charge transfer (RCT) in nanosystems, including several related physical problems, such as electron capture during H+-H0 collision and electron tunneling between H- ion and thin metallic island film.

  20. Generating synthetic 3D density fluctuation data to verify two-point measurement of parallel correlation length

    NASA Astrophysics Data System (ADS)

    Kim, Jaewook; Ghim, Young-Chul; Nuclear Fusion and Plasma Lab Team

    2014-10-01

    A BES (beam emission spectroscopy) system and an MIR (Microwave Imaging Reflectometer) system installed in KSTAR measure 2D (radial and poloidal) density fluctuations at two different toroidal locations. This gives a possibility of measuring the parallel correlation length of ion-scale turbulence in KSTAR. Due to lack of measurement points in toroidal direction and shorter separation distance between the diagnostics compared to an expected parallel correlation length, it is necessary to confirm whether a conventional statistical method, i.e., using a cross-correlation function, is valid for measuring the parallel correlation length. For this reason, we generated synthetic 3D density fluctuation data following Gaussian random field in a toroidal coordinate system that mimic real density fluctuation data. We measure the correlation length of the synthetic data by fitting a Gaussian function to the cross-correlation function. We observe that there is disagreement between the measured and actual correlation lengths, and the degree of disagreement is a function of at least, correlation length, correlation time and advection velocity of synthetic data. We identify the cause of disagreement and propose an appropriate method to measure correct correlation length.

  1. Efficient parallel seismic simulations including topography and 3-D material heterogeneities on locally refined composite grids

    NASA Astrophysics Data System (ADS)

    Petersson, Anders; Rodgers, Arthur

    2010-05-01

    conserving, coupling procedure for the elastic wave equation at grid refinement interfaces. When used together with our single grid finite difference scheme, it results in a method which is provably stable, without artificial dissipation, for arbitrary heterogeneous isotropic elastic materials. The new coupling procedure is based on satisfying the summation-by-parts principle across refinement interfaces. From a practical standpoint, an important advantage of the proposed method is the absence of tunable numerical parameters, which seldom are appreciated by application experts. In WPP, the composite grid discretization is combined with a curvilinear grid approach that enables accurate modeling of free surfaces on realistic (non-planar) topography. The overall method satisfies the summation-by-parts principle and is stable under a CFL time step restriction. A feature of great practical importance is that WPP automatically generates the composite grid based on the user provided topography and the depths of the grid refinement interfaces. The WPP code has been verified extensively, for example using the method of manufactured solutions, by solving Lamb's problem, by solving various layer over half- space problems and comparing to semi-analytic (FK) results, and by simulating scenario earthquakes where results from other seismic simulation codes are available. WPP has also been validated against seismographic recordings of moderate earthquakes. WPP performs well on large parallel computers and has been run on up to 32,768 processors using about 26 Billion grid points (78 Billion DOF) and 41,000 time steps. WPP is an open source code that is available under the Gnu general public license.

  2. Development of a 3D parallel mechanism robot arm with three vertical-axial pneumatic actuators combined with a stereo vision system.

    PubMed

    Chiang, Mao-Hsiung; Lin, Hao-Ting

    2011-01-01

    This study aimed to develop a novel 3D parallel mechanism robot driven by three vertical-axial pneumatic actuators with a stereo vision system for path tracking control. The mechanical system and the control system are the primary novel parts for developing a 3D parallel mechanism robot. In the mechanical system, a 3D parallel mechanism robot contains three serial chains, a fixed base, a movable platform and a pneumatic servo system. The parallel mechanism are designed and analyzed first for realizing a 3D motion in the X-Y-Z coordinate system of the robot's end-effector. The inverse kinematics and the forward kinematics of the parallel mechanism robot are investigated by using the Denavit-Hartenberg notation (D-H notation) coordinate system. The pneumatic actuators in the three vertical motion axes are modeled. In the control system, the Fourier series-based adaptive sliding-mode controller with H(∞) tracking performance is used to design the path tracking controllers of the three vertical servo pneumatic actuators for realizing 3D path tracking control of the end-effector. Three optical linear scales are used to measure the position of the three pneumatic actuators. The 3D position of the end-effector is then calculated from the measuring position of the three pneumatic actuators by means of the kinematics. However, the calculated 3D position of the end-effector cannot consider the manufacturing and assembly tolerance of the joints and the parallel mechanism so that errors between the actual position and the calculated 3D position of the end-effector exist. In order to improve this situation, sensor collaboration is developed in this paper. A stereo vision system is used to collaborate with the three position sensors of the pneumatic actuators. The stereo vision system combining two CCD serves to measure the actual 3D position of the end-effector and calibrate the error between the actual and the calculated 3D position of the end-effector. Furthermore, to

  3. Radiation hydrodynamics using characteristics on adaptive decomposed domains for massively parallel star formation simulations

    NASA Astrophysics Data System (ADS)

    Buntemeyer, Lars; Banerjee, Robi; Peters, Thomas; Klassen, Mikhail; Pudritz, Ralph E.

    2016-02-01

    We present an algorithm for solving the radiative transfer problem on massively parallel computers using adaptive mesh refinement and domain decomposition. The solver is based on the method of characteristics which requires an adaptive raytracer that integrates the equation of radiative transfer. The radiation field is split into local and global components which are handled separately to overcome the non-locality problem. The solver is implemented in the framework of the magneto-hydrodynamics code FLASH and is coupled by an operator splitting step. The goal is the study of radiation in the context of star formation simulations with a focus on early disc formation and evolution. This requires a proper treatment of radiation physics that covers both the optically thin as well as the optically thick regimes and the transition region in particular. We successfully show the accuracy and feasibility of our method in a series of standard radiative transfer problems and two 3D collapse simulations resembling the early stages of protostar and disc formation.

  4. ALEGRA -- A massively parallel h-adaptive code for solid dynamics

    SciTech Connect

    Summers, R.M.; Wong, M.K.; Boucheron, E.A.; Weatherby, J.R.

    1997-12-31

    ALEGRA is a multi-material, arbitrary-Lagrangian-Eulerian (ALE) code for solid dynamics designed to run on massively parallel (MP) computers. It combines the features of modern Eulerian shock codes, such as CTH, with modern Lagrangian structural analysis codes using an unstructured grid. ALEGRA is being developed for use on the teraflop supercomputers to conduct advanced three-dimensional (3D) simulations of shock phenomena important to a variety of systems. ALEGRA was designed with the Single Program Multiple Data (SPMD) paradigm, in which the mesh is decomposed into sub-meshes so that each processor gets a single sub-mesh with approximately the same number of elements. Using this approach the authors have been able to produce a single code that can scale from one processor to thousands of processors. A current major effort is to develop efficient, high precision simulation capabilities for ALEGRA, without the computational cost of using a global highly resolved mesh, through flexible, robust h-adaptivity of finite elements. H-adaptivity is the dynamic refinement of the mesh by subdividing elements, thus changing the characteristic element size and reducing numerical error. The authors are working on several major technical challenges that must be met to make effective use of HAMMER on MP computers.

  5. QCD on the Massively Parallel Computer AP1000

    NASA Astrophysics Data System (ADS)

    Akemi, K.; Fujisaki, M.; Okuda, M.; Tago, Y.; Hashimoto, T.; Hioki, S.; Miyamura, O.; Takaishi, T.; Nakamura, A.; de Forcrand, Ph.; Hege, C.; Stamatescu, I. O.

    We present the QCD-TARO program of calculations which uses the parallel computer AP1000 of Fujitsu. We discuss the results on scaling, correlation times and hadronic spectrum, some aspects of the implementation and the future prospects.

  6. A parallelized surface extraction algorithm for large binary image data sets based on an adaptive 3D delaunay subdivision strategy.

    PubMed

    Ma, Yingliang; Saetzler, Kurt

    2008-01-01

    In this paper we describe a novel 3D subdivision strategy to extract the surface of binary image data. This iterative approach generates a series of surface meshes that capture different levels of detail of the underlying structure. At the highest level of detail, the resulting surface mesh generated by our approach uses only about 10% of the triangles in comparison to the marching cube algorithm (MC) even in settings were almost no image noise is present. Our approach also eliminates the so-called "staircase effect" which voxel based algorithms like the MC are likely to show, particularly if non-uniformly sampled images are processed. Finally, we show how the presented algorithm can be parallelized by subdividing 3D image space into rectilinear blocks of subimages. As the algorithm scales very well with an increasing number of processors in a multi-threaded setting, this approach is suited to process large image data sets of several gigabytes. Although the presented work is still computationally more expensive than simple voxel-based algorithms, it produces fewer surface triangles while capturing the same level of detail, is more robust towards image noise and eliminates the above-mentioned "staircase" effect in anisotropic settings. These properties make it particularly useful for biomedical applications, where these conditions are often encountered. PMID:17993710

  7. A development plan for a massively parallel version of the hydrocode CTH

    SciTech Connect

    Robinson, A.C.; Fang, E.; Holdridge, D.; McGlaun, J.M.

    1990-07-01

    Massively parallel computers and computer networks are beginning to appear as an integral part of the scientific computing workplace. This report documents the goals and the corresponding development plan of the massively parallel project of Departments 1530 and 1420. The main goal of the project is to provide a clear understanding of the issues and difficulties involved in bringing the current production hydrocode CTH to the state of being portable to a number of currently available parallel computing architectures. In the process of this research, various working versions of the code will be produced. 6 refs., 6 figs.

  8. A Massively Parallel Adaptive Fast Multipole Method on Heterogeneous Architectures

    SciTech Connect

    Lashuk, Ilya; Chandramowlishwaran, Aparna; Langston, Harper; Nguyen, Tuan-Anh; Sampath, Rahul S; Shringarpure, Aashay; Vuduc, Richard; Ying, Lexing; Zorin, Denis; Biros, George

    2012-01-01

    We describe a parallel fast multipole method (FMM) for highly nonuniform distributions of particles. We employ both distributed memory parallelism (via MPI) and shared memory parallelism (via OpenMP and GPU acceleration) to rapidly evaluate two-body nonoscillatory potentials in three dimensions on heterogeneous high performance computing architectures. We have performed scalability tests with up to 30 billion particles on 196,608 cores on the AMD/CRAY-based Jaguar system at ORNL. On a GPU-enabled system (NSF's Keeneland at Georgia Tech/ORNL), we observed 30x speedup over a single core CPU and 7x speedup over a multicore CPU implementation. By combining GPUs with MPI, we achieve less than 10 ns/particle and six digits of accuracy for a run with 48 million nonuniformly distributed particles on 192 GPUs.

  9. Massively parallel switch-level simulation: A feasibility study

    SciTech Connect

    Kravitz, S.A.

    1989-01-01

    This thesis addresses the feasibility of mapping the COSMOS switch-level simulator onto computers with thousands of simple processors. COSMOS Preprocesses transistor networks into equivalent Boolean behavioral models, capturing the switch-level behavior of a circuit in a set of Boolean formulas. The author shows that thousand-fold parallelism exists in the formulas derived by COSMOS for some actual circuits. He exposes this parallelism by eliminating the event list from the simulator, and he demonstrates that this represents an attractive tradeoff given sufficient parallelism in the circuit model. To investigate the feasibility of this approach, he has developed a prototype implementation of the COSMOS simulator on a 32k processor Connection Machine.

  10. Massively parallel determination and modeling of endonuclease substrate specificity

    PubMed Central

    Thyme, Summer B.; Song, Yifan; Brunette, T. J.; Szeto, Mindy D.; Kusak, Lara; Bradley, Philip; Baker, David

    2014-01-01

    We describe the identification and characterization of novel homing endonucleases using genome database mining to identify putative target sites, followed by high throughput activity screening in a bacterial selection system. We characterized the substrate specificity and kinetics of these endonucleases by monitoring DNA cleavage events with deep sequencing. The endonuclease specificities revealed by these experiments can be partially recapitulated using 3D structure-based computational models. Analysis of these models together with genome sequence data provide insights into how alternative endonuclease specificities were generated during natural evolution. PMID:25389263

  11. High density packaging and interconnect of massively parallel image processors

    NASA Technical Reports Server (NTRS)

    Carson, John C.; Indin, Ronald J.

    1991-01-01

    This paper presents conceptual designs for high density packaging of parallel processing systems. The systems fall into two categories: global memory systems where many processors are packaged into a stack, and distributed memory systems where a single processor and many memory chips are packaged into a stack. Thermal behavior and performance are discussed.

  12. Molecular simulation of rheological properties using massively parallel supercomputers

    SciTech Connect

    Bhupathiraju, R.K.; Cui, S.T.; Gupta, S.A.; Cummings, P.T.; Cochran, H.D.

    1996-11-01

    Advances in parallel supercomputing now make possible molecular-based engineering and science calculations that will soon revolutionize many technologies, such as those involving polymers and those involving aqueous electrolytes. We have developed a suite of message-passing codes for classical molecular simulation of such complex fluids and amorphous materials and have completed a number of demonstration calculations of problems of scientific and technological importance with each. In this paper, we will focus on the molecular simulation of rheological properties, particularly viscosity, of simple and complex fluids using parallel implementations of non-equilibrium molecular dynamics. Such calculations represent significant challenges computationally because, in order to reduce the thermal noise in the calculated properties within acceptable limits, large systems and/or long simulated times are required.

  13. Casting Pearls Ballistically: Efficient Massively Parallel Simulation of Particle Deposition

    NASA Astrophysics Data System (ADS)

    Lubachevsky, Boris D.; Privman, Vladimir; Roy, Subhas C.

    1996-06-01

    We simulate ballistic particle deposition wherein a large number of spherical particles are "cast" vertically over a planar horizontal surface. Upon first contact (with the surface or with a previously deposited particle) each particle stops. This model helps material scientists to study the adsorption and sediment formation. The model is sequential, with particles deposited one by one. We have found an equivalent formulation using a continuous time random process and we simulate the latter in parallel using a method similar to the one previously employed for simulating Ising spins. We augment the parallel algorithm for simulating Ising spins with several techniques aimed at the increase of efficiency of producing the particle configuration and statistics collection. Some of these techniques are similar to earlier ones. We implement the resulting algorithm on a 16K PE MasPar MP-1 and a 4K PE MasPar MP-2. The parallel code runs on MasPar computers nearly two orders of magnitude faster than an optimized sequential code runs on a fast workstation.

  14. Casting pearls ballistically: Efficient massively parallel simulation of particle deposition

    SciTech Connect

    Lubachevsky, B.D.; Privman, V.; Roy, S.C.

    1996-06-01

    We simulate ballistic particle deposition wherein a large number of spherical particles are {open_quotes}cast{close_quotes} vertically over a planar horizontal surface. Upon first contact (with the surface or with a previously deposited particle) each particle stops. This model helps material scientists to study the adsorption and sediment formation. The model is sequential, with particles deposited one by one. We have found an equivalent formulation using a continuous time random process and we simulate the latter in parallel using a method similar to the one previously employed for simulating Ising spins. We augment the parallel algorithm for simulating Ising spins with several techniques aimed at the increase of efficiency of producing the particle configuration and statistics collection. Some of these techniques are similar to earlier ones. We implement the resulting algorithm on a 16K PE MasPar MP-1 and a 4K PE MasPar MP-2. The parallel code runs on MasPar computers nearly two orders of magnitude faster than an optimized sequential code runs on a fast workstation. 17 refs., 9 figs.

  15. Performance effects of irregular communications patterns on massively parallel multiprocessors

    NASA Technical Reports Server (NTRS)

    Saltz, Joel; Petiton, Serge; Berryman, Harry; Rifkin, Adam

    1991-01-01

    A detailed study of the performance effects of irregular communications patterns on the CM-2 was conducted. The communications capabilities of the CM-2 were characterized under a variety of controlled conditions. In the process of carrying out the performance evaluation, extensive use was made of a parameterized synthetic mesh. In addition, timings with unstructured meshes generated for aerodynamic codes and a set of sparse matrices with banded patterns on non-zeroes were performed. This benchmarking suite stresses the communications capabilities of the CM-2 in a range of different ways. Benchmark results demonstrate that it is possible to make effective use of much of the massive concurrency available in the communications network.

  16. Massively parallel spatial light modulation-based optical signal processing

    NASA Astrophysics Data System (ADS)

    Li, Yao

    1993-03-01

    A new optical parallel arithmetic processing scheme using a nonholographic optoelectronic content-addressable memory (CAM) was proposed. The design of a four-bit CAM-based optical carry look-ahead adder was studied. Compared with existing optoelectronic binary addition approaches, this nonholographic CAM Scheme offers a number of practical advantages, such as faster processing speed and ease of optical implementation and alignment. For an addition of numbers longer than four bits, by incorporating the previous stage's carry, a number of four-bit CLA's can be cascaded. Experimental results were also demonstrated. One paper to the Optics Letters was published.

  17. A sweep algorithm for massively parallel simulation of circuit-switched networks

    NASA Technical Reports Server (NTRS)

    Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.

    1992-01-01

    A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.

  18. Performance of the Wavelet Decomposition on Massively Parallel Architectures

    NASA Technical Reports Server (NTRS)

    El-Ghazawi, Tarek A.; LeMoigne, Jacqueline; Zukor, Dorothy (Technical Monitor)

    2001-01-01

    Traditionally, Fourier Transforms have been utilized for performing signal analysis and representation. But although it is straightforward to reconstruct a signal from its Fourier transform, no local description of the signal is included in its Fourier representation. To alleviate this problem, Windowed Fourier transforms and then wavelet transforms have been introduced, and it has been proven that wavelets give a better localization than traditional Fourier transforms, as well as a better division of the time- or space-frequency plane than Windowed Fourier transforms. Because of these properties and after the development of several fast algorithms for computing the wavelet representation of any signal, in particular the Multi-Resolution Analysis (MRA) developed by Mallat, wavelet transforms have increasingly been applied to signal analysis problems, especially real-life problems, in which speed is critical. In this paper we present and compare efficient wavelet decomposition algorithms on different parallel architectures. We report and analyze experimental measurements, using NASA remotely sensed images. Results show that our algorithms achieve significant performance gains on current high performance parallel systems, and meet scientific applications and multimedia requirements. The extensive performance measurements collected over a number of high-performance computer systems have revealed important architectural characteristics of these systems, in relation to the processing demands of the wavelet decomposition of digital images.

  19. Scientific development of a massively parallel ocean climate model. Final report

    SciTech Connect

    Semtner, A.J.; Chervin, R.M.

    1996-09-01

    Over the last three years, very significant advances have been made in refining the grid resolution of ocean models and in improving the physical and numerical treatments of ocean hydrodynamics. Some of these advances have occurred as a result of the successful transition of ocean models onto massively parallel computers, which has been led by Los Alamos investigators. Major progress has been made in simulating global ocean circulation and in understanding various ocean climatic aspects such as the effect of wind driving on heat and freshwater transports. These steps have demonstrated the capability to conduct realistic decadal to century ocean integrations at high resolution on massively parallel computers.

  20. Signal processing applications of massively parallel charge domain computing devices

    NASA Technical Reports Server (NTRS)

    Fijany, Amir (Inventor); Barhen, Jacob (Inventor); Toomarian, Nikzad (Inventor)

    1999-01-01

    The present invention is embodied in a charge coupled device (CCD)/charge injection device (CID) architecture capable of performing a Fourier transform by simultaneous matrix vector multiplication (MVM) operations in respective plural CCD/CID arrays in parallel in O(1) steps. For example, in one embodiment, a first CCD/CID array stores charge packets representing a first matrix operator based upon permutations of a Hartley transform and computes the Fourier transform of an incoming vector. A second CCD/CID array stores charge packets representing a second matrix operator based upon different permutations of a Hartley transform and computes the Fourier transform of an incoming vector. The incoming vector is applied to the inputs of the two CCD/CID arrays simultaneously, and the real and imaginary parts of the Fourier transform are produced simultaneously in the time required to perform a single MVM operation in a CCD/CID array.

  1. Factorization of large integers on a massively parallel computer

    SciTech Connect

    Davis, J.A.; Holdridge, D.B.

    1988-01-01

    Our interest in integer factorization at Sandia National Laboratories is motivated by cryptographic applications and in particular the security of the RSA encryption-decryption algorithm. We have implemented our version of the quadratic sieve procedure on the NCUBE computer with 1024 processors (nodes). The new code is significantly different in all important aspects from the program used to factor number of order 10/sup 70/ on a single processor CRAY computer. Capabilities of parallel processing and limitation of small local memory necessitated this entirely new implementation. This effort involved several restarts as realizations of program structures that seemed appealing bogged down due to inter-processor communications. We are presently working with integers of magnitude about 10/sup 70/ in tuning this code to the novel hardware. 6 refs., 3 figs.

  2. Massively parallel algorithms for trace-driven cache simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Greenberg, Albert G.; Lubachevsky, Boris D.

    1991-01-01

    Trace driven cache simulation is central to computer design. A trace is a very long sequence of reference lines from main memory. At the t(exp th) instant, reference x sub t is hashed into a set of cache locations, the contents of which are then compared with x sub t. If at the t sup th instant x sub t is not present in the cache, then it is said to be a miss, and is loaded into the cache set, possibly forcing the replacement of some other memory line, and making x sub t present for the (t+1) sup st instant. The problem of parallel simulation of a subtrace of N references directed to a C line cache set is considered, with the aim of determining which references are misses and related statistics. A simulation method is presented for the Least Recently Used (LRU) policy, which regradless of the set size C runs in time O(log N) using N processors on the exclusive read, exclusive write (EREW) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. Timings are presented of the second algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference based line replacement policies are considered, which includes LRU as well as the Least Frequently Used and Random replacement policies. A simulation method is presented for any such policy that on any trace of length N directed to a C line set runs in the O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.

  3. MASSIVELY PARALLEL LATENT SEMANTIC ANALYSES USING A GRAPHICS PROCESSING UNIT

    SciTech Connect

    Cavanagh, J.; Cui, S.

    2009-01-01

    Latent Semantic Analysis (LSA) aims to reduce the dimensions of large term-document datasets using Singular Value Decomposition. However, with the ever-expanding size of datasets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. A graphics processing unit (GPU) can solve some highly parallel problems much faster than a traditional sequential processor or central processing unit (CPU). Thus, a deployable system using a GPU to speed up large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a PC cluster. Due to the GPU’s application-specifi c architecture, harnessing the GPU’s computational prowess for LSA is a great challenge. We presented a parallel LSA implementation on the GPU, using NVIDIA® Compute Unifi ed Device Architecture and Compute Unifi ed Basic Linear Algebra Subprograms software. The performance of this implementation is compared to traditional LSA implementation on a CPU using an optimized Basic Linear Algebra Subprograms library. After implementation, we discovered that the GPU version of the algorithm was twice as fast for large matrices (1 000x1 000 and above) that had dimensions not divisible by 16. For large matrices that did have dimensions divisible by 16, the GPU algorithm ran fi ve to six times faster than the CPU version. The large variation is due to architectural benefi ts of the GPU for matrices divisible by 16. It should be noted that the overall speeds for the CPU version did not vary from relative normal when the matrix dimensions were divisible by 16. Further research is needed in order to produce a fully implementable version of LSA. With that in mind, the research we presented shows that the GPU is a viable option for increasing the speed of LSA, in terms of cost/performance ratio.

  4. Massively parallel computation of RCS with finite elements

    NASA Technical Reports Server (NTRS)

    Parker, Jay

    1993-01-01

    One of the promising combinations of finite element approaches for scattering problems uses Whitney edge elements, spherical vector wave-absorbing boundary conditions, and bi-conjugate gradient solution for the frequency-domain near field. Each of these approaches may be criticized. Low-order elements require high mesh density, but also result in fast, reliable iterative convergence. Spherical wave-absorbing boundary conditions require additional space to be meshed beyond the most minimal near-space region, but result in fully sparse, symmetric matrices which keep storage and solution times low. Iterative solution is somewhat unpredictable and unfriendly to multiple right-hand sides, yet we find it to be uniformly fast on large problems to date, given the other two approaches. Implementation of these approaches on a distributed memory, message passing machine yields huge dividends, as full scalability to the largest machines appears assured and iterative solution times are well-behaved for large problems. We present times and solutions for computed RCS for a conducting cube and composite permeability/conducting sphere on the Intel ipsc860 with up to 16 processors solving over 200,000 unknowns. We estimate problems of approximately 10 million unknowns, encompassing 1000 cubic wavelengths, may be attempted on a currently available 512 processor machine, but would be exceedingly tedious to prepare. The most severe bottlenecks are due to the slow rate of mesh generation on non-parallel machines and the large transfer time from such a machine to the parallel processor. One solution, in progress, is to create and then distribute a coarse mesh among the processors, followed by systematic refinement within each processor. Elimination of redundant node definitions at the mesh-partition surfaces, snap-to-surface post processing of the resulting mesh for good modelling of curved surfaces, and load-balancing redistribution of new elements after the refinement are auxiliary

  5. A massively parallel computational approach to coupled thermoelastic/porous gas flow problems

    NASA Technical Reports Server (NTRS)

    Shia, David; Mcmanus, Hugh L.

    1995-01-01

    A new computational scheme for coupled thermoelastic/porous gas flow problems is presented. Heat transfer, gas flow, and dynamic thermoelastic governing equations are expressed in fully explicit form, and solved on a massively parallel computer. The transpiration cooling problem is used as an example problem. The numerical solutions have been verified by comparison to available analytical solutions. Transient temperature, pressure, and stress distributions have been obtained. Small spatial oscillations in pressure and stress have been observed, which would be impractical to predict with previously available schemes. Comparisons between serial and massively parallel versions of the scheme have also been made. The results indicate that for small scale problems the serial and parallel versions use practically the same amount of CPU time. However, as the problem size increases the parallel version becomes more efficient than the serial version.

  6. A massively parallel adaptive scheme for melt migration in geodynamics computations

    NASA Astrophysics Data System (ADS)

    Dannberg, Juliane; Heister, Timo; Grove, Ryan

    2016-04-01

    Melt generation and migration are important processes for the evolution of the Earth's interior and impact the global convection of the mantle. While they have been the subject of numerous investigations, the typical time and length-scales of melt transport are vastly different from global mantle convection, which determines where melt is generated. This makes it difficult to study mantle convection and melt migration in a unified framework. In addition, modelling magma dynamics poses the challenge of highly non-linear and spatially variable material properties, in particular the viscosity. We describe our extension of the community mantle convection code ASPECT that adds equations describing the behaviour of silicate melt percolating through and interacting with a viscously deforming host rock. We use the original compressible formulation of the McKenzie equations, augmented by an equation for the conservation of energy. This approach includes both melt migration and melt generation with the accompanying latent heat effects, and it incorporates the individual compressibilities of the solid and the fluid phase. For this, we derive an accurate and stable Finite Element scheme that can be combined with adaptive mesh refinement. This is particularly advantageous for this type of problem, as the resolution can be increased in mesh cells where melt is present and viscosity gradients are high, whereas a lower resolution is sufficient in regions without melt. Together with a high-performance, massively parallel implementation, this allows for high resolution, 3d, compressible, global mantle convection simulations coupled with melt migration. Furthermore, scalable iterative linear solvers are required to solve the large linear systems arising from the discretized system. Finally, we present benchmarks and scaling tests of our solver up to tens of thousands of cores, show the effectiveness of adaptive mesh refinement when applied to melt migration and compare the

  7. Solution of large linear systems of equations on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Ida, Nathan; Udawatta, Kapila

    1987-01-01

    The Massively Parallel Processor (MPP) was designed as a special machine for specific applications in image processing. As a parallel machine, with a large number of processors that can be reconfigured in different combinations it is also applicable to other problems that require a large number of processors. The solution of linear systems of equations on the MPP is investigated. The solution times achieved are compared to those obtained with a serial machine and the performance of the MPP is discussed.

  8. Process Simulation of Complex Biological Pathways in Physical Reactive Space and Reformulated for Massively Parallel Computing Platforms.

    PubMed

    Ganesan, Narayan; Li, Jie; Sharma, Vishakha; Jiang, Hanyu; Compagnoni, Adriana

    2016-01-01

    Biological systems encompass complexity that far surpasses many artificial systems. Modeling and simulation of large and complex biochemical pathways is a computationally intensive challenge. Traditional tools, such as ordinary differential equations, partial differential equations, stochastic master equations, and Gillespie type methods, are all limited either by their modeling fidelity or computational efficiency or both. In this work, we present a scalable computational framework based on modeling biochemical reactions in explicit 3D space, that is suitable for studying the behavior of large and complex biological pathways. The framework is designed to exploit parallelism and scalability offered by commodity massively parallel processors such as the graphics processing units (GPUs) and other parallel computing platforms. The reaction modeling in 3D space is aimed at enhancing the realism of the model compared to traditional modeling tools and framework. We introduce the Parallel Select algorithm that is key to breaking the sequential bottleneck limiting the performance of most other tools designed to study biochemical interactions. The algorithm is designed to be computationally tractable, handle hundreds of interacting chemical species and millions of independent agents by considering all-particle interactions within the system. We also present an implementation of the framework on the popular graphics processing units and apply it to the simulation study of JAK-STAT Signal Transduction Pathway. The computational framework will offer a deeper insight into various biological processes within the cell and help us observe key events as they unfold in space and time. This will advance the current state-of-the-art in simulation study of large scale biological systems and also enable the realistic simulation study of macro-biological cultures, where inter-cellular interactions are prevalent. PMID:27045833

  9. Massively parallel algorithms for real-time wavefront control of a dense adaptive optics system

    SciTech Connect

    Fijany, A.; Milman, M.; Redding, D.

    1994-12-31

    In this paper massively parallel algorithms and architectures for real-time wavefront control of a dense adaptive optic system (SELENE) are presented. The authors have already shown that the computation of a near optimal control algorithm for SELENE can be reduced to the solution of a discrete Poisson equation on a regular domain. Although, this represents an optimal computation, due the large size of the system and the high sampling rate requirement, the implementation of this control algorithm poses a computationally challenging problem since it demands a sustained computational throughput of the order of 10 GFlops. They develop a novel algorithm, designated as Fast Invariant Imbedding algorithm, which offers a massive degree of parallelism with simple communication and synchronization requirements. Due to these features, this algorithm is significantly more efficient than other Fast Poisson Solvers for implementation on massively parallel architectures. The authors also discuss two massively parallel, algorithmically specialized, architectures for low-cost and optimal implementation of the Fast Invariant Imbedding algorithm.

  10. Robust Adaptive 3-D Segmentation of Vessel Laminae From Fluorescence Confocal Microscope Images and Parallel GPU Implementation

    PubMed Central

    Narayanaswamy, Arunachalam; Dwarakapuram, Saritha; Bjornsson, Christopher S.; Cutler, Barbara M.; Shain, William

    2010-01-01

    This paper presents robust 3-D algorithms to segment vasculature that is imaged by labeling laminae, rather than the lumenal volume. The signal is weak, sparse, noisy, nonuniform, low-contrast, and exhibits gaps and spectral artifacts, so adaptive thresholding and Hessian filtering based methods are not effective. The structure deviates from a tubular geometry, so tracing algorithms are not effective. We propose a four step approach. The first step detects candidate voxels using a robust hypothesis test based on a model that assumes Poisson noise and locally planar geometry. The second step performs an adaptive region growth to extract weakly labeled and fine vessels while rejecting spectral artifacts. To enable interactive visualization and estimation of features such as statistical confidence, local curvature, local thickness, and local normal, we perform the third step. In the third step, we construct an accurate mesh representation using marching tetrahedra, volume-preserving smoothing, and adaptive decimation algorithms. To enable topological analysis and efficient validation, we describe a method to estimate vessel centerlines using a ray casting and vote accumulation algorithm which forms the final step of our algorithm. Our algorithm lends itself to parallel processing, and yielded an 8× speedup on a graphics processor (GPU). On synthetic data, our meshes had average error per face (EPF) values of (0.1–1.6) voxels per mesh face for peak signal-to-noise ratios from (110–28 dB). Separately, the error from decimating the mesh to less than 1% of its original size, the EPF was less than 1 voxel/face. When validated on real datasets, the average recall and precision values were found to be 94.66% and 94.84%, respectively. PMID:20199906

  11. Large-eddy simulation of the Rayleigh-Taylor instability on a massively parallel computer

    SciTech Connect

    Amala, P.A.K.

    1995-03-01

    A computational model for the solution of the three-dimensional Navier-Stokes equations is developed. This model includes a turbulence model: a modified Smagorinsky eddy-viscosity with a stochastic backscatter extension. The resultant equations are solved using finite difference techniques: the second-order explicit Lax-Wendroff schemes. This computational model is implemented on a massively parallel computer. Programming models on massively parallel computers are next studied. It is desired to determine the best programming model for the developed computational model. To this end, three different codes are tested on a current massively parallel computer: the CM-5 at Los Alamos. Each code uses a different programming model: one is a data parallel code; the other two are message passing codes. Timing studies are done to determine which method is the fastest. The data parallel approach turns out to be the fastest method on the CM-5 by at least an order of magnitude. The resultant code is then used to study a current problem of interest to the computational fluid dynamics community. This is the Rayleigh-Taylor instability. The Lax-Wendroff methods handle shocks and sharp interfaces poorly. To this end, the Rayleigh-Taylor linear analysis is modified to include a smoothed interface. The linear growth rate problem is then investigated. Finally, the problem of the randomly perturbed interface is examined. Stochastic backscatter breaks the symmetry of the stationary unstable interface and generates a mixing layer growing at the experimentally observed rate. 115 refs., 51 figs., 19 tabs.

  12. Parallel optimization of pixel purity index algorithm for massive hyperspectral images in cloud computing environment

    NASA Astrophysics Data System (ADS)

    Chen, Yufeng; Wu, Zebin; Sun, Le; Wei, Zhihui; Li, Yonglong

    2016-04-01

    With the gradual increase in the spatial and spectral resolution of hyperspectral images, the size of image data becomes larger and larger, and the complexity of processing algorithms is growing, which poses a big challenge to efficient massive hyperspectral image processing. Cloud computing technologies distribute computing tasks to a large number of computing resources for handling large data sets without the limitation of memory and computing resource of a single machine. This paper proposes a parallel pixel purity index (PPI) algorithm for unmixing massive hyperspectral images based on a MapReduce programming model for the first time in the literature. According to the characteristics of hyperspectral images, we describe the design principle of the algorithm, illustrate the main cloud unmixing processes of PPI, and analyze the time complexity of serial and parallel algorithms. Experimental results demonstrate that the parallel implementation of the PPI algorithm on the cloud can effectively process big hyperspectral data and accelerate the algorithm.

  13. Applications of the massively parallel machine, the MasPar MP-1, to Earth sciences

    NASA Technical Reports Server (NTRS)

    Fischer, James R.; Strong, James P.; Dorband, John E.; Tilton, James C.

    1991-01-01

    The computational workload of upcoming NASA science missions, especially the ground data processing for the Earth Observing System, is projected to be quite large (in the 50 to 100 gigaFLOPS range) and corespondingly very expensive to perform using conventional supercomputer systems. High performance, general purpose massively parallel computer systems such as the MasPar MP-1 are being investigated by NASA as a more cost effective alternative. Massively parallel systems are targeted for accelerated development and maturation by NASA's upcoming five-year High Performance Computing and Communications Program. A summary of the broad range of applications currently running on the MP-1 at NASA/Goddard are presented in this paper along with descriptions of the parallel algorithmic techniques employed in five applications that have bearing on Earth sciences.

  14. Massively parallel implementation of the Penn State/NCAR Mesoscale Model

    SciTech Connect

    Foster, I.; Michalakes, J.

    1992-01-01

    Parallel computing promises significant improvements in both the raw speed and cost performance of mesoscale atmospheric models. On distributed-memory massively parallel computers available today, the performance of a mesoscale model will exceed that of conventional supercomputers; on the teraflops machines expected within the next five years, performance will increase by several orders of magnitude. As a result, scientists will be able to consider larger problems, more complex model processes, and finer resolutions. In this paper. we report on a project at Argonne National Laboratory that will allow scientists to take advantage of parallel computing technology. This Massively Parallel Mesoscale Model (MPMM) will be functionally equivalent to the Penn State/NCAR Mesoscale Model (MM). In a prototype study, we produced a parallel version of MM4 using a static (compile-time) coarse-grained patch'' decomposition. This code achieves one-third the performance of a one-processor CRAY Y-MP on twelve Intel 1860 microprocessors. The current version of MPMM is based on all MM5 and uses a more fine-grained approach, decomposing the grid as finely as the mesh itself allows so that each horizontal grid cell is a parallel process. This will allow the code to utilize many hundreds of processors. A high-level language for expressing parallel programs is used to implement communication strearns between the processes in a way that permits dynamic remapping to the physical processors of a particular parallel computer. This facilitates load balancing, grid nesting, and coupling with graphical systems and other models.

  15. Massively parallel implementation of the Penn State/NCAR Mesoscale Model

    SciTech Connect

    Foster, I.; Michalakes, J.

    1992-12-01

    Parallel computing promises significant improvements in both the raw speed and cost performance of mesoscale atmospheric models. On distributed-memory massively parallel computers available today, the performance of a mesoscale model will exceed that of conventional supercomputers; on the teraflops machines expected within the next five years, performance will increase by several orders of magnitude. As a result, scientists will be able to consider larger problems, more complex model processes, and finer resolutions. In this paper. we report on a project at Argonne National Laboratory that will allow scientists to take advantage of parallel computing technology. This Massively Parallel Mesoscale Model (MPMM) will be functionally equivalent to the Penn State/NCAR Mesoscale Model (MM). In a prototype study, we produced a parallel version of MM4 using a static (compile-time) coarse-grained ``patch`` decomposition. This code achieves one-third the performance of a one-processor CRAY Y-MP on twelve Intel 1860 microprocessors. The current version of MPMM is based on all MM5 and uses a more fine-grained approach, decomposing the grid as finely as the mesh itself allows so that each horizontal grid cell is a parallel process. This will allow the code to utilize many hundreds of processors. A high-level language for expressing parallel programs is used to implement communication strearns between the processes in a way that permits dynamic remapping to the physical processors of a particular parallel computer. This facilitates load balancing, grid nesting, and coupling with graphical systems and other models.

  16. Massively parallel per-pixel-based zerotree processing architecture for real-time video compression

    NASA Astrophysics Data System (ADS)

    Alagoda, Geoffrey; Rassau, Alexander M.; Eshraghian, Kamran

    2001-11-01

    In the span of a few years, mobile multimedia communication has rapidly become a significant area of research and development constantly challenging boundaries on a variety of technological fronts. Video compression, a fundamental component for most mobile multimedia applications, generally places heavy demands in terms of the required processing capacity. Hardware implementations of typical modern hybrid codecs require realisation of components such as motion compensation, wavelet transform, quantisation, zerotree coding and arithmetic coding in real-time. While the implementation of such codecs using a fast generic processor is possible, undesirable trade-offs in terms of power consumption and speed must generally be made. The improvement in power consumption that is achievable through the use of a slow-clocked massively parallel processing environment, while maintaining real-time processing speeds, should thus not be overlooked. An architecture to realise such a massively parallel solution for a zerotree entropy coder is, therefore, presented in this paper.

  17. Numerical and physical instabilities in massively parallel LES of reacting flows

    NASA Astrophysics Data System (ADS)

    Poinsot, Thierry

    LES of reacting flows is rapidly becoming mature and providing levels of precision which can not be reached with any RANS (Reynolds Averaged) technique. In addition to the multiple subgrid scale models required for such LES and to the questions raised by the required numerical accurcay of LES solvers, various issues related the reliability, mesh independence and repetitivity of LES must still be addressed, especially when LES is used on massively parallel machines. This talk discusses some of these issues: (1) the existence of non physical waves (known as `wiggles' by most LES practitioners) in LES, (2) the effects of mesh size on LES of reacting flows, (3) the growth of rounding errors in LES on massively parallel machines and more generally (4) the ability to qualify a LES code as `bug free' and `accurate'. Examples range from academic cases (minimum non-reacting turbulent channel) to applied configurations (a sector of an helicopter combustion chamber).

  18. Cross-platform compatibility of Hi-Plex, a streamlined approach for targeted massively parallel sequencing.

    PubMed

    Nguyen-Dumont, Tú; Pope, Bernard J; Hammet, Fleur; Mahmoodi, Maryam; Tsimiklis, Helen; Southey, Melissa C; Park, Daniel J

    2013-11-15

    Although per-base sequencing costs have decreased during recent years, library preparation for targeted massively parallel sequencing remains constrained by high reagent cost, limited design flexibility, and protocol complexity. To address these limitations, we previously developed Hi-Plex, a polymerase chain reaction (PCR) massively parallel sequencing strategy for screening panels of genomic target regions. Here, we demonstrate that Hi-Plex applied with hybrid adapters can generate a library suitable for sequencing with both the Ion Torrent and the TruSeq chemistries and that adjusting primer concentrations improves coverage uniformity. These results expand Hi-Plex capabilities as an accurate, affordable, flexible, and rapid approach for various genetic screening applications. PMID:23933242

  19. Cross-platform compatibility of Hi-Plex, a streamlined approach for targeted massively parallel sequencing

    PubMed Central

    Nguyen-Dumont, Tú; Pope, Bernard J.; Hammet, Fleur; Mahmoodi, Maryam; Tsimiklis, Helen; Southey, Melissa C.; Park, Daniel J.

    2013-01-01

    Although per-base sequencing costs have decreased during recent years, library preparation for targeted massively parallel sequencing remains constrained by high reagent cost, limited design flexibility, and protocol complexity. To address these limitations, we previously developed Hi-Plex, a polymerase chain reaction (PCR) massively parallel sequencing strategy for screening panels of genomic target regions. Here, we demonstrate that Hi-Plex applied with hybrid adapters can generate a library suitable for sequencing with both the Ion Torrent and the TruSeq chemistries and that adjusting primer concentrations improves coverage uniformity. These results expand Hi-Plex capabilities as an accurate, affordable, flexible, and rapid approach for various genetic screening applications. PMID:23933242

  20. A domain decomposition study of massively parallel computing in compressible gas dynamics

    NASA Astrophysics Data System (ADS)

    Wong, C. C.; Blottner, F. G.; Payne, J. L.; Soetrisno, M.

    1995-03-01

    The appropriate utilization of massively parallel computers for solving the Navier-Stokes equations is investigated and determined from an engineering perspective. The issues investigated are: (1) Should strip or patch domain decomposition of the spatial mesh be used to reduce computer time? (2) How many computer nodes should be used for a problem with a given sized mesh to reduce computer time? (3) Is the convergence of the Navier-Stokes solution procedure (LU-SGS) adversely influenced by the domain decomposition approach? The results of the paper show that the present Navier-Stokes solution technique has good performance on a massively parallel computer for transient flow problems. For steady-state problems with a large number of mesh cells, the solution procedure will require significant computer time due to an increased number of iterations to achieve a converged solution. There is an optimum number of computer nodes to use for a problem with a given global mesh size.

  1. Chemical network problems solved on NASA/Goddard's massively parallel processor computer

    NASA Technical Reports Server (NTRS)

    Cho, Seog Y.; Carmichael, Gregory R.

    1987-01-01

    The single instruction stream, multiple data stream Massively Parallel Processor (MPP) unit consists of 16,384 bit serial arithmetic processors configured as a 128 x 128 array whose speed can exceed that of current supercomputers (Cyber 205). The applicability of the MPP for solving reaction network problems is presented and discussed, including the mapping of the calculation to the architecture, and CPU timing comparisons.

  2. Progressive Vector Quantization on a massively parallel SIMD machine with application to multispectral image data

    NASA Technical Reports Server (NTRS)

    Manohar, Mareboyana; Tilton, James C.

    1994-01-01

    A progressive vector quantization (VQ) compression approach is discussed which decomposes image data into a number of levels using full search VQ. The final level is losslessly compressed, enabling lossless reconstruction. The computational difficulties are addressed by implementation on a massively parallel SIMD machine. We demonstrate progressive VQ on multispectral imagery obtained from the Advanced Very High Resolution Radiometer instrument and other Earth observation image data, and investigate the trade-offs in selecting the number of decomposition levels and codebook training method.

  3. Parallel contributing area calculation with granularity control on massive grid terrain datasets

    NASA Astrophysics Data System (ADS)

    Jiang, Ling; Tang, Guoan; Liu, Xuejun; Song, Xiaodong; Yang, Jianyi; Liu, Kai

    2013-10-01

    The calculation of contributing areas from digital elevation models (DEMs) is one of the important tasks in digital terrain analysis (DTA). The computational process usually involves two steps in a real application: (1) calculating flow directions via a flow model, and (2) computing the contributing area for each grid cell in the DEM. The traditional algorithm for calculating contributing areas is coded as a sequential program executed on a single processor. With the increase of scope and resolution of DEMs, the serial algorithm has become increasingly difficult to perform and is often very time-consuming, especially for DEMs of large areas and fine scales. In recent years, parallel computing is able to meet this challenge with the development of computer technology. However, the parallel implementation with granularity control, an efficient strategy to reap the best parallel performance and to break the limitation of computing resources in processing massive grid terrain datasets, has not been found in DTA research field. This paper develops a message-passing-interface (MPI) parallel approach with granularity control to calculate contributing areas. According to the proposed parallelization strategy, the parallel D8 algorithm with granularity control is designed as well as the parallel AreaD8 algorithm. Based on the domain decomposition of DEM data, it is possible for each process to process multiple partitions decomposed under a grain size. According to an iterative procedure of reading source data, executing the operator and writing resulting data, the partitions achieve the calculation results one by one in each process. The experimental results on a multi-node cluster show that the proposed parallel algorithms with granularity control are the powerful tools to process the big dataset and the parallel D8 algorithm is insensitive to granularity, while the parallel AreaD8 algorithm has an optimal grain size to reap the best parallel performance.

  4. Massively parallel multifrontal methods for finite element analysis on MIMD computer systems

    SciTech Connect

    Benner, R.E.

    1993-03-01

    The development of highly parallel direct solvers for large, sparse linear systems of equations (e.g. for finite element or finite difference models) is lagging behind progress in parallel direct solvers for dense matrices and iterative methods for sparse matrices. We describe a massively parallel (MP) multifrontal solver for the direct solution of large sparse linear systems, such as those routinely encountered in finite element structural analysis, in an effort to address concerns about the viability of scalable, MP direct methods for sparse systems and enhance the software base for MP applications. Performance results are presented and future directions are outlined for research and development efforts in parallel multifrontal and related solvers. In particular, parallel efficiencies of 25% on 1024 nCUBE 2 nodes and 36% on 64 Intel iPSCS60 nodes have been demonstrated, and parallel efficiencies of 60--85% are expected when a severe load imbalance is overcome by static mapping and dynamic load balance techniques previously developed for other parallel solvers and application codes.

  5. Using CLIPS in the domain of knowledge-based massively parallel programming

    NASA Technical Reports Server (NTRS)

    Dvorak, Jiri J.

    1994-01-01

    The Program Development Environment (PDE) is a tool for massively parallel programming of distributed-memory architectures. Adopting a knowledge-based approach, the PDE eliminates the complexity introduced by parallel hardware with distributed memory and offers complete transparency in respect of parallelism exploitation. The knowledge-based part of the PDE is realized in CLIPS. Its principal task is to find an efficient parallel realization of the application specified by the user in a comfortable, abstract, domain-oriented formalism. A large collection of fine-grain parallel algorithmic skeletons, represented as COOL objects in a tree hierarchy, contains the algorithmic knowledge. A hybrid knowledge base with rule modules and procedural parts, encoding expertise about application domain, parallel programming, software engineering, and parallel hardware, enables a high degree of automation in the software development process. In this paper, important aspects of the implementation of the PDE using CLIPS and COOL are shown, including the embedding of CLIPS with C++-based parts of the PDE. The appropriateness of the chosen approach and of the CLIPS language for knowledge-based software engineering are discussed.

  6. Massively parallel fast elliptic equation solver for three dimensional hydrodynamics and relativity

    SciTech Connect

    Sholl, P.L.; Wilson, J.R.; Mathews, G.J.; Avila, J.H.

    1995-01-01

    Through the work proposed in this document we expect to advance the forefront of large scale computational efforts on massively parallel distributed-memory multiprocessors. We will develop tools for effective conversion to a parallel implementation of sequential numerical methods used to solve large systems of partial differential equations. The research supported by this work will involve conversion of a program which does state of the art modeling of multi-dimensional hydrodynamics, general relativity and particle transport in energetic astrophysical environments. The proposed parallel algorithm development, particularly the study and development of fast elliptic equation solvers, could significantly benefit this program and other applications involving solutions to systems of differential equations. We shall develop a data communication manager for distributed memory computers as an aid in program conversions to a parallel environment and implement it in the three dimensional relativistic hydrodynamics program discussed below; develop a concurrent system/concurrent subgrid multigrid method. Currently, five systems are approximated sequentially using multigrid successive overrelaxation. Results from an iteration cycle of one multigrid system are used in following multigrid systems iterations. We shall develop a multigrid algorithm for simultaneous computation of the sets of equations. In addition, we shall implement a method for concurrent processing of the subgrids in each of the multigrid computations. The conditions for convergence of the method will be examined. We`ll compare this technique to other parallel multigrid techniques, such as distributed data/sequential subgrids and the Parallel Superconvergent Multigrid of Frederickson and McBryan. We expect the results of these studies to offer insight and tools both for the selection of new algorithms as well as for conversion of existing large codes for massively parallel architectures.

  7. A new 3D parallel high resolution electromagnetic nonlinear inversion based on new global magnetic integral and local differential decomposition (GILD)

    SciTech Connect

    Xie, G.; Li, J.

    1997-05-01

    A new 3D electromagnetic modeling and nonlinear inversion algorithm is presented based on global integral and local differential equations decomposition (GILD). The GILD parallel nonlinear inversion algorithm consists of five parts: (1) the domain is decomposed into subdomain SI and subdomain SII; (2) a new global magnetic integral equation in SI and the local magnetic differential equations IN SII will be used together to obtain the magnetic field in the modeling step; (3) the new global magnetic integral Jacobian equation in SI and the local magnetic differential Jacobian equations in SII will be used together to update the electric conductivity and permittivity from the magnetic field data in the inversion step; (4) the subdomain SII can naturally and uniformly be decomposed into 2{sup n} smaller sub-cubic-domains; the sparse matrix in each sub-cubic-domain can be eliminated separately, in parallel; (5) a new parallel multiple hierarchy substructure algorithm will be used to solve the smaller full matrices in SI, in parallel. The applications of the new 3D parallel GILD EM modeling and nonlinear inversion algorithm and software are: (1) to create high resolution controlled-source electric conductivity and permittivity imaging for interpreting electromagnetic field data acquired from cross hole, surface to borehole, surface to surface, single hole, and multiple holes; (2) to create the magnetotelluric high resolution imaging from the surface impedance and field data. The new GILD parallel nonlinear inversion will be a 3D/2.5D powerful imaging tool for the oil geophysical exploration and environmental remediation and monitoring.

  8. ASCI Red -- Experiences and lessons learned with a massively parallel teraFLOP supercomputer

    SciTech Connect

    Christon, M.A.; Crawford, D.A.; Hertel, E.S.; Peery, J.S.; Robinson, A.C.

    1997-06-01

    The Accelerated Strategic Computing Initiative (ASCI) program involves Sandia, Los Alamos and Lawrence Livermore National Laboratories. At Sandia National Laboratories, ASCI applications include large deformation transient dynamics, shock propagation, electromechanics, and abnormal thermal environments. In order to resolve important physical phenomena in these problems, it is estimated that meshes ranging from 10{sup 6} to 10{sup 9} grid points will be required. The ASCI program is relying on the use of massively parallel supercomputers initially capable of delivering over 1 TFLOPs to perform such demanding computations. The ASCI Red machine at Sandia National Laboratories consists of over 4,500 computational nodes with a peak computational rate of 1.8 TFLOPs, 567 GBytes of memory, and 2 TBytes of disk storage. Regardless of the peak FLOP rate, there are many issues surrounding the use of massively parallel supercomputers in a production environment. These issues include parallel I/O, mesh generation, visualization, archival storage, high-bandwidth networking and the development of parallel algorithms. In order to illustrate these issues and their solution with respect to ASCI Red, demonstration calculations of time-dependent buoyancy-dominated plumes, electromechanics, and shock propagation will be presented.

  9. Massively parallel Monte Carlo for many-particle simulations on GPUs

    SciTech Connect

    Anderson, Joshua A.; Jankowski, Eric; Grubb, Thomas L.; Engel, Michael; Glotzer, Sharon C.

    2013-12-01

    Current trends in parallel processors call for the design of efficient massively parallel algorithms for scientific computing. Parallel algorithms for Monte Carlo simulations of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. In this paper, we present a massively parallel method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a Tesla K20, our GPU implementation executes over one billion trial moves per second, which is 148 times faster than on a single Intel Xeon E5540 CPU core, enables 27 times better performance per dollar, and cuts energy usage by a factor of 13. With this improved performance we are able to calculate the equation of state for systems of up to one million hard disks. These large system sizes are required in order to probe the nature of the melting transition, which has been debated for the last forty years. In this paper we present the details of our computational method, and discuss the thermodynamics of hard disks separately in a companion paper.

  10. Molecular Dynamics Simulations from SNL's Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS)

    DOE Data Explorer

    Plimpton, Steve; Thompson, Aidan; Crozier, Paul

    LAMMPS (http://lammps.sandia.gov/index.html) stands for Large-scale Atomic/Molecular Massively Parallel Simulator and is a code that can be used to model atoms or, as the LAMMPS website says, as a parallel particle simulator at the atomic, meso, or continuum scale. This Sandia-based website provides a long list of animations from large simulations. These were created using different visualization packages to read LAMMPS output, and each one provides the name of the PI and a brief description of the work done or visualization package used. See also the static images produced from simulations at http://lammps.sandia.gov/pictures.html The foundation paper for LAMMPS is: S. Plimpton, Fast Parallel Algorithms for Short-Range Molecular Dynamics, J Comp Phys, 117, 1-19 (1995), but the website also lists other papers describing contributions to LAMMPS over the years.

  11. Medical image processing utilizing neural networks trained on a massively parallel computer.

    PubMed

    Kerr, J P; Bartlett, E B

    1995-07-01

    While finding many applications in science, engineering, and medicine, artificial neural networks (ANNs) have typically been limited to small architectures. In this paper, we demonstrate how very large architecture neural networks can be trained for medical image processing utilizing a massively parallel, single-instruction multiple data (SIMD) computer. The two- to three-orders of magnitude improvement in processing time attainable using a parallel computer makes it practical to train very large architecture ANNs. As an example we have trained several ANNs to demonstrate the tomographic reconstruction of 64 x 64 single photon emission computed tomography (SPECT) images from 64 planar views of the images. The potential for these large architecture ANNs lies in the fact that once the neural network is properly trained on the parallel computer the corresponding interconnection weight file can be loaded on a serial computer. Subsequently, relatively fast processing of all novel images can be performed on a PC or workstation. PMID:7497701

  12. A massively parallel adaptive finite element method with dynamic load balancing

    SciTech Connect

    Devine, K.D.; Flaherty, J.E.; Wheat, S.R.; Maccabe, A.B.

    1993-05-01

    We construct massively parallel, adaptive finite element methods for the solution of hyperbolic conservation laws in one and two dimensions. Spatial discretization is performed by a discontinuous Galerkin finite element method using a basis of piecewise Legendre polynomials. Temporal discretization utilizes a Runge-Kutta method. Dissipative fluxes and projection limiting prevent oscillations near solution discontinuities. The resulting method is of high order and may be parallelized efficiently on MIMD computers. We demonstrate parallel efficiency through computations on a 1024-processor nCUBE/2 hypercube. We also present results using adaptive p-refinement to reduce the computational cost of the method. We describe tiling, a dynamic, element-based data migration system. Tiling dynamically maintains global load balance in the adaptive method by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. We demonstrate the effectiveness of the dynamic load balancing with adaptive p-refinement examples.

  13. A massively parallel adaptive finite element method with dynamic load balancing

    SciTech Connect

    Devine, K.D.; Flaherty, J.E.; Wheat, S.R.; Maccabe, A.B.

    1993-12-31

    The authors construct massively parallel adaptive finite element methods for the solution of hyperbolic conservation laws. Spatial discretization is performed by a discontinuous Galerkin finite element method using a basis of piecewise Legendre polynomials. Temporal discretization utilizes a Runge-Kutta method. Dissipative fluxes and projection limiting prevent oscillations near solution discontinuities. The resulting method is of high order and may be parallelized efficiently on MIMD computers. They demonstrate parallel efficiency through computations on a 1024-processor nCUBE/2 hypercube. They present results using adaptive p-refinement to reduce the computational cost of the method, and tiling, a dynamic, element-based data migration system that maintains global load balance of the adaptive method by overlapping neighborhoods of processors that each perform local balancing.

  14. The use of inexact ODE solver in waveform relaxation methods on a massively parallel computer

    SciTech Connect

    Luk, W.S.; Wing, O.

    1995-12-01

    This paper presents the use of inexact ordinary differential equation (ODE) solver in waveform relaxation methods for solving initial value problems: Since the conventional ODE solvers are inherently sequential, the inexact ODE solver is used by taking time points from only previous waveform iteration for time integration. As a result, this method is truly massively parallel, as the equation is completely unfolded both in system and in time. Convergence analysis shows that the spectral radius of the iteration equation resulting from the {open_quotes}inexact{close_quotes} solver is the same as that from the standard method, and hence the new method is robust. The parallel implementation issues on the DECmpp 12000/Sx computer will also be discussed. Numerical results illustrate that though the number of iterations in the inexact method is increased over the exact method, as expected, the computation time is much reduced because of the large-scale parallelism.

  15. Design and Performance Analysis of a Massively Parallel Atmospheric General Circulation Model

    NASA Technical Reports Server (NTRS)

    Schaffer, Daniel S.; Suarez, Max J.

    1998-01-01

    In the 1990's computer manufacturers are increasingly turning to the development of parallel processor machines to meet the high performance needs of their customers. Simultaneously, atmospheric scientists study weather and climate phenomena ranging from hurricanes to El Nino to global warming that require increasingly fine resolution models. Here, implementation of a parallel atmospheric general circulation model (GCM) which exploits the power of massively parallel machines is described. Using the horizontal data domain decomposition methodology, this FORTRAN 90 model is able to integrate a 0.6 deg. longitude by 0.5 deg. latitude problem at a rate of 19 Gigaflops on 512 processors of a Cray T3E 600; corresponding to 280 seconds of wall-clock time per simulated model day. At this resolution, the model has 64 times as many degrees of freedom and performs 400 times as many floating point operations per simulated day as the model it replaces.

  16. Virtual Simulator: An infrastructure for design and performance-prediction of massively parallel codes

    NASA Astrophysics Data System (ADS)

    Perumalla, K.; Fujimoto, R.; Pande, S.; Karimabadi, H.; Driscoll, J.; Omelchenko, Y.

    2005-12-01

    Large parallel/distributed scientific simulations are very complex, and their dynamic behavior is hard to predict. Efficient development of massively parallel codes remains a computational challenge. For example, almost none of the kinetic codes in use in space physics today have dynamic load balancing capability. Here we present a new infrastructure for design and prediction of parallel codes. Performance prediction is useful to analyze, understand and experiment with different partitioning schemes, multiple modeling alternatives and so on, without having to run the application on supercomputers. Instrumentation of the model (with least perturbance to performance) is useful to glean key metrics and understand application-level behavior. Unfortunately, traditional approaches to virtual execution and instrumentation are limited by either slow execution speed or low resolution or both. We present a new framework that provides a high-resolution framework that provides a virtual CPU abstraction (with a full thread context per CPU), yet scales to thousands of virtual CPUs. The tool, called PDES2, presents different levels of modeling interfaces, from general purpose parallel simulations to parallel grid-based particle-in-cell (PIC) codes. The tool itself runs on multiple processors in order to accommodate the high-resolution by distributing the virtual execution across processors. Validation experiments of PIC models in the framework using a 1-D hybrid shock application show close agreement of results from virtual executions with results from actual supercomputer runs. The utility of this tool is further illustrated through an application to a parallel global hybrid code.

  17. Scalable Parallel Execution of an Event-based Radio Signal Propagation Model for Cluttered 3D Terrains

    SciTech Connect

    Seal, Sudip K; Perumalla, Kalyan S

    2009-01-01

    Radio signal strength estimation is essential in many applications, including the design of military radio communications and industrial wireless installations. While classical approaches such as finite difference methods are well-known, new event-based models of radio signal propagation have been recently shown to deliver such estimates faster (via serial execution) than other methods. For scenarios with large or richly-featured geographical volumes, however, parallel processing is required to meet the memory and computation time demands. Here, we present a scalable and efficient parallel execution of a recently-developed event-based radio signal propagation model. We demonstrate its scalability to thousands of processors, with parallel speedups over 1000x. The speed and scale achieved by our parallel execution enable larger scenarios and faster execution than has ever been reported before.

  18. LDRD final report on massively-parallel linear programming : the parPCx system.

    SciTech Connect

    Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar

    2005-02-01

    This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runs on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We

  19. Overcoming rule-based rigidity and connectionist limitations through massively-parallel case-based reasoning

    NASA Technical Reports Server (NTRS)

    Barnden, John; Srinivas, Kankanahalli

    1990-01-01

    Symbol manipulation as used in traditional Artificial Intelligence has been criticized by neural net researchers for being excessively inflexible and sequential. On the other hand, the application of neural net techniques to the types of high-level cognitive processing studied in traditional artificial intelligence presents major problems as well. A promising way out of this impasse is to build neural net models that accomplish massively parallel case-based reasoning. Case-based reasoning, which has received much attention recently, is essentially the same as analogy-based reasoning, and avoids many of the problems leveled at traditional artificial intelligence. Further problems are avoided by doing many strands of case-based reasoning in parallel, and by implementing the whole system as a neural net. In addition, such a system provides an approach to some aspects of the problems of noise, uncertainty and novelty in reasoning systems. The current neural net system (Conposit), which performs standard rule-based reasoning, is being modified into a massively parallel case-based reasoning version.

  20. A Novel Implementation of Massively Parallel Three Dimensional Monte Carlo Radiation Transport

    NASA Astrophysics Data System (ADS)

    Robinson, P. B.; Peterson, J. D. L.

    2005-12-01

    The goal of our summer project was to implement the difference formulation for radiation transport into Cosmos++, a multidimensional, massively parallel, magneto hydrodynamics code for astrophysical applications (Peter Anninos - AX). The difference formulation is a new method for Symbolic Implicit Monte Carlo thermal transport (Brooks and Szöke - PAT). Formerly, simultaneous implementation of fully implicit Monte Carlo radiation transport in multiple dimensions on multiple processors had not been convincingly demonstrated. We found that a combination of the difference formulation and the inherent structure of Cosmos++ makes such an implementation both accurate and straightforward. We developed a "nearly nearest neighbor physics" technique to allow each processor to work independently, even with a fully implicit code. This technique coupled with the increased accuracy of an implicit Monte Carlo solution and the efficiency of parallel computing systems allows us to demonstrate the possibility of massively parallel thermal transport. This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48

  1. A cost-effective methodology for the design of massively-parallel VLSI functional units

    NASA Technical Reports Server (NTRS)

    Venkateswaran, N.; Sriram, G.; Desouza, J.

    1993-01-01

    In this paper we propose a generalized methodology for the design of cost-effective massively-parallel VLSI Functional Units. This methodology is based on a technique of generating and reducing a massive bit-array on the mask-programmable PAcube VLSI array. This methodology unifies (maintains identical data flow and control) the execution of complex arithmetic functions on PAcube arrays. It is highly regular, expandable and uniform with respect to problem-size and wordlength, thereby reducing the communication complexity. The memory-functional unit interface is regular and expandable. Using this technique functional units of dedicated processors can be mask-programmed on the naked PAcube arrays, reducing the turn-around time. The production cost of such dedicated processors can be drastically reduced since the naked PAcube arrays can be mass-produced. Analysis of the the performance of functional units designed by our method yields promising results.

  2. A Two Colorable Fourth Order Compact Difference Scheme and Parallel Iterative Solution of the 3D Convection Diffusion Equation

    NASA Technical Reports Server (NTRS)

    Zhang, Jun; Ge, Lixin; Kouatchou, Jules

    2000-01-01

    A new fourth order compact difference scheme for the three dimensional convection diffusion equation with variable coefficients is presented. The novelty of this new difference scheme is that it Only requires 15 grid points and that it can be decoupled with two colors. The entire computational grid can be updated in two parallel subsweeps with the Gauss-Seidel type iterative method. This is compared with the known 19 point fourth order compact differenCe scheme which requires four colors to decouple the computational grid. Numerical results, with multigrid methods implemented on a shared memory parallel computer, are presented to compare the 15 point and the 19 point fourth order compact schemes.

  3. Scaling and performance of a 3-D radiation hydrodynamics code on message-passing parallel computers: final report

    SciTech Connect

    Hayes, J C; Norman, M

    1999-10-28

    This report details an investigation into the efficacy of two approaches to solving the radiation diffusion equation within a radiation hydrodynamic simulation. Because leading-edge scientific computing platforms have evolved from large single-node vector processors to parallel aggregates containing tens to thousands of individual CPU's, the ability of an algorithm to maintain high compute efficiency when distributed over a large array of nodes is critically important. The viability of an algorithm thus hinges upon the tripartite question of numerical accuracy, total time to solution, and parallel efficiency.

  4. Commodity cluster and hardware-based massively parallel implementations of hyperspectral imaging algorithms

    NASA Astrophysics Data System (ADS)

    Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David

    2006-05-01

    The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.

  5. GICUDA: A parallel program for 3D correlation imaging of large scale gravity and gravity gradiometry data on graphics processing units with CUDA

    NASA Astrophysics Data System (ADS)

    Chen, Zhaoxi; Meng, Xiaohong; Guo, Lianghui; Liu, Guofeng

    2012-09-01

    The 3D correlation imaging for gravity and gravity gradiometry data provides a rapid approach to the equivalent estimation of objective bodies with different density contrasts in the subsurface. The subsurface is divided into a 3D regular grid, and then a cross correlation between the observed data and the theoretical gravity anomaly due to a point mass source is calculated at each grid node. The resultant correlation coefficients are adopted to describe the equivalent mass distribution in a quantitate probability sense. However, when the size of the survey data is large, it is still computationally expensive. With the advent of the CUDA, GPUs lead to a new path for parallel computing, which have been widely applied in seismic processing, astronomy, molecular dynamics simulation, fluid mechanics and some other fields. We transfer the main time-consuming program of 3D correlation imaging into GPU device, where the program can be executed in a parallel way. The synthetic and real tests have been performed to validate the correctness of our code on NVIDIA GTX 550. The precision evaluation and performance speedup comparison of the CPU and GPU implementations are illustrated with different sizes of gravity data. When the size of grid nodes and observed data sets is 1024×1024×1 and 1024×1024, the speed up can reach to 81.5 for gravity data and 90.7 for gravity vertical gradient data respectively, thus providing the basis for the rapid interpretation of gravity and gravity gradiometry data.

  6. Stochastic simulation of charged particle transport on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Earl, James A.

    1988-01-01

    Computations of cosmic-ray transport based upon finite-difference methods are afflicted by instabilities, inaccuracies, and artifacts. To avoid these problems, researchers developed a Monte Carlo formulation which is closely related not only to the finite-difference formulation, but also to the underlying physics of transport phenomena. Implementations of this approach are currently running on the Massively Parallel Processor at Goddard Space Flight Center, whose enormous computing power overcomes the poor statistical accuracy that usually limits the use of stochastic methods. These simulations have progressed to a stage where they provide a useful and realistic picture of solar energetic particle propagation in interplanetary space.

  7. Block iterative restoration of astronomical images with the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Heap, Sara R.; Lindler, Don J.

    1987-01-01

    A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images.

  8. Direct methods for banded linear systems on massively parallel processor computers

    SciTech Connect

    Arbenz, P.; Gander, W.

    1995-12-01

    The authors discuss direct methods for solving systems of linear equations Ax = b, A {element_of} lR{sup nxn}, on massively parallel processor (MPP) computers. Here, A is a real banded n x n matrix with lower and upper half-bandwidth r and s, respectively. We assume that the matrix A has a narrow band, meaning r + s << n. Only in this case, it is worthwhile taking into account the zero structure of A, i.e. store the matrix by diagonals and modify algorithms.

  9. Scalable load balancing for massively parallel distributed Monte Carlo particle transport

    SciTech Connect

    O'Brien, M. J.; Brantley, P. S.; Joy, K. I.

    2013-07-01

    In order to run computer simulations efficiently on massively parallel computers with hundreds of thousands or millions of processors, care must be taken that the calculation is load balanced across the processors. Examining the workload of every processor leads to an unscalable algorithm, with run time at least as large as O(N), where N is the number of processors. We present a scalable load balancing algorithm, with run time 0(log(N)), that involves iterated processor-pair-wise balancing steps, ultimately leading to a globally balanced workload. We demonstrate scalability of the algorithm up to 2 million processors on the Sequoia supercomputer at Lawrence Livermore National Laboratory. (authors)

  10. Estimating water flow through a hillslope using the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Devaney, Judy E.; Camillo, P. J.; Gurney, R. J.

    1988-01-01

    A new two-dimensional model of water flow in a hillslope has been implemented on the Massively Parallel Processor at the Goddard Space Flight Center. Flow in the soil both in the saturated and unsaturated zones, evaporation and overland flow are all modelled, and the rainfall rates are allowed to vary spatially. Previous models of this type had always been very limited computationally. This model takes less than a minute to model all the components of the hillslope water flow for a day. The model can now be used in sensitivity studies to specify which measurements should be taken and how accurate they should be to describe such flows for environmental studies.

  11. Animated computer graphics models of space and earth sciences data generated via the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Treinish, Lloyd A.; Gough, Michael L.; Wildenhain, W. David

    1987-01-01

    The capability was developed of rapidly producing visual representations of large, complex, multi-dimensional space and earth sciences data sets via the implementation of computer graphics modeling techniques on the Massively Parallel Processor (MPP) by employing techniques recently developed for typically non-scientific applications. Such capabilities can provide a new and valuable tool for the understanding of complex scientific data, and a new application of parallel computing via the MPP. A prototype system with such capabilities was developed and integrated into the National Space Science Data Center's (NSSDC) Pilot Climate Data System (PCDS) data-independent environment for computer graphics data display to provide easy access to users. While developing these capabilities, several problems had to be solved independently of the actual use of the MPP, all of which are outlined.

  12. Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

    DOE PAGESBeta

    Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; Davidson, Gregory G.; Hamilton, Steven P.; Godfrey, Andrew T.

    2015-12-21

    This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Somemore » specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000® problems. These benchmark and scaling studies show promising results.« less

  13. Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

    SciTech Connect

    Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; Davidson, Gregory G.; Hamilton, Steven P.; Godfrey, Andrew T.

    2015-12-21

    This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Some specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000® problems. These benchmark and scaling studies show promising results.

  14. Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

    NASA Astrophysics Data System (ADS)

    Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; Davidson, Gregory G.; Hamilton, Steven P.; Godfrey, Andrew T.

    2016-03-01

    This work discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package authored at Oak Ridge National Laboratory. Shift has been developed to scale well from laptops to small computing clusters to advanced supercomputers and includes features such as support for multiple geometry and physics engines, hybrid capabilities for variance reduction methods such as the Consistent Adjoint-Driven Importance Sampling methodology, advanced parallel decompositions, and tally methods optimized for scalability on supercomputing architectures. The scaling studies presented in this paper demonstrate good weak and strong scaling behavior for the implemented algorithms. Shift has also been validated and verified against various reactor physics benchmarks, including the Consortium for Advanced Simulation of Light Water Reactors' Virtual Environment for Reactor Analysis criticality test suite and several Westinghouse AP1000® problems presented in this paper. These benchmark results compare well to those from other contemporary Monte Carlo codes such as MCNP5 and KENO.

  15. Massively Parallel Computation of Soil Surface Roughness Parameters on A Fermi GPU

    NASA Astrophysics Data System (ADS)

    Li, Xiaojie; Song, Changhe

    2016-06-01

    Surface roughness is description of the surface micro topography of randomness or irregular. The standard deviation of surface height and the surface correlation length describe the statistical variation for the random component of a surface height relative to a reference surface. When the number of data points is large, calculation of surface roughness parameters is time-consuming. With the advent of Graphics Processing Unit (GPU) architectures, inherently parallel problem can be effectively solved using GPUs. In this paper we propose a GPU-based massively parallel computing method for 2D bare soil surface roughness estimation. This method was applied to the data collected by the surface roughness tester based on the laser triangulation principle during the field experiment in April 2012. The total number of data points was 52,040. It took 47 seconds on a Fermi GTX 590 GPU whereas its serial CPU version took 5422 seconds, leading to a significant 115x speedup.

  16. Massively-parallel electrical-conductivity imaging of hydrocarbonsusing the Blue Gene/L supercomputer

    SciTech Connect

    Commer, M.; Newman, G.A.; Carazzone, J.J.; Dickens, T.A.; Green,K.E.; Wahrmund, L.A.; Willen, D.E.; Shiu, J.

    2007-05-16

    Large-scale controlled source electromagnetic (CSEM)three-dimensional (3D) geophysical imaging is now receiving considerableattention for electrical conductivity mapping of potential offshore oiland gas reservoirs. To cope with the typically large computationalrequirements of the 3D CSEM imaging problem, our strategies exploitcomputational parallelism and optimized finite-difference meshing. Wereport on an imaging experiment, utilizing 32,768 tasks/processors on theIBM Watson Research Blue Gene/L (BG/L) supercomputer. Over a 24-hourperiod, we were able to image a large scale marine CSEM field data setthat previously required over four months of computing time ondistributed clusters utilizing 1024 tasks on an Infiniband fabric. Thetotal initial data misfit could be decreased by 67 percent within 72completed inversion iterations, indicating an electrically resistiveregion in the southern survey area below a depth of 1500 m below theseafloor. The major part of the residual misfit stems from transmitterparallel receiver components that have an offset from the transmittersail line (broadside configuration). Modeling confirms that improvedbroadside data fits can be achieved by considering anisotropic electricalconductivities. While delivering a satisfactory gross scale image for thedepths of interest, the experiment provides important evidence for thenecessity of discriminating between horizontal and verticalconductivities for maximally consistent 3D CSEM inversions.

  17. Massively parallel solution of the inverse scattering problem for integrated circuit quality control

    SciTech Connect

    Leland, R.W.; Draper, B.L.; Naqvi, S.; Minhas, B.

    1997-09-01

    The authors developed and implemented a highly parallel computational algorithm for solution of the inverse scattering problem generated when an integrated circuit is illuminated by laser. The method was used as part of a system to measure diffraction grating line widths on specially fabricated test wafers and the results of the computational analysis were compared with more traditional line-width measurement techniques. The authors found they were able to measure the line width of singly periodic and doubly periodic diffraction gratings (i.e. 2D and 3D gratings respectively) with accuracy comparable to the best available experimental techniques. They demonstrated that their parallel code is highly scalable, achieving a scaled parallel efficiency of 90% or more on typical problems running on 1024 processors. They also made substantial improvements to the algorithmics and their original implementation of Rigorous Coupled Waveform Analysis, the underlying computational technique. These resulted in computational speed-ups of two orders of magnitude in some test problems. By combining these algorithmic improvements with parallelism the authors achieve speedups of between a few thousand and hundreds of thousands over the original engineering code. This made the laser diffraction measurement technique practical.

  18. Progress in the Simulation of Steady and Time-Dependent Flows with 3D Parallel Unstructured Cartesian Methods

    NASA Technical Reports Server (NTRS)

    Aftosmis, M. J.; Berger, M. J.; Murman, S. M.; Kwak, Dochan (Technical Monitor)

    2002-01-01

    The proposed paper will present recent extensions in the development of an efficient Euler solver for adaptively-refined Cartesian meshes with embedded boundaries. The paper will focus on extensions of the basic method to include solution adaptation, time-dependent flow simulation, and arbitrary rigid domain motion. The parallel multilevel method makes use of on-the-fly parallel domain decomposition to achieve extremely good scalability on large numbers of processors, and is coupled with an automatic coarse mesh generation algorithm for efficient processing by a multigrid smoother. Numerical results are presented demonstrating parallel speed-ups of up to 435 on 512 processors. Solution-based adaptation may be keyed off truncation error estimates using tau-extrapolation or a variety of feature detection based refinement parameters. The multigrid method is extended to for time-dependent flows through the use of a dual-time approach. The extension to rigid domain motion uses an Arbitrary Lagrangian-Eulerlarian (ALE) formulation, and results will be presented for a variety of two- and three-dimensional example problems with both simple and complex geometry.

  19. Reduction of reconstruction time for time-resolved spiral 3D contrast-enhanced magnetic resonance angiography using parallel computing.

    PubMed

    Kressler, Bryan; Spincemaille, Pascal; Prince, Martin R; Wang, Yi

    2006-09-01

    Time-resolved 3D MRI with high spatial and temporal resolution can be achieved using spiral sampling and sliding-window reconstruction. Image reconstruction is computationally intensive because of the need for data regridding, a large number of temporal phases, and multiple RF receiver coils. Inhomogeneity blurring correction for spiral sampling further increases the computational work load by an order of magnitude, hindering the clinical utility of spiral trajectories. In this work the reconstruction time is reduced by a factor of >40 compared to reconstruction using a single processor. This is achieved by using a cluster of 32 commercial off-the-shelf computers, commodity networking hardware, and readily available software. The reconstruction system is demonstrated for time-resolved spiral contrast-enhanced (CE) peripheral MR angiography (MRA), and a reduction of reconstruction time from 80 min to 1.8 min is achieved. PMID:16892189

  20. Massively Parallel and Scalable Implicit Time Integration Algorithms for Structural Dynamics

    NASA Technical Reports Server (NTRS)

    Farhat, Charbel

    1997-01-01

    Explicit codes are often used to simulate the nonlinear dynamics of large-scale structural systems, even for low frequency response, because the storage and CPU requirements entailed by the repeated factorizations traditionally found in implicit codes rapidly overwhelm the available computing resources. With the advent of parallel processing, this trend is accelerating because of the following additional facts: (a) explicit schemes are easier to parallelize than implicit ones, and (b) explicit schemes induce short range interprocessor communications that are relatively inexpensive, while the factorization methods used in most implicit schemes induce long range interprocessor communications that often ruin the sought-after speed-up. However, the time step restriction imposed by the Courant stability condition on all explicit schemes cannot yet be offset by the speed of the currently available parallel hardware. Therefore, it is essential to develop efficient alternatives to direct methods that are also amenable to massively parallel processing because implicit codes using unconditionally stable time-integration algorithms are computationally more efficient when simulating the low-frequency dynamics of aerospace structures.

  1. Massively parallel simulation of flow and transport in variably saturated porous and fractured media

    SciTech Connect

    Wu, Yu-Shu; Zhang, Keni; Pruess, Karsten

    2002-01-15

    This paper describes a massively parallel simulation method and its application for modeling multiphase flow and multicomponent transport in porous and fractured reservoirs. The parallel-computing method has been implemented into the TOUGH2 code and its numerical performance is tested on a Cray T3E-900 and IBM SP. The efficiency and robustness of the parallel-computing algorithm are demonstrated by completing two simulations with more than one million gridblocks, using site-specific data obtained from a site-characterization study. The first application involves the development of a three-dimensional numerical model for flow in the unsaturated zone of Yucca Mountain, Nevada. The second application is the study of tracer/radionuclide transport through fracture-matrix rocks for the same site. The parallel-computing technique enhances modeling capabilities by achieving several-orders-of-magnitude speedup for large-scale and high resolution modeling studies. The resulting modeling results provide many new insights into flow and transport processes that could not be obtained from simulations using the single-CPU simulator.

  2. DGDFT: A massively parallel method for large scale density functional theory calculations

    SciTech Connect

    Hu, Wei Yang, Chao; Lin, Lin

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10{sup −4} Hartree/atom in terms of the error of energy and 6.2 × 10{sup −4} Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  3. A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

    SciTech Connect

    Madduri, Kamesh; Ediger, David; Jiang, Karl; Bader, David A.; Chavarría-Miranda, Daniel

    2009-05-29

    We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor. For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.

  4. A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

    SciTech Connect

    Madduri, Kamesh; Ediger, David; Jiang, Karl; Bader, David A.; Chavarria-Miranda, Daniel

    2009-02-15

    We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.

  5. On distributed memory MPI-based parallelization of SPH codes in massive HPC context

    NASA Astrophysics Data System (ADS)

    Oger, G.; Le Touzé, D.; Guibert, D.; de Leffe, M.; Biddiscombe, J.; Soumagne, J.; Piccinali, J.-G.

    2016-03-01

    Most of particle methods share the problem of high computational cost and in order to satisfy the demands of solvers, currently available hardware technologies must be fully exploited. Two complementary technologies are now accessible. On the one hand, CPUs which can be structured into a multi-node framework, allowing massive data exchanges through a high speed network. In this case, each node is usually comprised of several cores available to perform multithreaded computations. On the other hand, GPUs which are derived from the graphics computing technologies, able to perform highly multi-threaded calculations with hundreds of independent threads connected together through a common shared memory. This paper is primarily dedicated to the distributed memory parallelization of particle methods, targeting several thousands of CPU cores. The experience gained clearly shows that parallelizing a particle-based code on moderate numbers of cores can easily lead to an acceptable scalability, whilst a scalable speedup on thousands of cores is much more difficult to obtain. The discussion revolves around speeding up particle methods as a whole, in a massive HPC context by making use of the MPI library. We focus on one particular particle method which is Smoothed Particle Hydrodynamics (SPH), one of the most widespread today in the literature as well as in engineering.

  6. Massively Parallel Dantzig-Wolfe Decomposition Applied to Traffic Flow Scheduling

    NASA Technical Reports Server (NTRS)

    Rios, Joseph Lucio; Ross, Kevin

    2009-01-01

    Optimal scheduling of air traffic over the entire National Airspace System is a computationally difficult task. To speed computation, Dantzig-Wolfe decomposition is applied to a known linear integer programming approach for assigning delays to flights. The optimization model is proven to have the block-angular structure necessary for Dantzig-Wolfe decomposition. The subproblems for this decomposition are solved in parallel via independent computation threads. Experimental evidence suggests that as the number of subproblems/threads increases (and their respective sizes decrease), the solution quality, convergence, and runtime improve. A demonstration of this is provided by using one flight per subproblem, which is the finest possible decomposition. This results in thousands of subproblems and associated computation threads. This massively parallel approach is compared to one with few threads and to standard (non-decomposed) approaches in terms of solution quality and runtime. Since this method generally provides a non-integral (relaxed) solution to the original optimization problem, two heuristics are developed to generate an integral solution. Dantzig-Wolfe followed by these heuristics can provide a near-optimal (sometimes optimal) solution to the original problem hundreds of times faster than standard (non-decomposed) approaches. In addition, when massive decomposition is employed, the solution is shown to be more likely integral, which obviates the need for an integerization step. These results indicate that nationwide, real-time, high fidelity, optimal traffic flow scheduling is achievable for (at least) 3 hour planning horizons.

  7. The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

    DOE PAGESBeta

    O'keefe, Matthew; Parr, Terence; Edgar, B. Kevin; Anderson, Steve; Woodward, Paul; Dietz, Hank

    1995-01-01

    Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. Wemore » have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.« less

  8. Massively Parallel Sequencing-Based Clonality Analysis of Synchronous Endometrioid Endometrial and Ovarian Carcinomas.

    PubMed

    Schultheis, Anne M; Ng, Charlotte K Y; De Filippo, Maria R; Piscuoglio, Salvatore; Macedo, Gabriel S; Gatius, Sonia; Perez Mies, Belen; Soslow, Robert A; Lim, Raymond S; Viale, Agnes; Huberman, Kety H; Palacios, Jose C; Reis-Filho, Jorge S; Matias-Guiu, Xavier; Weigelt, Britta

    2016-06-01

    Synchronous early-stage endometrioid endometrial carcinomas (EECs) and endometrioid ovarian carcinomas (EOCs) are associated with a favorable prognosis and have been suggested to represent independent primary tumors rather than metastatic disease. We subjected sporadic synchronous EECs/EOCs from five patients to whole-exome massively parallel sequencing, which revealed that the EEC and EOC of each case displayed strikingly similar repertoires of somatic mutations and gene copy number alterations. Despite the presence of mutations restricted to the EEC or EOC in each case, we observed that the mutational processes that shaped their respective genomes were consistent. High-depth targeted massively parallel sequencing of sporadic synchronous EECs/EOCs from 17 additional patients confirmed that these lesions are clonally related. In an additional Lynch Syndrome case, however, the EEC and EOC were found to constitute independent cancers lacking somatic mutations in common. Taken together, sporadic synchronous EECs/EOCs are clonally related and likely constitute dissemination from one site to the other. PMID:26832770

  9. New strategies and emerging technologies for massively parallel sequencing: applications in medical research.

    PubMed

    Mardis, Elaine R

    2009-01-01

    A variety of techniques that specifically target human gene sequences for differential capture from a genomic sample, coupled with next-generation, massively parallel DNA sequencing instruments, is rapidly supplanting the combination of polymerase chain reaction and capillary sequencing to discover coding variants in medically relevant samples. These studies are most appropriate for the sample numbers necessary to identify both common and rare single nucleotide variants, as well as small insertion or deletion events, which may cause complex inherited diseases. The same massively parallel sequencers are simultaneously being used for whole-genome resequencing and comprehensive, genome-wide variant discovery in studies of somatic diseases such as cancer. Viral and microbial researchers are using next-generation sequences to identify unknown etiologic agents in human diseases, to study the viral and microbial species that occupy surfaces of the human body, and to inform the clinical management of chronic infectious diseases such as human immunodeficiency virus (HIV). Taken together, these approaches are dramatically accelerating the pace of human disease research and are already impacting patient care. PMID:19435481

  10. Transcriptional analysis of endocrine disruption using zebrafish and massively parallel sequencing

    PubMed Central

    Baker, Michael E.; Hardiman, Gary

    2014-01-01

    Endocrine disrupting chemicals (EDCs) including plasticizers, pesticides, detergents and pharmaceuticals, affect a variety of hormone-regulated physiological pathways in humans and wildlife. Many EDCs are lipophilic molecules and bind to hydrophobic pockets in steroid receptors, such as the estrogen receptor and androgen receptor, which are important in vertebrate reproduction and development. Indeed, health effects attributed to EDCs include reproductive dysfunction (e.g., reduced fertility, reproductive tract abnormalities and skewed male/female sex ratios in fish), early puberty, various cancers and obesity. A major concern is the effects of exposure to low concentrations of endocrine disruptors in utero and post partum, which may increase the incidence of cancer and diabetes in adults. EDCs affect transcription of hundreds and even thousands of genes, which has created the need for new tools to monitor the global effects of EDCs. The emergence of massive parallel sequencing for investigating gene transcription provides a sensitive tool for monitoring the effects of EDCs on humans and other vertebrates as well as elucidating the mechanism of action of EDCs. Zebrafish conserve many developmental pathways found in humans, which makes zebrafish a valuable model system for studying EDCs especially on early organ development because their embryos are translucent. In this article we review recent advances in massive parallel sequencing approaches with a focus on zebrafish. We make the case that zebrafish exposed to EDCs at different stages of development, can provide important insights on EDC effects on human health. PMID:24850832

  11. A massively parallel semi-Lagrangian algorithm for solving the transport equation

    SciTech Connect

    Manson, Russell; Wang, Dali

    2010-01-01

    The scalar transport equation underpins many models employed in science, engineering, technology and business. Application areas include, but are not restricted to, pollution transport, weather forecasting, video analysis and encoding (the optical flow equation), options and stock pricing (the Black-Scholes equation) and spatially explicit ecological models. Unfortunately finding numerical solutions to this equation which are fast and accurate is not trivial. Moreover, finding such numerical algorithms that can be implemented on high performance computer architectures efficiently is challenging. In this paper the authors describe a massively parallel algorithm for solving the advection portion of the transport equation. We present an approach here which is different to that used in most transport models and which we have tried and tested for various scenarios. The approach employs an intelligent domain decomposition based on the vector field of the system equations and thus automatically partitions the computational domain into algorithmically autonomous regions. The solution of a classic pure advection transport problem is shown to be conservative, monotonic and highly accurate at large time steps. Additionally we demonstrate that the algorithm is highly efficient for high performance computer architectures and thus offers a route towards massively parallel application.

  12. MADmap: A Massively Parallel Maximum-Likelihood Cosmic Microwave Background Map-Maker

    SciTech Connect

    Cantalupo, Christopher; Borrill, Julian; Jaffe, Andrew; Kisner, Theodore; Stompor, Radoslaw

    2009-06-09

    MADmap is a software application used to produce maximum-likelihood images of the sky from time-ordered data which include correlated noise, such as those gathered by Cosmic Microwave Background (CMB) experiments. It works efficiently on platforms ranging from small workstations to the most massively parallel supercomputers. Map-making is a critical step in the analysis of all CMB data sets, and the maximum-likelihood approach is the most accurate and widely applicable algorithm; however, it is a computationally challenging task. This challenge will only increase with the next generation of ground-based, balloon-borne and satellite CMB polarization experiments. The faintness of the B-mode signal that these experiments seek to measure requires them to gather enormous data sets. MADmap is already being run on up to O(1011) time samples, O(108) pixels and O(104) cores, with ongoing work to scale to the next generation of data sets and supercomputers. We describe MADmap's algorithm based around a preconditioned conjugate gradient solver, fast Fourier transforms and sparse matrix operations. We highlight MADmap's ability to address problems typically encountered in the analysis of realistic CMB data sets and describe its application to simulations of the Planck and EBEX experiments. The massively parallel and distributed implementation is detailed and scaling complexities are given for the resources required. MADmap is capable of analysing the largest data sets now being collected on computing resources currently available, and we argue that, given Moore's Law, MADmap will be capable of reducing the most massive projected data sets.

  13. ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains.

    PubMed

    Torre, Emiliano; Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz; Grün, Sonja

    2016-07-01

    With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity. PMID:27420734

  14. ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains

    PubMed Central

    Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz

    2016-01-01

    With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity. PMID:27420734

  15. MPI/OpenMP Hybrid Parallel Algorithm of Resolution of Identity Second-Order Møller-Plesset Perturbation Calculation for Massively Parallel Multicore Supercomputers.

    PubMed

    Katouda, Michio; Nakajima, Takahito

    2013-12-10

    A new algorithm for massively parallel calculations of electron correlation energy of large molecules based on the resolution of identity second-order Møller-Plesset perturbation (RI-MP2) technique is developed and implemented into the quantum chemistry software NTChem. In this algorithm, a Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) hybrid parallel programming model is applied to attain efficient parallel performance on massively parallel supercomputers. An in-core storage scheme of intermediate data of three-center electron repulsion integrals utilizing the distributed memory is developed to eliminate input/output (I/O) overhead. The parallel performance of the algorithm is tested on massively parallel supercomputers such as the K computer (using up to 45 992 central processing unit (CPU) cores) and a commodity Intel Xeon cluster (using up to 8192 CPU cores). The parallel RI-MP2/cc-pVTZ calculation of two-layer nanographene sheets (C150H30)2 (number of atomic orbitals is 9640) is performed using 8991 node and 71 288 CPU cores of the K computer. PMID:26592275

  16. A Diffusion-Based and Dynamic 3D-Printed Device That Enables Parallel in Vitro Pharmacokinetic Profiling of Molecules.

    PubMed

    Lockwood, Sarah Y; Meisel, Jayda E; Monsma, Frederick J; Spence, Dana M

    2016-02-01

    The process of bringing a drug to market involves many steps, including the preclinical stage, where various properties of the drug candidate molecule are determined. These properties, which include drug absorption, distribution, metabolism, and excretion, are often displayed in a pharmacokinetic (PK) profile. While PK profiles are determined in animal models, in vitro systems that model in vivo processes are available, although each possesses shortcomings. Here, we present a 3D-printed, diffusion-based, and dynamic in vitro PK device. The device contains six flow channels, each with integrated porous membrane-based insert wells. The pores of these membranes enable drugs to freely diffuse back and forth between the flow channels and the inserts, thus enabling both loading and clearance portions of a standard PK curve to be generated. The device is designed to work with 96-well plate technology and consumes single-digit milliliter volumes to generate multiple PK profiles, simultaneously. Generation of PK profiles by use of the device was initially performed with fluorescein as a test molecule. Effects of such parameters as flow rate, loading time, volume in the insert well, and initial concentration of the test molecule were investigated. A prediction model was generated from this data, enabling the user to predict the concentration of the test molecule at any point along the PK profile within a coefficient of variation of ∼ 5%. Depletion of the analyte from the well was characterized and was determined to follow first-order rate kinetics, indicated by statistically equivalent (p > 0.05) depletion half-lives that were independent of the starting concentration. A PK curve for an approved antibiotic, levofloxacin, was generated to show utility beyond the fluorescein test molecule. PMID:26727249

  17. A 3D point-kernel multiple scatter model for parallel-beam SPECT based on a gamma-ray buildup factor

    NASA Astrophysics Data System (ADS)

    Marinkovic, Predrag; Ilic, Radovan; Spaic, Rajko

    2007-09-01

    A three-dimensional (3D) point-kernel multiple scatter model for point spread function (PSF) determination in parallel-beam single-photon emission computed tomography (SPECT), based on a dose gamma-ray buildup factor, is proposed. This model embraces nonuniform attenuation in a voxelized object of imaging (patient body) and multiple scattering that is treated as in the point-kernel integration gamma-ray shielding problems. First-order Compton scattering is done by means of the Klein-Nishina formula, but the multiple scattering is accounted for by making use of a dose buildup factor. An asset of the present model is the possibility of generating a complete two-dimensional (2D) PSF that can be used for 3D SPECT reconstruction by means of iterative algorithms. The proposed model is convenient in those situations where more exact techniques are not economical. For the proposed model's testing purpose calculations (for the point source in a nonuniform scattering object for parallel beam collimator geometry), the multiple-order scatter PSF generated by means of the proposed model matched well with those using Monte Carlo (MC) simulations. Discrepancies are observed only at the exponential tails mostly due to the high statistic uncertainty of MC simulations in this area, but not because of the inappropriateness of the model.

  18. Compact Graph Representations and Parallel Connectivity Algorithms for Massive Dynamic Network Analysis

    SciTech Connect

    Madduri, Kamesh; Bader, David A.

    2009-02-15

    Graph-theoretic abstractions are extensively used to analyze massive data sets. Temporal data streams from socioeconomic interactions, social networking web sites, communication traffic, and scientific computing can be intuitively modeled as graphs. We present the first study of novel high-performance combinatorial techniques for analyzing large-scale information networks, encapsulating dynamic interaction data in the order of billions of entities. We present new data structures to represent dynamic interaction networks, and discuss algorithms for processing parallel insertions and deletions of edges in small-world networks. With these new approaches, we achieve an average performance rate of 25 million structural updates per second and a parallel speedup of nearly28 on a 64-way Sun UltraSPARC T2 multicore processor, for insertions and deletions to a small-world network of 33.5 million vertices and 268 million edges. We also design parallel implementations of fundamental dynamic graph kernels related to connectivity and centrality queries. Our implementations are freely distributed as part of the open-source SNAP (Small-world Network Analysis and Partitioning) complex network analysis framework.

  19. Microfluidic Reactor Array Device for Massively Parallel In-situ Synthesis of Oligonucleotides

    PubMed Central

    Srivannavit, Onnop; Gulari, Mayurachat; Hua, Zhishan.; Gao, Xiaolian; Zhou, Xiaochuan; Hong, Ailing; Zhou, Tiecheng; Gulari, Erdogan

    2009-01-01

    We have designed and fabricated a microfluidic reactor array device for massively parallel in-situ synthesis of oligonucleotides (oDNA). The device is made of glass anodically bonded to silicon consisting of three level features: microreactors, microchannels and through inlet/outlet holes. Main challenges in the design of this device include preventing diffusion of photogenerated reagents upon activation and achieving uniform reagent flow through thousands of parallel reactors. The device embodies a simple and effective dynamic isolation mechanism which prevents the intermixing of active reagents between discrete microreactors. Depending on the design parameters, it is possible to achieve uniform flow and synthesis reaction in all of the reactors by proper design of the microreactors and the microchannels. We demonstrated the use of this device on a solution-based, light-directed parallel in-situ oDNA synthesis. We were able to synthesize long oDNA, up to 120 mers at stepwise yield of 98 %. The quality of our microfluidic oDNA microarray including sensitivity, signal noise, specificity, spot variation and accuracy was characterized. Our microfluidic reactor array devices show a great potential for genomics and proteomics researches. PMID:20161215

  20. PMESH: A parallel mesh generator

    SciTech Connect

    Hardin, D.D.

    1994-10-21

    The Parallel Mesh Generation (PMESH) Project is a joint LDRD effort by A Division and Engineering to develop a unique mesh generation system that can construct large calculational meshes (of up to 10{sup 9} elements) on massively parallel computers. Such a capability will remove a critical roadblock to unleashing the power of massively parallel processors (MPPs) for physical analysis. PMESH will support a variety of LLNL 3-D physics codes in the areas of electromagnetics, structural mechanics, thermal analysis, and hydrodynamics.

  1. Implementation of Helioseismic Data Reduction and Diagnostic Techniques on Massively Parallel Architectures

    NASA Technical Reports Server (NTRS)

    Korzennik, Sylvain

    1997-01-01

    Under the direction of Dr. Rhodes, and the technical supervision of Dr. Korzennik, the data assimilation of high spatial resolution solar dopplergrams has been carried out throughout the program on the Intel Delta Touchstone supercomputer. With the help of a research assistant, partially supported by this grant, and under the supervision of Dr. Korzennik, code development was carried out at SAO, using various available resources. To ensure cross-platform portability, PVM was selected as the message passing library. A parallel implementation of power spectra computation for helioseismology data reduction, using PVM was successfully completed. It was successfully ported to SMP architectures (i.e. SUN), and to some MPP architectures (i.e. the CM5). Due to limitation of the implementation of PVM on the Cray T3D, the port to that architecture was not completed at the time.

  2. Massively Parallel Interrogation of the Effects of Gene Expression Levels on Fitness.

    PubMed

    Keren, Leeat; Hausser, Jean; Lotan-Pompan, Maya; Vainberg Slutskin, Ilya; Alisar, Hadas; Kaminski, Sivan; Weinberger, Adina; Alon, Uri; Milo, Ron; Segal, Eran

    2016-08-25

    Data of gene expression levels across individuals, cell types, and disease states is expanding, yet our understanding of how expression levels impact phenotype is limited. Here, we present a massively parallel system for assaying the effect of gene expression levels on fitness in Saccharomyces cerevisiae by systematically altering the expression level of ∼100 genes at ∼100 distinct levels spanning a 500-fold range at high resolution. We show that the relationship between expression levels and growth is gene and environment specific and provides information on the function, stoichiometry, and interactions of genes. Wild-type expression levels in some conditions are not optimal for growth, and genes whose fitness is greatly affected by small changes in expression level tend to exhibit lower cell-to-cell variability in expression. Our study addresses a fundamental gap in understanding the functional significance of gene expression regulation and offers a framework for evaluating the phenotypic effects of expression variation. PMID:27545349

  3. The sensitivity of massively parallel sequencing for detecting candidate infectious agents associated with human tissue.

    PubMed

    Moore, Richard A; Warren, René L; Freeman, J Douglas; Gustavsen, Julia A; Chénard, Caroline; Friedman, Jan M; Suttle, Curtis A; Zhao, Yongjun; Holt, Robert A

    2011-01-01

    Massively parallel sequencing technology now provides the opportunity to sample the transcriptome of a given tissue comprehensively. Transcripts at only a few copies per cell are readily detectable, allowing the discovery of low abundance viral and bacterial transcripts in human tissue samples. Here we describe an approach for mining large sequence data sets for the presence of microbial sequences. Further, we demonstrate the sensitivity of this approach by sequencing human RNA-seq libraries spiked with decreasing amounts of an RNA-virus. At a modest depth of sequencing, viral transcripts can be detected at frequencies less than 1 in 1,000,000. With current sequencing platforms approaching outputs of one billion reads per run, this is a highly sensitive method for detecting putative infectious agents associated with human tissues. PMID:21603639

  4. Demonstration of EDA flow for massively parallel e-beam lithography

    NASA Astrophysics Data System (ADS)

    Brandt, P.; Belledent, J.; Tranquillin, C.; Figueiro, T.; Meunier, S.; Bayle, S.; Fay, A.; Milléquant, M.; Icard, B.; Wieland, M.

    2014-03-01

    Today's soaring complexity in pushing the limits of 193nm immersion lithography drives the development of other technologies. One of these alternatives is mask-less massively parallel electron beam lithography, (MP-EBL), a promising candidate in which future resolution needs can be fulfilled at competitive cost. MAPPER Lithography's MATRIX MP-EBL platform has currently entered an advanced stage of development. The first tool in this platform, the FLX 1200, will operate using more than 1,300 beams, each one writing a stripe 2.2μm wide. 0.2μm overlap from stripe to stripe is allocated for stitching. Each beam is composed of 49 individual sub-beams that can be blanked independently in order to write in a raster scan pixels onto the wafer.

  5. Simulating massively parallel electron beam inspection for sub-20 nm defects

    NASA Astrophysics Data System (ADS)

    Bunday, Benjamin D.; Mukhtar, Maseeh; Quoi, Kathy; Thiel, Brad; Malloy, Matt

    2015-03-01

    SEMATECH has initiated a program to develop massively-parallel electron beam defect inspection (MPEBI). Here we use JMONSEL simulations to generate expected imaging responses of chosen test cases of patterns and defects with ability to vary parameters for beam energy, spot size, pixel size, and/or defect material and form factor. The patterns are representative of the design rules for an aggressively-scaled FinFET-type design. With these simulated images and resulting shot noise, a signal-to-noise framework is developed, which relates to defect detection probabilities. Additionally, with this infrastructure the effect of detection chain noise and frequency dependent system response can be made, allowing for targeting of best recipe parameters for MPEBI validation experiments, ultimately leading to insights into how such parameters will impact MPEBI tool design, including necessary doses for defect detection and estimations of scanning speeds for achieving high throughput for HVM.

  6. A Massively Parallel Sparse Eigensolver for Structural Dynamics Finite Element Analysis

    SciTech Connect

    Day, David M.; Reese, G.M.

    1999-05-01

    Eigenanalysis is a critical component of structural dynamics which is essential for determinating the vibrational response of systems. This effort addresses the development of numerical algorithms associated with scalable eigensolver techniques suitable for use on massively parallel, distributed memory computers that are capable of solving large scale structural dynamics problems. An iterative Lanczos method was determined to be the best choice for the application. Scalability of the eigenproblem depends on scalability of the underlying linear solver. A multi-level solver (FETI) was selected as most promising for this component. Issues relating to heterogeneous materials, mechanisms and multipoint constraints have been examined, and the linear solver algorithm has been developed to incorporate features that result in a scalable, robust algorithm for practical structural dynamics applications. The resulting tools have been demonstrated on large problems representative of a weapon's system.

  7. Inside the intraterrestrials: The deep biosphere seen through massively parallel sequencing

    NASA Astrophysics Data System (ADS)

    Biddle, J.

    2009-12-01

    Deeply buried marine sediments may house a large amount of the Earth’s microbial population. Initial studies based on 16S rRNA clone libraries suggest that these sediments contain unique phylotypes of microorganisms, particularly from the archaeal domain. Since this environment is so difficult to study, microbiologists are challenged to find ways to examine these populations remotely. A major approach taken to study this environment uses massively parallel sequencing to examine the inner genetic workings of these microorganisms after the sediment has been drilled. Both metagenomics and tagged amplicon sequencing have been employed on deep sediments, and initial results show that different geographic regions can be differentiated through genomics and also minor populations may cause major geochemical changes.

  8. Macro-scale phenomena of arterial coupled cells: a massively parallel simulation

    PubMed Central

    Shaikh, Mohsin Ahmed; Wall, David J. N.; David, Tim

    2012-01-01

    Impaired mass transfer characteristics of blood-borne vasoactive species such as adenosine triphosphate in regions such as an arterial bifurcation have been hypothesized as a prospective mechanism in the aetiology of atherosclerotic lesions. Arterial endothelial cells (ECs) and smooth muscle cells (SMCs) respond differentially to altered local haemodynamics and produce coordinated macro-scale responses via intercellular communication. Using a computationally designed arterial segment comprising large populations of mathematically modelled coupled ECs and SMCs, we investigate their response to spatial gradients of blood-borne agonist concentrations and the effect of micro-scale-driven perturbation on the macro-scale. Altering homocellular (between same cell type) and heterocellular (between different cell types) intercellular coupling, we simulated four cases of normal and pathological arterial segments experiencing an identical gradient in the concentration of the agonist. Results show that the heterocellular calcium (Ca2+) coupling between ECs and SMCs is important in eliciting a rapid response when the vessel segment is stimulated by the agonist gradient. In the absence of heterocellular coupling, homocellular Ca2+ coupling between SMCs is necessary for propagation of Ca2+ waves from downstream to upstream cells axially. Desynchronized intracellular Ca2+ oscillations in coupled SMCs are mandatory for this propagation. Upon decoupling the heterocellular membrane potential, the arterial segment looses the inhibitory effect of ECs on the Ca2+ dynamics of the underlying SMCs. The full system comprises hundreds of thousands of coupled nonlinear ordinary differential equations simulated on the massively parallel Blue Gene architecture. The use of massively parallel computational architectures shows the capability of this approach to address macro-scale phenomena driven by elementary micro-scale components of the system. PMID:21920960

  9. Massively parallel cis-regulatory analysis in the mammalian central nervous system

    PubMed Central

    Shen, Susan Q.; Myers, Connie A.; Hughes, Andrew E.O.; Byrne, Leah C.; Flannery, John G.; Corbo, Joseph C.

    2016-01-01

    Cis-regulatory elements (CREs, e.g., promoters and enhancers) regulate gene expression, and variants within CREs can modulate disease risk. Next-generation sequencing has enabled the rapid generation of genomic data that predict the locations of CREs, but a bottleneck lies in functionally interpreting these data. To address this issue, massively parallel reporter assays (MPRAs) have emerged, in which barcoded reporter libraries are introduced into cells, and the resulting barcoded transcripts are quantified by next-generation sequencing. Thus far, MPRAs have been largely restricted to assaying short CREs in a limited repertoire of cultured cell types. Here, we present two advances that extend the biological relevance and applicability of MPRAs. First, we adapt exome capture technology to instead capture candidate CREs, thereby tiling across the targeted regions and markedly increasing the length of CREs that can be readily assayed. Second, we package the library into adeno-associated virus (AAV), thereby allowing delivery to target organs in vivo. As a proof of concept, we introduce a capture library of about 46,000 constructs, corresponding to roughly 3500 DNase I hypersensitive (DHS) sites, into the mouse retina by ex vivo plasmid electroporation and into the mouse cerebral cortex by in vivo AAV injection. We demonstrate tissue-specific cis-regulatory activity of DHSs and provide examples of high-resolution truncation mutation analysis for multiplex parsing of CREs. Our approach should enable massively parallel functional analysis of a wide range of CREs in any organ or species that can be infected by AAV, such as nonhuman primates and human stem cell–derived organoids. PMID:26576614

  10. Architecture for next-generation massively parallel maskless lithography system (MPML2)

    NASA Astrophysics Data System (ADS)

    Su, Ming-Shing; Tsai, Kuen-Yu; Lu, Yi-Chang; Kuo, Yu-Hsuan; Pei, Ting-Hang; Yen, Jia-Yush

    2010-03-01

    Electron-beam lithography is promising for future manufacturing technology because it does not suffer from wavelength limits set by light sources. Since single electron-beam lithography systems have a common problem in throughput, a multi-electron-beam lithography (MEBL) system should be a feasible alternative using the concept of massive parallelism. In this paper, we evaluate the advantages and the disadvantages of different MEBL system architectures, and propose our novel Massively Parallel MaskLess Lithography System, MPML2. MPML2 system is targeting for cost-effective manufacturing at the 32nm node and beyond. The key structure of the proposed system is its beamlet array cells (BACs). Hundreds of BACs are uniformly arranged over the whole wafer area in the proposed system. Each BAC has a data processor and an array of beamlets, and each beamlet consists of an electron-beam source, a source controller, a set of electron lenses, a blanker, a deflector, and an electron detector. These essential parts of beamlets are integrated using MEMS technology, which increases the density of beamlets and reduces the system cost. The data processor in the BAC processes layout information coming off-chamber and dispatches them to the corresponding beamlet to control its ON/OFF status. High manufacturing cost of masks is saved in maskless lithography systems, however, immense mask data are needed to be handled and transmitted. Therefore, data compression technique is applied to reduce required transmission bandwidth. The compression algorithm is fast and efficient so that the real-time decoder can be implemented on-chip. Consequently, the proposed MPML2 can achieve 10 wafers per hour (WPH) throughput for 300mm-wafer systems.

  11. Massively parallel simulation with DOE's ASCI supercomputers : an overview of the Los Alamos Crestone project

    SciTech Connect

    Weaver, R. P.; Gittings, M. L.

    2004-01-01

    The Los Alamos Crestone Project is part of the Department of Energy's (DOE) Accelerated Strategic Computing Initiative, or ASCI Program. The main goal of this software development project is to investigate the use of continuous adaptive mesh refinement (CAMR) techniques for application to problems of interest to the Laboratory. There are many code development efforts in the Crestone Project, both unclassified and classified codes. In this overview I will discuss the unclassified SAGE and the RAGE codes. The SAGE (SAIC adaptive grid Eulerian) code is a one-, two-, and three-dimensional multimaterial Eulerian massively parallel hydrodynamics code for use in solving a variety of high-deformation flow problems. The RAGE CAMR code is built from the SAGE code by adding various radiation packages, improved setup utilities and graphics packages and is used for problems in which radiation transport of energy is important. The goal of these massively-parallel versions of the codes is to run extremely large problems in a reasonable amount of calendar time. Our target is scalable performance to {approx}10,000 processors on a 1 billion CAMR computational cell problem that requires hundreds of variables per cell, multiple physics packages (e.g. radiation and hydrodynamics), and implicit matrix solves for each cycle. A general description of the RAGE code has been published in [l],[ 2], [3] and [4]. Currently, the largest simulations we do are three-dimensional, using around 500 million computation cells and running for literally months of calendar time using {approx}2000 processors. Current ASCI platforms range from several 3-teraOPS supercomputers to one 12-teraOPS machine at Lawrence Livermore National Laboratory, the White machine, and one 20-teraOPS machine installed at Los Alamos, the Q machine. Each machine is a system comprised of many component parts that must perform in unity for the successful run of these simulations. Key features of any massively parallel system

  12. Development of the 3D Parallel Particle-In-Cell Code IMPACT to Simulate the Ion Beam Transport System of VENUS (Abstract)

    SciTech Connect

    Qiang, J.; Leitner, D.; Todd, D.S.; Ryne, R.D.

    2005-03-15

    The superconducting ECR ion source VENUS serves as the prototype injector ion source for the Rare Isotope Accelerator (RIA) driver linac. The RIA driver linac requires a great variety of high charge state ion beams with up to an order of magnitude higher intensity than currently achievable with conventional ECR ion sources. In order to design the beam line optics of the low energy beam line for the RIA front end for the wide parameter range required for the RIA driver accelerator, reliable simulations of the ion beam extraction from the ECR ion source through the ion mass analyzing system are essential. The RIA low energy beam transport line must be able to transport intense beams (up to 10 mA) of light and heavy ions at 30 keV.For this purpose, LBNL is developing the parallel 3D particle-in-cell code IMPACT to simulate the ion beam transport from the ECR extraction aperture through the analyzing section of the low energy transport system. IMPACT, a parallel, particle-in-cell code, is currently used to model the superconducting RF linac section of RIA and is being modified in order to simulate DC beams from the ECR ion source extraction. By using the high performance of parallel supercomputing we will be able to account consistently for the changing space charge in the extraction region and the analyzing section. A progress report and early results in the modeling of the VENUS source will be presented.

  13. Development of the 3D Parallel Particle-In-Cell Code IMPACT to Simulate the Ion Beam Transport System of VENUS (Abstract)

    NASA Astrophysics Data System (ADS)

    Qiang, J.; Leitner, D.; Todd, D. S.; Ryne, R. D.

    2005-03-01

    The superconducting ECR ion source VENUS serves as the prototype injector ion source for the Rare Isotope Accelerator (RIA) driver linac. The RIA driver linac requires a great variety of high charge state ion beams with up to an order of magnitude higher intensity than currently achievable with conventional ECR ion sources. In order to design the beam line optics of the low energy beam line for the RIA front end for the wide parameter range required for the RIA driver accelerator, reliable simulations of the ion beam extraction from the ECR ion source through the ion mass analyzing system are essential. The RIA low energy beam transport line must be able to transport intense beams (up to 10 mA) of light and heavy ions at 30 keV. For this purpose, LBNL is developing the parallel 3D particle-in-cell code IMPACT to simulate the ion beam transport from the ECR extraction aperture through the analyzing section of the low energy transport system. IMPACT, a parallel, particle-in-cell code, is currently used to model the superconducting RF linac section of RIA and is being modified in order to simulate DC beams from the ECR ion source extraction. By using the high performance of parallel supercomputing we will be able to account consistently for the changing space charge in the extraction region and the analyzing section. A progress report and early results in the modeling of the VENUS source will be presented.

  14. A practical approach to portability and performance problems on massively parallel supercomputers

    SciTech Connect

    Beazley, D.M.; Lomdahl, P.S.

    1994-12-08

    We present an overview of the tactics we have used to achieve a high-level of performance while improving portability for a large-scale molecular dynamics code SPaSM. SPaSM was originally implemented in ANSI C with message passing for the Connection Machine 5 (CM-5). In 1993, SPaSM was selected as one of the winners in the IEEE Gordon Bell Prize competition for sustaining 50 Gflops on the 1024 node CM-5 at Los Alamos National Laboratory. Achieving this performance on the CM-5 required rewriting critical sections of code in CDPEAC assembler language. In addition, the code made extensive use of CM-5 parallel I/O and the CMMD message passing library. Given this highly specialized implementation, we describe how we have ported the code to the Cray T3D and high performance workstations. In addition we will describe how it has been possible to do this using a single version of source code that runs on all three platforms without sacrificing any performance. Sound too good to be true? We hope to demonstrate that one can realize both code performance and portability without relying on the latest and greatest prepackaged tool or parallelizing compiler.

  15. Integration Architecture of Content Addressable Memory and Massive-Parallel Memory-Embedded SIMD Matrix for Versatile Multimedia Processor

    NASA Astrophysics Data System (ADS)

    Kumaki, Takeshi; Ishizaki, Masakatsu; Koide, Tetsushi; Mattausch, Hans Jürgen; Kuroda, Yasuto; Gyohten, Takayuki; Noda, Hideyuki; Dosaka, Katsumi; Arimoto, Kazutami; Saito, Kazunori

    This paper presents an integration architecture of content addressable memory (CAM) and a massive-parallel memory-embedded SIMD matrix for constructing a versatile multimedia processor. The massive-parallel memory-embedded SIMD matrix has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. The SIMD matrix architecture is verified to be a better way for processing the repeated arithmetic operation types in multimedia applications. The proposed architecture, reported in this paper, exploits in addition CAM technology and enables therefore fast pipelined table-lookup coding operations. Since both arithmetic and table-lookup operations execute extremely fast, the proposed novel architecture can realize consequently efficient and versatile multimedia data processing. Evaluation results of the proposed CAM-enhanced massive-parallel SIMD matrix processor for the example of the frequently used JPEG image-compression application show that the necessary clock cycle number can be reduced by 86% in comparison to a conventional mobile DSP architecture. The determined performances in Mpixel/mm2 are factors 3.3 and 4.4 better than with a CAM-less massive-parallel memory-embedded SIMD matrix processor and a conventional mobile DSP, respectively.

  16. Massive Exploration of Perturbed Conditions of the Blood Coagulation Cascade through GPU Parallelization

    PubMed Central

    Cazzaniga, Paolo; Nobile, Marco S.; Besozzi, Daniela; Bellini, Matteo; Mauri, Giancarlo

    2014-01-01

    The introduction of general-purpose Graphics Processing Units (GPUs) is boosting scientific applications in Bioinformatics, Systems Biology, and Computational Biology. In these fields, the use of high-performance computing solutions is motivated by the need of performing large numbers of in silico analysis to study the behavior of biological systems in different conditions, which necessitate a computing power that usually overtakes the capability of standard desktop computers. In this work we present coagSODA, a CUDA-powered computational tool that was purposely developed for the analysis of a large mechanistic model of the blood coagulation cascade (BCC), defined according to both mass-action kinetics and Hill functions. coagSODA allows the execution of parallel simulations of the dynamics of the BCC by automatically deriving the system of ordinary differential equations and then exploiting the numerical integration algorithm LSODA. We present the biological results achieved with a massive exploration of perturbed conditions of the BCC, carried out with one-dimensional and bi-dimensional parameter sweep analysis, and show that GPU-accelerated parallel simulations of this model can increase the computational performances up to a 181× speedup compared to the corresponding sequential simulations. PMID:25025072

  17. Practical Realization of Massively Parallel Fiber -Free-Space Optical Interconnects

    NASA Astrophysics Data System (ADS)

    Gruber, Matthias; Jahns, Jürgen; El Joudi, El Mehdi; Sinzinger, Stefan

    2001-06-01

    We propose a novel approach to realizing massively parallel optical interconnects based on commercially available multifiber ribbons with MT-type connectors and custom-designed planar-integrated free-space components. It combines the advantages of fiber optics, that is, a long range and convenient and flexible installation, with those of (planar-integrated) free-space optics, that is, a wide range of implementable functions and a high potential for integration and parallelization. For the interface between fibers and free-space optical systems a low-cost practical solution is presented. It consists of using a metal connector plate that was manufactured on a computer-controlled milling machine. Channel densities are of the order of 100 /mm2 between optoelectronic VLSI chips and the free-space optical systems and 1 /mm2 between the free-space optical systems and MT-type fiber connectors. Experiments in combination with specially designed planar-integrated test systems prove that multiple one-to-one and one-to-many interconnects can be established with not more than 10% uniformity error.

  18. Measures of effectiveness for BMD mid-course tracking on MIMD massively parallel computers

    SciTech Connect

    VanDyke, J.P.; Tomkins, J.L.; Furnish, M.D.

    1995-05-01

    The TRC code, a mid-course tracking code for ballistic missiles, has previously been implemented on a 1024-processor MIMD (Multiple Instruction -- Multiple Data) massively parallel computer. Measures of Effectiveness (MOE) for this algorithm have been developed for this computing environment. The MOE code is run in parallel with the TRC code. Particularly useful MOEs include the number of missed objects (real objects for which the TRC algorithm did not construct a track); of ghost tracks (tracks not corresponding to a real object); of redundant tracks (multiple tracks corresponding to a single real object); and of unresolved objects (multiple objects corresponding to a single track). All of these are expressed as a function of time, and tend to maximize during the time in which real objects are spawned (multiple reentry vehicles per post-boost vehicle). As well, it is possible to measure the track-truth separation as a function of time. A set of calculations is presented illustrating these MOEs as a function of time for a case with 99 post-boost vehicles, each of which spawns 9 reentry vehicles.

  19. Massive parallel IGHV gene sequencing reveals a germinal center pathway in origins of human multiple myeloma

    PubMed Central

    Bryant, Dean; Seckinger, Anja; Hose, Dirk; Zojer, Niklas; Sahota, Surinder S.

    2015-01-01

    Human multiple myeloma (MM) is characterized by accumulation of malignant terminally differentiated plasma cells (PCs) in the bone marrow (BM), raising the question when during maturation neoplastic transformation begins. Immunoglobulin IGHV genes carry imprints of clonal tumor history, delineating somatic hypermutation (SHM) events that generally occur in the germinal center (GC). Here, we examine MM-derived IGHV genes using massive parallel deep sequencing, comparing them with profiles in normal BM PCs. In 4/4 presentation IgG MM, monoclonal tumor-derived IGHV sequences revealed significant evidence for intraclonal variation (ICV) in mutation patterns. IGHV sequences of 2/2 normal PC IgG populations revealed dominant oligoclonal expansions, each expansion also displaying mutational ICV. Clonal expansions in MM and in normal BM PCs reveal common IGHV features. In such MM, the data fit a model of tumor origins in which neoplastic transformation is initiated in a GC B-cell committed to terminal differentiation but still targeted by on-going SHM. Strikingly, the data parallel IGHV clonal sequences in some monoclonal gammopathy of undetermined significance (MGUS) known to display on-going SHM imprints. Since MGUS generally precedes MM, these data suggest origins of MGUS and MM with IGHV gene mutational ICV from the same GC B-cell, arising via a distinctive pathway. PMID:25929340

  20. The divide-expand-consolidate MP2 scheme goes massively parallel

    NASA Astrophysics Data System (ADS)

    Kristensen, Kasper; Kjærgaard, Thomas; Høyvik, Ida-Marie; Ettenhuber, Patrick; Jørgensen, Poul; Jansik, Branislav; Reine, Simen; Jakowski, Jacek

    2013-07-01

    For large molecular systems conventional implementations of second order Møller-Plesset (MP2) theory encounter a scaling wall, both memory- and time-wise. We describe how this scaling wall can be removed. We present a massively parallel algorithm for calculating MP2 energies and densities using the divide-expand-consolidate scheme where a calculation on a large system is divided into many small fragment calculations employing local orbital spaces. The resulting algorithm is linear-scaling with system size, exhibits near perfect parallel scalability, removes memory bottlenecks and does not involve any I/O. The algorithm employs three levels of parallelisation combined via a dynamic job distribution scheme. Results on two molecular systems containing 528 and 1056 atoms (4278 and 8556 basis functions) using 47,120 and 94,240 cores are presented. The results demonstrate the scalability of the algorithm both with respect to the number of cores and with respect to system size. The presented algorithm is thus highly suited for large super computer architectures and allows MP2 calculations on large molecular systems to be carried out within a few hours - for example, the correlated calculation on the molecular system containing 1056 atoms took 2.37 hours using 94240 cores.

  1. Rigid body constraints realized in massively-parallel molecular dynamics on graphics processing units

    NASA Astrophysics Data System (ADS)

    Nguyen, Trung Dac; Phillips, Carolyn L.; Anderson, Joshua A.; Glotzer, Sharon C.

    2011-11-01

    Molecular dynamics (MD) methods compute the trajectory of a system of point particles in response to a potential function by numerically integrating Newton's equations of motion. Extending these basic methods with rigid body constraints enables composite particles with complex shapes such as anisotropic nanoparticles, grains, molecules, and rigid proteins to be modeled. Rigid body constraints are added to the GPU-accelerated MD package, HOOMD-blue, version 0.10.0. The software can now simulate systems of particles, rigid bodies, or mixed systems in microcanonical (NVE), canonical (NVT), and isothermal-isobaric (NPT) ensembles. It can also apply the FIRE energy minimization technique to these systems. In this paper, we detail the massively parallel scheme that implements these algorithms and discuss how our design is tuned for the maximum possible performance. Two different case studies are included to demonstrate the performance attained, patchy spheres and tethered nanorods. In typical cases, HOOMD-blue on a single GTX 480 executes 2.5-3.6 times faster than LAMMPS executing the same simulation on any number of CPU cores in parallel. Simulations with rigid bodies may now be run with larger systems and for longer time scales on a single workstation than was previously even possible on large clusters.

  2. A two-phase thermal model for subsurface transport on massively parallel computers

    SciTech Connect

    Martinez, M.J.; Hopkins, P.L.

    1997-12-01

    Many research activities in subsurface transport require the numerical simulation of multiphase flow in porous media. This capability is critical to research in environmental remediation (e.g. contaminations with dense, non-aqueous-phase liquids), nuclear waste management, reservoir engineering, and to the assessment of the future availability of groundwater in many parts of the world. This paper presents an unstructured grid numerical algorithm for subsurface transport in heterogeneous porous media implemented for use on massively parallel (MP) computers. The mathematical model considers nonisothermal two-phase (liquid/gas) flow, including capillary pressure effects, binary diffusion in the gas phase, conductive, latent, and sensible heat transport. The Galerkin finite element method is used for spatial discretization, and temporal integration is accomplished via a predictor/corrector scheme. Message-passing and domain decomposition techniques are used for implementing a scalable algorithm for distributed memory parallel computers. Illustrative applications are shown to demonstrate capabilities and performance, one of which is modeling hydrothermal transport at the Yucca Mountain site for a radioactive waste facility.

  3. GPAW - massively parallel electronic structure calculations with Python-based software.

    SciTech Connect

    Enkovaara, J.; Romero, N.; Shende, S.; Mortensen, J.

    2011-01-01

    Electronic structure calculations are a widely used tool in materials science and large consumer of supercomputing resources. Traditionally, the software packages for these kind of simulations have been implemented in compiled languages, where Fortran in its different versions has been the most popular choice. While dynamic, interpreted languages, such as Python, can increase the effciency of programmer, they cannot compete directly with the raw performance of compiled languages. However, by using an interpreted language together with a compiled language, it is possible to have most of the productivity enhancing features together with a good numerical performance. We have used this approach in implementing an electronic structure simulation software GPAW using the combination of Python and C programming languages. While the chosen approach works well in standard workstations and Unix environments, massively parallel supercomputing systems can present some challenges in porting, debugging and profiling the software. In this paper we describe some details of the implementation and discuss the advantages and challenges of the combined Python/C approach. We show that despite the challenges it is possible to obtain good numerical performance and good parallel scalability with Python based software.

  4. Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array

    NASA Astrophysics Data System (ADS)

    Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul

    2008-04-01

    This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.

  5. A Lightweight Remote Parallel Visualization Platform for Interactive Massive Time-varying Climate Data Analysis

    NASA Astrophysics Data System (ADS)

    Li, J.; Zhang, T.; Huang, Q.; Liu, Q.

    2014-12-01

    Today's climate datasets are featured with large volume, high degree of spatiotemporal complexity and evolving fast overtime. As visualizing large volume distributed climate datasets is computationally intensive, traditional desktop based visualization applications fail to handle the computational intensity. Recently, scientists have developed remote visualization techniques to address the computational issue. Remote visualization techniques usually leverage server-side parallel computing capabilities to perform visualization tasks and deliver visualization results to clients through network. In this research, we aim to build a remote parallel visualization platform for visualizing and analyzing massive climate data. Our visualization platform was built based on Paraview, which is one of the most popular open source remote visualization and analysis applications. To further enhance the scalability and stability of the platform, we have employed cloud computing techniques to support the deployment of the platform. In this platform, all climate datasets are regular grid data which are stored in NetCDF format. Three types of data access methods are supported in the platform: accessing remote datasets provided by OpenDAP servers, accessing datasets hosted on the web visualization server and accessing local datasets. Despite different data access methods, all visualization tasks are completed at the server side to reduce the workload of clients. As a proof of concept, we have implemented a set of scientific visualization methods to show the feasibility of the platform. Preliminary results indicate that the framework can address the computation limitation of desktop based visualization applications.

  6. Automation of Molecular-Based Analyses: A Primer on Massively Parallel Sequencing

    PubMed Central

    Nguyen, Lan; Burnett, Leslie

    2014-01-01

    Recent advances in genetics have been enabled by new genetic sequencing techniques called massively parallel sequencing (MPS) or next-generation sequencing. Through the ability to sequence in parallel hundreds of thousands to millions of DNA fragments, the cost and time required for sequencing has dramatically decreased. There are a number of different MPS platforms currently available and being used in Australia. Although they differ in the underlying technology involved, their overall processes are very similar: DNA fragmentation, adaptor ligation, immobilisation, amplification, sequencing reaction and data analysis. MPS is being used in research, translational and increasingly now also in clinical settings. Common applications include sequencing of whole genomes, whole exomes or targeted genes for disease-causing gene discovery, genetic diagnosis and targeted cancer therapy. Even though the revolution that is occurring with MPS is exciting due to its increasing use, improving and emerging technologies and new applications, significant challenges still exist. Particularly challenging issues are the bioinformatics required for data analysis, interpretation of results and the ethical dilemma of ‘incidental findings’. PMID:25336762

  7. A Fast Parallel Simulation Code for Interaction between Proto-Planetary Disk and Embedded Proto-Planets: Implementation for 3D Code

    SciTech Connect

    Li, Shengtai; Li, Hui

    2012-06-14

    the position of the planet, we adopt the corotating frame that allows the planet moving only in radial direction if only one planet is present. This code has been extensively tested on a number of problems. For the earthmass planet with constant aspect ratio h = 0.05, the torque calculated using our code matches quite well with the the 3D linear theory results by Tanaka et al. (2002). The code is fully parallelized via message-passing interface (MPI) and has very high parallel efficiency. Several numerical examples for both fixed planet and moving planet are provided to demonstrate the efficacy of the numerical method and code.

  8. A task-based parallelism and vectorized approach to 3D Method of Characteristics (MOC) reactor simulation for high performance computing architectures

    NASA Astrophysics Data System (ADS)

    Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.

    2016-05-01

    In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.

  9. 3-D Spherical Mantle Convection Simulations with Billions of Unknowns on the Yin-Yang Grid Using StagYY: Parallelization and Scaling (Invited)

    NASA Astrophysics Data System (ADS)

    Tackley, P. J.

    2013-12-01

    StagYY is a well-established code for modelling mantle convection in 3D spherical geometry (Tackley, PEPI 2008), incorporating several physical complexities such as compressibility, phase transitions, compositional variations, strongly temperature-dependent, non-linear rheology, tracers to track composition, continents, partial melting and melt migration. It uses a finite volume discretization (primitive variables on a staggered grid) on the yin-yang spherical grid (minimum overlap version). Geometric multigrid is used for simultaneous solution of the Stokes and mass conservation equations. Here, parallelization using MPI is discussed, and performance and scaling of the current StagYY version on up to 4096 cores on grids of up to 768x2304x512x2 cells (1.8 billion, corresponding to 7.2 billion unknowns) is demonstrated. Complexities related to scaling further to 100,000s to millions of cores are discussed together with possible solutions and performance projections.

  10. Numerical Simulation of 3D Hydraulic Fracturing Based on an Improved Flow-Stress-Damage Model and a Parallel FEM Technique

    NASA Astrophysics Data System (ADS)

    Li, L. C.; Tang, C. A.; Li, G.; Wang, S. Y.; Liang, Z. Z.; Zhang, Y. B.

    2012-09-01

    The failure mechanism of hydraulic fractures in heterogeneous geological materials is an important topic in mining and petroleum engineering. A three-dimensional (3D) finite element model that considers the coupled effects of seepage, damage, and the stress field is introduced. This model is based on a previously developed two-dimensional (2D) version of the model (RFPA2D-Rock Failure Process Analysis). The RFPA3D-Parallel model is developed using a parallel finite element method with a message-passing interface library. The constitutive law of this model considers strength and stiffness degradation, stress-dependent permeability for the pre-peak stage, and deformation-dependent permeability for the post-peak stage. Using this model, 3D modelling of progressive failure and associated fluid flow in rock are conducted and used to investigate the hydro-mechanical response of rock samples at laboratory scale. The responses investigated are the axial stress-axial strain together with permeability evolution and fracture patterns at various stages of loading. Then, the hydraulic fracturing process inside a rock specimen is numerically simulated. Three coupled processes are considered: (1) mechanical deformation of the solid medium induced by the fluid pressure acting on the fracture surfaces and the rock skeleton, (2) fluid flow within the fracture, and (3) propagation of the fracture. The numerically simulated results show that the fractures from a vertical wellbore propagate in the maximum principal stress direction without branching, turning, and twisting in the case of a large difference in the magnitude of the far-field stresses. Otherwise, the fracture initiates in a non-preferred direction and plane then turns and twists during propagation to become aligned with the preferred direction and plane. This pattern of fracturing is common when the rock formation contains multiple layers with different material properties. In addition, local heterogeneity of the rock

  11. Massively-parallel FDTD simulations to address mask electromagnetic effects in hyper-NA immersion lithography

    NASA Astrophysics Data System (ADS)

    Tirapu Azpiroz, Jaione; Burr, Geoffrey W.; Rosenbluth, Alan E.; Hibbs, Michael

    2008-03-01

    In the Hyper-NA immersion lithography regime, the electromagnetic response of the reticle is known to deviate in a complicated manner from the idealized Thin-Mask-like behavior. Already, this is driving certain RET choices, such as the use of polarized illumination and the customization of reticle film stacks. Unfortunately, full 3-D electromagnetic mask simulations are computationally intensive. And while OPC-compatible mask electromagnetic field (EMF) models can offer a reasonable tradeoff between speed and accuracy for full-chip OPC applications, full understanding of these complex physical effects demands higher accuracy. Our paper describes recent advances in leveraging High Performance Computing as a critical step towards lithographic modeling of the full manufacturing process. In this paper, highly accurate full 3-D electromagnetic simulation of very large mask layouts are conducted in parallel with reasonable turnaround time, using a Blue- Gene/L supercomputer and a Finite-Difference Time-Domain (FDTD) code developed internally within IBM. A 3-D simulation of a large 2-D layout spanning 5μm×5μm at the wafer plane (and thus (20μm×20μm×0.5μm at the mask) results in a simulation with roughly 12.5GB of memory (grid size of 10nm at the mask, single-precision computation, about 30 bytes/grid point). FDTD is flexible and easily parallelizable to enable full simulations of such large layout in approximately an hour using one BlueGene/L "midplane" containing 512 dual-processor nodes with 256MB of memory per processor. Our scaling studies on BlueGene/L demonstrate that simulations up to 100μm × 100μm at the mask can be computed in a few hours. Finally, we will show that the use of a subcell technique permits accurate simulation of features smaller than the grid discretization, thus improving on the tradeoff between computational complexity and simulation accuracy. We demonstrate the correlation of the real and quadrature components that comprise the

  12. MicroRNA transcriptome in the newborn mouse ovaries determined by massive parallel sequencing.

    PubMed

    Ahn, Hyo Won; Morin, Ryan D; Zhao, Han; Harris, Ronald A; Coarfa, Cristian; Chen, Zi-Jiang; Milosavljevic, Aleksandar; Marra, Marco A; Rajkovic, Aleksandar

    2010-07-01

    Small non-coding RNAs, such as microRNAs (miRNAs), are involved in diverse biological processes including organ development and tissue differentiation. Global disruption of miRNA biogenesis in Dicer knockout mice disrupts early embryogenesis and primordial germ cell formation. However, the role of miRNAs in early folliculogenesis is poorly understood. In order to identify a full transcriptome set of small RNAs expressed in the newborn (NB) ovary, we extracted small RNA fraction from mouse NB ovary tissues and subjected it to massive parallel sequencing using the Genome Analyzer from Illumina. Massive sequencing produced 4 655 992 reads of 33 bp each representing a total of 154 Mbp of sequence data. The Pash alignment algorithm mapped 50.13% of the reads to the mouse genome. Sequence reads were clustered based on overlapping mapping coordinates and intersected with known miRNAs, small nucleolar RNAs (snoRNAs), piwi-interacting RNA (piRNA) clusters and repetitive genomic regions; 25.2% of the reads mapped to known miRNAs, 25.5% to genomic repeats, 3.5% to piRNAs and 0.18% to snoRNAs. Three hundred and ninety-eight known miRNA species were among the sequenced small RNAs, and 118 isomiR sequences that are not in the miRBase database. Let-7 family was the most abundantly expressed miRNA, and mmu-mir-672, mmu-mir-322, mmu-mir-503 and mmu-mir-465 families are the most abundant X-linked miRNA detected. X-linked mmu-mir-503, mmu-mir-672 and mmu-mir-465 family showed preferential expression in testes and ovaries. We also identified four novel miRNAs that are preferentially expressed in gonads. Gonadal selective miRNAs may play important roles in ovarian development, folliculogenesis and female fertility. PMID:20215419

  13. A Parallel 3D Spectral Difference Method for Solutions of Compressible Navier Stokes Equations on Deforming Grids and Simulations of Vortex Induced Vibration

    NASA Astrophysics Data System (ADS)

    DeJong, Andrew

    Numerical models of fluid-structure interaction have grown in importance due to increasing interest in environmental energy harvesting, airfoil-gust interactions, and bio-inspired formation flying. Powered by increasingly powerful parallel computers, such models seek to explain the fundamental physics behind the complex, unsteady fluid-structure phenomena. To this end, a high-fidelity computational model based on the high-order spectral difference method on 3D unstructured, dynamic meshes has been developed. The spectral difference method constructs continuous solution fields within each element with a Riemann solver to compute the inviscid fluxes at the element interfaces and an averaging mechanism to compute the viscous fluxes. This method has shown promise in the past as a highly accurate, yet sufficiently fast method for solving unsteady viscous compressible flows. The solver is monolithically coupled to the equations of motion of an elastically mounted 3-degree of freedom rigid bluff body undergoing flow-induced lift, drag, and torque. The mesh is deformed using 4 methods: an analytic function, Laplace equation, biharmonic equation, and a bi-elliptic equation with variable diffusivity. This single system of equations -- fluid and structure -- is advanced through time using a 5-stage, 4th-order Runge-Kutta scheme. Message Passing Interface is used to run the coupled system in parallel on up to 240 processors. The solver is validated against previously published numerical and experimental data for an elastically mounted cylinder. The effect of adding an upstream body and inducing wake galloping is observed.

  14. MICADO: Parallel implementation of a 2D-1D iterative algorithm for the 3D neutron transport problem in prismatic geometries

    SciTech Connect

    Fevotte, F.; Lathuiliere, B.

    2013-07-01

    The large increase in computing power over the past few years now makes it possible to consider developing 3D full-core heterogeneous deterministic neutron transport solvers for reference calculations. Among all approaches presented in the literature, the method first introduced in [1] seems very promising. It consists in iterating over resolutions of 2D and ID MOC problems by taking advantage of prismatic geometries without introducing approximations of a low order operator such as diffusion. However, before developing a solver with all industrial options at EDF, several points needed to be clarified. In this work, we first prove the convergence of this iterative process, under some assumptions. We then present our high-performance, parallel implementation of this algorithm in the MICADO solver. Benchmarking the solver against the Takeda case shows that the 2D-1D coupling algorithm does not seem to affect the spatial convergence order of the MOC solver. As for performance issues, our study shows that even though the data distribution is suited to the 2D solver part, the efficiency of the ID part is sufficient to ensure a good parallel efficiency of the global algorithm. After this study, the main remaining difficulty implementation-wise is about the memory requirement of a vector used for initialization. An efficient acceleration operator will also need to be developed. (authors)

  15. Wideband aperture array using RF channelizers and massively parallel digital 2D IIR filterbank

    NASA Astrophysics Data System (ADS)

    Sengupta, Arindam; Madanayake, Arjuna; Gómez-García, Roberto; Engeberg, Erik D.

    2014-05-01

    Wideband receive-mode beamforming applications in wireless location, electronically-scanned antennas for radar, RF sensing, microwave imaging and wireless communications require digital aperture arrays that offer a relatively constant far-field beam over several octaves of bandwidth. Several beamforming schemes including the well-known true time-delay and the phased array beamformers have been realized using either finite impulse response (FIR) or fast Fourier transform (FFT) digital filter-sum based techniques. These beamforming algorithms offer the desired selectivity at the cost of a high computational complexity and frequency-dependant far-field array patterns. A novel approach to receiver beamforming is the use of massively parallel 2-D infinite impulse response (IIR) fan filterbanks for the synthesis of relatively frequency independent RF beams at an order of magnitude lower multiplier complexity compared to FFT or FIR filter based conventional algorithms. The 2-D IIR filterbanks demand fast digital processing that can support several octaves of RF bandwidth, fast analog-to-digital converters (ADCs) for RF-to-bits type direct conversion of wideband antenna element signals. Fast digital implementation platforms that can realize high-precision recursive filter structures necessary for real-time beamforming, at RF radio bandwidths, are also desired. We propose a novel technique that combines a passive RF channelizer, multichannel ADC technology, and single-phase massively parallel 2-D IIR digital fan filterbanks, realized at low complexity using FPGA and/or ASIC technology. There exists native support for a larger bandwidth than the maximum clock frequency of the digital implementation technology. We also strive to achieve More-than-Moore throughput by processing a wideband RF signal having content with N-fold (B = N Fclk/2) bandwidth compared to the maximum clock frequency Fclk Hz of the digital VLSI platform under consideration. Such increase in bandwidth is

  16. Switching dynamics of thin film ferroelectric devices - a massively parallel phase field study

    NASA Astrophysics Data System (ADS)

    Ashraf, Md. Khalid

    In this thesis, we investigate the switching dynamics in thin film ferroelectrics. Ferroelectric materials are of inherent interest for low power and multi-functional devices. However, possible device applications of these materials have been limited due to the poorly understood electromagnetic and mechanical response at the nanoscale in arbitrary device structures. The difficulty in understanding switching dynamics mainly arises from the presence of features at multiple length scales and the nonlinearity associated with the strongly coupled states. For example, in a ferroelectric material, the domain walls are of nm size whereas the domain pattern forms at micron scale. The switching is determined by coupled chemical, electrostatic, mechanical and thermal interactions. Thus computational understanding of switching dynamics in thin film ferroelectrics and a direct comparison with experiment poses a significant numerical challenge. We have developed a phase field model that describes the physics of polarization dynamics at the microscopic scale. A number of efficient numerical methods have been applied for achieving massive parallelization of all the calculation steps. Conformally mapped elements, node wise assembly and prevention of dynamic loading minimized the communication between processors and increased the parallelization efficiency. With these improvements, we have reached the experimental scale - a significant step forward compared to the state of the art thin film ferroelectric switching dynamics models. Using this model, we elucidated the switching dynamics on multiple surfaces of the multiferroic material BFO. We also calculated the switching energy of scaled BFO islands. Finally, we studied the interaction of domain wall propagation with misfit dislocations in the thin film. We believe that the model will be useful in understanding the switching dynamics in many different experimental setups incorporating thin film ferroelectrics.

  17. Diffuse large B-cell lymphoma: sub-classification by massive parallel quantitative RT-PCR.

    PubMed

    Xue, Xuemin; Zeng, Naiyan; Gao, Zifen; Du, Ming-Qing

    2015-01-01

    Diffuse large B-cell lymphoma (DLBCL) is a heterogeneous entity with remarkably variable clinical outcome. Gene expression profiling (GEP) classifies DLBCL into activated B-cell like (ABC), germinal center B-cell like (GCB), and Type-III subtypes, with ABC-DLBCL characterized by a poor prognosis and constitutive NF-κB activation. A major challenge for the application of this cell of origin (COO) classification in routine clinical practice is to establish a robust clinical assay amenable to routine formalin-fixed paraffin-embedded (FFPE) diagnostic biopsies. In this study, we investigated the possibility of COO-classification using FFPE tissue RNA samples by massive parallel quantitative reverse transcription PCR (qRT-PCR). We established a protocol for parallel qRT-PCR using FFPE RNA samples with the Fluidigm BioMark HD system, and quantified the expression of the COO classifier genes and the NF-κB targeted-genes that characterize ABC-DLBCL in 143 cases of DLBCL. We also trained and validated a series of basic machine-learning classifiers and their derived meta classifiers, and identified SimpleLogistic as the top classifier that gave excellent performance across various GEP data sets derived from fresh-frozen or FFPE tissues by different microarray platforms. Finally, we applied SimpleLogistic to our data set generated by qRT-PCR, and the ABC and GCB-DLBCL assigned showed the respective characteristics in their clinical outcome and NF-κB target gene expression. The methodology established in this study provides a robust approach for DLBCL sub-classification using routine FFPE diagnostic biopsies in a routine clinical setting. PMID:25418578

  18. Hierarchical Image Segmentation of Remotely Sensed Data using Massively Parallel GNU-LINUX Software

    NASA Technical Reports Server (NTRS)

    Tilton, James C.

    2003-01-01

    A hierarchical set of image segmentations is a set of several image segmentations of the same image at different levels of detail in which the segmentations at coarser levels of detail can be produced from simple merges of regions at finer levels of detail. In [1], Tilton, et a1 describes an approach for producing hierarchical segmentations (called HSEG) and gave a progress report on exploiting these hierarchical segmentations for image information mining. The HSEG algorithm is a hybrid of region growing and constrained spectral clustering that produces a hierarchical set of image segmentations based on detected convergence points. In the main, HSEG employs the hierarchical stepwise optimization (HSWO) approach to region growing, which was described as early as 1989 by Beaulieu and Goldberg. The HSWO approach seeks to produce segmentations that are more optimized than those produced by more classic approaches to region growing (e.g. Horowitz and T. Pavlidis, [3]). In addition, HSEG optionally interjects between HSWO region growing iterations, merges between spatially non-adjacent regions (i.e., spectrally based merging or clustering) constrained by a threshold derived from the previous HSWO region growing iteration. While the addition of constrained spectral clustering improves the utility of the segmentation results, especially for larger images, it also significantly increases HSEG s computational requirements. To counteract this, a computationally efficient recursive, divide-and-conquer, implementation of HSEG (RHSEG) was devised, which includes special code to avoid processing artifacts caused by RHSEG s recursive subdivision of the image data. The recursive nature of RHSEG makes for a straightforward parallel implementation. This paper describes the HSEG algorithm, its recursive formulation (referred to as RHSEG), and the implementation of RHSEG using massively parallel GNU-LINUX software. Results with Landsat TM data are included comparing RHSEG with classic

  19. Massively parallel LES of azimuthal thermo-acoustic instabilities in annular gas turbines

    NASA Astrophysics Data System (ADS)

    Wolf, P.; Staffelbach, G.; Roux, A.; Gicquel, L.; Poinsot, T.; Moureau, V.

    2009-06-01

    Increasingly stringent regulations and the need to tackle rising fuel prices have placed great emphasis on the design of aeronautical gas turbines, which are unfortunately more and more prone to combustion instabilities. In the particular field of annular combustion chambers, these instabilities often take the form of azimuthal modes. To predict these modes, one must compute the full combustion chamber, which remained out of reach until very recently and the development of massively parallel computers. In this article, full annular Large Eddy Simulations (LES) of two helicopter combustors, which differ only on the swirlers' design, are performed. In both computations, LES captures self-established rotating azimuthal modes. However, the two cases exhibit different thermo-acoustic responses and the resulting limit-cycles are different. With the first design, a self-excited strong instability develops, leading to pulsating flames and local flashback. In the second case, the flames are much less affected by the azimuthal mode and remain stable, allowing an acceptable operation. Hence, this study highlights the potential of LES for discriminating injection system designs. To cite this article: P. Wolf et al., C. R. Mecanique 337 (2009).

  20. Massively parallel LES of azimuthal thermo-acoustic instabilities in annular gas turbines

    NASA Astrophysics Data System (ADS)

    Wolf, Pierre; Staffelbach, Gabriel; Gicquel, Laurent; Poinsot, Thierry

    2009-07-01

    Most of the energy produced worldwide comes from the combustion of fossil fuels. In the context of global climate changes and dramatically decreasing resources, there is a critical need for optimizing the process of burning, especially in the field of gas turbines. Unfortunately, new designs for efficient combustion are prone to destructive thermo-acoustic instabilities. Large Eddy Simulation (LES) is a promising tool to predict turbulent reacting flows in complex industrial configurations and explore the mechanisms triggering the coupling between acoustics and combustion. In the particular field of annular combustion chambers, these instabilities usually take the form of azimuthal modes. To predict these modes, one must compute the full combustion chamber comprising all sectors, which remained out of reach until very recently and the development of massively parallel computers. A fully compressible, multi-species reactive Navier-Stokes solver is used on up to 4096 BlueGene/P CPUs for two designs of a full annular helicopter chamber. Results show evidence of self-established azimuthal modes for the two cases but with different energy containing limit-cycles. Mesh dependency is checked with grids comprising 38 and 93 million tetrahedra. The fact that the two grid predictions yield similar flow topologies and limit-cycles enforces the ability of LES to discriminate design changes.

  1. Ensuring the safety of vaccine cell substrates by massively parallel sequencing of the transcriptome.

    PubMed

    Onions, D; Côté, C; Love, B; Toms, B; Koduri, S; Armstrong, A; Chang, A; Kolman, J

    2011-09-22

    Massively parallel, deep, sequencing of the transcriptome coupled with algorithmic analysis to identify adventitious agents (MP-Seq™) is an important adjunct in ensuring the safety of cells used in vaccine production. Such cells may harbour novel viruses whose sequences are unknown or latent viruses that are only expressed following stress to the cells. MP-Seq is an unbiased and comprehensive method to identify such viruses and other adventitious agents without prior knowledge of the nature of those agents. Here we demonstrate its utility as part of an integrated approach to identify and characterise potential contaminants within commonly used virus and vaccine production cell lines. Through this analysis, in combination with more traditional approaches, we have excluded the presence of porcine circoviruses in the ATCC Vero cell bank (CCL-81), however, we found that a full length betaretrovirus related to SRV can be expressed in these cells, a factor that may be of importance in the production of certain vaccines. Similarly, insect cells are proving to be valuable for the production of virus like particles and sub-unit vaccines, but they can harbour a range of latent viruses. We show that following MP-Seq of the Trichoplusia ni (High Five cell line) transcriptome we were able to detect a contaminating, latent nodavirus and identify an expressed errantivirus genome. Collectively, these studies have reinforced the role of MP-Seq as an integral tool for the identification of contaminating agents in vaccine cell substrates. PMID:21651935

  2. GRay: A Massively Parallel GPU-based Code for Ray Tracing in Relativistic Spacetimes

    NASA Astrophysics Data System (ADS)

    Chan, Chi-kwan; Psaltis, Dimitrios; Özel, Feryal

    2013-11-01

    We introduce GRay, a massively parallel integrator designed to trace the trajectories of billions of photons in a curved spacetime. This graphics-processing-unit (GPU)-based integrator employs the stream processing paradigm, is implemented in CUDA C/C++, and runs on nVidia graphics cards. The peak performance of GRay using single-precision floating-point arithmetic on a single GPU exceeds 300 GFLOP (or 1 ns per photon per time step). For a realistic problem, where the peak performance cannot be reached, GRay is two orders of magnitude faster than existing central-processing-unit-based ray-tracing codes. This performance enhancement allows more effective searches of large parameter spaces when comparing theoretical predictions of images, spectra, and light curves from the vicinities of compact objects to observations. GRay can also perform on-the-fly ray tracing within general relativistic magnetohydrodynamic algorithms that simulate accretion flows around compact objects. Making use of this algorithm, we calculate the properties of the shadows of Kerr black holes and the photon rings that surround them. We also provide accurate fitting formulae of their dependencies on black hole spin and observer inclination, which can be used to interpret upcoming observations of the black holes at the center of the Milky Way, as well as M87, with the Event Horizon Telescope.

  3. Massive Parallel Sequencing for Diagnostic Genetic Testing of BRCA Genes--a Single Center Experience.

    PubMed

    Ermolenko, Natalya A; Boyarskikh, Uljana A; Kechin, Andrey A; Mazitova, Alexandra M; Khrapov, Evgeny A; Petrova, Valentina D; Lazarev, Alexandr F; Kushlinskii, Nikolay E; Filipenko, Maxim L

    2015-01-01

    The aim of this study was to implement massive parallel sequencing (MPS) technology in clinical genetics testing. We developed and tested an amplicon-based method for resequencing the BRCA1 and BRCA2 genes on an Illumina MiSeq to identify disease-causing mutations in patients with hereditary breast or ovarian cancer (HBOC). The coding regions of BRCA1 and BRCA2 were resequenced in 96 HBOC patient DNA samples obtained from different sample types: peripheral blood leukocytes, whole blood drops dried on paper, and buccal wash epithelia. A total of 16 random DNA samples were characterized using standard Sanger sequencing and applied to optimize the variant calling process and evaluate the accuracy of the MPS-method. The best bioinformatics workflow included the filtration of variants using GATK with the following cut-offs: variant frequency >14%, coverage (>25x) and presence in both the forward and reverse reads. The MPS method had 100% sensitivity and 94.4% specificity. Similar accuracy levels were achieved for DNA obtained from the different sample types. The workflow presented herein requires low amounts of DNA samples (170 ng) and is cost-effective due to the elimination of DNA and PCR product normalization steps. PMID:26625824

  4. Resolving genomic disorder–associated breakpoints within segmental DNA duplications using massively parallel sequencing

    PubMed Central

    Nuttle, Xander; Itsara, Andy; Shendure, Jay; Eichler, Evan E.

    2014-01-01

    The most common recurrent copy number variants associated with autism, developmental delay, and epilepsy are flanked by segmental duplications. Complete genetic characterization of these events is challenging because their breakpoints often occur within high-identity, copy number polymorphic paralogous sequences that cannot be specifically assayed using hybridization-based methods. Here, we provide a protocol for breakpoint resolution with sequence-level precision. Massively parallel sequencing is performed on libraries generated from haplotype-resolved chromosomes, genomic DNA, or molecular inversion probe–captured breakpoint-informative regions harboring paralog-distinguishing variants. Quantifying sequencing depth over informative sites enables breakpoint localization, typically within several kilobases to tens of kilobases. Depending on the approach employed, the sequencing platform, and the accuracy and completeness of the reference genome sequence, this protocol takes from a few days to several months to complete. Once established for a specific genomic disorder, it is possible to process thousands of DNA samples within as little as 3–4 weeks. PMID:24874815

  5. Characterization of the Zoarces viviparus liver transcriptome using massively parallel pyrosequencing

    PubMed Central

    Kristiansson, Erik; Asker, Noomi; Förlin, Lars; Larsson, DG Joakim

    2009-01-01

    Background The teleost Zoarces viviparus (eelpout) lives along the coasts of Northern Europe and has long been an established model organism for marine ecology and environmental monitoring. The scarce information about this species genome has however restrained the use of efficient molecular-level assays, such as gene expression microarrays. Results In the present study we present the first comprehensive characterization of the Zoarces viviparus liver transcriptome. From 400,000 reads generated by massively parallel pyrosequencing, more than 50,000 pieces of putative transcripts were assembled, annotated and functionally classified. The data was estimated to cover roughly 40% of the total transcriptome and homologues for about half of the genes of Gasterosteus aculeatus (stickleback) were identified. The sequence data was consequently used to design an oligonucleotide microarray for large-scale gene expression analysis. Conclusion Our results show that one run using a Genome Sequencer FLX from 454 Life Science/Roche generates enough genomic information for adequate de novo assembly of a large number of genes in a higher vertebrate. The generated sequence data, including the validated microarray probes, are publicly available to promote genome-wide research in Zoarces viviparus. PMID:19646242

  6. Hybrid selection of discrete genomic intervals on custom-designed microarrays for massively parallel sequencing

    PubMed Central

    Hodges, Emily; Rooks, Michelle; Xuan, Zhenyu; Bhattacharjee, Arindam; Gordon, D Benjamin; Brizuela, Leonardo; McCombie, W Richard; Hannon, Gregory J

    2010-01-01

    Complementary techniques that deepen information content and minimize reagent costs are required to realize the full potential of massively parallel sequencing. Here, we describe a resequencing approach that directs focus to genomic regions of high interest by combining hybridization-based purification of multi-megabase regions with sequencing on the Illumina Genome Analyzer (GA). The capture matrix is created by a microarray on which probes can be programmed as desired to target any non-repeat portion of the genome, while the method requires only a basic familiarity with microarray hybridization. We present a detailed protocol suitable for 1–2 µg of input genomic DNA and highlight key design tips in which high specificity (>65% of reads stem from enriched exons) and high sensitivity (98% targeted base pair coverage) can be achieved. We have successfully applied this to the enrichment of coding regions, in both human and mouse, ranging from 0.5 to 4 Mb in length. From genomic DNA library production to base-called sequences, this procedure takes approximately 9–10 d inclusive of array captures and one Illumina flow cell run. PMID:19478811

  7. Massively Parallel Sequencing for Genetic Diagnosis of Hearing Loss: The New Standard of Care

    PubMed Central

    Shearer, A. Eliot; Smith, Richard J.H.

    2016-01-01

    Objective To evaluate the use of new genetic sequencing techniques for comprehensive genetic testing for hearing loss. Data Sources Articles were identified from PubMed and Google Scholar databases using pertinent search terms. Review Methods Literature search identified 30 studies as candidates that met search criteria. Three studies were excluded and eight studies were found to be case reports. 20 studies were included for review analysis including seven studies that evaluated controls and 16 studies that evaluated patients with unknown causes of hearing loss; three studies evaluated both controls and patients. Conclusions In the 20 studies included in review analysis, 426 control samples and 603 patients with unknown causes of hearing loss underwent comprehensive genetic diagnosis for hearing loss using massively parallel sequencing. Control analysis showed a sensitivity and specificity > 99%, sufficient for clinical use of these tests. The overall diagnostic rate was 41% (range 10% to 83%) and varied based on several factors including inheritance and pre-screening prior to comprehensive testing. There were significant differences in platforms available in regards to number and type of genes included and whether copy number variations were examined. Based on these results, comprehensive genetic testing should form the cornerstone of a tiered approach to clinical evaluation of patients with hearing loss along with history, physical exam, and audiometry and can determine further testing that may be required, if any. Implications for Practice Comprehensive genetic testing has become the new standard of care for genetic testing for patients with sensorineural hearing loss. PMID:26084827

  8. Massively parallel network architectures for automatic recognition of visual speech signals. Final technical report

    SciTech Connect

    Sejnowski, T.J.; Goldstein, M.

    1990-01-01

    This research sought to produce a massively-parallel network architecture that could interpret speech signals from video recordings of human talkers. This report summarizes the project's results: (1) A corpus of video recordings from two human speakers was analyzed with image processing techniques and used as the data for this study; (2) We demonstrated that a feed forward network could be trained to categorize vowels from these talkers. The performance was comparable to that of the nearest neighbors techniques and to trained humans on the same data; (3) We developed a novel approach to sensory fusion by training a network to transform from facial images to short-time spectral amplitude envelopes. This information can be used to increase the signal-to-noise ratio and hence the performance of acoustic speech recognition systems in noisy environments; (4) We explored the use of recurrent networks to perform the same mapping for continuous speech. Results of this project demonstrate the feasibility of adding a visual speech recognition component to enhance existing speech recognition systems. Such a combined system could be used in noisy environments, such as cockpits, where improved communication is needed. This demonstration of presymbolic fusion of visual and acoustic speech signals is consistent with our current understanding of human speech perception.

  9. Novel Y-chromosome Short Tandem Repeat Variants Detected Through the Use of Massively Parallel Sequencing.

    PubMed

    Warshauer, David H; Churchill, Jennifer D; Novroski, Nicole; King, Jonathan L; Budowle, Bruce

    2015-08-01

    Massively parallel sequencing (MPS) technology is capable of determining the sizes of short tandem repeat (STR) alleles as well as their individual nucleotide sequences. Thus, single nucleotide polymorphisms (SNPs) within the repeat regions of STRs and variations in the pattern of repeat units in a given repeat motif can be used to differentiate alleles of the same length. In this study, MPS was used to sequence 28 forensically-relevant Y-chromosome STRs in a set of 41 DNA samples from the 3 major U.S. population groups (African Americans, Caucasians, and Hispanics). The resulting sequence data, which were analyzed with STRait Razor v2.0, revealed 37 unique allele sequence variants that have not been previously reported. Of these, 19 sequences were variations of documented sequences resulting from the presence of intra-repeat SNPs or alternative repeat unit patterns. Despite a limited sampling, two of the most frequently-observed variants were found only in African American samples. The remaining 18 variants represented allele sequences for which there were no published data with which to compare. These findings illustrate the great potential of MPS with regard to increasing the resolving power of STR typing and emphasize the need for sample population characterization of STR alleles. PMID:26391384

  10. Massively parallel decoding of mammalian regulatory sequences supports a flexible organizational model.

    PubMed

    Smith, Robin P; Taher, Leila; Patwardhan, Rupali P; Kim, Mee J; Inoue, Fumitaka; Shendure, Jay; Ovcharenko, Ivan; Ahituv, Nadav

    2013-09-01

    Despite continual progress in the cataloging of vertebrate regulatory elements, little is known about their organization and regulatory architecture. Here we describe a massively parallel experiment to systematically test the impact of copy number, spacing, combination and order of transcription factor binding sites on gene expression. A complex library of ∼5,000 synthetic regulatory elements containing patterns from 12 liver-specific transcription factor binding sites was assayed in mice and in HepG2 cells. We find that certain transcription factors act as direct drivers of gene expression in homotypic clusters of binding sites, independent of spacing between sites, whereas others function only synergistically. Heterotypic enhancers are stronger than their homotypic analogs and favor specific transcription factor binding site combinations, mimicking putative native enhancers. Exhaustive testing of binding site permutations suggests that there is flexibility in binding site order. Our findings provide quantitative support for a flexible model of regulatory element activity and suggest a framework for the design of synthetic tissue-specific enhancers. PMID:23892608

  11. Photo-patterned free-standing hydrogel microarrays for massively parallel protein analysis

    NASA Astrophysics Data System (ADS)

    Duncombe, Todd A.; Herr, Amy E.

    2015-03-01

    Microfluidic technologies have largely been realized within enclosed microchannels. While powerful, a principle limitation of closed-channel microfluidics is the difficulty for sample extraction and downstream processing. To address this limitation and expand the utility of microfluidic analytical separation tools, we developed an openchannel hydrogel architecture for rapid protein analysis. Designed for compatibility with slab-gel polyacrylamide gel electrophoresis (PAGE) reagents and instruments, we detail the development of free-standing polyacrylamide gel (fsPAG) microstructures supporting electrophoretic performance rivalling that of microfluidic platforms. Owing to its open architecture - the platform can be easily interfaced with automated robotic controllers and downstream processing (e.g., sample spotters, immunological probing, mass spectroscopy). The fsPAG devices are directly photopatterened atop of and covalently attached to planar polymer or glass surfaces. Due to the fast < 1 hr design-prototype-test cycle - significantly faster than mold based fabrication techniques - rapid prototyping devices with fsPAG microstructures provides researchers a powerful tool for developing custom analytical assays. Leveraging the rapid prototyping benefits - we up-scale from a unit separation to an array of 96 concurrent fsPAGE assays in 10 min run time driven by one electrode pair. The fsPAGE platform is uniquely well-suited for massively parallelized proteomics, a major unrealized goal from bioanalytical technology.

  12. Transcriptomic analysis of the housefly (Musca domestica) larva using massively parallel pyrosequencing.

    PubMed

    Liu, Fengsong; Tang, Ting; Sun, Lingling; Jose Priya, T A

    2012-02-01

    To explore the transcriptome of Musca domestica larvae and to identify unique sequences, we used massively parallel pyrosequencing on the Roche 454-FLX platform to generate a substantial EST dataset of this fly. As a result, we obtained a total of 249,555 ESTs with an average read length of 373 bp. These reads were assembled into 13,206 contigs and 20,556 singletons. Using BlastX searches of the Swissprot and Nr databases, we were able to identify 4,814 contigs and 8,166 singletons as unique sequences. Subsequently, the annotated sequences were subjected to GO analysis and the search results showed a majority of the query sequences were assignable to certain gene ontology terms. In addition, functional classification and pathway assignment were performed by KEGG and 2,164 unique sequences were mapped into 184 KEGG pathways in total. As the first attempt on large-scale RNA sequencing of M. domestica, this general picture of the transcriptome can establish a fundamental resource for further research on functional genomics. PMID:21643958

  13. Novel Y-chromosome Short Tandem Repeat Variants Detected Through the Use of Massively Parallel Sequencing

    PubMed Central

    Warshauer, David H.; Churchill, Jennifer D.; Novroski, Nicole; King, Jonathan L.; Budowle, Bruce

    2015-01-01

    Massively parallel sequencing (MPS) technology is capable of determining the sizes of short tandem repeat (STR) alleles as well as their individual nucleotide sequences. Thus, single nucleotide polymorphisms (SNPs) within the repeat regions of STRs and variations in the pattern of repeat units in a given repeat motif can be used to differentiate alleles of the same length. In this study, MPS was used to sequence 28 forensically-relevant Y-chromosome STRs in a set of 41 DNA samples from the 3 major U.S. population groups (African Americans, Caucasians, and Hispanics). The resulting sequence data, which were analyzed with STRait Razor v2.0, revealed 37 unique allele sequence variants that have not been previously reported. Of these, 19 sequences were variations of documented sequences resulting from the presence of intra-repeat SNPs or alternative repeat unit patterns. Despite a limited sampling, two of the most frequently-observed variants were found only in African American samples. The remaining 18 variants represented allele sequences for which there were no published data with which to compare. These findings illustrate the great potential of MPS with regard to increasing the resolving power of STR typing and emphasize the need for sample population characterization of STR alleles. PMID:26391384

  14. GRay: A MASSIVELY PARALLEL GPU-BASED CODE FOR RAY TRACING IN RELATIVISTIC SPACETIMES

    SciTech Connect

    Chan, Chi-kwan; Psaltis, Dimitrios; Özel, Feryal

    2013-11-01

    We introduce GRay, a massively parallel integrator designed to trace the trajectories of billions of photons in a curved spacetime. This graphics-processing-unit (GPU)-based integrator employs the stream processing paradigm, is implemented in CUDA C/C++, and runs on nVidia graphics cards. The peak performance of GRay using single-precision floating-point arithmetic on a single GPU exceeds 300 GFLOP (or 1 ns per photon per time step). For a realistic problem, where the peak performance cannot be reached, GRay is two orders of magnitude faster than existing central-processing-unit-based ray-tracing codes. This performance enhancement allows more effective searches of large parameter spaces when comparing theoretical predictions of images, spectra, and light curves from the vicinities of compact objects to observations. GRay can also perform on-the-fly ray tracing within general relativistic magnetohydrodynamic algorithms that simulate accretion flows around compact objects. Making use of this algorithm, we calculate the properties of the shadows of Kerr black holes and the photon rings that surround them. We also provide accurate fitting formulae of their dependencies on black hole spin and observer inclination, which can be used to interpret upcoming observations of the black holes at the center of the Milky Way, as well as M87, with the Event Horizon Telescope.

  15. Targeted massively parallel sequencing provides comprehensive genetic diagnosis for patients with disorders of sex development

    PubMed Central

    Arboleda, VA; Lee, H; Sánchez, FJ; Délot, EC; Sandberg, DE; Grody, WW; Nelson, SF; Vilain, E

    2013-01-01

    Disorders of sex development (DSD) are rare disorders in which there is discordance between chromosomal, gonadal, and phenotypic sex. Only a minority of patients clinically diagnosed with DSD obtains a molecular diagnosis, leaving a large gap in our understanding of the prevalence, management, and outcomes in affected patients. We created a novel DSD-genetic diagnostic tool, in which sex development genes are captured using RNA probes and undergo massively parallel sequencing. In the pilot group of 14 patients, we determined sex chromosome dosage, copy number variation, and gene mutations. In the patients with a known genetic diagnosis (obtained either on a clinical or research basis), this test identified the molecular cause in 100% (7/7) of patients. In patients in whom no molecular diagnosis had been made, this tool identified a genetic diagnosis in two of seven patients. Targeted sequencing of genes representing a specific spectrum of disorders can result in a higher rate of genetic diagnoses than current diagnostic approaches. Our DSD diagnostic tool provides for first time, in a single blood test, a comprehensive genetic diagnosis in patients presenting with a wide range of urogenital anomalies. PMID:22435390

  16. Identification of Novel FMR1 Variants by Massively Parallel Sequencing in Developmentally Delayed Males

    PubMed Central

    Collins, Stephen C.; Bray, Steven M.; Suhl, Joshua A.; Cutler, David J.; Coffee, Bradford; Zwick, Michael E.; Warren, Stephen T.

    2010-01-01

    Fragile X syndrome (FXS), the most common inherited form of developmental delay, is typically caused by CGG-repeat expansion in FMR1. However, little attention has been paid to sequence variants in FMR1. Through the use of pooled-template massively parallel sequencing, we identified 130 novel FMR1 sequence variants in a population of 963 developmentally delayed males without CGG-repeat expansion mutations. Among these, we identified a novel missense change, p.R138Q, which alters a conserved residue in the nuclear localization signal of FMRP. We have also identified three promoter mutations in this population, all of which significantly reduce in vitro levels of FMR1 transcription. Additionally, we identified 10 noncoding variants of possible functional significance in the introns and 3’-untranslated region of FMR1, including two predicted splice site mutations. These findings greatly expand the catalogue of known FMR1 sequence variants and suggest that FMR1 sequence variants may represent an important cause of developmental delay. PMID:20799337

  17. Mitochondrial DNA heteroplasmy in the emerging field of massively parallel sequencing

    PubMed Central

    Just, Rebecca S.; Irwin, Jodi A.; Parson, Walther

    2015-01-01

    Long an important and useful tool in forensic genetic investigations, mitochondrial DNA (mtDNA) typing continues to mature. Research in the last few years has demonstrated both that data from the entire molecule will have practical benefits in forensic DNA casework, and that massively parallel sequencing (MPS) methods will make full mitochondrial genome (mtGenome) sequencing of forensic specimens feasible and cost-effective. A spate of recent studies has employed these new technologies to assess intraindividual mtDNA variation. However, in several instances, contamination and other sources of mixed mtDNA data have been erroneously identified as heteroplasmy. Well vetted mtGenome datasets based on both Sanger and MPS sequences have found authentic point heteroplasmy in approximately 25% of individuals when minor component detection thresholds are in the range of 10–20%, along with positional distribution patterns in the coding region that differ from patterns of point heteroplasmy in the well-studied control region. A few recent studies that examined very low-level heteroplasmy are concordant with these observations when the data are examined at a common level of resolution. In this review we provide an overview of considerations related to the use of MPS technologies to detect mtDNA heteroplasmy. In addition, we examine published reports on point heteroplasmy to characterize features of the data that will assist in the evaluation of future mtGenome data developed by any typing method. PMID:26009256

  18. Simulation of hydraulic fracture networks in three dimensions utilizing massively parallel computing platforms

    NASA Astrophysics Data System (ADS)

    Settgast, R. R.; Johnson, S.; Fu, P.; Walsh, S. D.; Ryerson, F. J.; Antoun, T.

    2012-12-01

    Hydraulic fracturing has been an enabling technology for commercially stimulating fracture networks for over half of a century. It has become one of the most widespread technologies for engineering subsurface fracture systems. Despite the ubiquity of this technique in the field, understanding and prediction of the hydraulic induced propagation of the fracture network in realistic, heterogeneous reservoirs has been limited. A number of developments in multiscale modeling in recent years have allowed researchers in related fields to tackle the modeling of complex fracture propagation as well as the mechanics of heterogeneous materials. These developments, combined with advances in quantifying solution uncertainties, provide possibilities for the geologic modeling community to capture both the fracturing behavior and longer-term permeability evolution of rock masses under hydraulic loading across both dynamic and viscosity-dominated regimes. Here we will demonstrate the first phase of this effort through illustrations of fully three-dimensional, tightly coupled hydromechanical simulations of hydraulically induced fracture network propagation run on massively parallel computing scales, and discuss preliminary results regarding the mechanisms by which fracture interactions and the accompanying changes to the stress field can lead to deleterious or beneficial changes to the fracture network.

  19. Identification of a novel GATA3 mutation in a deaf Taiwanese family by massively parallel sequencing.

    PubMed

    Lin, Yin-Hung; Wu, Chen-Chi; Hsu, Tun-Yen; Chiu, Wei-Yih; Hsu, Chuan-Jen; Chen, Pei-Lung

    2015-01-01

    Recent studies have confirmed the utility of massively parallel sequencing (MPS) in addressing genetically heterogeneous hereditary hearing impairment. By applying a MPS diagnostic panel targeting 129 known deafness genes, we identified a novel frameshift GATA3 mutation, c.149delT (p.Phe51LeufsX144), in a hearing-impaired family compatible with autosomal dominant inheritance. The GATA3 haploinsufficiency is thought to be associated with the hypoparathyroidism, sensorineural deafness, and renal dysplasia (HDR) syndrome. The pathogenicity of GATA3 c.149delT was supported by its absence in the 5400 NHLBI exomes, 1000 Genomes, and the 100 normal hearing controls of the present study; the co-segregation of c.149delT heterozygosity with hearing impairment in 9 affected members of the family; as well as the nonsense-mediated mRNA decay of the mutant allele in in vitro functional studies. The phenotypes in this family appeared relatively mild, as most affected members presented no signs of hypoparathyroidism or renal abnormalities, including the proband. To our knowledge, this is the first report of genetic diagnosis of HDR syndrome before the clinical diagnosis. Genetic examination for multiple deafness genes with MPS might be helpful in identifying certain types of syndromic hearing loss such as HDR syndrome, contributing to earlier diagnosis and treatment of the affected individuals. PMID:25771973

  20. Mitochondrial DNA heteroplasmy in the emerging field of massively parallel sequencing.

    PubMed

    Just, Rebecca S; Irwin, Jodi A; Parson, Walther

    2015-09-01

    Long an important and useful tool in forensic genetic investigations, mitochondrial DNA (mtDNA) typing continues to mature. Research in the last few years has demonstrated both that data from the entire molecule will have practical benefits in forensic DNA casework, and that massively parallel sequencing (MPS) methods will make full mitochondrial genome (mtGenome) sequencing of forensic specimens feasible and cost-effective. A spate of recent studies has employed these new technologies to assess intraindividual mtDNA variation. However, in several instances, contamination and other sources of mixed mtDNA data have been erroneously identified as heteroplasmy. Well vetted mtGenome datasets based on both Sanger and MPS sequences have found authentic point heteroplasmy in approximately 25% of individuals when minor component detection thresholds are in the range of 10-20%, along with positional distribution patterns in the coding region that differ from patterns of point heteroplasmy in the well-studied control region. A few recent studies that examined very low-level heteroplasmy are concordant with these observations when the data are examined at a common level of resolution. In this review we provide an overview of considerations related to the use of MPS technologies to detect mtDNA heteroplasmy. In addition, we examine published reports on point heteroplasmy to characterize features of the data that will assist in the evaluation of future mtGenome data developed by any typing method. PMID:26009256

  1. Massively parallel sequencing of complete mitochondrial genomes from hair shaft samples.

    PubMed

    Parson, Walther; Huber, Gabriela; Moreno, Lilliana; Madel, Maria-Bernadette; Brandhagen, Michael D; Nagl, Simone; Xavier, Catarina; Eduardoff, Mayra; Callaghan, Thomas C; Irwin, Jodi A

    2015-03-01

    Though shed hairs are one of the most commonly encountered evidence types, they are among the most limited in terms of DNA quantity and quality. As a result, DNA testing has historically focused on the recovery of just about 600 base pairs of the mitochondrial DNA control region. Here, we describe our success in recovering complete mitochondrial genome (mtGenome) data (∼16,569bp) from single shed hairs. By employing massively parallel sequencing (MPS), we demonstrate that particular hair samples yield DNA sufficient in quantity and quality to produce 2-3kb mtGenome amplicons and that entire mtGenome data can be recovered from hair extracts even without PCR enrichment. Most importantly, we describe a small amplicon multiplex assay comprised of sixty-two primer sets that can be routinely applied to the compromised hair samples typically encountered in forensic casework. In all samples tested here, the MPS data recovered using any one of the three methods were consistent with the control Sanger sequence data developed from high quality known specimens. Given the recently demonstrated value of complete mtGenome data in terms of discrimination power among randomly sampled individuals, the possibility of recovering mtGenome data from the most compromised and limited evidentiary material is likely to vastly increase the utility of mtDNA testing for hair evidence. PMID:25438934

  2. Large-scale massively parallel atomistic simulations of short pulse laser interaction with metals

    NASA Astrophysics Data System (ADS)

    Wu, Chengping; Zhigilei, Leonid; Computational Materials Group Team

    2014-03-01

    Taking advantage of petascale supercomputing architectures, large-scale massively parallel atomistic simulations (108-109 atoms) are performed to study the microscopic mechanisms of short pulse laser interaction with metals. The results of the simulations reveal a complex picture of highly non-equilibrium processes responsible for material modification and/or ejection. At low laser fluences below the ablation threshold, fast melting and resolidification occur under conditions of extreme heating and cooling rates resulting in surface microstructure modification. At higher laser fluences in the spallation regime, the material is ejected by the relaxation of laser-induced stresses and proceeds through the nucleation, growth and percolation of multiple voids in the sub-surface region of the irradiated target. At a fluence of ~ 2.5 times the spallation threshold, the top part of the target reaches the conditions for an explosive decomposition into vapor and small droplets, marking the transition to the phase explosion regime of laser ablation. The dynamics of plume formation and the characteristics of the ablation plume are obtained from the simulations and compared with the results of time-resolved plume imaging experiments. Financial support for this work was provided by NSF (DMR-0907247 and CMMI-1301298) and AFOSR (FA9550-10-1-0541). Computational support was provided by the OLCF (MAT048) and XSEDE (TG-DMR110090).

  3. Frequency of Usher syndrome type 1 in deaf children by massively parallel DNA sequencing.

    PubMed

    Yoshimura, Hidekane; Miyagawa, Maiko; Kumakawa, Kozo; Nishio, Shin-Ya; Usami, Shin-Ichi

    2016-05-01

    Usher syndrome type 1 (USH1) is the most severe of the three USH subtypes due to its profound hearing loss, absent vestibular response and retinitis pigmentosa appearing at a prepubescent age. Six causative genes have been identified for USH1, making early diagnosis and therapy possible through DNA testing. Targeted exon sequencing of selected genes using massively parallel DNA sequencing (MPS) technology enables clinicians to systematically tackle previously intractable monogenic disorders and improve molecular diagnosis. Using MPS along with direct sequence analysis, we screened 227 unrelated non-syndromic deaf children and detected recessive mutations in USH1 causative genes in five patients (2.2%): three patients harbored MYO7A mutations and one each carried CDH23 or PCDH15 mutations. As indicated by an earlier genotype-phenotype correlation study of the CDH23 and PCDH15 genes, we considered the latter two patients to have USH1. Based on clinical findings, it was also highly likely that one patient with MYO7A mutations possessed USH1 due to a late onset age of walking. This first report describing the frequency (1.3-2.2%) of USH1 among non-syndromic deaf children highlights the importance of comprehensive genetic testing for early disease diagnosis. PMID:26791358

  4. Tracking the roots of cellulase hyperproduction by the fungus Trichoderma reesei using massively parallel DNA sequencing

    SciTech Connect

    Le Crom, Stphane; Schackwitz, Wendy; Pennacchiod, Len; Magnuson, Jon K.; Culley, David E.; Collett, James R.; Martin, Joel X.; Druzhinina, Irina S.; Mathis, Hugues; Monot, Frdric; Seiboth, Bernhard; Cherry, Barbara; Rey, Michael; Berka, Randy; Kubicek, Christian P.; Baker, Scott E.; Margeot, Antoine

    2009-09-22

    Trichoderma reesei (teleomorph Hypocrea jecorina) is the main industrial source of cellulases and hemicellulases harnessed for the hydrolysis of biomass to simple sugars, which can then be converted to biofuels, such as ethanol, and other chemicals. The highly productive strains in use today were generated by classical mutagenesis. To learn how cellulase production was improved by these techniques, we performed massively parallel sequencing to identify mutations in the genomes of two hyperproducing strains (NG14, and its direct improved descendant, RUT C30). We detected a surprisingly high number of mutagenic events: 223 single nucleotides variants, 15 small deletions or insertions and 18 larger deletions leading to the loss of more than 100 kb of genomic DNA. From these events we report previously undocumented non-synonymous mutations in 43 genes that are mainly involved in nuclear transport, mRNA stability, transcription, secretion/vacuolar targeting, and metabolism. This homogeneity of functional categories suggests that multiple changes are necessary to improve cellulase production and not simply a few clear-cut mutagenic events. Phenotype microarrays show that some of these mutations result in strong changes in the carbon assimilation pattern of the two mutants with respect to the wild type strain QM6a. Our analysis provides the first genome-wide insights into the changes induced by classical mutagenesis in a filamentous fungus, and suggests new areas for the generation of enhanced T. reesei strains for industrial applications such as biofuel production.

  5. Frequency of Usher syndrome type 1 in deaf children by massively parallel DNA sequencing

    PubMed Central

    Yoshimura, Hidekane; Miyagawa, Maiko; Kumakawa, Kozo; Nishio, Shin-ya; Usami, Shin-ichi

    2016-01-01

    Usher syndrome type 1 (USH1) is the most severe of the three USH subtypes due to its profound hearing loss, absent vestibular response and retinitis pigmentosa appearing at a prepubescent age. Six causative genes have been identified for USH1, making early diagnosis and therapy possible through DNA testing. Targeted exon sequencing of selected genes using massively parallel DNA sequencing (MPS) technology enables clinicians to systematically tackle previously intractable monogenic disorders and improve molecular diagnosis. Using MPS along with direct sequence analysis, we screened 227 unrelated non-syndromic deaf children and detected recessive mutations in USH1 causative genes in five patients (2.2%): three patients harbored MYO7A mutations and one each carried CDH23 or PCDH15 mutations. As indicated by an earlier genotype–phenotype correlation study of the CDH23 and PCDH15 genes, we considered the latter two patients to have USH1. Based on clinical findings, it was also highly likely that one patient with MYO7A mutations possessed USH1 due to a late onset age of walking. This first report describing the frequency (1.3–2.2%) of USH1 among non-syndromic deaf children highlights the importance of comprehensive genetic testing for early disease diagnosis. PMID:26791358

  6. Massively parallel sampling of lattice proteins reveals foundations of thermal adaptation.

    PubMed

    Venev, Sergey V; Zeldovich, Konstantin B

    2015-08-01

    Evolution of proteins in bacteria and archaea living in different conditions leads to significant correlations between amino acid usage and environmental temperature. The origins of these correlations are poorly understood, and an important question of protein theory, physics-based prediction of types of amino acids overrepresented in highly thermostable proteins, remains largely unsolved. Here, we extend the random energy model of protein folding by weighting the interaction energies of amino acids by their frequencies in protein sequences and predict the energy gap of proteins designed to fold well at elevated temperatures. To test the model, we present a novel scalable algorithm for simultaneous energy calculation for many sequences in many structures, targeting massively parallel computing architectures such as graphics processing unit. The energy calculation is performed by multiplying two matrices, one representing the complete set of sequences, and the other describing the contact maps of all structural templates. An implementation of the algorithm for the CUDA platform is available at http://www.github.com/kzeldovich/galeprot and calculates protein folding energies over 250 times faster than a single central processing unit. Analysis of amino acid usage in 64-mer cubic lattice proteins designed to fold well at different temperatures demonstrates an excellent agreement between theoretical and simulated values of energy gap. The theoretical predictions of temperature trends of amino acid frequencies are significantly correlated with bioinformatics data on 191 bacteria and archaea, and highlight protein folding constraints as a fundamental selection pressure during thermal adaptation in biological evolution. PMID:26254668

  7. Tracking the roots of cellulase hyperproduction by the fungus Trichoderma reesei using massively parallel DNA sequencing

    PubMed Central

    Le Crom, Stéphane; Schackwitz, Wendy; Pennacchio, Len; Magnuson, Jon K.; Culley, David E.; Collett, James R.; Martin, Joel; Druzhinina, Irina S.; Mathis, Hugues; Monot, Frédéric; Seiboth, Bernhard; Cherry, Barbara; Rey, Michael; Berka, Randy; Kubicek, Christian P.; Baker, Scott E.; Margeot, Antoine

    2009-01-01

    Trichoderma reesei (teleomorph Hypocrea jecorina) is the main industrial source of cellulases and hemicellulases harnessed for the hydrolysis of biomass to simple sugars, which can then be converted to biofuels such as ethanol and other chemicals. The highly productive strains in use today were generated by classical mutagenesis. To learn how cellulase production was improved by these techniques, we performed massively parallel sequencing to identify mutations in the genomes of two hyperproducing strains (NG14, and its direct improved descendant, RUT C30). We detected a surprisingly high number of mutagenic events: 223 single nucleotides variants, 15 small deletions or insertions, and 18 larger deletions, leading to the loss of more than 100 kb of genomic DNA. From these events, we report previously undocumented non-synonymous mutations in 43 genes that are mainly involved in nuclear transport, mRNA stability, transcription, secretion/vacuolar targeting, and metabolism. This homogeneity of functional categories suggests that multiple changes are necessary to improve cellulase production and not simply a few clear-cut mutagenic events. Phenotype microarrays show that some of these mutations result in strong changes in the carbon assimilation pattern of the two mutants with respect to the wild-type strain QM6a. Our analysis provides genome-wide insights into the changes induced by classical mutagenesis in a filamentous fungus and suggests areas for the generation of enhanced T. reesei strains for industrial applications such as biofuel production. PMID:19805272

  8. Tracking the roots of cellulase hyperproduction by the fungus Trichoderma reesei using massively parallel DNA sequencing.

    PubMed

    Le Crom, Stéphane; Schackwitz, Wendy; Pennacchio, Len; Magnuson, Jon K; Culley, David E; Collett, James R; Martin, Joel; Druzhinina, Irina S; Mathis, Hugues; Monot, Frédéric; Seiboth, Bernhard; Cherry, Barbara; Rey, Michael; Berka, Randy; Kubicek, Christian P; Baker, Scott E; Margeot, Antoine

    2009-09-22

    Trichoderma reesei (teleomorph Hypocrea jecorina) is the main industrial source of cellulases and hemicellulases harnessed for the hydrolysis of biomass to simple sugars, which can then be converted to biofuels such as ethanol and other chemicals. The highly productive strains in use today were generated by classical mutagenesis. To learn how cellulase production was improved by these techniques, we performed massively parallel sequencing to identify mutations in the genomes of two hyperproducing strains (NG14, and its direct improved descendant, RUT C30). We detected a surprisingly high number of mutagenic events: 223 single nucleotides variants, 15 small deletions or insertions, and 18 larger deletions, leading to the loss of more than 100 kb of genomic DNA. From these events, we report previously undocumented non-synonymous mutations in 43 genes that are mainly involved in nuclear transport, mRNA stability, transcription, secretion/vacuolar targeting, and metabolism. This homogeneity of functional categories suggests that multiple changes are necessary to improve cellulase production and not simply a few clear-cut mutagenic events. Phenotype microarrays show that some of these mutations result in strong changes in the carbon assimilation pattern of the two mutants with respect to the wild-type strain QM6a. Our analysis provides genome-wide insights into the changes induced by classical mutagenesis in a filamentous fungus and suggests areas for the generation of enhanced T. reesei strains for industrial applications such as biofuel production. PMID:19805272

  9. SIESTA-PEXSI: Massively parallel method for efficient and accurate ab initio materials simulation

    NASA Astrophysics Data System (ADS)

    Lin, Lin; Huhs, Georg; Garcia, Alberto; Yang, Chao

    2014-03-01

    We describe how to combine the pole expansion and selected inversion (PEXSI) technique with the SIESTA method, which uses numerical atomic orbitals for Kohn-Sham density functional theory (KSDFT) calculations. The PEXSI technique can efficiently utilize the sparsity pattern of the Hamiltonian matrix and the overlap matrix generated from codes such as SIESTA, and solves KSDFT without using cubic scaling matrix diagonalization procedure. The complexity of PEXSI scales at most quadratically with respect to the system size, and the accuracy is comparable to that obtained from full diagonalization. One distinct feature of PEXSI is that it achieves low order scaling without using the near-sightedness property and can be therefore applied to metals as well as insulators and semiconductors, at room temperature or even lower temperature. The PEXSI method is highly scalable, and the recently developed massively parallel PEXSI technique can make efficient usage of 10,000 ~100,000 processors on high performance machines. We demonstrate the performance the SIESTA-PEXSI method using several examples for large scale electronic structure calculation including long DNA chain and graphene-like structures with more than 20000 atoms. Funded by Luis Alvarez fellowship in LBNL, and DOE SciDAC project in partnership with BES.

  10. Massively parallel sequencing-based survey of eukaryotic community structures in Hiroshima Bay and Ishigaki Island.

    PubMed

    Nagai, Satoshi; Hida, Kohsuke; Urusizaki, Shingo; Takano, Yoshihito; Hongo, Yuki; Kameda, Takahiko; Abe, Kazuo

    2016-02-01

    In this study, we compared the eukaryote biodiversity between Hiroshima Bay and Ishigaki Island in Japanese coastal waters by using the massively parallel sequencing (MPS)-based technique to collect preliminary data. The relative abundance of Alveolata was highest in both localities, and the second highest groups were Stramenopiles, Opisthokonta, or Hacrobia, which varied depending on the samples considered. For microalgal phyla, the relative abundance of operational taxonomic units (OTUs) and the number of MPS were highest for Dinophyceae in both localities, followed by Bacillariophyceae in Hiroshima Bay, and by Bacillariophyceae or Chlorophyceae in Ishigaki Island. The number of detected OTUs in Hiroshima Bay and Ishigaki Island was 645 and 791, respectively, and 15.3% and 12.5% of the OTUs were common between the two localities. In the non-metric multidimensional scaling analysis, the samples from the two localities were plotted in different positions. In the dendrogram developed using similarity indices, the samples were clustered into different nodes based on localities with high multiscale bootstrap values, reflecting geographic differences in biodiversity. Thus, we succeeded in demonstrating biodiversity differences between the two localities, although the read numbers of the MPSs were not high enough. The corresponding analysis showed a clear seasonal change in the biodiversity of Hiroshima Bay but it was not clear in Ishigaki Island. Thus, the MPS-based technique shows a great advantage of high performance by detecting several hundreds of OTUs from a single sample, strongly suggesting the effectiveness to apply this technique to routine monitoring programs. PMID:26476293

  11. Massively parallel enzyme kinetics reveals the substrate recognition landscape of the metalloprotease ADAMTS13

    PubMed Central

    Kretz, Colin A.; Dai, Manhong; Soylemez, Onuralp; Yee, Andrew; Desch, Karl C.; Siemieniak, David; Tomberg, Kärt; Kondrashov, Fyodor A.; Meng, Fan; Ginsburg, David

    2015-01-01

    Proteases play important roles in many biologic processes and are key mediators of cancer, inflammation, and thrombosis. However, comprehensive and quantitative techniques to define the substrate specificity profile of proteases are lacking. The metalloprotease ADAMTS13 regulates blood coagulation by cleaving von Willebrand factor (VWF), reducing its procoagulant activity. A mutagenized substrate phage display library based on a 73-amino acid fragment of VWF was constructed, and the ADAMTS13-dependent change in library complexity was evaluated over reaction time points, using high-throughput sequencing. Reaction rate constants (kcat/KM) were calculated for nearly every possible single amino acid substitution within this fragment. This massively parallel enzyme kinetics analysis detailed the specificity of ADAMTS13 and demonstrated the critical importance of the P1-P1′ substrate residues while defining exosite binding domains. These data provided empirical evidence for the propensity for epistasis within VWF and showed strong correlation to conservation across orthologs, highlighting evolutionary selective pressures for VWF. PMID:26170332

  12. The minimal amount of starting DNA for Agilent's hybrid capture-based targeted massively parallel sequencing.

    PubMed

    Chung, Jongsuk; Son, Dae-Soon; Jeon, Hyo-Jeong; Kim, Kyoung-Mee; Park, Gahee; Ryu, Gyu Ha; Park, Woong-Yang; Park, Donghyun

    2016-01-01

    Targeted capture massively parallel sequencing is increasingly being used in clinical settings, and as costs continue to decline, use of this technology may become routine in health care. However, a limited amount of tissue has often been a challenge in meeting quality requirements. To offer a practical guideline for the minimum amount of input DNA for targeted sequencing, we optimized and evaluated the performance of targeted sequencing depending on the input DNA amount. First, using various amounts of input DNA, we compared commercially available library construction kits and selected Agilent's SureSelect-XT and KAPA Biosystems' Hyper Prep kits as the kits most compatible with targeted deep sequencing using Agilent's SureSelect custom capture. Then, we optimized the adapter ligation conditions of the Hyper Prep kit to improve library construction efficiency and adapted multiplexed hybrid selection to reduce the cost of sequencing. In this study, we systematically evaluated the performance of the optimized protocol depending on the amount of input DNA, ranging from 6.25 to 200 ng, suggesting the minimal input DNA amounts based on coverage depths required for specific applications. PMID:27220682

  13. Nanopantography: A new method for massively parallel nanopatterning over large areas

    NASA Astrophysics Data System (ADS)

    Xu, Lin

    Nanopantography, a radically new method for versatile fabrication of sub-20 nm features in a massively parallel fashion, represents a breakthrough in nanotechnology. The concept of this technique is to focus ion "beamlets" in parallel to write identical, arbitrary nano-patterns. Depending on the ion species, nanopatterns can be either etched, or deposited by nanopantography. An array of electrostatic lenses and a broad-area, directional, monoenergetic ion beam are required to implement nanopantography. This dissertation is dedicated to extracting an ion beam with desired properties from a plasma source and realizing nanopantography using this beam. A novel ion extraction strategy has been used to extract a nearly monoenergetic and energy-specified ion beam from a capacitively-coupled or an inductively-coupled, pulsed Ar plasma. The electron temperature decayed rapidly in the afterglow, resulting in uniform plasma potential, and minimal energy spread for ions extracted in the afterglow. Ion energy was controlled by a DC bias, or alternatively by a high-voltage pulse, on the ring electrode surrounding the plasma. Langmuir probe measurements indicated that this bias raised the plasma potential without heating the electrons in the afterglow. The energy spread was 3.4 eV (FWHM) For a peak ion beam energy of 102.0 eV. Similar results were obtained in an inductively-coupled pulsed plasma when the acceleration ring was pulsed exclusively during the afterglow. To achieve Ni deposition by nanopantography, higher Ni atom and ion densities are desired in the plasma source. An ionized physical vapor deposition (IPVD) system with a Ni internal RF coil and Ni target was used to introduce Ni atoms, and a fraction of the atoms becomes ionized in the high-density plasma. Optical emission spectroscopy (OAS) and optical absorption spectroscopy (OAS), in combination with global models, were used to determine the Ni atom and ion density. For a pressure of 8-20 mTorr and coil power of 40

  14. A Massive Parallel Variational Multiscale FEM Scheme Applied to Nonhydrostatic Atmospheric Dynamics

    NASA Astrophysics Data System (ADS)

    Vazquez, Mariano; Marras, Simone; Moragues, Margarida; Jorba, Oriol; Houzeaux, Guillaume; Aubry, Romain

    2010-05-01

    The solution of the fully compressible Euler equations of stratified flows is approached from the point of view of Computational Fluid Dynamics techniques. Specifically, the main aim of this contribution is the introduction of a Variational Multiscale Finite Element (CVMS-FE) approach to solve dry atmospheric dynamics effectively on massive parallel architectures with more than 1000 processors. The conservation form of the equations of motion is discretized in all directions with a Galerkin scheme with stabilization given by the compressible counterpart of the variational multiscale technique of Hughes [1] and Houzeaux et al. [2]. The justification of this effort is twofold: the search of optimal parallelization characteristics and linear scalability trends on petascale machines is one. The development of a numerical algorithm whose local nature helps maintaining minimal the communication among the processors implies, in fact, a large leap towards efficient parallel computing. Second, the rising trend to global models and models of higher spatial resolution naturally suggests the use of adaptive grids to only resolve zones of larger gradients while keeping the computational mesh properly coarse elsewhere (thus keeping the computational cost low). With these two hypotheses in mind, the finite element scheme presented here is an open option to the development of the next generation Numerical Weather Prediction (NWP) codes. This methodology is as new in Computational Fluid Dynamics for compressible flows at low Mach number as it is in Numerical Weather Prediction (NWP). We however mean to show its ability to maintain stability in the solution of thermal, gravity-driven flows in a stratified environment in the specific context of dry atmospheric dynamics. Standard two dimensional benchmarks are implemented and compared against the reference literature. In the context of thermal and gravity-driven flows in a neutral atmosphere, we present: (1) the density current

  15. 3-D magnetotelluric inversion including topography using deformed hexahedral edge finite elements and direct solvers parallelized on SMP computers - Part I: forward problem and parameter Jacobians

    NASA Astrophysics Data System (ADS)

    Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.

    2016-01-01

    We have developed an algorithm, which we call HexMT, for 3-D simulation and inversion of magnetotelluric (MT) responses using deformable hexahedral finite elements that permit incorporation of topography. Direct solvers parallelized on symmetric multiprocessor (SMP), single-chassis workstations with large RAM are used throughout, including the forward solution, parameter Jacobians and model parameter update. In Part I, the forward simulator and Jacobian calculations are presented. We use first-order edge elements to represent the secondary electric field (E), yielding accuracy O(h) for E and its curl (magnetic field). For very low frequencies or small material admittivities, the E-field requires divergence correction. With the help of Hodge decomposition, the correction may be applied in one step after the forward solution is calculated. This allows accurate E-field solutions in dielectric air. The system matrix factorization and source vector solutions are computed using the MKL PARDISO library, which shows good scalability through 24 processor cores. The factorized matrix is used to calculate the forward response as well as the Jacobians of electromagnetic (EM) field and MT responses using the reciprocity theorem. Comparison with other codes demonstrates accuracy of our forward calculations. We consider a popular conductive/resistive double brick structure, several synthetic topographic models and the natural topography of Mount Erebus in Antarctica. In particular, the ability of finite elements to represent smooth topographic slopes permits accurate simulation of refraction of EM waves normal to the slopes at high frequencies. Run-time tests of the parallelized algorithm indicate that for meshes as large as 176 × 176 × 70 elements, MT forward responses and Jacobians can be calculated in ˜1.5 hr per frequency. Together with an efficient inversion parameter step described in Part II, MT inversion problems of 200-300 stations are computable with total run times

  16. The Double Hierarchy Method. A parallel 3D contact method for the interaction of spherical particles with rigid FE boundaries using the DEM

    NASA Astrophysics Data System (ADS)

    Santasusana, Miquel; Irazábal, Joaquín; Oñate, Eugenio; Carbonell, Josep Maria

    2016-07-01

    In this work, we present a new methodology for the treatment of the contact interaction between rigid boundaries and spherical discrete elements (DE). Rigid body parts are present in most of large-scale simulations. The surfaces of the rigid parts are commonly meshed with a finite element-like (FE) discretization. The contact detection and calculation between those DE and the discretized boundaries is not straightforward and has been addressed by different approaches. The algorithm presented in this paper considers the contact of the DEs with the geometric primitives of a FE mesh, i.e. facet, edge or vertex. To do so, the original hierarchical method presented by Horner et al. (J Eng Mech 127(10):1027-1032, 2001) is extended with a new insight leading to a robust, fast and accurate 3D contact algorithm which is fully parallelizable. The implementation of the method has been developed in order to deal ideally with triangles and quadrilaterals. If the boundaries are discretized with another type of geometries, the method can be easily extended to higher order planar convex polyhedra. A detailed description of the procedure followed to treat a wide range of cases is presented. The description of the developed algorithm and its validation is verified with several practical examples. The parallelization capabilities and the obtained performance are presented with the study of an industrial application example.

  17. The Double Hierarchy Method. A parallel 3D contact method for the interaction of spherical particles with rigid FE boundaries using the DEM

    NASA Astrophysics Data System (ADS)

    Santasusana, Miquel; Irazábal, Joaquín; Oñate, Eugenio; Carbonell, Josep Maria

    2016-04-01

    In this work, we present a new methodology for the treatment of the contact interaction between rigid boundaries and spherical discrete elements (DE). Rigid body parts are present in most of large-scale simulations. The surfaces of the rigid parts are commonly meshed with a finite element-like (FE) discretization. The contact detection and calculation between those DE and the discretized boundaries is not straightforward and has been addressed by different approaches. The algorithm presented in this paper considers the contact of the DEs with the geometric primitives of a FE mesh, i.e. facet, edge or vertex. To do so, the original hierarchical method presented by Horner et al. (J Eng Mech 127(10):1027-1032, 2001) is extended with a new insight leading to a robust, fast and accurate 3D contact algorithm which is fully parallelizable. The implementation of the method has been developed in order to deal ideally with triangles and quadrilaterals. If the boundaries are discretized with another type of geometries, the method can be easily extended to higher order planar convex polyhedra. A detailed description of the procedure followed to treat a wide range of cases is presented. The description of the developed algorithm and its validation is verified with several practical examples. The parallelization capabilities and the obtained performance are presented with the study of an industrial application example.

  18. 3D geological to geophysical modelling and seismic wave propagation simulation: a case study from the Lalor Lake VMS (Volcanogenic Massive Sulphides) mining camp

    NASA Astrophysics Data System (ADS)

    Miah, Khalid; Bellefleur, Gilles

    2014-05-01

    The global demand for base metals, uranium and precious metals has been pushing mineral explorations at greater depth. Seismic techniques and surveys have become essential in finding and extracting mineral rich ore bodies, especially for deep VMS mining camps. Geophysical parameters collected from borehole logs and laboratory measurements of core samples provide preliminary information about the nature and type of subsurface lithologic units. Alteration halos formed during the hydrothermal alteration process contain ore bodies, which are of primary interests among geologists and mining industries. It is known that the alteration halos are easier to detect than the ore bodies itself. Many 3D geological models are merely projection of 2D surface geology based on outcrop inspections and geochemical analysis of a small number of core samples collected from the area. Since a large scale 3D multicomponent seismic survey can be prohibitively expensive, performance analysis of such geological models can be helpful in reducing exploration costs. In this abstract, we discussed challenges and constraints encountered in geophysical modelling of ore bodies and surrounding geologic structures from the available coarse 3D geological models of the Lalor Lake mining camp, located in northern Manitoba, Canada. Ore bodies in the Lalor lake VMS camp are rich in gold, zinc, lead and copper, and have an approximate weight of 27 Mt. For better understanding of physical parameters of these known ore bodies and potentially unknown ones at greater depth, we constructed a fine resolution 3D seismic model with dimensions: 2000 m (width), 2000 m (height), and 1500 m (vertical depth). Seismic properties (P-wave, S-wave velocities, and density) were assigned based on a previous rock properties study of the same mining camp. 3D finite-difference elastic wave propagation simulation was performed in the model using appropriate parameters. The generated synthetic 3D seismic data was then compared to

  19. Massively parallel neural circuits for stereoscopic color vision: encoding, decoding and identification.

    PubMed

    Lazar, Aurel A; Slutskiy, Yevgeniy B; Zhou, Yiyin

    2015-03-01

    Past work demonstrated how monochromatic visual stimuli could be faithfully encoded and decoded under Nyquist-type rate conditions. Color visual stimuli were then traditionally encoded and decoded in multiple separate monochromatic channels. The brain, however, appears to mix information about color channels at the earliest stages of the visual system, including the retina itself. If information about color is mixed and encoded by a common pool of neurons, how can colors be demixed and perceived? We present Color Video Time Encoding Machines (Color Video TEMs) for encoding color visual stimuli that take into account a variety of color representations within a single neural circuit. We then derive a Color Video Time Decoding Machine (Color Video TDM) algorithm for color demixing and reconstruction of color visual scenes from spikes produced by a population of visual neurons. In addition, we formulate Color Video Channel Identification Machines (Color Video CIMs) for functionally identifying color visual processing performed by a spiking neural circuit. Furthermore, we derive a duality between TDMs and CIMs that unifies the two and leads to a general theory of neural information representation for stereoscopic color vision. We provide examples demonstrating that a massively parallel color visual neural circuit can be first identified with arbitrary precision and its spike trains can be subsequently used to reconstruct the encoded stimuli. We argue that evaluation of the functional identification methodology can be effectively and intuitively performed in the stimulus space. In this space, a signal reconstructed from spike trains generated by the identified neural circuit can be compared to the original stimulus. PMID:25594573

  20. Detection of inherited mutations for breast and ovarian cancer using genomic capture and massively parallel sequencing

    PubMed Central

    Walsh, Tom; Lee, Ming K.; Casadei, Silvia; Thornton, Anne M.; Stray, Sunday M.; Pennil, Christopher; Nord, Alex S.; Mandell, Jessica B.; Swisher, Elizabeth M.; King, Mary-Claire

    2010-01-01

    Inherited loss-of-function mutations in the tumor suppressor genes BRCA1, BRCA2, and multiple other genes predispose to high risks of breast and/or ovarian cancer. Cancer-associated inherited mutations in these genes are collectively quite common, but individually rare or even private. Genetic testing for BRCA1 and BRCA2 mutations has become an integral part of clinical practice, but testing is generally limited to these two genes and to women with severe family histories of breast or ovarian cancer. To determine whether massively parallel, “next-generation” sequencing would enable accurate, thorough, and cost-effective identification of inherited mutations for breast and ovarian cancer, we developed a genomic assay to capture, sequence, and detect all mutations in 21 genes, including BRCA1 and BRCA2, with inherited mutations that predispose to breast or ovarian cancer. Constitutional genomic DNA from subjects with known inherited mutations, ranging in size from 1 to >100,000 bp, was hybridized to custom oligonucleotides and then sequenced using a genome analyzer. Analysis was carried out blind to the mutation in each sample. Average coverage was >1200 reads per base pair. After filtering sequences for quality and number of reads, all single-nucleotide substitutions, small insertion and deletion mutations, and large genomic duplications and deletions were detected. There were zero false-positive calls of nonsense mutations, frameshift mutations, or genomic rearrangements for any gene in any of the test samples. This approach enables widespread genetic testing and personalized risk assessment for breast and ovarian cancer. PMID:20616022

  1. Massively parallel sampling of lattice proteins reveals foundations of thermal adaptation

    NASA Astrophysics Data System (ADS)

    Venev, Sergey V.; Zeldovich, Konstantin B.

    2015-08-01

    Evolution of proteins in bacteria and archaea living in different conditions leads to significant correlations between amino acid usage and environmental temperature. The origins of these correlations are poorly understood, and an important question of protein theory, physics-based prediction of types of amino acids overrepresented in highly thermostable proteins, remains largely unsolved. Here, we extend the random energy model of protein folding by weighting the interaction energies of amino acids by their frequencies in protein sequences and predict the energy gap of proteins designed to fold well at elevated temperatures. To test the model, we present a novel scalable algorithm for simultaneous energy calculation for many sequences in many structures, targeting massively parallel computing architectures such as graphics processing unit. The energy calculation is performed by multiplying two matrices, one representing the complete set of sequences, and the other describing the contact maps of all structural templates. An implementation of the algorithm for the CUDA platform is available at http://www.github.com/kzeldovich/galeprot and calculates protein folding energies over 250 times faster than a single central processing unit. Analysis of amino acid usage in 64-mer cubic lattice proteins designed to fold well at different temperatures demonstrates an excellent agreement between theoretical and simulated values of energy gap. The theoretical predictions of temperature trends of amino acid frequencies are significantly correlated with bioinformatics data on 191 bacteria and archaea, and highlight protein folding constraints as a fundamental selection pressure during thermal adaptation in biological evolution.

  2. A SNP panel for identity and kinship testing using massive parallel sequencing.

    PubMed

    Grandell, Ida; Samara, Raed; Tillmar, Andreas O

    2016-07-01

    Within forensic genetics, there is still a need for supplementary DNA marker typing in order to increase the power to solve cases for both identity testing and complex kinship issues. One major disadvantage with current capillary electrophoresis (CE) methods is the limitation in DNA marker multiplex capability. By utilizing massive parallel sequencing (MPS) technology, this capability can, however, be increased. We have designed a customized GeneRead DNASeq SNP panel (Qiagen) of 140 previously published autosomal forensically relevant identity SNPs for analysis using MPS. One single amplification step was followed by library preparation using the GeneRead Library Prep workflow (Qiagen). The sequencing was performed on a MiSeq System (Illumina), and the bioinformatic analyses were done using the software Biomedical Genomics Workbench (CLC Bio, Qiagen). Forty-nine individuals from a Swedish population were genotyped in order to establish genotype frequencies and to evaluate the performance of the assay. The analyses showed to have a balanced coverage among the included loci, and the heterozygous balance showed to have less than 0.5 % outliers. Analyses of dilution series of the 2800M Control DNA gave reproducible results down to 0.2 ng DNA input. In addition, typing of FTA samples and bone samples was performed with promising results. Further studies and optimizations are, however, required for a more detailed evaluation of the performance of degraded and PCR-inhibited forensic samples. In summary, the assay offers a straightforward sample-to-genotype workflow and could be useful to gain information in forensic casework, for both identity testing and in order to solve complex kinship issues. PMID:26932869

  3. Implementation of a Message Passing Interface into a Cloud-Resolving Model for Massively Parallel Computing

    NASA Technical Reports Server (NTRS)

    Juang, Hann-Ming Henry; Tao, Wei-Kuo; Zeng, Xi-Ping; Shie, Chung-Lin; Simpson, Joanne; Lang, Steve

    2004-01-01

    The capability for massively parallel programming (MPP) using a message passing interface (MPI) has been implemented into a three-dimensional version of the Goddard Cumulus Ensemble (GCE) model. The design for the MPP with MPI uses the concept of maintaining similar code structure between the whole domain as well as the portions after decomposition. Hence the model follows the same integration for single and multiple tasks (CPUs). Also, it provides for minimal changes to the original code, so it is easily modified and/or managed by the model developers and users who have little knowledge of MPP. The entire model domain could be sliced into one- or two-dimensional decomposition with a halo regime, which is overlaid on partial domains. The halo regime requires that no data be fetched across tasks during the computational stage, but it must be updated before the next computational stage through data exchange via MPI. For reproducible purposes, transposing data among tasks is required for spectral transform (Fast Fourier Transform, FFT), which is used in the anelastic version of the model for solving the pressure equation. The performance of the MPI-implemented codes (i.e., the compressible and anelastic versions) was tested on three different computing platforms. The major results are: 1) both versions have speedups of about 99% up to 256 tasks but not for 512 tasks; 2) the anelastic version has better speedup and efficiency because it requires more computations than that of the compressible version; 3) equal or approximately-equal numbers of slices between the x- and y- directions provide the fastest integration due to fewer data exchanges; and 4) one-dimensional slices in the x-direction result in the slowest integration due to the need for more memory relocation for computation.

  4. Identification of cancer/testis-antigen genes by massively parallel signature sequencing

    PubMed Central

    Chen, Yao-Tseng; Scanlan, Matthew J.; Venditti, Charis A.; Chua, Ramon; Theiler, Gregory; Stevenson, Brian J.; Iseli, Christian; Gure, Ali O.; Vasicek, Tom; Strausberg, Robert L.; Jongeneel, C. Victor; Old, Lloyd J.; Simpson, Andrew J. G.

    2005-01-01

    Massively parallel signature sequencing (MPSS) generates millions of short sequence tags corresponding to transcripts from a single RNA preparation. Most MPSS tags can be unambiguously assigned to genes, thereby generating a comprehensive expression profile of the tissue of origin. From the comparison of MPSS data from 32 normal human tissues, we identified 1,056 genes that are predominantly expressed in the testis. Further evaluation by using MPSS tags from cancer cell lines and EST data from a wide variety of tumors identified 202 of these genes as candidates for encoding cancer/testis (CT) antigens. Of these genes, the expression in normal tissues was assessed by RT-PCR in a subset of 166 intron-containing genes, and those with confirmed testis-predominant expression were further evaluated for their expression in 21 cancer cell lines. Thus, 20 CT or CT-like genes were identified, with several exhibiting expression in five or more of the cancer cell lines examined. One of these genes is a member of a CT gene family that we designated as CT45. The CT45 family comprises six highly similar (>98% cDNA identity) genes that are clustered in tandem within a 125-kb region on Xq26.3. CT45 was found to be frequently expressed in both cancer cell lines and lung cancer specimens. Thus, MPSS analysis has resulted in a significant extension of our knowledge of CT antigens, leading to the discovery of a distinctive X-linked CT-antigen gene family. PMID:15905330

  5. Massively parallel computation of lattice associative memory classifiers on multicore processors

    NASA Astrophysics Data System (ADS)

    Ritter, Gerhard X.; Schmalz, Mark S.; Hayden, Eric T.

    2011-09-01

    Over the past quarter century, concepts and theory derived from neural networks (NNs) have featured prominently in the literature of pattern recognition. Implementationally, classical NNs based on the linear inner product can present performance challenges due to the use of multiplication operations. In contrast, NNs having nonlinear kernels based on Lattice Associative Memories (LAM) theory tend to concentrate primarily on addition and maximum/minimum operations. More generally, the emergence of LAM-based NNs, with their superior information storage capacity, fast convergence and training due to relatively lower computational cost, as well as noise-tolerant classification has extended the capabilities of neural networks far beyond the limited applications potential of classical NNs. This paper explores theory and algorithmic approaches for the efficient computation of LAM-based neural networks, in particular lattice neural nets and dendritic lattice associative memories. Of particular interest are massively parallel architectures such as multicore CPUs and graphics processing units (GPUs). Originally developed for video gaming applications, GPUs hold the promise of high computational throughput without compromising numerical accuracy. Unfortunately, currently-available GPU architectures tend to have idiosyncratic memory hierarchies that can produce unacceptably high data movement latencies for relatively simple operations, unless careful design of theory and algorithms is employed. Advantageously, some GPUs (e.g., the Nvidia Fermi GPU) are optimized for efficient streaming computation (e.g., concurrent multiply and add operations). As a result, the linear or nonlinear inner product structures of NNs are inherently suited to multicore GPU computational capabilities. In this paper, the authors' recent research in lattice associative memories and their implementation on multicores is overviewed, with results that show utility for a wide variety of pattern

  6. Identifying Children With Poor Cochlear Implantation Outcomes Using Massively Parallel Sequencing

    PubMed Central

    Wu, Chen-Chi; Lin, Yin-Hung; Liu, Tien-Chen; Lin, Kai-Nan; Yang, Wei-Shiung; Hsu, Chuan-Jen; Chen, Pei-Lung; Wu, Che-Ming

    2015-01-01

    Abstract Cochlear implantation is currently the treatment of choice for children with severe to profound hearing impairment. However, the outcomes with cochlear implants (CIs) vary significantly among recipients. The purpose of the present study is to identify the genetic determinants of poor CI outcomes. Twelve children with poor CI outcomes (the “cases”) and 30 “matched controls” with good CI outcomes were subjected to comprehensive genetic analyses using massively parallel sequencing, which targeted 129 known deafness genes. Audiological features, imaging findings, and auditory/speech performance with CIs were then correlated to the genetic diagnoses. We identified genetic variants which are associated with poor CI outcomes in 7 (58%) of the 12 cases; 4 cases had bi-allelic PCDH15 pathogenic mutations and 3 cases were homozygous for the DFNB59 p.G292R variant. Mutations in the WFS1, GJB3, ESRRB, LRTOMT, MYO3A, and POU3F4 genes were detected in 7 (23%) of the 30 matched controls. The allele frequencies of PCDH15 and DFNB59 variants were significantly higher in the cases than in the matched controls (both P < 0.001). In the 7 CI recipients with PCDH15 or DFNB59 variants, otoacoustic emissions were absent in both ears, and imaging findings were normal in all 7 implanted ears. PCDH15 or DFNB59 variants are associated with poor CI performance, yet children with PCDH15 or DFNB59 variants might show clinical features indistinguishable from those of other typical pediatric CI recipients. Accordingly, genetic examination is indicated in all CI candidates before operation. PMID:26166082

  7. Application of Massively Parallel Sequencing to Genetic Diagnosis in Multiplex Families with Idiopathic Sensorineural Hearing Impairment

    PubMed Central

    Wu, Chen-Chi; Lin, Yin-Hung; Lu, Ying-Chang; Chen, Pei-Jer; Yang, Wei-Shiung; Hsu, Chuan-Jen; Chen, Pei-Lung

    2013-01-01

    Despite the clinical utility of genetic diagnosis to address idiopathic sensorineural hearing impairment (SNHI), the current strategy for screening mutations via Sanger sequencing suffers from the limitation that only a limited number of DNA fragments associated with common deafness mutations can be genotyped. Consequently, a definitive genetic diagnosis cannot be achieved in many families with discernible family history. To investigate the diagnostic utility of massively parallel sequencing (MPS), we applied the MPS technique to 12 multiplex families with idiopathic SNHI in which common deafness mutations had previously been ruled out. NimbleGen sequence capture array was designed to target all protein coding sequences (CDSs) and 100 bp of the flanking sequence of 80 common deafness genes. We performed MPS on the Illumina HiSeq2000, and applied BWA, SAMtools, Picard, GATK, Variant Tools, ANNOVAR, and IGV for bioinformatics analyses. Initial data filtering with allele frequencies (<5% in the 1000 Genomes Project and 5400 NHLBI exomes) and PolyPhen2/SIFT scores (>0.95) prioritized 5 indels (insertions/deletions) and 36 missense variants in the 12 multiplex families. After further validation by Sanger sequencing, segregation pattern, and evolutionary conservation of amino acid residues, we identified 4 variants in 4 different genes, which might lead to SNHI in 4 families compatible with autosomal dominant inheritance. These included GJB2 p.R75Q, MYO7A p.T381M, KCNQ4 p.S680F, and MYH9 p.E1256K. Among them, KCNQ4 p.S680F and MYH9 p.E1256K were novel. In conclusion, MPS allows genetic diagnosis in multiplex families with idiopathic SNHI by detecting mutations in relatively uncommon deafness genes. PMID:23451214

  8. Transcriptional analysis of the Arabidopsis ovule by massively parallel signature sequencing

    PubMed Central

    Sánchez-León, Nidia; Arteaga-Vázquez, Mario; Alvarez-Mejía, César; Mendiola-Soto, Javier; Durán-Figueroa, Noé; Rodríguez-Leal, Daniel; Rodríguez-Arévalo, Isaac; García-Campayo, Vicenta; García-Aguilar, Marcelina; Olmedo-Monfil, Vianey; Arteaga-Sánchez, Mario; Martínez de la Vega, Octavio; Nobuta, Kan; Vemaraju, Kalyan; Meyers, Blake C.; Vielle-Calzada, Jean-Philippe

    2012-01-01

    The life cycle of flowering plants alternates between a predominant sporophytic (diploid) and an ephemeral gametophytic (haploid) generation that only occurs in reproductive organs. In Arabidopsis thaliana, the female gametophyte is deeply embedded within the ovule, complicating the study of the genetic and molecular interactions involved in the sporophytic to gametophytic transition. Massively parallel signature sequencing (MPSS) was used to conduct a quantitative large-scale transcriptional analysis of the fully differentiated Arabidopsis ovule prior to fertilization. The expression of 9775 genes was quantified in wild-type ovules, additionally detecting >2200 new transcripts mapping to antisense or intergenic regions. A quantitative comparison of global expression in wild-type and sporocyteless (spl) individuals resulted in 1301 genes showing 25-fold reduced or null activity in ovules lacking a female gametophyte, including those encoding 92 signalling proteins, 75 transcription factors, and 72 RNA-binding proteins not reported in previous studies based on microarray profiling. A combination of independent genetic and molecular strategies confirmed the differential expression of 28 of them, showing that they are either preferentially active in the female gametophyte, or dependent on the presence of a female gametophyte to be expressed in sporophytic cells of the ovule. Among 18 genes encoding pentatricopeptide-repeat proteins (PPRs) that show transcriptional activity in wild-type but not spl ovules, CIHUATEOTL (At4g38150) is specifically expressed in the female gametophyte and necessary for female gametogenesis. These results expand the nature of the transcriptional universe present in the ovule of Arabidopsis, and offer a large-scale quantitative reference of global expression for future genomic and developmental studies. PMID:22442422

  9. Systematic comparison of three genomic enrichment methods for massively parallel DNA sequencing.

    PubMed

    Teer, Jamie K; Bonnycastle, Lori L; Chines, Peter S; Hansen, Nancy F; Aoyama, Natsuyo; Swift, Amy J; Abaan, Hatice Ozel; Albert, Thomas J; Margulies, Elliott H; Green, Eric D; Collins, Francis S; Mullikin, James C; Biesecker, Leslie G

    2010-10-01

    Massively parallel DNA sequencing technologies have greatly increased our ability to generate large amounts of sequencing data at a rapid pace. Several methods have been developed to enrich for genomic regions of interest for targeted sequencing. We have compared three of these methods: Molecular Inversion Probes (MIP), Solution Hybrid Selection (SHS), and Microarray-based Genomic Selection (MGS). Using HapMap DNA samples, we compared each of these methods with respect to their ability to capture an identical set of exons and evolutionarily conserved regions associated with 528 genes (2.61 Mb). For sequence analysis, we developed and used a novel Bayesian genotype-assigning algorithm, Most Probable Genotype (MPG). All three capture methods were effective, but sensitivities (percentage of targeted bases associated with high-quality genotypes) varied for an equivalent amount of pass-filtered sequence: for example, 70% (MIP), 84% (SHS), and 91% (MGS) for 400 Mb. In contrast, all methods yielded similar accuracies of >99.84% when compared to Infinium 1M SNP BeadChip-derived genotypes and >99.998% when compared to 30-fold coverage whole-genome shotgun sequencing data. We also observed a low false-positive rate with all three methods; of the heterozygous positions identified by each of the capture methods, >99.57% agreed with 1M SNP BeadChip, and >98.840% agreed with the whole-genome shotgun data. In addition, we successfully piloted the genomic enrichment of a set of 12 pooled samples via the MGS method using molecular bar codes. We find that these three genomic enrichment methods are highly accurate and practical, with sensitivities comparable to that of 30-fold coverage whole-genome shotgun data. PMID:20810667

  10. Water mass-specificity of bacterial communities in the North Atlantic revealed by massively parallel sequencing

    PubMed Central

    Agogué, Hélène; Lamy, Dominique; Neal, Phillip R.; Sogin, Mitchell L.; Herndl, Gerhard J.

    2011-01-01

    Bacterial assemblages from subsurface (100 m depth), meso- (200–1000 m depth) and bathy-pelagic (below 1000 m depth) zones at 10 stations along a North Atlantic Ocean transect from 60°N to 5°S were characterized using massively parallel pyrotag sequencing of the V6 region of the 16S rRNA gene (V6 pyrotags). In a dataset of more than 830,000 pyrotags we identified 10,780 OTUs of which 52% were singletons. The singletons accounted for less than 2% of the OTU abundance, while the 100 and 1,000 most abundant OTUs represented 80% and 96%, respectively, of all recovered OTUs. Non-metric Multi-Dimensional Scaling and Canonical Correspondence Analysis of all the OTUs excluding the singletons revealed a clear clustering of the bacterial communities according to the water masses. More than 80% of the 1,000 most abundant OTUs corresponded to Proteobacteria of which 55% were Alphaproteobacteria, mostly composed of the SAR11 cluster. Gammaproteobacteria increased with depth and included a relatively large number of OTUs belonging to Alteromonadales and Oceanospirillales. The bathypelagic zone showed higher taxonomic evenness than the overlying waters, albeit bacterial diversity was remarkably variable. Both abundant and low-abundance OTUs were responsible for the distinct bacterial communities characterizing the major deep-water masses. Taken together, our results reveal that deep-water masses act as bio-oceanographic islands for bacterioplankton leading to water mass-specific bacterial communities in the deep waters of the Atlantic. PMID:21143328

  11. The complete genome of an individual by massively parallel DNA sequencing.

    PubMed

    Wheeler, David A; Srinivasan, Maithreyan; Egholm, Michael; Shen, Yufeng; Chen, Lei; McGuire, Amy; He, Wen; Chen, Yi-Ju; Makhijani, Vinod; Roth, G Thomas; Gomes, Xavier; Tartaro, Karrie; Niazi, Faheem; Turcotte, Cynthia L; Irzyk, Gerard P; Lupski, James R; Chinault, Craig; Song, Xing-zhi; Liu, Yue; Yuan, Ye; Nazareth, Lynne; Qin, Xiang; Muzny, Donna M; Margulies, Marcel; Weinstock, George M; Gibbs, Richard A; Rothberg, Jonathan M

    2008-04-17

    The association of genetic variation with disease and drug response, and improvements in nucleic acid technologies, have given great optimism for the impact of 'genomic medicine'. However, the formidable size of the diploid human genome, approximately 6 gigabases, has prevented the routine application of sequencing methods to deciphering complete individual human genomes. To realize the full potential of genomics for human health, this limitation must be overcome. Here we report the DNA sequence of a diploid genome of a single individual, James D. Watson, sequenced to 7.4-fold redundancy in two months using massively parallel sequencing in picolitre-size reaction vessels. This sequence was completed in two months at approximately one-hundredth of the cost of traditional capillary electrophoresis methods. Comparison of the sequence to the reference genome led to the identification of 3.3 million single nucleotide polymorphisms, of which 10,654 cause amino-acid substitution within the coding sequence. In addition, we accurately identified small-scale (2-40,000 base pair (bp)) insertion and deletion polymorphism as well as copy number variation resulting in the large-scale gain and loss of chromosomal segments ranging from 26,000 to 1.5 million base pairs. Overall, these results agree well with recent results of sequencing of a single individual by traditional methods. However, in addition to being faster and significantly less expensive, this sequencing technology avoids the arbitrary loss of genomic sequences inherent in random shotgun sequencing by bacterial cloning because it amplifies DNA in a cell-free system. As a result, we further demonstrate the acquisition of novel human sequence, including novel genes not previously identified by traditional genomic sequencing. This is the first genome sequenced by next-generation technologies. Therefore it is a pilot for the future challenges of 'personalized genome sequencing'. PMID:18421352

  12. Investigations on the usefulness of the Massively Parallel Processor for study of electronic properties of atomic and condensed matter systems

    NASA Technical Reports Server (NTRS)

    Das, T. P.

    1988-01-01

    The usefulness of the Massively Parallel Processor (MPP) for investigation of electronic structures and hyperfine properties of atomic and condensed matter systems was explored. The major effort was directed towards the preparation of algorithms for parallelization of the computational procedure being used on serial computers for electronic structure calculations in condensed matter systems. Detailed descriptions of investigations and results are reported, including MPP adaptation of self-consistent charge extended Hueckel (SCCEH) procedure, MPP adaptation of the first-principles Hartree-Fock cluster procedure for electronic structures of large molecules and solid state systems, and MPP adaptation of the many-body procedure for atomic systems.

  13. A Serious Game for Massive Training and Assessment of French Soldiers Involved in Forward Combat Casualty Care (3D-SC1): Development and Deployment

    PubMed Central

    Mérat, Stéphane; Malgras, Brice; Petit, Ludovic; Queran, Xavier; Bay, Christian; Boutonnet, Mathieu; Jault, Patrick; Ausset, Sylvain; Auroy, Yves; Perez, Jean Paul; Tesnière, Antoine; Pons, François; Mignon, Alexandre

    2016-01-01

    Background The French Military Health Service has standardized its military prehospital care policy in a ‘‘Sauvetage au Combat’’ (SC) program (Forward Combat Casualty Care). A major part of the SC training program relies on simulations, which are challenging and costly when dealing with more than 80,000 soldiers. In 2014, the French Military Health Service decided to develop and deploy 3D-SC1, a serious game (SG) intended to train and assess soldiers managing the early steps of SC. Objectives The purpose of this paper is to describe the creation and production of 3D-SC1 and to present its deployment. Methods A group of 10 experts and the Paris Descartes University Medical Simulation Department spin-off, Medusims, coproduced 3D-SC1. Medusims are virtual medical experiences using 3D real-time videogame technology (creation of an environment and avatars in different scenarios) designed for educational purposes (training and assessment) to simulate medical situations. These virtual situations have been created based on real cases and tested on mannequins by experts. Trainees are asked to manage specific situations according to best practices recommended by SC, and receive a score and a personalized feedback regarding their performance. Results The scenario simulated in the SG is an attack on a patrol of 3 soldiers with an improvised explosive device explosion as a result of which one soldier dies, one soldier is slightly stunned, and the third soldier experiences a leg amputation and other injuries. This scenario was first tested with mannequins in military simulation centers, before being transformed into a virtual 3D real-time scenario using a multi-support, multi–operating system platform, Unity. Processes of gamification and scoring were applied, with 2 levels of difficulty. A personalized debriefing was integrated at the end of the simulations. The design and production of the SG took 9 months. The deployment, performed in 3 months, has reached 84 of 96

  14. Feasibility of 3-D MRI of Proximal Femur Microarchitecture at 3 T using 26 Receive Elements without and with Parallel Imaging

    PubMed Central

    Chang, Gregory; Deniz, Cem; Honig, Stephen; Rajapakse, Chamith S.; Egol, Kenneth; Regatte, Ravinder R.; Brown, Ryan

    2013-01-01

    Purpose High-resolution imaging of deeper anatomy such as the hip is challenging due to low signal-to-noise ratio (SNR), necessitating long scan times. Multi-element coils can increase SNR and reduce scan time through parallel imaging (PI). We assessed the feasibility of using a 26-element receive coil setup to perform 3 T MRI of proximal femur microarchitecture without and with PI. Materials and Methods This study had institutional review board approval. We scanned thirteen subjects on a 3 T scanner using 26 receive-elements and a 3-D FLASH sequence without and with PI (acceleration factors (AF) 2, 3, 4). We assessed SNR, depiction of individual trabeculae, PI performance (1/g-factor), and image quality with PI (1=non-visualization to 5=excellent). Results SNR maps demonstrate higher SNR for the 26-element setup compared to a 12-element setup for hip MRI. Without PI, individual proximal femur trabeculae were well-depicted, including microarchitectural deterioration in osteoporotic subjects. With PI, 1/g values for the 26-element/12-element receive-setup were 0.71/0.45, 0.56/0.25, and 0.44/0.08 at AF2, AF3, and AF4, respectively. Image quality was: AF1, excellent (4.8±0.4); AF2, good (4.2±1.0); AF3, average (3.3±1.0); AF4, non-visualization (1.4±0.9). Conclusion A 26-element receive-setup permits 3 T MRI of proximal femur microarchitecture with good image quality up to PI AF2. PMID:24711013

  15. Underlying Data for Sequencing the Mitochondrial Genome with the Massively Parallel Sequencing Platform Ion Torrent™ PGM™

    PubMed Central

    2015-01-01

    Background Massively parallel sequencing (MPS) technologies have the capacity to sequence targeted regions or whole genomes of multiple nucleic acid samples with high coverage by sequencing millions of DNA fragments simultaneously. Compared with Sanger sequencing, MPS also can reduce labor and cost on a per nucleotide basis and indeed on a per sample basis. In this study, whole genomes of human mitochondria (mtGenome) were sequenced on the Personal Genome Machine (PGMTM) (Life Technologies, San Francisco, CA), the out data were assessed, and the results were compared with data previously generated on the MiSeqTM (Illumina, San Diego, CA). The objectives of this paper were to determine the feasibility, accuracy, and reliability of sequence data obtained from the PGM. Results 24 samples were multiplexed (in groups of six) and sequenced on the at least 10 megabase throughput 314 chip. The depth of coverage pattern was similar among all 24 samples; however the coverage across the genome varied. For strand bias, the average ratio of coverage between the forward and reverse strands at each nucleotide position indicated that two-thirds of the positions of the genome had ratios that were greater than 0.5. A few sites had more extreme strand bias. Another observation was that 156 positions had a false deletion rate greater than 0.15 in one or more individuals. There were 31-98 (SNP) mtGenome variants observed per sample for the 24 samples analyzed. The total 1237 (SNP) variants were concordant between the results from the PGM and MiSeq. The quality scores for haplogroup assignment for all 24 samples ranged between 88.8%-100%. Conclusions In this study, mtDNA sequence data generated from the PGM were analyzed and the output evaluated. Depth of coverage variation and strand bias were identified but generally were infrequent and did not impact reliability of variant calls. Multiplexing of samples was demonstrated which can improve throughput and reduce cost per sample analyzed

  16. Evaluation of Two Highly-Multiplexed Custom Panels for Massively Parallel Semiconductor Sequencing on Paraffin DNA

    PubMed Central

    Kotoula, Vassiliki; Lyberopoulou, Aggeliki; Papadopoulou, Kyriaki; Charalambous, Elpida; Alexopoulou, Zoi; Gakou, Chryssa; Lakis, Sotiris; Tsolaki, Eleftheria; Lilakos, Konstantinos; Fountzilas, George

    2015-01-01

    Background—Aim Massively parallel sequencing (MPS) holds promise for expanding cancer translational research and diagnostics. As yet, it has been applied on paraffin DNA (FFPE) with commercially available highly multiplexed gene panels (100s of DNA targets), while custom panels of low multiplexing are used for re-sequencing. Here, we evaluated the performance of two highly multiplexed custom panels on FFPE DNA. Methods Two custom multiplex amplification panels (B, 373 amplicons; T, 286 amplicons) were coupled with semiconductor sequencing on DNA samples from FFPE breast tumors and matched peripheral blood samples (n samples: 316; n libraries: 332). The two panels shared 37% DNA targets (common or shifted amplicons). Panel performance was evaluated in paired sample groups and quartets of libraries, where possible. Results Amplicon read ratios yielded similar patterns per gene with the same panel in FFPE and blood samples; however, performance of common amplicons differed between panels (p<0.001). FFPE genotypes were compared for 1267 coding and non-coding variant replicates, 999 out of which (78.8%) were concordant in different paired sample combinations. Variant frequency was highly reproducible (Spearman’s rho 0.959). Repeatedly discordant variants were of high coverage / low frequency (p<0.001). Genotype concordance was (a) high, for intra-run duplicates with the same panel (mean±SD: 97.2±4.7, 95%CI: 94.8–99.7, p<0.001); (b) modest, when the same DNA was analyzed with different panels (mean±SD: 81.1±20.3, 95%CI: 66.1–95.1, p = 0.004); and (c) low, when different DNA samples from the same tumor were compared with the same panel (mean±SD: 59.9±24.0; 95%CI: 43.3–76.5; p = 0.282). Low coverage / low frequency variants were validated with Sanger sequencing even in samples with unfavourable DNA quality. Conclusions Custom MPS may yield novel information on genomic alterations, provided that data evaluation is adjusted to tumor tissue FFPE DNA. To this

  17. Performance analysis of three dimensional integral equation computations on a massively parallel computer. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Logan, Terry G.

    1994-01-01

    The purpose of this study is to investigate the performance of the integral equation computations using numerical source field-panel method in a massively parallel processing (MPP) environment. A comparative study of computational performance of the MPP CM-5 computer and conventional Cray-YMP supercomputer for a three-dimensional flow problem is made. A serial FORTRAN code is converted into a parallel CM-FORTRAN code. Some performance results are obtained on CM-5 with 32, 62, 128 nodes along with those on Cray-YMP with a single processor. The comparison of the performance indicates that the parallel CM-FORTRAN code near or out-performs the equivalent serial FORTRAN code for some cases.

  18. System and method for representing and manipulating three-dimensional objects on massively parallel architectures

    DOEpatents

    Karasick, M.S.; Strip, D.R.

    1996-01-30

    A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modeling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modeling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modeling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication. 8 figs.

  19. System and method for representing and manipulating three-dimensional objects on massively parallel architectures

    DOEpatents

    Karasick, Michael S.; Strip, David R.

    1996-01-01

    A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modelling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modelling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modelling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication.

  20. Method and apparatus for obtaining stack traceback data for multiple computing nodes of a massively parallel computer system

    DOEpatents

    Gooding, Thomas Michael; McCarthy, Patrick Joseph

    2010-03-02

    A data collector for a massively parallel computer system obtains call-return stack traceback data for multiple nodes by retrieving partial call-return stack traceback data from each node, grouping the nodes in subsets according to the partial traceback data, and obtaining further call-return stack traceback data from a representative node or nodes of each subset. Preferably, the partial data is a respective instruction address from each node, nodes having identical instruction address being grouped together in the same subset. Preferably, a single node of each subset is chosen and full stack traceback data is retrieved from the call-return stack within the chosen node.

  1. A Precision Dose Control Circuit for Maskless E-Beam Lithography With Massively Parallel Vertically Aligned Carbon Nanofibers

    SciTech Connect

    Eliza, Sazia A.; Islam, Syed K; Rahman, Touhidur; Bull, Nora D; Blalock, Benjamin; Baylor, Larry R; Ericson, Milton Nance; Gardner, Walter L

    2011-01-01

    This paper describes a highly accurate dose control circuit (DCC) for the emission of a desired number of electrons from vertically aligned carbon nanofibers (VACNFs) in a massively parallel maskless e-beam lithography system. The parasitic components within the VACNF device cause a premature termination of the electron emission, resulting in underexposure of the photoresist. In this paper, we compensate for the effects of the parasitic components and noise while reducing the area of the chip and achieving a precise count of emitted electrons from the VACNFs to obtain the optimum dose for the e-beam lithography.

  2. Performance of the UCAN2 Gyrokinetic Particle In Cell (PIC) Code on Two Massively Parallel Mainframes with Intel ``Sandy Bridge'' Processors

    NASA Astrophysics Data System (ADS)

    Leboeuf, Jean-Noel; Decyk, Viktor; Newman, David; Sanchez, Raul

    2013-10-01

    The massively parallel, 2D domain-decomposed, nonlinear, 3D, toroidal, electrostatic, gyrokinetic, Particle in Cell (PIC), Cartesian geometry UCAN2 code, with particle ions and adiabatic electrons, has been ported to two emerging mainframes. These two computers, one at NERSC in the US built by Cray named Edison and the other at the Barcelona Supercomputer Center (BSC) in Spain built by IBM named MareNostrum III (MNIII) just happen to share the same Intel ``Sandy Bridge'' processors. The successful port of UCAN2 to MNIII which came online first has enabled us to be up and running efficiently in record time on Edison. Overall, the performance of UCAN2 on Edison is superior to that on MNIII, particularly at large numbers of processors (>1024) for the same Intel IFORT compiler. This appears to be due to different MPI modules (OpenMPI on MNIII and MPICH2 on Edison) and different interconnection networks (Infiniband on MNIII and Cray's Aries on Edison) on the two mainframes. Details of these ports and comparative benchmarks are presented. Work supported by OFES, USDOE, under contract no. DE-FG02-04ER54741 with the University of Alaska at Fairbanks.

  3. Efficient Extraction of Regional Subsets from Massive Climate Datasets using Parallel IO

    SciTech Connect

    Daily, Jeffrey A.; Schuchardt, Karen L.; Palmer, Bruce J.

    2010-09-16

    The size of datasets produced by current climate models is increasing rapidly to the scale of petabytes. To handle data at this scale parallel analysis tools are required, however the majority of climate analysis software remains at the scale of workstations. Further, many climate analysis tools adequately process regularly gridded data but lack sufficient features when handling unstructured grids. This paper presents a data-parallel subsetter capable of correctly handling unstructured grids while scaling to over 2000 cores. The approach is based on the partitioned global address space (PGAS) parallel programming model and one-sided communication. The paper demonstrates that IO remains the single greatest bottleneck for this domain of applications and that parallel analysis of climate data succeeds in practice.

  4. Massively parallel implementation of the multi-reference Brillouin-Wigner CCSD method

    SciTech Connect

    Brabec, Jiri; Krishnamoorthy, Sriram; van Dam, Hubertus JJ; Kowalski, Karol; Pittner, Jiri

    2011-10-06

    This paper reports the parallel implementation of the Brillouin Wigner MultiReference Coupled Cluster method with Single and Double excitations (BW-MRCCSD). Preliminary tests for systems composed of 304 and 440 correlated obritals demonstrate the performance of our implementation across 1000 cores and clearly indicate the advantages of using improved task scheduling. Possible ways for further improvements of the parallel performance are also delineated.

  5. 3D constraints on a possible deep > 2.5 km massive sulphide mineralization from 2D crooked-line seismic reflection data in the Kristineberg mining area, northern Sweden

    NASA Astrophysics Data System (ADS)

    Malehmir, Alireza; Schmelzbach, Cedric; Bongajum, Emmanuel; Bellefleur, Gilles; Juhlin, Christopher; Tryggvason, Ari

    2009-12-01

    2D crooked-line seismic reflection surveys in crystalline environments are often considered challenging in their processing and interpretation. These challenges are more evident when complex diffraction signals that can originate from out-of-the-plane and a variety of geological features are present. A seismic profile in the Kristineberg mining area in northern Sweden shows an impressive diffraction package, covering an area larger than 25 km 2 in the subsurface at depths greater than 2.5 km. We present here a series of scenarios in which each can, to some extent, explain the nature of this extraordinarily large package of diffractions. Cross-dip analysis, diffraction imaging and modeling, as well as 3D processing of the crooked-line data provided constraints on the interpretation of the diffraction package. Overall, the results indicate that the diffraction package can be associated with at least four main short south-dipping diffractors in a depth range of 2.5-4.5 km. Candidate scenarios for the origin of the diffraction package are: (1) a series of massive sulphide deposits, (2) a series of mafic-ultramafic intrusions, (3) a major shear-zone and (4) multiple contact lithologies. We have also investigated the possible contribution of mode-converted scattered energy in the diffraction package using a modified converted-wave 3D prestack depth migration algorithm with the results indicating that a majority of the diffractions are P-wave diffractions. The 3D prestack migration of the data provided improved images of a series of steeply north-dipping mafic-ultramafic sill intrusions to a depth of about 4 km, where the diffractions appear to focus after the migration. The results and associated interpretations presented in this paper have improved our understanding of this conspicuous package of diffractions and may lead to re-evaluation of the 3D geological model of the Kristineberg mining area.

  6. A parallel computing tool for large-scale simulation of massive fluid injection in thermo-poro-mechanical systems

    NASA Astrophysics Data System (ADS)

    Karrech, Ali; Schrank, Christoph; Regenauer-Lieb, Klaus

    2015-10-01

    Massive fluid injections into the earth's upper crust are commonly used to stimulate permeability in geothermal reservoirs, enhance recovery in oil reservoirs, store carbon dioxide and so forth. Currently used models for reservoir simulation are limited to small perturbations and/or hydraulic aspects that are insufficient to describe the complex thermal-hydraulic-mechanical behaviour of natural geomaterials. Comprehensive approaches, which take into account the non-linear mechanical deformations of rock masses, fluid flow in percolating pore spaces, and changes of temperature due to heat transfer, are necessary to predict the behaviour of deep geo-materials subjected to high pressure and temperature changes. In this paper, we introduce a thermodynamically consistent poromechanics formulation which includes coupled thermal, hydraulic and mechanical processes. Moreover, we propose a numerical integration strategy based on massively parallel computing. The proposed formulations and numerical integration are validated using analytical solutions of simple multi-physics problems. As a representative application, we investigate the massive injection of fluids within deep formation to mimic the conditions of reservoir stimulation. The model showed, for instance, the effects of initial pre-existing stress fields on the orientations of stimulation-induced failures.

  7. Ages of Massive Galaxies at 0.5 > z > 2.0 from 3D-HST Rest-frame Optical Spectroscopy

    NASA Astrophysics Data System (ADS)

    Fumagalli, Mattia; Franx, Marijn; van Dokkum, Pieter; Whitaker, Katherine E.; Skelton, Rosalind E.; Brammer, Gabriel; Nelson, Erica; Maseda, Michael; Momcheva, Ivelina; Kriek, Mariska; Labbé, Ivo; Lundgren, Britt; Rix, Hans-Walter

    2016-05-01

    We present low-resolution near-infrared stacked spectra from the 3D–HST survey up to z = 2.0 and fit them with commonly used stellar population synthesis models: BC03, FSPS10 (Flexible Stellar Population Synthesis), and FSPS-C3K. The accuracy of the grism redshifts allows the unambiguous detection of many emission and absorption features and thus a first systematic exploration of the rest-frame optical spectra of galaxies up to z = 2. We select massive galaxies ({log}({M}*/{M}ȯ )\\gt 10.8), we divide them into quiescent and star-forming via a rest-frame color–color technique, and we median-stack the samples in three redshift bins between z = 0.5 and z = 2.0. We find that stellar population models fit the observations well at wavelengths below the 6500 Å rest frame, but show systematic residuals at redder wavelengths. The FSPS-C3K model generally provides the best fits (evaluated with χ 2 red statistics) for quiescent galaxies, while BC03 performs the best for star-forming galaxies. The stellar ages of quiescent galaxies implied by the models, assuming solar metallicity, vary from 4 Gyr at z ∼ 0.75 to 1.5 Gyr at z ∼ 1.75, with an uncertainty of a factor of two caused by the unknown metallicity. On average, the stellar ages are half the age of the universe at these redshifts. We show that the inferred evolution of ages of quiescent galaxies is in agreement with fundamental plane measurements, assuming an 8 Gyr age for local galaxies. For star-forming galaxies, the inferred ages depend strongly on the stellar population model and the shape of the assumed star-formation history.

  8. Massively parallel computing simulation of fluid flow in the unsaturated zone of Yucca Mountain, Nevada

    SciTech Connect

    Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G.S.

    2001-08-31

    This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-one-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the one-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models.

  9. Satisfiability Test with Synchronous Simulated Annealing on the Fujitsu AP1000 Massively-Parallel Multiprocessor

    NASA Technical Reports Server (NTRS)

    Sohn, Andrew; Biswas, Rupak

    1996-01-01

    Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.

  10. Solution of the within-group multidimensional discrete ordinates transport equations on massively parallel architectures

    NASA Astrophysics Data System (ADS)

    Zerr, Robert Joseph

    2011-12-01

    The integral transport matrix method (ITMM) has been used as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells and between the cells and boundary surfaces. The main goals of this work were to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and performance of the developed methods for increasing number of processes. This project compares the effectiveness of the ITMM with the SI scheme parallelized with the Koch-Baker-Alcouffe (KBA) method. The primary parallel solution method involves a decomposition of the domain into smaller spatial sub-domains, each with their own transport matrices, and coupled together via interface boundary angular fluxes. Each sub-domain has its own set of ITMM operators and represents an independent transport problem. Multiple iterative parallel solution methods have investigated, including parallel block Jacobi (PBJ), parallel red/black Gauss-Seidel (PGS), and parallel GMRES (PGMRES). The fastest observed parallel solution method, PGS, was used in a weak scaling comparison with the PARTISN code. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method without acceleration/preconditioning is not competitive for any problem parameters considered. The best comparisons occur for problems that are difficult for SI DSA, namely highly scattering and optically thick. SI DSA execution time curves are generally steeper than the PGS ones. However, until further testing is performed it cannot be concluded that SI DSA does not outperform the ITMM with PGS even on several thousand or tens of

  11. A Novel Algorithm for Solving the Multidimensional Neutron Transport Equation on Massively Parallel Architectures

    SciTech Connect

    Azmy, Yousry

    2014-06-10

    We employ the Integral Transport Matrix Method (ITMM) as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells' fluxes and between the cells' and boundary surfaces' fluxes. The main goals of this work are to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and parallel performance of the developed methods with increasing number of processes, P. The fastest observed parallel solution method, Parallel Gauss-Seidel (PGS), was used in a weak scaling comparison with the PARTISN transport code, which uses the source iteration (SI) scheme parallelized with the Koch-baker-Alcouffe (KBA) method. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method- even without acceleration/preconditioning-is completitive for optically thick problems as P is increased to the tens of thousands range. For the most optically thick cells tested, PGS reduced execution time by an approximate factor of three for problems with more than 130 million computational cells on P = 32,768. Moreover, the SI-DSA execution times's trend rises generally more steeply with increasing P than the PGS trend. Furthermore, the PGS method outperforms SI for the periodic heterogeneous layers (PHL) configuration problems. The PGS method outperforms SI and SI-DSA on as few as P = 16 for PHL problems and reduces execution time by a factor of ten or more for all problems considered with more than 2 million computational cells on P = 4.096.

  12. Massively parallel read mapping on GPUs with the q-group index and PEANUT

    PubMed Central

    Rahmann, Sven

    2014-01-01

    We present the q-group index, a novel data structure for read mapping tailored towards graphics processing units (GPUs) with a small memory footprint and efficient parallel algorithms for querying and building. On top of the q-group index we introduce PEANUT, a highly parallel GPU-based read mapper. PEANUT provides the possibility to output both the best hits or all hits of a read. Our benchmarks show that PEANUT outperforms other state-of-the-art read mappers in terms of speed while maintaining or slightly increasing precision, recall and sensitivity. PMID:25289191

  13. Design of electrostatic microcolumn for nanoscale photoemission source in massively parallel electron-beam lithography

    NASA Astrophysics Data System (ADS)

    Wen, Ye; Du, Zhidong; Pan, Liang

    2015-10-01

    Microcolumns are widely used for parallel electron-beam lithography because of their compactness and the ability to achieve high spatial resolution. A design of an electrostatic microcolumn for our recent nanoscale photoemission sources is presented. We proposed a compact column structure (as short as several microns in length) for the ease of microcolumn fabrication and lithography operation. We numerically studied the influence of several design parameters on the optical performance such as microcolumn diameter, electrode thickness, beam current, working voltages, and working distance. We also examined the effect of fringing field between adjacent microcolumns during parallel lithography operations.

  14. Massively Parallel, Three-Dimensional Transport Solutions for the k-Eigenvalue Problem

    SciTech Connect

    Davidson, Gregory G; Evans, Thomas M; Jarrell, Joshua J; Pandya, Tara M; Slaybaugh, R

    2014-01-01

    We have implemented a new multilevel parallel decomposition in the Denovo dis- crete ordinates radiation transport code. In concert with Krylov subspace iterative solvers, the multilevel decomposition allows concurrency over energy in addition to space-angle, enabling scalability beyond the limits imposed by the traditional KBA space-angle partitioning. Furthermore, a new Arnoldi-based k-eigenvalue solver has been implemented. The added phase-space concurrency combined with the high- performance Krylov and Arnoldi solvers has enabled weak scaling to O(100K) cores on the Jaguar XK6 supercomputer. The multilevel decomposition provides sucient parallelism to scale to exascale computing and beyond.

  15. Spatiotemporal Domain Decomposition for Massive Parallel Computation of Space-Time Kernel Density

    NASA Astrophysics Data System (ADS)

    Hohl, A.; Delmelle, E. M.; Tang, W.

    2015-07-01

    Accelerated processing capabilities are deemed critical when conducting analysis on spatiotemporal datasets of increasing size, diversity and availability. High-performance parallel computing offers the capacity to solve computationally demanding problems in a limited timeframe, but likewise poses the challenge of preventing processing inefficiency due to workload imbalance between computing resources. Therefore, when designing new algorithms capable of implementing parallel strategies, careful spatiotemporal domain decomposition is necessary to account for heterogeneity in the data. In this study, we perform octtree-based adaptive decomposition of the spatiotemporal domain for parallel computation of space-time kernel density. In order to avoid edge effects near subdomain boundaries, we establish spatiotemporal buffers to include adjacent data-points that are within the spatial and temporal kernel bandwidths. Then, we quantify computational intensity of each subdomain to balance workloads among processors. We illustrate the benefits of our methodology using a space-time epidemiological dataset of Dengue fever, an infectious vector-borne disease that poses a severe threat to communities in tropical climates. Our parallel implementation of kernel density reaches substantial speedup compared to sequential processing, and achieves high levels of workload balance among processors due to great accuracy in quantifying computational intensity. Our approach is portable of other space-time analytical tests.

  16. Massive parallel implementation of JPEG2000 decoding algorithm with multi-GPUs

    NASA Astrophysics Data System (ADS)

    Wu, Xianyun; Li, Yunsong; Liu, Kai; Wang, Keyan; Wang, Li

    2014-05-01

    JPEG2000 is an important technique for image compression that has been successfully used in many fields. Due to the increasing spatial, spectral and temporal resolution of remotely sensed imagery data sets, fast decompression of remote sensed data is becoming a very important and challenging object. In this paper, we develop an implementation of the JPEG2000 decompression in graphics processing units (GPUs) for fast decoding of codeblock-based parallel compression stream. We use one CUDA block to decode one frame. Tier-2 is still serial decoded while Tier-1 and IDWT are parallel processed. Since our encode stream are block-based parallel which means each block are independent with other blocks, we parallel process each block in T1 with one thread. For IDWT, we use one CUDA block to execute one line and one CUDA thread to process one pixel. We investigate the speedups that can be gained by using the GPUs implementations with regards to the CPUs-based serial implementations. Experimental result reveals that our implementation can achieve significant speedups compared with serial implementations.

  17. Optical binary de Bruijn networks for massively parallel computing: design methodology and feasibility study

    NASA Astrophysics Data System (ADS)

    Louri, Ahmed; Sung, Hongki

    1995-10-01

    The interconnection network structure can be the deciding and limiting factor in the cost and the performance of parallel computers. One of the most popular point-to-point interconnection networks for parallel computers today is the hypercube. The regularity, logarithmic diameter, symmetry, high connectivity, fault tolerance, simple routing, and reconfigurability (easy embedding of other network topologies) of the hypercube make it a very attractive choice for parallel computers. Unfortunately the hypercube possesses a major drawback, which is the links per node increases as the network grows in size. As an alternative to the hypercube, the binary de Bruijn (BdB) network has recently received much attention. The BdB not only provides a logarithmic diameter, fault tolerance, and simple routing but also requires fewer links than the hypercube for the same network size. Additionally, a major advantage of the BdB edges per node is independent of the network size. This makes it very desirable for large-scale parallel systems. However, because of its asymmetrical nature and global connectivity, it poses a major challenge for VLSI technology. Optics, owing to its three-dimensional and global-connectivity nature, seems to be very suitable for implementing BdB networks. We present an implementation methodology for optical BdB networks. The distinctive feature of the proposed implementation methodology is partitionability of the network into a few primitive operations that can be implemented efficiently. We further show feasibility of the

  18. Solution of the within-group multidimensional discrete ordinates transport equations on massively parallel architectures

    NASA Astrophysics Data System (ADS)

    Zerr, Robert Joseph

    2011-12-01

    The integral transport matrix method (ITMM) has been used as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells and between the cells and boundary surfaces. The main goals of this work were to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and performance of the developed methods for increasing number of processes. This project compares the effectiveness of the ITMM with the SI scheme parallelized with the Koch-Baker-Alcouffe (KBA) method. The primary parallel solution method involves a decomposition of the domain into smaller spatial sub-domains, each with their own transport matrices, and coupled together via interface boundary angular fluxes. Each sub-domain has its own set of ITMM operators and represents an independent transport problem. Multiple iterative parallel solution methods have investigated, including parallel block Jacobi (PBJ), parallel red/black Gauss-Seidel (PGS), and parallel GMRES (PGMRES). The fastest observed parallel solution method, PGS, was used in a weak scaling comparison with the PARTISN code. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method without acceleration/preconditioning is not competitive for any problem parameters considered. The best comparisons occur for problems that are difficult for SI DSA, namely highly scattering and optically thick. SI DSA execution time curves are generally steeper than the PGS ones. However, until further testing is performed it cannot be concluded that SI DSA does not outperform the ITMM with PGS even on several thousand or tens of

  19. Nonlinear structural response using adaptive dynamic relaxation on a massively-parallel-processing system

    NASA Technical Reports Server (NTRS)

    Oakley, David R.; Knight, Norman F., Jr.

    1994-01-01

    A parallel adaptive dynamic relaxation (ADR) algorithm has been developed for nonlinear structural analysis. This algorithm has minimal memory requirements, is easily parallelizable and scalable to many processors, and is generally very reliable and efficient for highly nonlinear problems. Performance evaluations on single-processor computers have shown that the ADR algorithm is reliable and highly vectorizable, and that it is competitive with direct solution methods for the highly nonlinear problems considered. The present algorithm is implemented on the 512-processor Intel Touchstone DELTA system at Caltech, and it is designed to minimize the extent and frequency of interprocessor communication. The algorithm has been used to solve for the nonlinear static response of two and three dimensional hyperelastic systems involving contact. Impressive relative speedups have been achieved and demonstrate the high scalability of the ADR algorithm. For the class of problems addressed, the ADR algorithm represents a very promising approach for parallel-vector processing.

  20. Analysis and selection of optimal function implementations in massively parallel computer

    DOEpatents

    Archer, Charles Jens; Peters, Amanda; Ratterman, Joseph D.

    2011-05-31

    An apparatus, program product and method optimize the operation of a parallel computer system by, in part, collecting performance data for a set of implementations of a function capable of being executed on the parallel computer system based upon the execution of the set of implementations under varying input parameters in a plurality of input dimensions. The collected performance data may be used to generate selection program code that is configured to call selected implementations of the function in response to a call to the function under varying input parameters. The collected performance data may be used to perform more detailed analysis to ascertain the comparative performance of the set of implementations of the function under the varying input parameters.

  1. Application of Parallel Hybrid Algorithm in Massively Parallel GPGPU—The Improved Effective and Efficient Method for Calculating Coulombic Interactions in Simulations of Many Ions with SIMION

    NASA Astrophysics Data System (ADS)

    Saito, Kenichiro; Koizumi, Eiko; Koizumi, Hideya

    2012-09-01

    In our previous study, we introduced a new hybrid approach to effectively approximate the total force on each ion during a trajectory calculation in mass spectrometry device simulations, and the algorithm worked successfully with SIMION. We took one step further and applied the method in massively parallel general-purpose computing with GPU (GPGPU) to test its performance in simulations with thousands to over a million ions. We took extra care to minimize the barrier synchronization and data transfer between the host (CPU) and the device (GPU) memory, and took full advantage of the latency hiding. Parallel codes were written in CUDA C++ and implemented to SIMION via the user-defined Lua program. In this study, we tested the parallel hybrid algorithm with a couple of basic models and analyzed the performance by comparing it to that of the original, fully-explicit method written in serial code. The Coulomb explosion simulation with 128,000 ions was completed in 309 s, over 700 times faster than the 63 h taken by the original explicit method in which we evaluated two-body Coulomb interactions explicitly on one ion with each of all the other ions. The simulation of 1,024,000 ions was completed in 2650 s. In another example, we applied the hybrid method on a simulation of ions in a simple quadrupole ion storage model with 100,000 ions, and it only took less than 10 d. Based on our estimate, the same simulation is expected to take 5-7 y by the explicit method in serial code.

  2. Time-Resolved 3D Quantitative Flow MRI of the Major Intracranial Vessels: Initial Experience and Comparative Evaluation at 1.5T and 3.0T in Combination With Parallel Imaging

    PubMed Central

    Bammer, Roland; Hope, Thomas A.; Aksoy, Murat; Alley, Marcus T.

    2012-01-01

    Exact knowledge of blood flow characteristics in the major cerebral vessels is of great relevance for diagnosing cerebrovascular abnormalities. This involves the assessment of hemodynamically critical areas as well as the derivation of biomechanical parameters such as wall shear stress and pressure gradients. A time-resolved, 3D phase-contrast (PC) MRI method using parallel imaging was implemented to measure blood flow in three dimensions at multiple instances over the cardiac cycle. The 4D velocity data obtained from 14 healthy volunteers were used to investigate dynamic blood flow with the use of multiplanar reformatting, 3D streamlines, and 4D particle tracing. In addition, the effects of magnetic field strength, parallel imaging, and temporal resolution on the data were investigated in a comparative evaluation at 1.5T and 3T using three different parallel imaging reduction factors and three different temporal resolutions in eight of the 14 subjects. Studies were consistently performed faster at 3T than at 1.5T because of better parallel imaging performance. A high temporal resolution (65 ms) was required to follow dynamic processes in the intracranial vessels. The 4D flow measurements provided a high degree of vascular conspicuity. Time-resolved streamline analysis provided features that have not been reported previously for the intracranial vasculature. PMID:17195166

  3. Massively parallel 454-sequencing of fungal communities in Quercus spp. ectomycorrhizas indicates seasonal dynamics in urban and rural sites.

    PubMed

    Jumpponen, Ari; Jones, Kenneth L; David Mattox, J; Yaege, Chulee

    2010-03-01

    We analysed two sites within and outside an urban development in a rural background to estimate the fungal richness, diversity and community composition in Quercus spp. ectomycorrhizas using massively parallel 454-sequencing in combination with DNA-tagging. Our analyses indicated that shallow sequencing ( approximately 150 sequences) of a large number of samples (192 in total) provided data that allowed identification of seasonal trends within the fungal communities: putative root-associated antagonists and saprobes that were abundant early in the growing season were replaced by common ectomycorrhizal fungi in the course of the growing season. Ordination analyses identified a number of factors that were correlated with the observed communities including host species as well as soil organic matter, nutrient and heavy metal enrichment. Overall, our application of the high throughput 454 sequencing provided an expedient means for characterization of fungal communities. PMID:20331769

  4. Probing the Nanosecond Dynamics of a Designed Three-Stranded Beta-Sheet with a Massively Parallel Molecular Dynamics Simulation

    PubMed Central

    Voelz, Vincent A.; Luttmann, Edgar; Bowman, Gregory R.; Pande, Vijay S.

    2009-01-01

    Recently a temperature-jump FTIR study of a designed three-stranded sheet showing a fast relaxation time of ~140 ± 20 ns was published. We performed massively parallel molecular dynamics simulations in explicit solvent to probe the structural events involved in this relaxation. While our simulations produce similar relaxation rates, the structural ensemble is broad. We observe the formation of turn structure, but only very weak interaction in the strand regions, which is consistent with the lack of strong backbone-backbone NOEs in previous structural NMR studies. These results suggest that either DPDP-II folds at time scales longer than 240 ns, or that DPDP-II is not a well-defined three-stranded β-sheet. This work also provides an opportunity to compare the performance of several popular forcefield models against one another. PMID:19399235

  5. High-Throughput Detection of Actionable Genomic Alterations in Clinical Tumor Samples by Targeted, Massively Parallel Sequencing

    PubMed Central

    Wagle, Nikhil; Berger, Michael F.; Davis, Matthew J.; Blumenstiel, Brendan; DeFelice, Matthew; Pochanard, Panisa; Ducar, Matthew; Van Hummelen, Paul; MacConaill, Laura E.; Hahn, William C.; Meyerson, Matthew; Gabriel, Stacey B.; Garraway, Levi A.

    2011-01-01

    Knowledge of “actionable” somatic genomic alterations present in each tumor (e.g., point mutations, small insertions/deletions, and copy number alterations that direct therapeutic options) should facilitate individualized approaches to cancer treatment. However, clinical implementation of systematic genomic profiling has rarely been achieved beyond limited numbers of oncogene point mutations. To address this challenge, we utilized a targeted, massively parallel sequencing approach to detect tumor genomic alterations in formalin-fixed, paraffin embedded (FFPE) tumor samples. Nearly 400-fold mean sequence coverage was achieved, and single nucleotide sequence variants, small insertions/deletions, and chromosomal copy number alterations were detected simultaneously with high accuracy compared to other methods in clinical use. Putatively actionable genomic alterations, including those that predict sensitivity or resistance to established and experimental therapies, were detected in each tumor sample tested. Thus, targeted deep sequencing of clinical tumor material may enable mutation-driven clinical trials and, ultimately, ”personalized” cancer treatment. PMID:22585170

  6. Development of a Massively Parallel Particle-Mesh Algorithm for Simulations of Galaxy Dynamics and Plasmas

    NASA Astrophysics Data System (ADS)

    Wallin, John

    1996-01-01

    Particle-mesh calculations treat forces and potentials as field quantities which are represented approximately on a mesh. A system of particles is mapped onto this mesh as a density distribution of mass or charge. The Fourier transform is used to convolve this distribution with the Green's function of the potential, and a finite difference scheme is used to calculate the forces acting on the particles. The computation time scales as the Ng log Ng, where Ng is the size of the computational grid. In contrast, the particle-particle method's computing time relies on direct summation, so the time for each calculation is given by Np2, where Np is the number of particles. The particle-mesh method is best suited for simulations with a fixed minimum resolution and for collisionless systems, while hierarchical tree codes have proven to be superior for collisional systems where two-body interactions are important. Particle mesh methods still dominate in plasma physics where collisionless systems are modeled. The CM-200 Connection Machine produced by Thinking Machines Corp. is a data parallel system. On this system, the front-end computer controls the timing and execution of the parallel processing units. The programming paradigm is Single-Instruction, Multiple Data (SIMD). The processors on the CM-200 are connected in an N-dimensional hypercube; the largest number of links a message will ever have to make is N. As in all parallel computing, the efficiency of an algorithm is primarily determined by the fraction of the time spent communicating compared to that spent computing. Because of the topology of the processors, nearest neighbor communication is more efficient than general communication.

  7. Extended computational kernels in a massively parallel implementation of the Trotter-Suzuki approximation

    NASA Astrophysics Data System (ADS)

    Wittek, Peter; Calderaro, Luca

    2015-12-01

    We extended a parallel and distributed implementation of the Trotter-Suzuki algorithm for simulating quantum systems to study a wider range of physical problems and to make the library easier to use. The new release allows periodic boundary conditions, many-body simulations of non-interacting particles, arbitrary stationary potential functions, and imaginary time evolution to approximate the ground state energy. The new release is more resilient to the computational environment: a wider range of compiler chains and more platforms are supported. To ease development, we provide a more extensive command-line interface, an application programming interface, and wrappers from high-level languages.

  8. Exposing malaria in-host diversity and estimating population diversity by capture-recapture using massively parallel pyrosequencing

    PubMed Central

    Juliano, Jonathan J.; Porter, Kimberly; Mwapasa, Victor; Sem, Rithy; Rogers, William O.; Ariey, Frédéric; Wongsrichanalai, Chansuda; Read, Andrew; Meshnick, Steven R.

    2010-01-01

    Malaria infections commonly contain multiple genetically distinct variants. Mathematical and animal models suggest that interactions among these variants have a profound impact on the emergence of drug resistance. However, methods currently used for quantifying parasite diversity in individual infections are insensitive to low-abundance variants and are not quantitative for variant population sizes. To more completely describe the in-host complexity and ecology of malaria infections, we used massively parallel pyrosequencing to characterize malaria parasite diversity in the infections of a group of patients. By individually sequencing single strands of DNA in a complex mixture, this technique can quantify uncommon variants in mixed infections. The in-host diversity revealed by this method far exceeded that described by currently recommended genotyping methods, with as many as sixfold more variants per infection. In addition, in paired pre- and posttreatment samples, we show a complex milieu of parasites, including variants likely up-selected and down-selected by drug therapy. As with all surveys of diversity, sampling limitations prevent full discovery and differences in sampling effort can confound comparisons among samples, hosts, and populations. Here, we used ecological approaches of species accumulation curves and capture-recapture to estimate the number of variants we failed to detect in the population, and show that these methods enable comparisons of diversity before and after treatment, as well as between malaria populations. The combination of ecological statistics and massively parallel pyrosequencing provides a powerful tool for studying the evolution of drug resistance and the in-host ecology of malaria infections. PMID:21041629

  9. Inter-laboratory evaluation of SNP-based forensic identification by massively parallel sequencing using the Ion PGM™.

    PubMed

    Eduardoff, M; Santos, C; de la Puente, M; Gross, T E; Fondevila, M; Strobl, C; Sobrino, B; Ballard, D; Schneider, P M; Carracedo, Á; Lareu, M V; Parson, W; Phillips, C

    2015-07-01

    Next generation sequencing (NGS) offers the opportunity to analyse forensic DNA samples and obtain massively parallel coverage of targeted short sequences with the variants they carry. We evaluated the levels of sequence coverage, genotyping precision, sensitivity and mixed DNA patterns of a prototype version of the first commercial forensic NGS kit: the HID-Ion AmpliSeq™ Identity Panel with 169-markers designed for the Ion PGM™ system. Evaluations were made between three laboratories following closely matched Ion PGM™ protocols and a simple validation framework of shared DNA controls. The sequence coverage obtained was extensive for the bulk of SNPs targeted by the HID-Ion AmpliSeq™ Identity Panel. Sensitivity studies showed 90-95% of SNP genotypes could be obtained from 25 to 100pg of input DNA. Genotyping concordance tests included Coriell cell-line control DNA analyses checked against whole-genome sequencing data from 1000 Genomes and Complete Genomics, indicating a very high concordance rate of 99.8%. Discordant genotypes detected in rs1979255, rs1004357, rs938283, rs2032597 and rs2399332 indicate these loci should be excluded from the panel. Therefore, the HID-Ion AmpliSeq™ Identity Panel and Ion PGM™ system provide a sensitive and accurate forensic SNP genotyping assay. However, low-level DNA produced much more varied sequence coverage and in forensic use the Ion PGM™ system will require careful calibration of the total samples loaded per chip to preserve the genotyping reliability seen in routine forensic DNA. Furthermore, assessments of mixed DNA indicate the user's control of sequence analysis parameter settings is necessary to ensure mixtures are detected robustly. Given the sensitivity of Ion PGM™, this aspect of forensic genotyping requires further optimisation before massively parallel sequencing is applied to routine casework. PMID:25955683

  10. A massively parallel method of characteristic neutral particle transport code for GPUs

    SciTech Connect

    Boyd, W. R.; Smith, K.; Forget, B.

    2013-07-01

    Over the past 20 years, parallel computing has enabled computers to grow ever larger and more powerful while scientific applications have advanced in sophistication and resolution. This trend is being challenged, however, as the power consumption for conventional parallel computing architectures has risen to unsustainable levels and memory limitations have come to dominate compute performance. Heterogeneous computing platforms, such as Graphics Processing Units (GPUs), are an increasingly popular paradigm for solving these issues. This paper explores the applicability of GPUs for deterministic neutron transport. A 2D method of characteristics (MOC) code - OpenMOC - has been developed with solvers for both shared memory multi-core platforms as well as GPUs. The multi-threading and memory locality methodologies for the GPU solver are presented. Performance results for the 2D C5G7 benchmark demonstrate 25-35 x speedup for MOC on the GPU. The lessons learned from this case study will provide the basis for further exploration of MOC on GPUs as well as design decisions for hardware vendors exploring technologies for the next generation of machines for scientific computing. (authors)

  11. Harnessing the killer micros: Applications from LLNL's massively parallel computing initiative

    SciTech Connect

    Belak, J.F.

    1991-07-01

    Recent developments in microprocessor technology have led to performance on scalar applications exceeding traditional supercomputers. This suggests that coupling hundreds or even thousands of these killer-micros'' (all working on a single physical problem) may lead to performance on vector applications in excess of vector supercomputers. Also, future generation killer-micros are expected to have vector floating point units as well. The purpose of this paper is to present an overview of the parallel computing environment at Lawrence Livermore National Laboratory. However, the perspective is necessarily quite narrow and most of the examples are taken from the author's implementation of a large scale molecular dynamics code on the BBN-TC2000 at LLNL. Parallelism is achieved through a geometric domain decomposition -- each processor is assigned a distinct region of space and all atoms contained therein. As the atomic positions evolve, the processors must exchange ownership of specific atoms. This geometric domain decomposition proves to be quite general and we highlight its application to image processing and hydrodynamics simulations as well. 10 refs., 6 figs.

  12. Library Preparation and Multiplex Capture for Massive Parallel Sequencing Applications Made Efficient and Easy

    PubMed Central

    Neiman, Mårten; Sundling, Simon; Grönberg, Henrik; Hall, Per; Czene, Kamila

    2012-01-01

    During the recent years, rapid development of sequencing technologies and a competitive market has enabled researchers to perform massive sequencing projects at a reasonable cost. As the price for the actual sequencing reactions drops, enabling more samples to be sequenced, the relative price for preparing libraries gets larger and the practical laboratory work becomes complex and tedious. We present a cost-effective strategy for simplified library preparation compatible with both whole genome- and targeted sequencing experiments. An optimized enzyme composition and reaction buffer reduces the number of required clean-up steps and allows for usage of bulk enzymes which makes the whole process cheap, efficient and simple. We also present a two-tagging strategy, which allows for multiplex sequencing of targeted regions. To prove our concept, we have prepared libraries for low-pass sequencing from 100 ng DNA, performed 2-, 4- and 8-plex exome capture and a 96-plex capture of a 500 kb region. In all samples we see a high concordance (>99.4%) of SNP calls when comparing to commercially available SNP-chip platforms. PMID:23139805

  13. On Deciding between Conservative and Optimistic Approaches on Massively Parallel Platforms

    SciTech Connect

    Carothers, Prof. Christopher D.; Perumalla, Kalyan S

    2010-01-01

    Over 5000 publications on parallel discrete event simulation (PDES) have appeared in the literature to date. Nevertheless, few articles have focused on empirical studies of PDES performance on large supercomputer-based systems. This gap is bridged here, by undertaking a parameterized performance study on thousands of processor cores of a Blue Gene supercomputing system. In contrast to theoretical insights from analytical studies, our study is based on actual implementation in software, incurring the actual messaging and computational overheads for both conservative and optimistic synchronization approaches of PDES. Complex and counter-intuitive effects are uncovered and analyzed, with different event timestamp distributions and available levels of concurrency in the synthetic benchmark models. The results are intended to provide guidance to the PDES community in terms of how the synchronization protocols behave at high processor core counts using a state-of-the-art supercomputing systems.

  14. Neptune: An astrophysical smooth particle hydrodynamics code for massively parallel computer architectures

    NASA Astrophysics Data System (ADS)

    Sandalski, Stou

    Smooth particle hydrodynamics is an efficient method for modeling the dynamics of fluids. It is commonly used to simulate astrophysical processes such as binary mergers. We present a newly developed GPU accelerated smooth particle hydrodynamics code for astrophysical simulations. The code is named neptune after the Roman god of water. It is written in OpenMP parallelized C++ and OpenCL and includes octree based hydrodynamic and gravitational acceleration. The design relies on object-oriented methodologies in order to provide a flexible and modular framework that can be easily extended and modified by the user. Several pre-built scenarios for simulating collisions of polytropes and black-hole accretion are provided. The code is released under the MIT Open Source license and publicly available at http://code.google.com/p/neptune-sph/.

  15. The transition to massively parallel computing within a production environment at a DOE access center

    SciTech Connect

    McCoy, M.G.

    1993-04-01

    In contemplating the transition from sequential to MP computing, the National Energy Research Supercomputer Center (NERSC) is faced with the frictions inherent in the duality of its mission. There have been two goals, the first has been to provide a stable, serviceable, production environment to the user base, the second to bring the most capable early serial supercomputers to the Center to make possible the leading edge simulations. This seeming conundrum has in reality been a source of strength. The task of meeting both goals was faced before with the CRAY 1 which, as delivered, was all iron; so the problems associated with the advent of parallel computers are not entirely new, but they are serious. Current vector supercomputers, such as the C90, offer mature production environments, including software tools, a large applications base, and generality; these machines can be used to attack the spectrum of scientific applications by a large user base knowledgeable in programming techniques for this architecture. Parallel computers to date have offered less developed, even rudimentary, working environments, a sparse applications base, and forced specialization. They have been specialized in terms of programming models, and specialized in terms of the kinds of applications which would do well on the machines. Given this context, why do many service computer centers feel that now is the time to cease or slow the procurement of traditional vector supercomputers in favor of MP systems? What are some of the issues that NERSC must face to engineer a smooth transition? The answers to these questions are multifaceted and by no means completely clear. However, a route exists as a result of early efforts at the Laboratories combined with research within the HPCC Program. One can begin with an analysis of why the hardware and software appearing shortly should be made available to the mainstream, and then address what would be required in an initial production environment.

  16. Dissecting the target specificity of RNase H recruiting oligonucleotides using massively parallel reporter analysis of short RNA motifs

    PubMed Central

    Rukov, Jakob Lewin; Hagedorn, Peter H.; Høy, Isabel Bro; Feng, Yanping; Lindow, Morten; Vinther, Jeppe

    2015-01-01

    Processing and post-transcriptional regulation of RNA often depend on binding of regulatory molecules to short motifs in RNA. The effects of such interactions are difficult to study, because most regulatory molecules recognize partially degenerate RNA motifs, embedded in a sequence context specific for each RNA. Here, we describe Library Sequencing (LibSeq), an accurate massively parallel reporter method for completely characterizing the regulatory potential of thousands of short RNA sequences in a specific context. By sequencing cDNA derived from a plasmid library expressing identical reporter genes except for a degenerate 7mer subsequence in the 3′UTR, the regulatory effects of each 7mer can be determined. We show that LibSeq identifies regulatory motifs used by RNA-binding proteins and microRNAs. We furthermore apply the method to cells transfected with RNase H recruiting oligonucleotides to obtain quantitative information for >15000 potential target sequences in parallel. These comprehensive datasets provide insights into the specificity requirements of RNase H and allow a specificity measure to be calculated for each tested oligonucleotide. Moreover, we show that inclusion of chemical modifications in the central part of an RNase H recruiting oligonucleotide can increase its sequence-specificity. PMID:26220183

  17. Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

    NASA Astrophysics Data System (ADS)

    Calafiura, Paolo; Leggett, Charles; Seuster, Rolf; Tsulaia, Vakhtang; Van Gemmeren, Peter

    2015-12-01

    AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write mechanisms, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows the running of AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the diversity of ATLAS event processing workloads on various computing resources: Grid, opportunistic resources and HPC.

  18. Deep mutational scanning of an antibody against epidermal growth factor receptor using mammalian cell display and massively parallel pyrosequencing

    PubMed Central

    Forsyth, Charles M.; Juan, Veronica; Akamatsu, Yoshiko; DuBridge, Robert B.; Doan, Minhtam; Ivanov, Alexander V.; Ma, Zhiyuan; Polakoff, Dixie; Razo, Jennifer; Wilson, Keith; Powers, David B.

    2013-01-01

    We developed a method for deep mutational scanning of antibody complementarity-determining regions (CDRs) that can determine in parallel the effect of every possible single amino acid CDR substitution on antigen binding. The method uses libraries of full length IgGs containing more than 1000 CDR point mutations displayed on mammalian cells, sorted by flow cytometry into subpopulations based on antigen affinity and analyzed by massively parallel pyrosequencing. Higher, lower and neutral affinity mutations are identified by their enrichment or depletion in the FACS subpopulations. We applied this method to a humanized version of the anti-epidermal growth factor receptor antibody cetuximab, generated a near comprehensive data set for 1060 point mutations that recapitulates previously determined structural and mutational data for these CDRs and identified 67 point mutations that increase affinity. The large-scale, comprehensive sequence-function data sets generated by this method should have broad utility for engineering properties such as antibody affinity and specificity and may advance theoretical understanding of antibody-antigen recognition. PMID:23765106

  19. Quaternary Morphodynamics of Fluvial Dispersal Systems Revealed: The Fly River, PNG, and the Sunda Shelf, SE Asia, simulated with the Massively Parallel GPU-based Model 'GULLEM'

    NASA Astrophysics Data System (ADS)

    Aalto, R. E.; Lauer, J. W.; Darby, S. E.; Best, J.; Dietrich, W. E.

    2015-12-01

    During glacial-marine transgressions vast volumes of sediment are deposited due to the infilling of lowland fluvial systems and shallow shelves, material that is removed during ensuing regressions. Modelling these processes would illuminate system morphodynamics, fluxes, and 'complexity' in response to base level change, yet such problems are computationally formidable. Environmental systems are characterized by strong interconnectivity, yet traditional supercomputers have slow inter-node communication -- whereas rapidly advancing Graphics Processing Unit (GPU) technology offers vastly higher (>100x) bandwidths. GULLEM (GpU-accelerated Lowland Landscape Evolution Model) employs massively parallel code to simulate coupled fluvial-landscape evolution for complex lowland river systems over large temporal and spatial scales. GULLEM models the accommodation space carved/infilled by representing a range of geomorphic processes, including: river & tributary incision within a multi-directional flow regime, non-linear diffusion, glacial-isostatic flexure, hydraulic geometry, tectonic deformation, sediment production, transport & deposition, and full 3D tracking of all resulting stratigraphy. Model results concur with the Holocene dynamics of the Fly River, PNG -- as documented with dated cores, sonar imaging of floodbasin stratigraphy, and the observations of topographic remnants from LGM conditions. Other supporting research was conducted along the Mekong River, the largest fluvial system of the Sunda Shelf. These and other field data provide tantalizing empirical glimpses into the lowland landscapes of large rivers during glacial-interglacial transitions, observations that can be explored with this powerful numerical model. GULLEM affords estimates for the timing and flux budgets within the Fly and Sunda Systems, illustrating complex internal system responses to the external forcing of sea level and climate. Furthermore, GULLEM can be applied to most ANY fluvial system to

  20. Massively Parallel Simulation of Uranium Migration at the Hanford 300 Area

    NASA Astrophysics Data System (ADS)

    Hammond, G. E.; Lichtner, P. C.

    2009-12-01

    Effectively utilized, high-performance computing can have a significant impact on subsurface science by enabling researchers to employ models with ever increasing sophistication and complexity that provide a more accurate and mechanistic representation of subsurface processes. As part of the U.S. Department of Energy’s SciDAC-2 program, the petascale subsurface reactive multiphase flow and transport code PFLOTRAN has been developed and is currently being employed to simulate uranium migration at the Hanford 300 Area. PFLOTRAN has been run on subsurface problems composed of up to two billion degrees of freedom and utilizing up to 131,072 processor cores on the world’s largest open science supercomputer Jaguar. This presentation focuses on the application of PFLOTRAN to simulate geochemical transport of uranium at Hanford using the Jaguar supercomputer. The Hanford 300 Area presents many challenges with regard to simulating radionuclide transport. Aside from the many conceptual uncertainties in the problem such as the choice of initial conditions, rapid fluctuations in the Columbia River stage, which occur on an hourly basis with several meter variations, can have a dramatic impact on the size of the uranium plume, its migration direction, and the rate at which it migrates to the river. Due to the immense size of the physical domain needed to include the transient river boundary condition, the grid resolution required to preserve accuracy, and the number of chemical components simulated, 3D simulation of the Hanford 300 Area would be unsustainable on a single workstation, and thus high-performance computing is essential.

  1. cuTauLeaping: A GPU-Powered Tau-Leaping Stochastic Simulator for Massive Parallel Analyses of Biological Systems

    PubMed Central

    Besozzi, Daniela; Pescini, Dario; Mauri, Giancarlo

    2014-01-01

    Tau-leaping is a stochastic simulation algorithm that efficiently reconstructs the temporal evolution of biological systems, modeled according to the stochastic formulation of chemical kinetics. The analysis of dynamical properties of these systems in physiological and perturbed conditions usually requires the execution of a large number of simulations, leading to high computational costs. Since each simulation can be executed independently from the others, a massive parallelization of tau-leaping can bring to relevant reductions of the overall running time. The emerging field of General Purpose Graphic Processing Units (GPGPU) provides power-efficient high-performance computing at a relatively low cost. In this work we introduce cuTauLeaping, a stochastic simulator of biological systems that makes use of GPGPU computing to execute multiple parallel tau-leaping simulations, by fully exploiting the Nvidia's Fermi GPU architecture. We show how a considerable computational speedup is achieved on GPU by partitioning the execution of tau-leaping into multiple separated phases, and we describe how to avoid some implementation pitfalls related to the scarcity of memory resources on the GPU streaming multiprocessors. Our results show that cuTauLeaping largely outperforms the CPU-based tau-leaping implementation when the number of parallel simulations increases, with a break-even directly depending on the size of the biological system and on the complexity of its emergent dynamics. In particular, cuTauLeaping is exploited to investigate the probability distribution of bistable states in the Schlögl model, and to carry out a bidimensional parameter sweep analysis to study the oscillatory regimes in the Ras/cAMP/PKA pathway in S. cerevisiae. PMID:24663957

  2. Development and characterization of hollow microprobe array as a potential tool for versatile and massively parallel manipulation of single cells.

    PubMed

    Nagai, Moeto; Oohara, Kiyotaka; Kato, Keita; Kawashima, Takahiro; Shibata, Takayuki

    2015-04-01

    Parallel manipulation of single cells is important for reconstructing in vivo cellular microenvironments and studying cell functions. To manipulate single cells and reconstruct their environments, development of a versatile manipulation tool is necessary. In this study, we developed an array of hollow probes using microelectromechanical systems fabrication technology and demonstrated the manipulation of single cells. We conducted a cell aspiration experiment with a glass pipette and modeled a cell using a standard linear solid model, which provided information for designing hollow stepped probes for minimally invasive single-cell manipulation. We etched a silicon wafer on both sides and formed through holes with stepped structures. The inner diameters of the holes were reduced by SiO2 deposition of plasma-enhanced chemical vapor deposition to trap cells on the tips. This fabrication process makes it possible to control the wall thickness, inner diameter, and outer diameter of the probes. With the fabricated probes, single cells were manipulated and placed in microwells at a single-cell level in a parallel manner. We studied the capture, release, and survival rates of cells at different suction and release pressures and found that the cell trapping rate was directly proportional to the suction pressure, whereas the release rate and viability decreased with increasing the suction pressure. The proposed manipulation system makes it possible to place cells in a well array and observe the adherence, spreading, culture, and death of the cells. This system has potential as a tool for massively parallel manipulation and for three-dimensional hetero cellular assays. PMID:25749639

  3. Massively parallel haplotyping on microscopic beads for the high-throughput phase analysis of single molecules.

    PubMed

    Boulanger, Jérôme; Muresan, Leila; Tiemann-Boege, Irene

    2012-01-01

    In spite of the many advances in haplotyping methods, it is still very difficult to characterize rare haplotypes in tissues and different environmental samples or to accurately assess the haplotype diversity in large mixtures. This would require a haplotyping method capable of analyzing the phase of single molecules with an unprecedented throughput. Here we describe such a haplotyping method capable of analyzing in parallel hundreds of thousands single molecules in one experiment. In this method, multiple PCR reactions amplify different polymorphic regions of a single DNA molecule on a magnetic bead compartmentalized in an emulsion drop. The allelic states of the amplified polymorphisms are identified with fluorescently labeled probes that are then decoded from images taken of the arrayed beads by a microscope. This method can evaluate the phase of up to 3 polymorphisms separated by up to 5 kilobases in hundreds of thousands single molecules. We tested the sensitivity of the method by measuring the number of mutant haplotypes synthesized by four different commercially available enzymes: Phusion, Platinum Taq, Titanium Taq, and Phire. The digital nature of the method makes it highly sensitive to detecting haplotype ratios of less than 1:10,000. We also accurately quantified chimera formation during the exponential phase of PCR by different DNA polymerases. PMID:22558329

  4. Efficient massively parallel simulation of dynamic channel assignment schemes for wireless cellular communications

    NASA Technical Reports Server (NTRS)

    Greenberg, Albert G.; Lubachevsky, Boris D.; Nicol, David M.; Wright, Paul E.

    1994-01-01

    Fast, efficient parallel algorithms are presented for discrete event simulations of dynamic channel assignment schemes for wireless cellular communication networks. The driving events are call arrivals and departures, in continuous time, to cells geographically distributed across the service area. A dynamic channel assignment scheme decides which call arrivals to accept, and which channels to allocate to the accepted calls, attempting to minimize call blocking while ensuring co-channel interference is tolerably low. Specifically, the scheme ensures that the same channel is used concurrently at different cells only if the pairwise distances between those cells are sufficiently large. Much of the complexity of the system comes from ensuring this separation. The network is modeled as a system of interacting continuous time automata, each corresponding to a cell. To simulate the model, conservative methods are used; i.e., methods in which no errors occur in the course of the simulation and so no rollback or relaxation is needed. Implemented on a 16K processor MasPar MP-1, an elegant and simple technique provides speedups of about 15 times over an optimized serial simulation running on a high speed workstation. A drawback of this technique, typical of conservative methods, is that processor utilization is rather low. To overcome this, new methods were developed that exploit slackness in event dependencies over short intervals of time, thereby raising the utilization to above 50 percent and the speedup over the optimized serial code to about 120 times.

  5. Delta: An object-oriented finite element code architecture for massively parallel computers

    SciTech Connect

    Weatherby, J.R.; Schutt, J.A.; Peery, J.S.; Hogan, R.E.

    1996-02-01

    Delta is an object-oriented code architecture based on the finite element method which enables simulation of a wide range of engineering mechanics problems in a parallel processing environment. Written in C{sup ++}, Delta is a natural framework for algorithm development and for research involving coupling of mechanics from different Engineering Science disciplines. To enhance flexibility and encourage code reuse, the architecture provides a clean separation of the major aspects of finite element programming. Spatial discretization, temporal discretization, and the solution of linear and nonlinear systems of equations are each implemented separately, independent from the governing field equations. Other attractive features of the Delta architecture include support for constitutive models with internal variables, reusable ``matrix-free`` equation solvers, and support for region-to-region variations in the governing equations and the active degrees of freedom. A demonstration code built from the Delta architecture has been used in two-dimensional and three-dimensional simulations involving dynamic and quasi-static solid mechanics, transient and steady heat transport, and flow in porous media.

  6. Sassena — X-ray and neutron scattering calculated from molecular dynamics trajectories using massively parallel computers

    NASA Astrophysics Data System (ADS)

    Lindner, Benjamin; Smith, Jeremy C.

    2012-07-01

    Massively parallel computers now permit the molecular dynamics (MD) simulation of multi-million atom systems on time scales up to the microsecond. However, the subsequent analysis of the resulting simulation trajectories has now become a high performance computing problem in itself. Here, we present software for calculating X-ray and neutron scattering intensities from MD simulation data that scales well on massively parallel supercomputers. The calculation and data staging schemes used maximize the degree of parallelism and minimize the IO bandwidth requirements. The strong scaling tested on the Jaguar Petaflop Cray XT5 at Oak Ridge National Laboratory exhibits virtually linear scaling up to 7000 cores for most benchmark systems. Since both MPI and thread parallelism is supported, the software is flexible enough to cover scaling demands for different types of scattering calculations. The result is a high performance tool capable of unifying large-scale supercomputing and a wide variety of neutron/synchrotron technology. Catalogue identifier: AELW_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AELW_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU General Public License, version 3 No. of lines in distributed program, including test data, etc.: 1 003 742 No. of bytes in distributed program, including test data, etc.: 798 Distribution format: tar.gz Programming language: C++, OpenMPI Computer: Distributed Memory, Cluster of Computers with high performance network, Supercomputer Operating system: UNIX, LINUX, OSX Has the code been vectorized or parallelized?: Yes, the code has been parallelized using MPI directives. Tested with up to 7000 processors RAM: Up to 1 Gbytes/core Classification: 6.5, 8 External routines: Boost Library, FFTW3, CMAKE, GNU C++ Compiler, OpenMPI, LibXML, LAPACK Nature of problem: Recent developments in supercomputing allow molecular dynamics simulations to

  7. Modeling cardiovascular hemodynamics using the lattice Boltzmann method on massively parallel supercomputers

    NASA Astrophysics Data System (ADS)

    Randles, Amanda Elizabeth

    the modeling of fluids in vessels with smaller diameters and a method for introducing the deformational forces exerted on the arterial flows from the movement of the heart by borrowing concepts from cosmodynamics are presented. These additional forces have a great impact on the endothelial shear stress. Third, the fluid model is extended to not only recover Navier-Stokes hydrodynamics, but also a wider range of Knudsen numbers, which is especially important in micro- and nano-scale flows. The tradeoffs of many optimizations methods such as the use of deep halo level ghost cells that, alongside hybrid programming models, reduce the impact of such higher-order models and enable efficient modeling of extreme regimes of computational fluid dynamics are discussed. Fourth, the extension of these models to other research questions like clogging in microfluidic devices and determining the severity of co-arctation of the aorta is presented. Through this work, a validation of these methods by taking real patient data and the measured pressure value before the narrowing of the aorta and predicting the pressure drop across the co-arctation is shown. Comparison with the measured pressure drop in vivo highlights the accuracy and potential impact of such patient specific simulations. Finally, a method to enable the simulation of longer trajectories in time by discretizing both spatially and temporally is presented. In this method, a serial coarse iterator is used to initialize data at discrete time steps for a fine model that runs in parallel. This coarse solver is based on a larger time step and typically a coarser discretization in space. Iterative refinement enables the compute-intensive fine iterator to be modeled with temporal parallelization. The algorithm consists of a series of prediction-corrector iterations completing when the results have converged within a certain tolerance. Combined, these developments allow large fluid models to be simulated for longer time durations

  8. Massively Parallel Geostatistical Inversion of Coupled Processes in Heterogeneous Porous Media

    NASA Astrophysics Data System (ADS)

    Ngo, A.; Schwede, R. L.; Li, W.; Bastian, P.; Ippisch, O.; Cirpka, O. A.

    2012-04-01

    another level of parallelization has been added.

  9. Massively-parallel neuromonitoring and neurostimulation rodent headset with nanotextured flexible microelectrodes.

    PubMed

    Bagheri, Arezu; Gabran, S R I; Salam, Muhammad Tariqus; Perez Velazquez, Jose Luis; Mansour, Raafat R; Salama, M M A; Genov, Roman

    2013-10-01

    We present a compact wireless headset for simultaneous multi-site neuromonitoring and neurostimulation in the rodent brain. The system comprises flexible-shaft microelectrodes, neural amplifiers, neurostimulators, a digital time-division multiplexer (TDM), a micro-controller and a ZigBee wireless transceiver. The system is built by parallelizing up to four 0.35 μm CMOS integrated circuits (each having 256 neural amplifiers and 64 neurostimulators) to provide a total maximum of 1024 neural amplifiers and 256 neurostimulators. Each bipolar neural amplifier features 54 dB-72 dB adjustable gain, 1 Hz-5 kHz adjustable bandwidth with an input-referred noise of 7.99 μVrms and dissipates 12.9 μW. Each current-mode bipolar neurostimulator generates programmable arbitrary-waveform biphasic current in the range of 20-250 μA and dissipates 2.6 μW in the stand-by mode. Reconfigurability is provided by stacking a set of dedicated mini-PCBs that share a common signaling bus within as small as 22 × 30 × 15 mm³ volume. The system features flexible polyimide-based microelectrode array design that is not brittle and increases pad packing density. Pad nanotexturing by electrodeposition reduces the electrode-tissue interface impedance from an average of 2 MΩ to 30 kΩ at 100 Hz. The rodent headset and the microelectrode array have been experimentally validated in vivo in freely moving rats for two months. We demonstrate 92.8 percent seizure rate reduction by responsive neurostimulation in an acute epilepsy rat model. PMID:24144667

  10. High-Throughput Massively Parallel Sequencing for Fetal Aneuploidy Detection from Maternal Plasma

    PubMed Central

    Džakula, Željko; Kim, Sung K.; Mazloom, Amin R.; Zhu, Zhanyang; Tynan, John; Lu, Tim; McLennan, Graham; Palomaki, Glenn E.; Canick, Jacob A.; Oeth, Paul; Deciu, Cosmin; van den Boom, Dirk; Ehrich, Mathias

    2013-01-01

    Background Circulating cell-free (ccf) fetal DNA comprises 3–20% of all the cell-free DNA present in maternal plasma. Numerous research and clinical studies have described the analysis of ccf DNA using next generation sequencing for the detection of fetal aneuploidies with high sensitivity and specificity. We sought to extend the utility of this approach by assessing semi-automated library preparation, higher sample multiplexing during sequencing, and improved bioinformatic tools to enable a higher throughput, more efficient assay while maintaining or improving clinical performance. Methods Whole blood (10mL) was collected from pregnant female donors and plasma separated using centrifugation. Ccf DNA was extracted using column-based methods. Libraries were prepared using an optimized semi-automated library preparation method and sequenced on an Illumina HiSeq2000 sequencer in a 12-plex format. Z-scores were calculated for affected chromosomes using a robust method after normalization and genomic segment filtering. Classification was based upon a standard normal transformed cutoff value of z = 3 for chromosome 21 and z = 3.95 for chromosomes 18 and 13. Results Two parallel assay development studies using a total of more than 1900 ccf DNA samples were performed to evaluate the technical feasibility of automating library preparation and increasing the sample multiplexing level. These processes were subsequently combined and a study of 1587 samples was completed to verify the stability of the process-optimized assay. Finally, an unblinded clinical evaluation of 1269 euploid and aneuploid samples utilizing this high-throughput assay coupled to improved bioinformatic procedures was performed. We were able to correctly detect all aneuploid cases with extremely low false positive rates of 0.09%, <0.01%, and 0.08% for trisomies 21, 18, and 13, respectively. Conclusions These data suggest that the developed laboratory methods in concert with improved bioinformatic

  11. A parallel domain decomposition-based implicit method for the Cahn-Hilliard-Cook phase-field equation in 3D

    NASA Astrophysics Data System (ADS)

    Zheng, Xiang; Yang, Chao; Cai, Xiao-Chuan; Keyes, David

    2015-03-01

    We present a numerical algorithm for simulating the spinodal decomposition described by the three dimensional Cahn-Hilliard-Cook (CHC) equation, which is a fourth-order stochastic partial differential equation with a noise term. The equation is discretized in space and time based on a fully implicit, cell-centered finite difference scheme, with an adaptive time-stepping strategy designed to accelerate the progress to equilibrium. At each time step, a parallel Newton-Krylov-Schwarz algorithm is used to solve the nonlinear system. We discuss various numerical and computational challenges associated with the method. The numerical scheme is validated by a comparison with an explicit scheme of high accuracy (and unreasonably high cost). We present steady state solutions of the CHC equation in two and three dimensions. The effect of the thermal fluctuation on the spinodal decomposition process is studied. We show that the existence of the thermal fluctuation accelerates the spinodal decomposition process and that the final steady morphology is sensitive to the stochastic noise. We also show the evolution of the energies and statistical moments. In terms of the parallel performance, it is found that the implicit domain decomposition approach scales well on supercomputers with a large number of processors.

  12. A parallel domain decomposition-based implicit method for the Cahn–Hilliard–Cook phase-field equation in 3D

    SciTech Connect

    Zheng, Xiang; Yang, Chao; Cai, Xiao-Chuan; Keyes, David

    2015-03-15

    We present a numerical algorithm for simulating the spinodal decomposition described by the three dimensional Cahn–Hilliard–Cook (CHC) equation, which is a fourth-order stochastic partial differential equation with a noise term. The equation is discretized in space and time based on a fully implicit, cell-centered finite difference scheme, with an adaptive time-stepping strategy designed to accelerate the progress to equilibrium. At each time step, a parallel Newton–Krylov–Schwarz algorithm is used to solve the nonlinear system. We discuss various numerical and computational challenges associated with the method. The numerical scheme is validated by a comparison with an explicit scheme of high accuracy (and unreasonably high cost). We present steady state solutions of the CHC equation in two and three dimensions. The effect of the thermal fluctuation on the spinodal decomposition process is studied. We show that the existence of the thermal fluctuation accelerates the spinodal decomposition process and that the final steady morphology is sensitive to the stochastic noise. We also show the evolution of the energies and statistical moments. In terms of the parallel performance, it is found that the implicit domain decomposition approach scales well on supercomputers with a large number of processors.

  13. NIF Ignition Target 3D Point Design

    SciTech Connect

    Jones, O; Marinak, M; Milovich, J; Callahan, D

    2008-11-05

    We have developed an input file for running 3D NIF hohlraums that is optimized such that it can be run in 1-2 days on parallel computers. We have incorporated increasing levels of automation into the 3D input file: (1) Configuration controlled input files; (2) Common file for 2D and 3D, different types of capsules (symcap, etc.); and (3) Can obtain target dimensions, laser pulse, and diagnostics settings automatically from NIF Campaign Management Tool. Using 3D Hydra calculations to investigate different problems: (1) Intrinsic 3D asymmetry; (2) Tolerance to nonideal 3D effects (e.g. laser power balance, pointing errors); and (3) Synthetic diagnostics.

  14. A Parallel 3d Model for The Multi-Species Low Energy BeamTransport System of the RIA Prototype ECR Ion Source Venus

    SciTech Connect

    Qiang, J.; Leitner, D.; Todd, D.

    2005-05-16

    The driver linac of the proposed Rare Isotope Accelerator (RIA) requires a great variety of high intensity, high charge state ion beams. In order to design and to optimize the low energy beamline optics of the RIA front end,we have developed a new parallel three-dimensional model to simulate the low energy, multi-species ion beam formation and transport from the ECR ion source extraction region to the focal plane of the analyzing magnet. A multisection overlapped computational domain has been used to break the original transport system into a number of each subsystem, macro-particle tracking is used to obtain the charge density distribution in this subdomain. The three-dimensional Poisson equation is solved within the subdomain and particle tracking is repeated until the solution converges. Two new Poisson solvers based on a combination of the spectral method and the multigrid method have been developed to solve the Poisson equation in cylindrical coordinates for the beam extraction region and in the Frenet-Serret coordinates for the bending magnet region. Some test examples and initial applications will also be presented.

  15. Assessing mutant p53 in primary high-grade serous ovarian cancer using immunohistochemistry and massively parallel sequencing

    PubMed Central

    Cole, Alexander J.; Dwight, Trisha; Gill, Anthony J.; Dickson, Kristie-Ann; Zhu, Ying; Clarkson, Adele; Gard, Gregory B.; Maidens, Jayne; Valmadre, Susan; Clifton-Bligh, Roderick; Marsh, Deborah J.

    2016-01-01

    The tumour suppressor p53 is mutated in cancer, including over 96% of high-grade serous ovarian cancer (HGSOC). Mutations cause loss of wild-type p53 function due to either gain of abnormal function of mutant p53 (mutp53), or absent to low mutp53. Massively parallel sequencing (MPS) enables increased accuracy of detection of somatic variants in heterogeneous tumours. We used MPS and immunohistochemistry (IHC) to characterise HGSOCs for TP53 mutation and p53 expression. TP53 mutation was identified in 94% (68/72) of HGSOCs, 62% of which were missense. Missense mutations demonstrated high p53 by IHC, as did 35% (9/26) of non-missense mutations. Low p53 was seen by IHC in 62% of HGSOC associated with non-missense mutations. Most wild-type TP53 tumours (75%, 6/8) displayed intermediate p53 levels. The overall sensitivity of detecting a TP53 mutation based on classification as ‘Low’, ‘Intermediate’ or ‘High’ for p53 IHC was 99%, with a specificity of 75%. We suggest p53 IHC can be used as a surrogate marker of TP53 mutation in HGSOC; however, this will result in misclassification of a proportion of TP53 wild-type and mutant tumours. Therapeutic targeting of mutp53 will require knowledge of both TP53 mutations and mutp53 expression. PMID:27189670

  16. Massively parallel E-beam inspection: enabling next-generation patterned defect inspection for wafer and mask manufacturing

    NASA Astrophysics Data System (ADS)

    Malloy, Matt; Thiel, Brad; Bunday, Benjamin D.; Wurm, Stefan; Mukhtar, Maseeh; Quoi, Kathy; Kemen, Thomas; Zeidler, Dirk; Eberle, Anna Lena; Garbowski, Tomasz; Dellemann, Gregor; Peters, Jan Hendrik

    2015-03-01

    SEMATECH aims to identify and enable disruptive technologies to meet the ever-increasing demands of semiconductor high volume manufacturing (HVM). As such, a program was initiated in 2012 focused on high-speed e-beam defect inspection as a complement, and eventual successor, to bright field optical patterned defect inspection [1]. The primary goal is to enable a new technology to overcome the key gaps that are limiting modern day inspection in the fab; primarily, throughput and sensitivity to detect ultra-small critical defects. The program specifically targets revolutionary solutions based on massively parallel e-beam technologies, as opposed to incremental improvements to existing e-beam and optical inspection platforms. Wafer inspection is the primary target, but attention is also being paid to next generation mask inspection. During the first phase of the multi-year program multiple technologies were reviewed, a down-selection was made to the top candidates, and evaluations began on proof of concept systems. A champion technology has been selected and as of late 2014 the program has begun to move into the core technology maturation phase in order to enable eventual commercialization of an HVM system. Performance data from early proof of concept systems will be shown along with roadmaps to achieving HVM performance. SEMATECH's vision for moving from early-stage development to commercialization will be shown, including plans for development with industry leading technology providers.

  17. Non-CAR resists and advanced materials for Massively Parallel E-Beam Direct Write process integration

    NASA Astrophysics Data System (ADS)

    Pourteau, Marie-Line; Servin, Isabelle; Lepinay, Kévin; Essomba, Cyrille; Dal'Zotto, Bernard; Pradelles, Jonathan; Lattard, Ludovic; Brandt, Pieter; Wieland, Marco

    2016-03-01

    The emerging Massively Parallel-Electron Beam Direct Write (MP-EBDW) is an attractive high resolution high throughput lithography technology. As previously shown, Chemically Amplified Resists (CARs) meet process/integration specifications in terms of dose-to-size, resolution, contrast, and energy latitude. However, they are still limited by their line width roughness. To overcome this issue, we tested an alternative advanced non-CAR and showed it brings a substantial gain in sensitivity compared to CAR. We also implemented and assessed in-line post-lithographic treatments for roughness mitigation. For outgassing-reduction purpose, a top-coat layer is added to the total process stack. A new generation top-coat was tested and showed improved printing performances compared to the previous product, especially avoiding dark erosion: SEM cross-section showed a straight pattern profile. A spin-coatable charge dissipation layer based on conductive polyaniline has also been tested for conductivity and lithographic performances, and compatibility experiments revealed that the underlying resist type has to be carefully chosen when using this product. Finally, the Process Of Reference (POR) trilayer stack defined for 5 kV multi-e-beam lithography was successfully etched with well opened and straight patterns, and no lithography-etch bias.

  18. LiNbO3: A photovoltaic substrate for massive parallel manipulation and patterning of nano-objects

    NASA Astrophysics Data System (ADS)

    Carrascosa, M.; García-Cabañes, A.; Jubera, M.; Ramiro, J. B.; Agulló-López, F.

    2015-12-01

    The application of evanescent photovoltaic (PV) fields, generated by visible illumination of Fe:LiNbO3 substrates, for parallel massive trapping and manipulation of micro- and nano-objects is critically reviewed. The technique has been often referred to as photovoltaic or photorefractive tweezers. The main advantage of the new method is that the involved electrophoretic and/or dielectrophoretic forces do not require any electrodes and large scale manipulation of nano-objects can be easily achieved using the patterning capabilities of light. The paper describes the experimental techniques for particle trapping and the main reported experimental results obtained with a variety of micro- and nano-particles (dielectric and conductive) and different illumination configurations (single beam, holographic geometry, and spatial light modulator projection). The report also pays attention to the physical basis of the method, namely, the coupling of the evanescent photorefractive fields to the dielectric response of the nano-particles. The role of a number of physical parameters such as the contrast and spatial periodicities of the illumination pattern or the particle deposition method is discussed. Moreover, the main properties of the obtained particle patterns in relation to potential applications are summarized, and first demonstrations reviewed. Finally, the PV method is discussed in comparison to other patterning strategies, such as those based on the pyroelectric response and the electric fields associated to domain poling of ferroelectric materials.

  19. LiNbO{sub 3}: A photovoltaic substrate for massive parallel manipulation and patterning of nano-objects

    SciTech Connect

    Carrascosa, M.; García-Cabañes, A.; Jubera, M.; Ramiro, J. B.; Agulló-López, F.

    2015-12-15

    The application of evanescent photovoltaic (PV) fields, generated by visible illumination of Fe:LiNbO{sub 3} substrates, for parallel