3D seismic imaging on massively parallel computers
Womble, D.E.; Ober, C.C.; Oldfield, R.
1997-02-01
The ability to image complex geologies such as salt domes in the Gulf of Mexico and thrusts in mountainous regions is a key to reducing the risk and cost associated with oil and gas exploration. Imaging these structures, however, is computationally expensive. Datasets can be terabytes in size, and the processing time required for the multiple iterations needed to produce a velocity model can take months, even with the massively parallel computers available today. Some algorithms, such as 3D, finite-difference, prestack, depth migration remain beyond the capacity of production seismic processing. Massively parallel processors (MPPs) and algorithms research are the tools that will enable this project to provide new seismic processing capabilities to the oil and gas industry. The goals of this work are to (1) develop finite-difference algorithms for 3D, prestack, depth migration; (2) develop efficient computational approaches for seismic imaging and for processing terabyte datasets on massively parallel computers; and (3) develop a modular, portable, seismic imaging code.
Time efficient 3-D electromagnetic modeling on massively parallel computers
Alumbaugh, D.L.; Newman, G.A.
1995-08-01
A numerical modeling algorithm has been developed to simulate the electromagnetic response of a three dimensional earth to a dipole source for frequencies ranging from 100 to 100MHz. The numerical problem is formulated in terms of a frequency domain--modified vector Helmholtz equation for the scattered electric fields. The resulting differential equation is approximated using a staggered finite difference grid which results in a linear system of equations for which the matrix is sparse and complex symmetric. The system of equations is solved using a preconditioned quasi-minimum-residual method. Dirichlet boundary conditions are employed at the edges of the mesh by setting the tangential electric fields equal to zero. At frequencies less than 1MHz, normal grid stretching is employed to mitigate unwanted reflections off the grid boundaries. For frequencies greater than this, absorbing boundary conditions must be employed by making the stretching parameters of the modified vector Helmholtz equation complex which introduces loss at the boundaries. To allow for faster calculation of realistic models, the original serial version of the code has been modified to run on a massively parallel architecture. This modification involves three distinct tasks; (1) mapping the finite difference stencil to a processor stencil which allows for the necessary information to be exchanged between processors that contain adjacent nodes in the model, (2) determining the most efficient method to input the model which is accomplished by dividing the input into ``global`` and ``local`` data and then reading the two sets in differently, and (3) deciding how to output the data which is an inherently nonparallel process.
Massively parallel regularized 3D inversion of potential fields on CPUs and GPUs
NASA Astrophysics Data System (ADS)
Čuma, Martin; Zhdanov, Michael S.
2014-01-01
We have recently introduced a massively parallel regularized 3D inversion of potential fields data. This program takes as an input gravity or magnetic vector, tensor and Total Magnetic Intensity (TMI) measurements and produces 3D volume of density, susceptibility, or three dimensional magnetization vector, the latest also including magnetic remanence information. The code uses combined MPI and OpenMP approach that maps well onto current multiprocessor multicore clusters and exhibits nearly linear strong and weak parallel scaling. It has been used to invert regional to continental size data sets with up to billion cells of the 3D Earth's volume on large clusters for interpretation of large airborne gravity and magnetics surveys. In this paper we explain the features that made this massive parallelization feasible and extend the code to add GPU support in the form of the OpenACC directives. This implementation resulted in up to a 22x speedup as compared to the scalar multithreaded implementation on a 12 core Intel CPU based computer node. Furthermore, we also introduce a mixed single-double precision approach, which allows us to perform most of the calculation at a single floating point number precision while keeping the result as precise as if the double precision had been used. This approach provides an additional 40% speedup on the GPUs, as compared to the pure double precision implementation. It also has about half of the memory footprint of the fully double precision version.
3-D readout-electronics packaging for high-bandwidth massively paralleled imager
Kwiatkowski, Kris; Lyke, James
2007-12-18
Dense, massively parallel signal processing electronics are co-packaged behind associated sensor pixels. Microchips containing a linear or bilinear arrangement of photo-sensors, together with associated complex electronics, are integrated into a simple 3-D structure (a "mirror cube"). An array of photo-sensitive cells are disposed on a stacked CMOS chip's surface at a 45.degree. angle from light reflecting mirror surfaces formed on a neighboring CMOS chip surface. Image processing electronics are held within the stacked CMOS chip layers. Electrical connections couple each of said stacked CMOS chip layers and a distribution grid, the connections for distributing power and signals to components associated with each stacked CSMO chip layer.
PORTA: A Massively Parallel Code for 3D Non-LTE Polarized Radiative Transfer
NASA Astrophysics Data System (ADS)
Štěpán, J.
2014-10-01
The interpretation of the Stokes profiles of the solar (stellar) spectral line radiation requires solving a non-LTE radiative transfer problem that can be very complex, especially when the main interest lies in modeling the linear polarization signals produced by scattering processes and their modification by the Hanle effect. One of the main difficulties is due to the fact that the plasma of a stellar atmosphere can be highly inhomogeneous and dynamic, which implies the need to solve the non-equilibrium problem of generation and transfer of polarized radiation in realistic three-dimensional stellar atmospheric models. Here we present PORTA, a computer program we have developed for solving, in three-dimensional (3D) models of stellar atmospheres, the problem of the generation and transfer of spectral line polarization taking into account anisotropic radiation pumping and the Hanle and Zeeman effects in multilevel atoms. The numerical method of solution is based on a highly convergent iterative algorithm, whose convergence rate is insensitive to the grid size, and on an accurate short-characteristics formal solver of the Stokes-vector transfer equation which uses monotonic Bezier interpolation. In addition to the iterative method and the 3D formal solver, another important feature of PORTA is a novel parallelization strategy suitable for taking advantage of massively parallel computers. Linear scaling of the solution with the number of processors allows to reduce the solution time by several orders of magnitude. We present useful benchmarks and a few illustrations of applications using a 3D model of the solar chromosphere resulting from MHD simulations. Finally, we present our conclusions with a view to future research. For more details see Štěpán & Trujillo Bueno (2013).
Nguyen, B.T.; Hutchinson, S.A.
1995-07-01
The upwind leapfrog scheme for electromagnetic scattering is briefly described. Its application to the 3D Maxwell`s time domain equations is shown in detail. The scheme`s use of upwind characteristic variables and a narrow stencil result in a smaller demand in communication overhead, making it ideal for implementation on distributed memory parallel computers. The algorithm`s implementation on two message passing computers, a 1024-processor nCUBE 2 and a 1840-processor Intel Paragon, is described. Performance evaluation demonstrates that the scheme performs well with both good scaling qualities and high efficiencies on these machines.
NASA Astrophysics Data System (ADS)
Wang, S.; De Hoop, M. V.; Xia, J.; Li, X.
2011-12-01
We consider the modeling of elastic seismic wave propagation on a rectangular domain via the discretization and solution of the inhomogeneous coupled Helmholtz equation in 3D, by exploiting a parallel multifrontal sparse direct solver equipped with Hierarchically Semi-Separable (HSS) structure to reduce the computational complexity and storage. In particular, we are concerned with solving this equation on a large domain, for a large number of different forcing terms in the context of seismic problems in general, and modeling in particular. We resort to a parsimonious mixed grid finite differences scheme for discretizing the Helmholtz operator and Perfect Matched Layer boundaries, resulting in a non-Hermitian matrix. We make use of a nested dissection based domain decomposition, and introduce an approximate direct solver by developing a parallel HSS matrix compression, factorization, and solution approach. We cast our massive parallelization in the framework of the multifrontal method. The assembly tree is partitioned into local trees and a global tree. The local trees are eliminated independently in each processor, while the global tree is eliminated through massive communication. The solver for the inhomogeneous equation is a parallel hybrid between multifrontal and HSS structure. The computational complexity associated with the factorization is almost linear with the size of the Helmholtz matrix. Our numerical approach can be compared with the spectral element method in 3D seismic applications.
Massively parallel computation of 3D flow and reactions in chemical vapor deposition reactors
Salinger, A.G.; Shadid, J.N.; Hutchinson, S.A.; Hennigan, G.L.; Devine, K.D.; Moffat, H.K.
1997-12-01
Computer modeling of Chemical Vapor Deposition (CVD) reactors can greatly aid in the understanding, design, and optimization of these complex systems. Modeling is particularly attractive in these systems since the costs of experimentally evaluating many design alternatives can be prohibitively expensive, time consuming, and even dangerous, when working with toxic chemicals like Arsine (AsH{sub 3}): until now, predictive modeling has not been possible for most systems since the behavior is three-dimensional and governed by complex reaction mechanisms. In addition, CVD reactors often exhibit large thermal gradients, large changes in physical properties over regions of the domain, and significant thermal diffusion for gas mixtures with widely varying molecular weights. As a result, significant simplifications in the models have been made which erode the accuracy of the models` predictions. In this paper, the authors will demonstrate how the vast computational resources of massively parallel computers can be exploited to make possible the analysis of models that include coupled fluid flow and detailed chemistry in three-dimensional domains. For the most part, models have either simplified the reaction mechanisms and concentrated on the fluid flow, or have simplified the fluid flow and concentrated on rigorous reactions. An important CVD research thrust has been in detailed modeling of fluid flow and heat transfer in the reactor vessel, treating transport and reaction of chemical species either very simply or as a totally decoupled problem. Using the analogy between heat transfer and mass transfer, and the fact that deposition is often diffusion limited, much can be learned from these calculations; however, the effects of thermal diffusion, the change in physical properties with composition, and the incorporation of surface reaction mechanisms are not included in this model, nor can transitions to three-dimensional flows be detected.
Chang, H.; Solano, M.; VanDyke, J.P.; McMechan, G.A.; Epili, D.
1998-03-01
Portable, production-scale 3-D prestack Kirchhoff depth migration software capable of full-volume imaging has been successfully implemented and applied to a six-million trace (46.9 Gbyte) marine data set from a salt/subsalt play in the Gulf of Mexico. Velocity model building and updates use an image-driven strategy and were performed in a Sun Sparc environment. Images obtained by 3-D prestack migration after three velocity iterations are substantially better focused and reveal drilling targets that were not visible in images obtained from conventional 3-D poststack time migration. Amplitudes are well preserved, so anomalies associated with known reservoirs conform to the petrophysical predictions. Prototype development was on an 8-node Intel iPSC860 computer; the production version was run on an 1824-node Intel Paragon computer. The code has been successfully ported to CRAY (T3D) and Unix workstation (PVM) environments.
NASA Astrophysics Data System (ADS)
Schultz, A.
2010-12-01
3D forward solvers lie at the core of inverse formulations used to image the variation of electrical conductivity within the Earth's interior. This property is associated with variations in temperature, composition, phase, presence of volatiles, and in specific settings, the presence of groundwater, geothermal resources, oil/gas or minerals. The high cost of 3D solutions has been a stumbling block to wider adoption of 3D methods. Parallel algorithms for modeling frequency domain 3D EM problems have not achieved wide scale adoption, with emphasis on fairly coarse grained parallelism using MPI and similar approaches. The communications bandwidth as well as the latency required to send and receive network communication packets is a limiting factor in implementing fine grained parallel strategies, inhibiting wide adoption of these algorithms. Leading Graphics Processor Unit (GPU) companies now produce GPUs with hundreds of GPU processor cores per die. The footprint, in silicon, of the GPU's restricted instruction set is much smaller than the general purpose instruction set required of a CPU. Consequently, the density of processor cores on a GPU can be much greater than on a CPU. GPUs also have local memory, registers and high speed communication with host CPUs, usually through PCIe type interconnects. The extremely low cost and high computational power of GPUs provides the EM geophysics community with an opportunity to achieve fine grained (i.e. massive) parallelization of codes on low cost hardware. The current generation of GPUs (e.g. NVidia Fermi) provides 3 billion transistors per chip die, with nearly 500 processor cores and up to 6 GB of fast (DDR5) GPU memory. This latest generation of GPU supports fast hardware double precision (64 bit) floating point operations of the type required for frequency domain EM forward solutions. Each Fermi GPU board can sustain nearly 1 TFLOP in double precision, and multiple boards can be installed in the host computer system. We
Massively parallel visualization: Parallel rendering
Hansen, C.D.; Krogh, M.; White, W.
1995-12-01
This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume renderer use a MIMD approach. Implementations for these algorithms are presented for the Thinking Machines Corporation CM-5 MPP.
Massively parallel mathematical sieves
Montry, G.R.
1989-01-01
The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.
Soltz, R; Vranas, P; Blumrich, M; Chen, D; Gara, A; Giampap, M; Heidelberger, P; Salapura, V; Sexton, J; Bhanot, G
2007-04-11
The theory of the strong nuclear force, Quantum Chromodynamics (QCD), can be numerically simulated from first principles on massively-parallel supercomputers using the method of Lattice Gauge Theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures that it suggests. We demonstrate these methods on the BlueGene massively-parallel supercomputer and argue that LQCD and the BlueGene architecture are a natural match. This can be traced to the simple fact that LQCD is a regular lattice discretization of space into lattice sites while the BlueGene supercomputer is a discretization of space into compute nodes, and that both are constrained by requirements of locality. This simple relation is both technologically important and theoretically intriguing. The main result of this paper is the speedup of LQCD using up to 131,072 CPUs on the largest BlueGene/L supercomputer. The speedup is perfect with sustained performance of about 20% of peak. This corresponds to a maximum of 70.5 sustained TFlop/s. At these speeds LQCD and BlueGene are poised to produce the next generation of strong interaction physics theoretical results.
Parallel rendering techniques for massively parallel visualization
Hansen, C.; Krogh, M.; Painter, J.
1995-07-01
As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory. and parallelism of Massively Parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.
Holographic renormalization of 3D minimal massive gravity
NASA Astrophysics Data System (ADS)
Alishahiha, Mohsen; Qaemmaqami, Mohammad M.; Naseh, Ali; Shirzad, Ahmad
2016-01-01
We study holographic renormalization of 3D minimal massive gravity using the Chern-Simons-like formulation of the model. We explicitly present Gibbons- Hawking term as well as all counterterms needed to make the action finite in terms of dreibein and spin-connection. This can be used to find correlation functions of stress tensor of holographic dual field theory.
Finding evidence for massive neutrinos using 3D weak lensing
Kitching, T. D.; Heavens, A. F.; Verde, L.; Serra, P.; Melchiorri, A.
2008-05-15
In this paper we investigate the potential of 3D cosmic shear to constrain massive neutrino parameters. We find that if the total mass is substantial (near the upper limits from large scale structure, but setting aside the Ly alpha limit for now), then 3D cosmic shear+Planck is very sensitive to neutrino mass and one may expect that a next generation photometric redshift survey could constrain the number of neutrinos N{sub {nu}} and the sum of their masses m{sub {nu}}=im{sub i} to an accuracy of {delta}N{sub {nu}}{approx}0.08 and {delta}m{sub {nu}}{approx}0.03 eV, respectively. If in fact the masses are close to zero, then the errors weaken to {delta}N{sub {nu}}{approx}0.10 and {delta}m{sub {nu}}{approx}0.07 eV. In either case there is a factor 4 improvement over Planck alone. We use a Bayesian evidence method to predict joint expected evidence for N{sub {nu}} and m{sub {nu}}. We find that 3D cosmic shear combined with a Planck prior could provide 'substantial' evidence for massive neutrinos and be able to distinguish 'decisively' between many competing massive neutrino models. This technique should 'decisively' distinguish between models in which there are no massive neutrinos and models in which there are massive neutrinos with |N{sub {nu}}-3| > or approx. 0.35 and m{sub {nu}} > or approx. 0.25 eV. We introduce the notion of marginalized and conditional evidence when considering evidence for individual parameter values within a multiparameter model.
Efficient, massively parallel eigenvalue computation
NASA Technical Reports Server (NTRS)
Huo, Yan; Schreiber, Robert
1993-01-01
In numerical simulations of disordered electronic systems, one of the most common approaches is to diagonalize random Hamiltonian matrices and to study the eigenvalues and eigenfunctions of a single electron in the presence of a random potential. An effort to implement a matrix diagonalization routine for real symmetric dense matrices on massively parallel SIMD computers, the Maspar MP-1 and MP-2 systems, is described. Results of numerical tests and timings are also presented.
Massively parallel MRI detector arrays
NASA Astrophysics Data System (ADS)
Keil, Boris; Wald, Lawrence L.
2013-04-01
Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas via reception, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays.
Massively Parallel MRI Detector Arrays
Keil, Boris; Wald, Lawrence L
2013-01-01
Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758
Massively parallel MRI detector arrays.
Keil, Boris; Wald, Lawrence L
2013-04-01
Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas via reception, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called "ultimate" SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758
NASA Astrophysics Data System (ADS)
Popov, Anton; Kaus, Boris
2015-04-01
This software project aims at bringing the 3D lithospheric deformation modeling to a qualitatively different level. Our code LaMEM (Lithosphere and Mantle Evolution Model) is based on the following building blocks: * Massively-parallel data-distributed implementation model based on PETSc library * Light, stable and accurate staggered-grid finite difference spatial discretization * Marker-in-Cell pedictor-corector time discretization with Runge-Kutta 4-th order * Elastic stress rotation algorithm based on the time integration of the vorticity pseudo-vector * Staircase-type internal free surface boundary condition without artificial viscosity contrast * Geodynamically relevant visco-elasto-plastic rheology * Global velocity-pressure-temperature Newton-Raphson nonlinear solver * Local nonlinear solver based on FZERO algorithm * Coupled velocity-pressure geometric multigrid preconditioner with Galerkin coarsening Staggered grid finite difference, being inherently Eulerian and rather complicated discretization method, provides no natural treatment of free surface boundary condition. The solution based on the quasi-viscous sticky-air phase introduces significant viscosity contrasts and spoils the convergence of the iterative solvers. In LaMEM we are currently implementing an approximate stair-case type of the free surface boundary condition which excludes the empty cells and restores the solver convergence. Because of the mutual dependence of the stress and strain-rate tensor components, and their different spatial locations in the grid, there is no straightforward way of implementing the nonlinear rheology. In LaMEM we have developed and implemented an efficient interpolation scheme for the second invariant of the strain-rate tensor, that solves this problem. Scalable efficient linear solvers are the key components of the successful nonlinear problem solution. In LaMEM we have a range of PETSc-based preconditioning techniques that either employ a block factorization of
Seismic imaging on massively parallel computers
Ober, C.C.; Oldfield, R.A.; Womble, D.E.; Mosher, C.C.
1997-07-01
A key to reducing the risks and costs associated with oil and gas exploration is the fast, accurate imaging of complex geologies, such as salt domes in the Gulf of Mexico and overthrust regions in US onshore regions. Pre-stack depth migration generally yields the most accurate images, and one approach to this is to solve the scalar-wave equation using finite differences. Current industry computational capabilities are insufficient for the application of finite-difference, 3-D, prestack, depth-migration algorithms. High performance computers and state-of-the-art algorithms and software are required to meet this need. As part of an ongoing ACTI project funded by the US Department of Energy, the authors have developed a finite-difference, 3-D prestack, depth-migration code for massively parallel computer systems. The goal of this work is to demonstrate that massively parallel computers (thousands of processors) can be used efficiently for seismic imaging, and that sufficient computing power exists (or soon will exist) to make finite-difference, prestack, depth migration practical for oil and gas exploration.
Merlin - Massively parallel heterogeneous computing
NASA Technical Reports Server (NTRS)
Wittie, Larry; Maples, Creve
1989-01-01
Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.
Massively parallel quantum computer simulator
NASA Astrophysics Data System (ADS)
De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.
2007-01-01
We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.
Parallel 3-D method of characteristics in MPACT
Kochunas, B.; Dovvnar, T. J.; Liu, Z.
2013-07-01
A new parallel 3-D MOC kernel has been developed and implemented in MPACT which makes use of the modular ray tracing technique to reduce computational requirements and to facilitate parallel decomposition. The parallel model makes use of both distributed and shared memory parallelism which are implemented with the MPI and OpenMP standards, respectively. The kernel is capable of parallel decomposition of problems in space, angle, and by characteristic rays up to 0(104) processors. Initial verification of the parallel 3-D MOC kernel was performed using the Takeda 3-D transport benchmark problems. The eigenvalues computed by MPACT are within the statistical uncertainty of the benchmark reference and agree well with the averages of other participants. The MPACT k{sub eff} differs from the benchmark results for rodded and un-rodded cases by 11 and -40 pcm, respectively. The calculations were performed for various numbers of processors and parallel decompositions up to 15625 processors; all producing the same result at convergence. The parallel efficiency of the worst case was 60%, while very good efficiency (>95%) was observed for cases using 500 processors. The overall run time for the 500 processor case was 231 seconds and 19 seconds for the case with 15625 processors. Ongoing work is focused on developing theoretical performance models and the implementation of acceleration techniques to minimize the number of iterations to converge. (authors)
The 3D Death of a Massive Star
NASA Astrophysics Data System (ADS)
Kohler, Susanna
2015-07-01
What happens at the very end of a massive star's life, just before its core's collapse? A group led by Sean Couch (California Institute of Technology and Michigan State University) claim to have carried out the first three-dimensional simulations of these final few minutes — revealing new clues about the factors that can lead a massive star to explode in a catastrophic supernova at the end of its life. A Giant Collapses In dying massive stars, in-falling matter bounces off the of collapsed core, creating a shock wave. If the shock wave loses too much energy as it expands into the star, it can stall out — but further energy input can revive it and result in a successful explosion of the star as a core-collapse supernova. In simulations of this process, however, theorists have trouble getting the stars to consistently explode: the shocks often stall out and fail to revive. Couch and his group suggest that one reason might be that these simulations usually start at core collapse assuming spherical symmetry of the progenitor star. Adding Turbulence Couch and his collaborators suspect that the key is in the final minutes just before the star collapses. Models that assume a spherically-symmetric star can't include the effects of convection as the final shell of silicon is burned around the core — and those effects might have a significant impact! To test this hypothesis, the group ran fully 3D simulations of the final three minutes of the life of a 15 solar-mass star, ending with core collapse, bounce, and shock-revival. The outcome was striking: the 3D modeling introduced powerful turbulent convection (with speeds of several hundred km/s!) in the last few minutes of silicon-shell burning. As a result, the initial structure and motions in the star just before core collapse were very different from those in core-collapse simulations that use spherically-symmetric initial conditions. The turbulence was then further amplified during collapse and formation of the shock
Parallelization of Program to Optimize Simulated Trajectories (POST3D)
NASA Technical Reports Server (NTRS)
Hammond, Dana P.; Korte, John J. (Technical Monitor)
2001-01-01
This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.
Massively parallel femtosecond laser processing.
Hasegawa, Satoshi; Ito, Haruyasu; Toyoda, Haruyoshi; Hayasaki, Yoshio
2016-08-01
Massively parallel femtosecond laser processing with more than 1000 beams was demonstrated. Parallel beams were generated by a computer-generated hologram (CGH) displayed on a spatial light modulator (SLM). The key to this technique is to optimize the CGH in the laser processing system using a scheme called in-system optimization. It was analytically demonstrated that the number of beams is determined by the horizontal number of pixels in the SLM N_{SLM} that is imaged at the pupil plane of an objective lens and a distance parameter p_{d} obtained by dividing the distance between adjacent beams by the diffraction-limited beam diameter. A performance limitation of parallel laser processing in our system was estimated at N_{SLM} of 250 and p_{d} of 7.0. Based on these parameters, the maximum number of beams in a hexagonal close-packed structure was calculated to be 1189 by using an analytical equation. PMID:27505815
Massive fermion model in 3d and higher spin currents
NASA Astrophysics Data System (ADS)
Bonora, L.; Cvitan, M.; Prester, P. Dominis; de Souza, B. Lima; Smolić, I.
2016-05-01
We analyze the 3d free massive fermion theory coupled to external sources. The presence of a mass explicitly breaks parity invariance. We calculate two- and three-point functions of a gauge current and the energy momentum tensor and, for instance, obtain the well-known result that in the IR limit (but also in the UV one) we reconstruct the relevant CS action. We then couple the model to higher spin currents and explicitly work out the spin 3 case. In the UV limit we obtain an effective action which was proposed many years ago as a possible generalization of spin 3 CS action. In the IR limit we derive a different higher spin action. This analysis can evidently be generalized to higher spins. We also discuss the conservation and properties of the correlators we obtain in the intermediate steps of our derivation.
Parallel algorithm for computing 3-D reachable workspaces
NASA Astrophysics Data System (ADS)
Alameldin, Tarek K.; Sobh, Tarek M.
1992-03-01
The problem of computing the 3-D workspace for redundant articulated chains has applications in a variety of fields such as robotics, computer aided design, and computer graphics. The computational complexity of the workspace problem is at least NP-hard. The recent advent of parallel computers has made practical solutions for the workspace problem possible. Parallel algorithms for computing the 3-D workspace for redundant articulated chains with joint limits are presented. The first phase of these algorithms computes workspace points in parallel. The second phase uses workspace points that are computed in the first phase and fits a 3-D surface around the volume that encompasses the workspace points. The second phase also maps the 3- D points into slices, uses region filling to detect the holes and voids in the workspace, extracts the workspace boundary points by testing the neighboring cells, and tiles the consecutive contours with triangles. The proposed algorithms are efficient for computing the 3-D reachable workspace for articulated linkages, not only those with redundant degrees of freedom but also those with joint limits.
Parallelization of ARC3D with Computer-Aided Tools
NASA Technical Reports Server (NTRS)
Jin, Haoqiang; Hribar, Michelle; Yan, Jerry; Saini, Subhash (Technical Monitor)
1998-01-01
A series of efforts have been devoted to investigating methods of porting and parallelizing applications quickly and efficiently for new architectures, such as the SCSI Origin 2000 and Cray T3E. This report presents the parallelization of a CFD application, ARC3D, using the computer-aided tools, Cesspools. Steps of parallelizing this code and requirements of achieving better performance are discussed. The generated parallel version has achieved reasonably well performance, for example, having a speedup of 30 for 36 Cray T3E processors. However, this performance could not be obtained without modification of the original serial code. It is suggested that in many cases improving serial code and performing necessary code transformations are important parts for the automated parallelization process although user intervention in many of these parts are still necessary. Nevertheless, development and improvement of useful software tools, such as Cesspools, can help trim down many tedious parallelization details and improve the processing efficiency.
CALTRANS: A parallel, deterministic, 3D neutronics code
Carson, L.; Ferguson, J.; Rogers, J.
1994-04-01
Our efforts to parallelize the deterministic solution of the neutron transport equation has culminated in a new neutronics code CALTRANS, which has full 3D capability. In this article, we describe the layout and algorithms of CALTRANS and present performance measurements of the code on a variety of platforms. Explicit implementation of the parallel algorithms of CALTRANS using both the function calls of the Parallel Virtual Machine software package (PVM 3.2) and the Meiko CS-2 tagged message passing library (based on the Intel NX/2 interface) are provided in appendices.
Implementation of parallel matrix decomposition for NIKE3D on the KSR1 system
Su, Philip S.; Fulton, R.E.; Zacharia, T.
1995-06-01
New massively parallel computer architecture has revolutionized the design of computer algorithms and promises to have significant influence on algorithms for engineering computations. Realistic engineering problems using finite element analysis typically imply excessively large computational requirements. Parallel supercomputers that have the potential for significantly increasing calculation speeds can meet these computational requirements. This report explores the potential for the parallel Cholesky (U{sup T}DU) matrix decomposition algorithm on NIKE3D through actual computations. The examples of two- and three-dimensional nonlinear dynamic finite element problems are presented on the Kendall Square Research (KSR1) multiprocessor system, with 64 processors, at Oak Ridge National Laboratory. The numerical results indicate that the parallel Cholesky (U{sup T}DU) matrix decomposition algorithm is attractive for NIKE3D under multi-processor system environments.
A parallel algorithm for solving the 3d Schroedinger equation
Strickland, Michael; Yager-Elorriaga, David
2010-08-20
We describe a parallel algorithm for solving the time-independent 3d Schroedinger equation using the finite difference time domain (FDTD) method. We introduce an optimized parallelization scheme that reduces communication overhead between computational nodes. We demonstrate that the compute time, t, scales inversely with the number of computational nodes as t {proportional_to} (N{sub nodes}){sup -0.95} {sup {+-} 0.04}. This makes it possible to solve the 3d Schroedinger equation on extremely large spatial lattices using a small computing cluster. In addition, we present a new method for precisely determining the energy eigenvalues and wavefunctions of quantum states based on a symmetry constraint on the FDTD initial condition. Finally, we discuss the usage of multi-resolution techniques in order to speed up convergence on extremely large lattices.
Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU
Xia, Yong; Wang, Kuanquan; Zhang, Henggui
2015-01-01
Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation) and the other is the diffusion term of the monodomain model (partial differential equation). Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations. PMID:26581957
Seismic imaging on massively parallel computers
Ober, C.C.; Oldfield, R.; Womble, D.E.; VanDyke, J.; Dosanjh, S.
1996-03-01
Fast, accurate imaging of complex, oil-bearing geologies, such as overthrusts and salt domes, is the key to reducing the costs of domestic oil and gas exploration. Geophysicists say that the known oil reserves in the Gulf of Mexico could be significantly increased if accurate seismic imaging beneath salt domes was possible. A range of techniques exist for imaging these regions, but the highly accurate techniques involve the solution of the wave equation and are characterized by large data sets and large computational demands. Massively parallel computers can provide the computational power for these highly accurate imaging techniques. A brief introduction to seismic processing will be presented, and the implementation of a seismic-imaging code for distributed memory computers will be discussed. The portable code, Salvo, performs a wave equation-based, 3-D, prestack, depth imaging and currently runs on the Intel Paragon and the Cray T3D. It used MPI for portability, and has sustained 22 Mflops/sec/proc (compiled FORTRAN) on the Intel Paragon.
Parallel PAB3D: Experiences with a Prototype in MPI
NASA Technical Reports Server (NTRS)
Guerinoni, Fabio; Abdol-Hamid, Khaled S.; Pao, S. Paul
1998-01-01
PAB3D is a three-dimensional Navier Stokes solver that has gained acceptance in the research and industrial communities. It takes as computational domain, a set disjoint blocks covering the physical domain. This is the first report on the implementation of PAB3D using the Message Passing Interface (MPI), a standard for parallel processing. We discuss briefly the characteristics of tile code and define a prototype for testing. The principal data structure used for communication is derived from preprocessing "patching". We describe a simple interface (COMMSYS) for MPI communication, and some general techniques likely to be encountered when working on problems of this nature. Last, we identify levels of improvement from the current version and outline future work.
Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver
NASA Astrophysics Data System (ADS)
Moustafa, Salli; Dutka-Malen, Ivan; Plagne, Laurent; Ponçot, Angélique; Ramet, Pierre
2014-06-01
This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multicore+SIMD) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46 × 106 spatial cells and 1 × 1012 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40:74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool.
3D finite-difference seismic migration with parallel computers
Ober, C.C.; Gjertsen, R.; Minkoff, S.; Womble, D.E.
1998-11-01
The ability to image complex geologies such as salt domes in the Gulf of Mexico and thrusts in mountainous regions is essential for reducing the risk associated with oil exploration. Imaging these structures, however, is computationally expensive as datasets can be terabytes in size. Traditional ray-tracing migration methods cannot handle complex velocity variations commonly found near such salt structures. Instead the authors use the full 3D acoustic wave equation, discretized via a finite difference algorithm. They reduce the cost of solving the apraxial wave equation by a number of numerical techniques including the method of fractional steps and pipelining the tridiagonal solves. The imaging code, Salvo, uses both frequency parallelism (generally 90% efficient) and spatial parallelism (65% efficient). Salvo has been tested on synthetic and real data and produces clear images of the subsurface even beneath complicated salt structures.
A parallel algorithm for 3D dislocation dynamics
NASA Astrophysics Data System (ADS)
Wang, Zhiqiang; Ghoniem, Nasr; Swaminarayan, Sriram; LeSar, Richard
2006-12-01
Dislocation dynamics (DD), a discrete dynamic simulation method in which dislocations are the fundamental entities, is a powerful tool for investigation of plasticity, deformation and fracture of materials at the micron length scale. However, severe computational difficulties arising from complex, long-range interactions between these curvilinear line defects limit the application of DD in the study of large-scale plastic deformation. We present here the development of a parallel algorithm for accelerated computer simulations of DD. By representing dislocations as a 3D set of dislocation particles, we show here that the problem of an interacting ensemble of dislocations can be converted to a problem of a particle ensemble, interacting with a long-range force field. A grid using binary space partitioning is constructed to keep track of node connectivity across domains. We demonstrate the computational efficiency of the parallel micro-plasticity code and discuss how O(N) methods map naturally onto the parallel data structure. Finally, we present results from applications of the parallel code to deformation in single crystal fcc metals.
Parallel 3D Mortar Element Method for Adaptive Nonconforming Meshes
NASA Technical Reports Server (NTRS)
Feng, Huiyu; Mavriplis, Catherine; VanderWijngaart, Rob; Biswas, Rupak
2004-01-01
High order methods are frequently used in computational simulation for their high accuracy. An efficient way to avoid unnecessary computation in smooth regions of the solution is to use adaptive meshes which employ fine grids only in areas where they are needed. Nonconforming spectral elements allow the grid to be flexibly adjusted to satisfy the computational accuracy requirements. The method is suitable for computational simulations of unsteady problems with very disparate length scales or unsteady moving features, such as heat transfer, fluid dynamics or flame combustion. In this work, we select the Mark Element Method (MEM) to handle the non-conforming interfaces between elements. A new technique is introduced to efficiently implement MEM in 3-D nonconforming meshes. By introducing an "intermediate mortar", the proposed method decomposes the projection between 3-D elements and mortars into two steps. In each step, projection matrices derived in 2-D are used. The two-step method avoids explicitly forming/deriving large projection matrices for 3-D meshes, and also helps to simplify the implementation. This new technique can be used for both h- and p-type adaptation. This method is applied to an unsteady 3-D moving heat source problem. With our new MEM implementation, mesh adaptation is able to efficiently refine the grid near the heat source and coarsen the grid once the heat source passes. The savings in computational work resulting from the dynamic mesh adaptation is demonstrated by the reduction of the the number of elements used and CPU time spent. MEM and mesh adaptation, respectively, bring irregularity and dynamics to the computer memory access pattern. Hence, they provide a good way to gauge the performance of computer systems when running scientific applications whose memory access patterns are irregular and unpredictable. We select a 3-D moving heat source problem as the Unstructured Adaptive (UA) grid benchmark, a new component of the NAS Parallel
Li, Yong Gang; Yang, Yang; Short, Michael P.; Ding, Ze Jun; Zeng, Zhi; Li, Ju
2015-01-01
SRIM-like codes have limitations in describing general 3D geometries, for modeling radiation displacements and damage in nanostructured materials. A universal, computationally efficient and massively parallel 3D Monte Carlo code, IM3D, has been developed with excellent parallel scaling performance. IM3D is based on fast indexing of scattering integrals and the SRIM stopping power database, and allows the user a choice of Constructive Solid Geometry (CSG) or Finite Element Triangle Mesh (FETM) method for constructing 3D shapes and microstructures. For 2D films and multilayers, IM3D perfectly reproduces SRIM results, and can be ∼102 times faster in serial execution and > 104 times faster using parallel computation. For 3D problems, it provides a fast approach for analyzing the spatial distributions of primary displacements and defect generation under ion irradiation. Herein we also provide a detailed discussion of our open-source collision cascade physics engine, revealing the true meaning and limitations of the “Quick Kinchin-Pease” and “Full Cascades” options. The issues of femtosecond to picosecond timescales in defining displacement versus damage, the limitation of the displacements per atom (DPA) unit in quantifying radiation damage (such as inadequacy in quantifying degree of chemical mixing), are discussed. PMID:26658477
NASA Astrophysics Data System (ADS)
Li, Yong Gang; Yang, Yang; Short, Michael P.; Ding, Ze Jun; Zeng, Zhi; Li, Ju
2015-12-01
SRIM-like codes have limitations in describing general 3D geometries, for modeling radiation displacements and damage in nanostructured materials. A universal, computationally efficient and massively parallel 3D Monte Carlo code, IM3D, has been developed with excellent parallel scaling performance. IM3D is based on fast indexing of scattering integrals and the SRIM stopping power database, and allows the user a choice of Constructive Solid Geometry (CSG) or Finite Element Triangle Mesh (FETM) method for constructing 3D shapes and microstructures. For 2D films and multilayers, IM3D perfectly reproduces SRIM results, and can be ∼102 times faster in serial execution and > 104 times faster using parallel computation. For 3D problems, it provides a fast approach for analyzing the spatial distributions of primary displacements and defect generation under ion irradiation. Herein we also provide a detailed discussion of our open-source collision cascade physics engine, revealing the true meaning and limitations of the “Quick Kinchin-Pease” and “Full Cascades” options. The issues of femtosecond to picosecond timescales in defining displacement versus damage, the limitation of the displacements per atom (DPA) unit in quantifying radiation damage (such as inadequacy in quantifying degree of chemical mixing), are discussed.
Multigrid on massively parallel architectures
Falgout, R D; Jones, J E
1999-09-17
The scalable implementation of multigrid methods for machines with several thousands of processors is investigated. Parallel performance models are presented for three different structured-grid multigrid algorithms, and a description is given of how these models can be used to guide implementation. Potential pitfalls are illustrated when moving from moderate-sized parallelism to large-scale parallelism, and results are given from existing multigrid codes to support the discussion. Finally, the use of mixed programming models is investigated for multigrid codes on clusters of SMPs.
Computational fluid dynamics on a massively parallel computer
NASA Technical Reports Server (NTRS)
Jespersen, Dennis C.; Levit, Creon
1989-01-01
A finite difference code was implemented for the compressible Navier-Stokes equations on the Connection Machine, a massively parallel computer. The code is based on the ARC2D/ARC3D program and uses the implicit factored algorithm of Beam and Warming. The codes uses odd-even elimination to solve linear systems. Timings and computation rates are given for the code, and a comparison is made with a Cray XMP.
Warped black holes in 3D general massive gravity
NASA Astrophysics Data System (ADS)
Tonni, Erik
2010-08-01
We study regular spacelike warped black holes in the three dimensional general massive gravity model, which contains both the gravitational Chern-Simons term and the linear combination of curvature squared terms characterizing the new massive gravity besides the Einstein-Hilbert term. The parameters of the metric are found by solving a quartic equation, constrained by an inequality that imposes the absence of closed timelike curves. Explicit expressions for the central charges are suggested by exploiting the fact that these black holes are discrete quotients of spacelike warped AdS 3 and a known formula for the entropy. Previous results obtained separately in topological massive gravity and in new massive gravity are recovered as special cases.
Massively parallel neurocomputing for aerospace applications
NASA Technical Reports Server (NTRS)
Fijany, Amir; Barhen, Jacob; Toomarian, Nikzad
1993-01-01
An innovative hybrid, analog-digital charge-domain technology, for the massively parallel VLSI implementation of certain large scale matrix-vector operations, has recently been introduced. It employs arrays of Charge Coupled/Charge Injection Device cells holding an analog matrix of charge, which process digital vectors in parallel by means of binary, non-destructive charge transfer operations. The impact of this technology on massively parallel processing is discussed. Fundamentally new classes of algorithms, specifically designed for this emerging technology, as applied to signal processing, are derived.
Type D solutions of 3D new massive gravity
Ahmedov, Haji; Aliev, Alikram N.
2011-04-15
In a recent reformulation of three-dimensional new massive gravity, the field equations of the theory consist of a massive (tensorial) Klein-Gordon type equation with a curvature-squared source term and a constraint equation. Using this framework, we present all algebraic type D solutions of new massive gravity with constant and nonconstant scalar curvatures. For constant scalar curvature, they include homogeneous anisotropic solutions which encompass both solutions originating from topologically massive gravity, Bianchi types II, VIII, IX, and those of non-topologically massive gravity origin, Bianchi types VI{sub 0} and VII{sub 0}. For a special relation between the cosmological and mass parameters, {lambda}=m{sup 2}, they also include conformally flat solutions, and, in particular, those being locally isometric to the previously-known Kaluza-Klein type AdS{sub 2}xS{sup 1} or dS{sub 2}xS{sup 1} solutions. For nonconstant scalar curvature, all the solutions are conformally flat and exist only for {lambda}=m{sup 2}. We find two general metrics which possess at least one Killing vector and comprise all such solutions. We also discuss some properties of these solutions, delineating among them black hole type solutions.
Matter coupling in 3D ‘minimal massive gravity’
NASA Astrophysics Data System (ADS)
Arvanitakis, Alex S.; Routh, Alasdair J.; Townsend, Paul K.
2014-12-01
The ‘minimal massive gravity’ model of massive gravity in three spacetime dimensions (which has the same anti-de Sitter (AdS) bulk properties as ‘topologically massive gravity’ but improved boundary properties) is coupled to matter. Consistency requires a particular matter source tensor, which is quadratic in the stress tensor. The consequences are explored for an ideal fluid in the context of asymptotically de Sitter (dS) cosmological solutions, which bounce smoothly from contraction to expansion. Various vacuum solutions are also found, including warped (A)dS, and (for special values of parameters) static black holes and an (A)dS2× {{S}1} vacuum.
Massively Parallel Computing: A Sandia Perspective
Dosanjh, Sudip S.; Greenberg, David S.; Hendrickson, Bruce; Heroux, Michael A.; Plimpton, Steve J.; Tomkins, James L.; Womble, David E.
1999-05-06
The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is beginning to realize its potential for enabling significant break-throughs in science and engineering. This paper provides a perspective on the state of the field, colored by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address trends in hardware, system software and algorithms, and we also offer our view of the forces shaping the parallel computing industry.
A 3D parallel model of Ganymede's exosphere
NASA Astrophysics Data System (ADS)
Leclercq, Ludivine; Turc, Lucile; François, Leblanc; Ronan, Modolo
2013-04-01
Ganymede is a unique object : it is the biggest moon of our solar system, and the only satellite which has its own intrinsic magnetic field. Its surface is covered by water ice and by regolith. Some previous observations suggest that below its surface may exist an ocean of liquid water. The atmosphere of the planet is poorly known but should be composed essentially of water, hydrogen and oxygen (Marconi et al., Icarus, 2007). These atmospheric particles mainly originate from the surface thanks to sublimation of water-ice and sputtering, a process driven by the magnetospheric Jovian particles impacting Ganymede surface and leading to ejection of atoms and molecules into Ganymede atmosphere. We developed a model of Ganymede's atmosphere based on a 3D Monte Carlo description of the fate of the ejected particles from the surface. This model has been parallelized allowing a much better statistical, spatial and temporal description of Ganymede's environment. This model includes the main sources of the neutral atmosphere and is able to calculate all its characteristics. It was successfully compared to the few known observations as well as to previous modeling. In this presentation, we will present the main characteristics of this model and what it tells us on Ganymede's atmosphere, in terms of spatial structure, composition, temporal variability and relations with both magnetosphere and surface.
Kolotilina, L.; Nikishin, A.; Yeremin, A.
1994-12-31
The solution of large systems of linear equations is a crucial bottleneck when performing 3D finite element analysis of structures. Also, in many cases the reliability and robustness of iterative solution strategies, and their efficiency when exploiting hardware resources, fully determine the scope of industrial applications which can be solved on a particular computer platform. This is especially true for modern vector/parallel supercomputers with large vector length and for modern massively parallel supercomputers. Preconditioned iterative methods have been successfully applied to industrial class finite element analysis of structures. The construction and application of high quality preconditioners constitutes a high percentage of the total solution time. Parallel implementation of high quality preconditioners on such architectures is a formidable challenge. Two common types of existing preconditioners are the implicit preconditioners and the explicit preconditioners. The implicit preconditioners (e.g. incomplete factorizations of several types) are generally high quality but require solution of lower and upper triangular systems of equations per iteration which are difficult to parallelize without deteriorating the convergence rate. The explicit type of preconditionings (e.g. polynomial preconditioners or Jacobi-like preconditioners) require sparse matrix-vector multiplications and can be parallelized but their preconditioning qualities are less than desirable. The authors present results of numerical experiments with Factorized Sparse Approximate Inverses (FSAI) for symmetric positive definite linear systems. These are high quality preconditioners that possess a large resource of parallelism by construction without increasing the serial complexity.
Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing
NASA Technical Reports Server (NTRS)
Fricker, David M.
1997-01-01
The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.
Massive parallelism in the future of science
NASA Technical Reports Server (NTRS)
Denning, Peter J.
1988-01-01
Massive parallelism appears in three domains of action of concern to scientists, where it produces collective action that is not possible from any individual agent's behavior. In the domain of data parallelism, computers comprising very large numbers of processing agents, one for each data item in the result will be designed. These agents collectively can solve problems thousands of times faster than current supercomputers. In the domain of distributed parallelism, computations comprising large numbers of resource attached to the world network will be designed. The network will support computations far beyond the power of any one machine. In the domain of people parallelism collaborations among large groups of scientists around the world who participate in projects that endure well past the sojourns of individuals within them will be designed. Computing and telecommunications technology will support the large, long projects that will characterize big science by the turn of the century. Scientists must become masters in these three domains during the coming decade.
Massively parallel sequencing and rare disease
Ng, Sarah B.; Nickerson, Deborah A.; Bamshad, Michael J.; Shendure, Jay
2010-01-01
Massively parallel sequencing has enabled the rapid, systematic identification of variants on a large scale. This has, in turn, accelerated the pace of gene discovery and disease diagnosis on a molecular level and has the potential to revolutionize methods particularly for the analysis of Mendelian disease. Using massively parallel sequencing has enabled investigators to interrogate variants both in the context of linkage intervals and also on a genome-wide scale, in the absence of linkage information entirely. The primary challenge now is to distinguish between background polymorphisms and pathogenic mutations. Recently developed strategies for rare monogenic disorders have met with some early success. These strategies include filtering for potential causal variants based on frequency and function, and also ranking variants based on conservation scores and predicted deleteriousness to protein structure. Here, we review the recent literature in the use of high-throughput sequence data and its analysis in the discovery of causal mutations for rare disorders. PMID:20846941
Associative massively parallel processor for video processing
NASA Astrophysics Data System (ADS)
Krikelis, Argy; Tawiah, T.
1996-03-01
Massively parallel processing architectures have matured primarily through image processing and computer vision application. The similarity of processing requirements between these areas and video processing suggest that they should be very appropriate for video processing applications. This research describes the use of an associative massively parallel processing based system for video compression which includes architectural and system description, discussion of the implementation of compression tasks such as DCT/IDCT, Motion Estimation and Quantization and system evaluation. The core of the processing system is the ASP (Associative String Processor) architecture a modular massively parallel, programmable and inherently fault-tolerant fine-grain SIMD processing architecture incorporating a string of identical APEs (Associative Processing Elements), a reconfigurable inter-processor communication network and a Vector Data Buffer for fully-overlapped data input-output. For video compression applications a prototype system is developed, which is using ASP modules to implement the required compression tasks. This scheme leads to a linear speed up of the computation by simply adding more APEs to the modules.
Template based parallel checkpointing in a massively parallel computer system
Archer, Charles Jens; Inglett, Todd Alan
2009-01-13
A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
Efficient communication in massively parallel computers
Cypher, R.E.
1989-01-01
A fundamental operation in parallel computation is sorting. Sorting is important not only because it is required by many algorithms, but also because it can be used to implement irregular, pointer-based communication. The author studies two algorithms for sorting in massively parallel computers. First, he examines Shellsort. Shellsort is a sorting algorithm that is based on a sequence of parameters called increments. Shellsort can be used to create a parallel sorting device known as a sorting network. Researchers have suggested that if the correct increment sequence is used, an optimal size sorting network can be obtained. All published increment sequences have been monotonically decreasing. He shows that no monotonically decreasing increment sequence will yield an optimal size sorting network. Second, he presents a sorting algorithm called Cubesort. Cubesort is the fastest known sorting algorithm for a variety of parallel computers aver a wide range of parameters. He also presents a paradigm for developing parallel algorithms that have efficient communication. The paradigm, called the data reduction paradigm, consists of using a divide-and-conquer strategy. Both the division and combination phases of the divide-and-conquer algorithm may require irregular, pointer-based communication between processors. However, the problem is divided so as to limit the amount of data that must be communicated. As a result the communication can be performed efficiently. He presents data reduction algorithms for the image component labeling problem, the closest pair problem and four versions of the parallel prefix problem.
Sofronov, I.D.; Voronin, B.L.; Butnev, O.I.
1997-12-31
The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle. The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.
Xie, G.; Li, J.; Majer, E.; Zuo, D.
1998-07-01
This paper describes a new 3D parallel GILD electromagnetic (EM) modeling and nonlinear inversion algorithm. The algorithm consists of: (a) a new magnetic integral equation instead of the electric integral equation to solve the electromagnetic forward modeling and inverse problem; (b) a collocation finite element method for solving the magnetic integral and a Galerkin finite element method for the magnetic differential equations; (c) a nonlinear regularizing optimization method to make the inversion stable and of high resolution; and (d) a new parallel 3D modeling and inversion using a global integral and local differential domain decomposition technique (GILD). The new 3D nonlinear electromagnetic inversion has been tested with synthetic data and field data. The authors obtained very good imaging for the synthetic data and reasonable subsurface EM imaging for the field data. The parallel algorithm has high parallel efficiency over 90% and can be a parallel solver for elliptic, parabolic, and hyperbolic modeling and inversion. The parallel GILD algorithm can be extended to develop a high resolution and large scale seismic and hydrology modeling and inversion in the massively parallel computer.
Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA.
Mrozek, Dariusz; Brożek, Miłosz; Małysiak-Mrozek, Bożena
2014-02-01
Searching for similar 3D protein structures is one of the primary processes employed in the field of structural bioinformatics. However, the computational complexity of this process means that it is constantly necessary to search for new methods that can perform such a process faster and more efficiently. Finding molecular substructures that complex protein structures have in common is still a challenging task, especially when entire databases containing tens or even hundreds of thousands of protein structures must be scanned. Graphics processing units (GPUs) and general purpose graphics processing units (GPGPUs) can perform many time-consuming and computationally demanding processes much more quickly than a classical CPU can. In this paper, we describe the GPU-based implementation of the CASSERT algorithm for 3D protein structure similarity searching. This algorithm is based on the two-phase alignment of protein structures when matching fragments of the compared proteins. The GPU (GeForce GTX 560Ti: 384 cores, 2GB RAM) implementation of CASSERT ("GPU-CASSERT") parallelizes both alignment phases and yields an average 180-fold increase in speed over its CPU-based, single-core implementation on an Intel Xeon E5620 (2.40GHz, 4 cores). In this paper, we show that massive parallelization of the 3D structure similarity search process on many-core GPU devices can reduce the execution time of the process, allowing it to be performed in real time. GPU-CASSERT is available at: http://zti.polsl.pl/dmrozek/science/gpucassert/cassert.htm. PMID:24481593
Massively Parallel Direct Simulation of Multiphase Flow
COOK,BENJAMIN K.; PREECE,DALE S.; WILLIAMS,J.R.
2000-08-10
The authors understanding of multiphase physics and the associated predictive capability for multi-phase systems are severely limited by current continuum modeling methods and experimental approaches. This research will deliver an unprecedented modeling capability to directly simulate three-dimensional multi-phase systems at the particle-scale. The model solves the fully coupled equations of motion governing the fluid phase and the individual particles comprising the solid phase using a newly discovered, highly efficient coupled numerical method based on the discrete-element method and the Lattice-Boltzmann method. A massively parallel implementation will enable the solution of large, physically realistic systems.
Time sharing massively parallel machines. Draft
Gorda, B.; Wolski, R.
1995-03-01
As part of the Massively Parallel Computing Initiative (MPCI) at the Lawrence Livermore National Laboratory, the authors have developed a simple, effective and portable time sharing mechanism by scheduling gangs of processes on tightly coupled parallel machines. By time-sharing the resources, the system interleaves production and interactive jobs. Immediate priority is given to interactive use, maintaining good response time. Production jobs are scheduled during idle periods, making use of the otherwise unused resources. In this paper the authors discuss their experience with gang scheduling over the 3 year life-time of the project. In section 2, they motivate the project and discuss some of its details. Section 3.0 describes the general scheduling problem and how gang scheduling addresses it. In section 4.0, they describe the implementation. Section 8.0 presents results culled over the lifetime of the project. They conclude this paper with some observations and possible future directions.
Parallel adaptive mesh refinement within the PUMAA3D Project
NASA Technical Reports Server (NTRS)
Freitag, Lori; Jones, Mark; Plassmann, Paul
1995-01-01
To enable the solution of large-scale applications on distributed memory architectures, we are designing and implementing parallel algorithms for the fundamental tasks of unstructured mesh computation. In this paper, we discuss efficient algorithms developed for two of these tasks: parallel adaptive mesh refinement and mesh partitioning. The algorithms are discussed in the context of two-dimensional finite element solution on triangular meshes, but are suitable for use with a variety of element types and with h- or p-refinement. Results demonstrating the scalability and efficiency of the refinement algorithm and the quality of the mesh partitioning are presented for several test problems on the Intel DELTA.
Arbitrary and Parallel Nanofabrication of 3D Metal Structures with Polymer Brush Resists.
Chen, Chaojian; Xie, Zhuang; Wei, Xiaoling; Zheng, Zijian
2015-12-01
3D polymer brushes are reported for the first time as ideal resists for the alignment-free nanofabrication of complex 3D metal structures with sub-100 nm lateral resolution and sub-10 nm vertical resolution. Since 3D polymer brushes can be serially fabricated in parallel, this method is effective to generate arbitrary 3D metal structures over a large area at a high throughput. PMID:26439441
Parallel deterministic neutronics with AMR in 3D
Clouse, C.; Ferguson, J.; Hendrickson, C.
1997-12-31
AMTRAN, a three dimensional Sn neutronics code with adaptive mesh refinement (AMR) has been parallelized over spatial domains and energy groups and runs on the Meiko CS-2 with MPI message passing. Block refined AMR is used with linear finite element representations for the fluxes, which allows for a straight forward interpretation of fluxes at block interfaces with zoning differences. The load balancing algorithm assumes 8 spatial domains, which minimizes idle time among processors.
New 3D parallel SGILD modeling and inversion
Xie, G.; Li, J.; Majer, E.
1998-09-01
In this paper, a new parallel modeling and inversion algorithm using a Stochastic Global Integral and Local Differential equation (SGILD) is presented. The authors derived new acoustic integral equations and differential equation for statistical moments of the parameters and field. The new statistical moments integral equation on the boundary and local differential equations in domain will be used together to obtain mean wave field and its moments in the modeling. The new moments global Jacobian volume integral equation and the local Jacobian differential equations in domain will be used together to update the mean parameters and their moments in the inversion. A new parallel multiple hierarchy substructure direct algorithm or direct-iteration hybrid algorithm will be used to solve the sparse matrices and one smaller full matrix from domain to the boundary, in parallel. The SGILD modeling and imaging algorithm has many advantages over the conventional imaging approaches. The SGILD algorithm can be used for the stochastic acoustic, electromagnetic, and flow modeling and inversion, and are important for the prediction of oil, gas, coal, and geothermal energy reservoirs in geophysical exploration.
Massive hybrid parallelism for fully implicit multiphysics
Gaston, D. R.; Permann, C. J.; Andrs, D.; Peterson, J. W.
2013-07-01
As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided. (authors)
MASSIVE HYBRID PARALLELISM FOR FULLY IMPLICIT MULTIPHYSICS
Cody J. Permann; David Andrs; John W. Peterson; Derek R. Gaston
2013-05-01
As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided.
Parallel contact detection algorithm for transient solid dynamics simulations using PRONTO3D
Attaway, S.W.; Hendrickson, B.A.; Plimpton, S.J.
1996-09-01
An efficient, scalable, parallel algorithm for treating material surface contacts in solid mechanics finite element programs has been implemented in a modular way for MIMD parallel computers. The serial contact detection algorithm that was developed previously for the transient dynamics finite element code PRONTO3D has been extended for use in parallel computation by devising a dynamic (adaptive) processor load balancing scheme.
Solid modeling on a massively parallel processor
Strip, D. ); Karasick, M. )
1992-01-01
Solid modeling underlies many technologies that are key to modern manufacturing. These range from computer-aided design systems to robot simulators, from finite element analysis to integrated circuit process modeling. The accuracy, and hence the utility, of these models is often constrained by the amount of computer time required to perform the desired operations. This paper presents a family of algorithms for solid modeling operations using the Connection Machine, a massively parallel SIMD processor. The authors describe a data structure for representing solid models and algorithms that use the representation to implement efficiently a variety of solid modeling operations. The authors give a sketch of the algorithm for intersecting solids and present computational experience using these algorithms. The data structure and algorithms are contrasted with those of serial architectures, and execution times are compared.
Parallel processing for efficient 3D slope stability modelling
NASA Astrophysics Data System (ADS)
Marchesini, Ivan; Mergili, Martin; Alvioli, Massimiliano; Metz, Markus; Schneider-Muntau, Barbara; Rossi, Mauro; Guzzetti, Fausto
2014-05-01
We test the performance of the GIS-based, three-dimensional slope stability model r.slope.stability. The model was developed as a C- and python-based raster module of the GRASS GIS software. It considers the three-dimensional geometry of the sliding surface, adopting a modification of the model proposed by Hovland (1977), and revised and extended by Xie and co-workers (2006). Given a terrain elevation map and a set of relevant thematic layers, the model evaluates the stability of slopes for a large number of randomly selected potential slip surfaces, ellipsoidal or truncated in shape. Any single raster cell may be intersected by multiple sliding surfaces, each associated with a value of the factor of safety, FS. For each pixel, the minimum value of FS and the depth of the associated slip surface are stored. This information is used to obtain a spatial overview of the potentially unstable slopes in the study area. We test the model in the Collazzone area, Umbria, central Italy, an area known to be susceptible to landslides of different type and size. Availability of a comprehensive and detailed landslide inventory map allowed for a critical evaluation of the model results. The r.slope.stability code automatically splits the study area into a defined number of tiles, with proper overlap in order to provide the same statistical significance for the entire study area. The tiles are then processed in parallel by a given number of processors, exploiting a multi-purpose computing environment at CNR IRPI, Perugia. The map of the FS is obtained collecting the individual results, taking the minimum values on the overlapping cells. This procedure significantly reduces the processing time. We show how the gain in terms of processing time depends on the tile dimensions and on the number of cores.
MPSim: A Massively Parallel General Simulation Program for Materials
NASA Astrophysics Data System (ADS)
Iotov, Mihail; Gao, Guanghua; Vaidehi, Nagarajan; Cagin, Tahir; Goddard, William A., III
1997-08-01
In this talk, we describe a general purpose Massively Parallel Simulation (MPSim) program used for computational materials science and life sciences. We also will present scaling aspects of the program along with several case studies. The program incorporates highly efficient CMM method to accurately calculate the interactions. For studying bulk materials, the program uses the Reduced CMM to account for infinite range sums. The software embodies various advanced molecular dynamics algorithms, energy and structure optimization techniques with a set of analysis tools suitable for large scale structures. The applications using the program range amorphous polymers, liquid-polymer interfaces, large viruses, million atom clusters, surfaces, gas diffusion in polymers. Program is originally developed on KSR in an object oriented fashion and is ported to SGI-PC, and HP-Examplar. Message Passing version is originally implemented on Intel Paragon using NX, then MPI and later tested on Cray T3D, and IBM SP2 platforms.
Prediction of parallel NIKE3D performance on the KSR1 system
Su, P.S.; Zacharia, T.; Fulton, R.E.
1995-05-01
Finite element method is one of the bases for numerical solutions to engineering problems. Complex engineering problems using finite element analysis typically imply excessively large computational time. Parallel supercomputers have the potential for significantly increasing calculation speeds in order to meet these computational requirements. This paper predicts parallel NIKE3D performance on the Kendall Square Research (KSR1) system. The first part of the prediction is based on the implementation of parallel Cholesky (U{sup T}DU) matrix decomposition algorithm through actual computations on the KSRI multiprocessor system, with 64 processors, at Oak Ridge National Laboratory. The other predictions are based on actual computations for parallel element matrix generation, parallel global stiffness matrix assembly, and parallel forward/backward substitution on the BBN TC2000 multiprocessor system at Lawrence Livermore National Laboratory. The preliminary results indicate that parallel NIKE3D performance can be attractive under local/shared-memory multiprocessor system environments.
A parallel multigrid-based preconditioner for the 3D heterogeneous high-frequency Helmholtz equation
Riyanti, C.D. . E-mail: C.D.Riyanti@tudelft.nl; Kononov, A.; Erlangga, Y.A.; Vuik, C.; Oosterlee, C.W.; Plessix, R.-E.; Mulder, W.A.
2007-05-20
We investigate the parallel performance of an iterative solver for 3D heterogeneous Helmholtz problems related to applications in seismic wave propagation. For large 3D problems, the computation is no longer feasible on a single processor, and the memory requirements increase rapidly. Therefore, parallelization of the solver is needed. We employ a complex shifted-Laplace preconditioner combined with the Bi-CGSTAB iterative method and use a multigrid method to approximate the inverse of the resulting preconditioning operator. A 3D multigrid method with 2D semi-coarsening is employed. We show numerical results for large problems arising in geophysical applications.
Massively parallel neural network intelligent browse
NASA Astrophysics Data System (ADS)
Maxwell, Thomas P.; Zion, Philip M.
1992-04-01
A massively parallel neural network architecture is currently being developed as a potential component of a distributed information system in support of NASA's Earth Observing System. This architecture can be trained, via an iterative learning process, to recognize objects in images based on texture features, allowing scientists to search for all patterns which are similar to a target pattern in a database of images. It may facilitate scientific inquiry by allowing scientists to automatically search for physical features of interest in a database through computer pattern recognition, alleviating the need for exhaustive visual searches through possibly thousands of images. The architecture is implemented on a Connection Machine such that each physical processor contains a simulated 'neuron' which views a feature vector derived from a subregion of the input image. Each of these neurons is trained, via the perceptron rule, to identify the same pattern. The network output gives a probability distribution over the input image of finding the target pattern in a given region. In initial tests the architecture was trained to separate regions containing clouds from clear regions in 512 by 512 pixel AVHRR images. We found that in about 10 minutes we can train a network to perform with high accuracy in recognizing clouds which were texturally similar to a target cloud group. These promising results suggest that this type of architecture may play a significant role in coping with the forthcoming flood of data from the Earth-monitoring missions of the major space-faring nations.
Multiplexed microsatellite recovery using massively parallel sequencing
Jennings, T.N.; Knaus, B.J.; Mullins, T.D.; Haig, S.M.; Cronn, R.C.
2011-01-01
Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5M (USD).
Fault tolerant massively parallel processing architecture
Balasubramanian, V.; Banerjee, P.
1987-08-01
This paper presents two massively parallel processing architectures suitable for solving a wide variety of algorithms of divide-and-conquer type for problems such as the discrete Fourier transform, production systems, design automation, and others. The first architecture, called the Chain-structured Butterfly ARchitecture (CBAR), consists of a two-dimensional array of N-L . (log/sub 2/(L)+1) processing elements (PE) organized as L levels of log/sub 2/(L)+1 stages, and which has the butterfly connection between PEs in consecutive stages with straight-through feedback between PEs in the last and first stages. This connection system has the desirable property of allowing thousands of PEs to be connected with O(N) connection cost, O(log/sub 2/(N/log/sub 2/N)) communication paths, and a small number (=4) of I/O ports per PE. However, this architecture is not fault tolerant. The authors, therefore, propose a second architecture, called the REconfigurable Chain-structured Butterfly ARchitecture (RECBAR), which is a modified version of the CBAR. The RECBAR possesses all the desirable features of the CBAR, with the number of I/O ports per PE increased to six, and uses O(log/sub 2/N)/N) overhead in PEs and approximately 50% overhead in links to achieve single-level fault tolerance. Reliability improvements of the RECBAR over the CBAR are studied. This paper also presents a distributed diagnostic and structuring algorithm for the RECBAR that enables the architecture to detect faults and structure itself accordingly within 2 . log/sub 2/(L)+1 time steps, thus making it a truly fault tolerant architecture.
McGhee, J.M.; Roberts, R.M.; Morel, J.E.
1997-06-01
A spherical harmonics research code (DANTE) has been developed which is compatible with parallel computer architectures. DANTE provides 3-D, multi-material, deterministic, transport capabilities using an arbitrary finite element mesh. The linearized Boltzmann transport equation is solved in a second order self-adjoint form utilizing a Galerkin finite element spatial differencing scheme. The core solver utilizes a preconditioned conjugate gradient algorithm. Other distinguishing features of the code include options for discrete-ordinates and simplified spherical harmonics angular differencing, an exact Marshak boundary treatment for arbitrarily oriented boundary faces, in-line matrix construction techniques to minimize memory consumption, and an effective diffusion based preconditioner for scattering dominated problems. Algorithm efficiency is demonstrated for a massively parallel SIMD architecture (CM-5), and compatibility with MPP multiprocessor platforms or workstation clusters is anticipated.
Linking 1D evolutionary to 3D hydrodynamical simulations of massive stars
NASA Astrophysics Data System (ADS)
Cristini, A.; Meakin, C.; Hirschi, R.; Arnett, D.; Georgy, C.; Viallet, M.
2016-03-01
Stellar evolution models of massive stars are important for many areas of astrophysics, for example nucleosynthesis yields, supernova progenitor models and understanding physics under extreme conditions. Turbulence occurs in stars primarily due to nuclear burning at different mass coordinates within the star. The understanding and correct treatment of turbulence and turbulent mixing at convective boundaries in stellar models has been studied for decades but still lacks a definitive solution. This paper presents initial results of a study on convective boundary mixing (CBM) in massive stars. The ‘stiffness’ of a convective boundary can be quantified using the bulk Richardson number ({{Ri}}{{B}}), the ratio of the potential energy for restoration of the boundary to the kinetic energy of turbulent eddies. A ‘stiff’ boundary ({{Ri}}{{B}}˜ {10}4) will suppress CBM, whereas in the opposite case a ‘soft’ boundary ({{Ri}}{{B}}˜ 10) will be more susceptible to CBM. One of the key results obtained so far is that lower convective boundaries (closer to the centre) of nuclear burning shells are ‘stiffer’ than the corresponding upper boundaries, implying limited CBM at lower shell boundaries. This is in agreement with 3D hydrodynamic simulations carried out by Meakin and Arnett (2007 Astrophys. J. 667 448-75). This result also has implications for new CBM prescriptions in massive stars as well as for nuclear burning flame front propagation in super-asymptotic giant branch stars and also the onset of novae.
The EMCC / DARPA Massively Parallel Electromagnetic Scattering Project
NASA Technical Reports Server (NTRS)
Woo, Alex C.; Hill, Kueichien C.
1996-01-01
The Electromagnetic Code Consortium (EMCC) was sponsored by the Advanced Research Program Agency (ARPA) to demonstrate the effectiveness of massively parallel computing in large scale radar signature predictions. The EMCC/ARPA project consisted of three parts.
Visualization on massively parallel computers using CM/AVS
Krogh, M.F.; Hansen, C.D.
1993-09-01
CM/AVS is a visualization environment for the massively parallel CM-5 from Thinking Machines. It provides a backend to the standard commercially available AVS visualization product. At the Advanced Computing Laboratory at Los Alamos National Laboratory, we have been experimenting and utilizing this software within our visualization environment. This paper describes our experiences with CM/AVS. The conclusions reached are applicable to any implimentation of visualization software within a massively parallel computing environment.
Experimental free-space optical network for massively parallel computers
NASA Astrophysics Data System (ADS)
Araki, S.; Kajita, M.; Kasahara, K.; Kubota, K.; Kurihara, K.; Redmond, I.; Schenfeld, E.; Suzaki, T.
1996-03-01
A free-space optical interconnection scheme is described for massively parallel processors based on the interconnection-cached network architecture. The optical network operates in a circuit-switching mode. Combined with a packet-switching operation among the circuit-switched optical channels, a high-bandwidth, low-latency network for massively parallel processing results. The design and assembly of a 64-channel experimental prototype is discussed, and operational results are presented.
Three-dimensional radiative transfer on a massively parallel computer
NASA Astrophysics Data System (ADS)
Vath, H. M.
1994-04-01
We perform 3D radiative transfer calculations in non-local thermodynamic equilibrium (NLTE) in the simple two-level atom approximation on the Mas-Par MP-1, which contains 8192 processors and is a single instruction multiple data (SIMD) machine, an example of the new generation of massively parallel computers. On such a machine, all processors execute the same command at a given time, but on different data. To make radiative transfer calculations efficient, we must re-consider the numerical methods and storage of data. To solve the transfer equation, we adopt the short characteristic method and examine different acceleration methods to obtain the source function. We use the ALI method and test local and non-local operators. Furthermore, we compare the Ng and the orthomin methods of acceleration. We also investigate the use of multi-grid methods to get fast solutions for the NLTE case. In order to test these numerical methods, we apply them to two problems with and without periodic boundary conditions.
3D Modeling of the Massive Binary Wind Interaction Region in Eta Carinae
NASA Astrophysics Data System (ADS)
Madura, Thomas; Gull, T.; Owocki, S.; Okazaki, A.; Russell, C.
2009-01-01
We present recent work on the theoretical modeling of low excitation ([Fe II]) and high excitation ([Fe III]) wind lines observed in Eta Carinae using the HST/STIS. The spatially resolved structures seen in these lines are interpreted as the time-averaged, outer extensions of the wind from the primary star and the wind-wind interaction region of the massive binary system. For most of the orbit, the wind-wind interface can be approximated as a cone with a half-opening angle of 65° whose axis of rotation is aligned with the major axis of the binary orbit and appears to lie in the plane of the Homunculus disk. However, because the orbit is highly elliptical, this approximation breaks down at periastron and so full 3D Smoothed Particle Hydrodynamics (SPH) simulations become necessary. By analyzing the results of these 3D SPH simulations of the binary interactions and comparing them to the spectra obtained with the HST/STIS we place further constraints on the orientation of the binary orbit, and hope to eventually determine how/where UV light is escaping in the system, to search for any direct signatures of the companion star, and to ultimately establish a mass ratio for the system.
RAMA: A file system for massively parallel computers
NASA Technical Reports Server (NTRS)
Miller, Ethan L.; Katz, Randy H.
1993-01-01
This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.
IMPAIR: massively parallel deconvolution on the GPU
NASA Astrophysics Data System (ADS)
Sherry, Michael; Shearer, Andy
2013-02-01
The IMPAIR software is a high throughput image deconvolution tool for processing large out-of-core datasets of images, varying from large images with spatially varying PSFs to large numbers of images with spatially invariant PSFs. IMPAIR implements a parallel version of the tried and tested Richardson-Lucy deconvolution algorithm regularised via a custom wavelet thresholding library. It exploits the inherently parallel nature of the convolution operation to achieve quality results on consumer grade hardware: through the NVIDIA Tesla GPU implementation, the multi-core OpenMP implementation, and the cluster computing MPI implementation of the software. IMPAIR aims to address the problem of parallel processing in both top-down and bottom-up approaches: by managing the input data at the image level, and by managing the execution at the instruction level. These combined techniques will lead to a scalable solution with minimal resource consumption and maximal load balancing. IMPAIR is being developed as both a stand-alone tool for image processing, and as a library which can be embedded into non-parallel code to transparently provide parallel high throughput deconvolution.
EFFICIENT SCHEDULING OF PARALLEL JOBS ON MASSIVELY PARALLEL SYSTEMS
F. PETRINI; W. FENG
1999-09-01
We present buffered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the efficient implementation of a distributed operating system. Buffered coscheduling is based on three innovative techniques: communication buffering, strobing, and non-blocking communication. By leveraging these techniques, we can perform effective optimizations based on the global status of the parallel machine rather than on the limited knowledge available locally to each processor. The advantages of buffered coscheduling include higher resource utilization, reduced communication overhead, efficient implementation of low-control strategies and fault-tolerant protocols, accurate performance modeling, and a simplified yet still expressive parallel programming model. Preliminary experimental results show that buffered coscheduling is very effective in increasing the overall performance in the presence of load imbalance and communication-intensive workloads.
Scan line graphics generation on the massively parallel processor
NASA Technical Reports Server (NTRS)
Dorband, John E.
1988-01-01
Described here is how researchers implemented a scan line graphics generation algorithm on the Massively Parallel Processor (MPP). Pixels are computed in parallel and their results are applied to the Z buffer in large groups. To perform pixel value calculations, facilitate load balancing across the processors and apply the results to the Z buffer efficiently in parallel requires special virtual routing (sort computation) techniques developed by the author especially for use on single-instruction multiple-data (SIMD) architectures.
Gust Acoustics Computation with a Space-Time CE/SE Parallel 3D Solver
NASA Technical Reports Server (NTRS)
Wang, X. Y.; Himansu, A.; Chang, S. C.; Jorgenson, P. C. E.; Reddy, D. R. (Technical Monitor)
2002-01-01
The benchmark Problem 2 in Category 3 of the Third Computational Aero-Acoustics (CAA) Workshop is solved using the space-time conservation element and solution element (CE/SE) method. This problem concerns the unsteady response of an isolated finite-span swept flat-plate airfoil bounded by two parallel walls to an incident gust. The acoustic field generated by the interaction of the gust with the flat-plate airfoil is computed by solving the 3D (three-dimensional) Euler equations in the time domain using a parallel version of a 3D CE/SE solver. The effect of the gust orientation on the far-field directivity is studied. Numerical solutions are presented and compared with analytical solutions, showing a reasonable agreement.
An improved parallel SPH approach to solve 3D transient generalized Newtonian free surface flows
NASA Astrophysics Data System (ADS)
Ren, Jinlian; Jiang, Tao; Lu, Weigang; Li, Gang
2016-08-01
In this paper, a corrected parallel smoothed particle hydrodynamics (C-SPH) method is proposed to simulate the 3D generalized Newtonian free surface flows with low Reynolds number, especially the 3D viscous jets buckling problems are investigated. The proposed C-SPH method is achieved by coupling an improved SPH method based on the incompressible condition with the traditional SPH (TSPH), that is, the improved SPH with diffusive term and first-order Kernel gradient correction scheme is used in the interior of the fluid domain, and the TSPH is used near the free surface. Thus the C-SPH method possesses the advantages of two methods. Meanwhile, an effective and convenient boundary treatment is presented to deal with 3D multiple-boundary problem, and the MPI parallelization technique with a dynamic cells neighbor particle searching method is considered to improve the computational efficiency. The validity and the merits of the C-SPH are first verified by solving several benchmarks and compared with other results. Then the viscous jet folding/coiling based on the Cross model is simulated by the C-SPH method and compared with other experimental or numerical results. Specially, the influences of macroscopic parameters on the flow are discussed. All the numerical results agree well with available data, and show that the C-SPH method has higher accuracy and better stability for solving 3D moving free surface flows over other particle methods.
Massively Parallel Sequencing: The Next Big Thing in Genetic Medicine
Tucker, Tracy; Marra, Marco; Friedman, Jan M.
2009-01-01
Massively parallel sequencing has reduced the cost and increased the throughput of genomic sequencing by more than three orders of magnitude, and it seems likely that costs will fall and throughput improve even more in the next few years. Clinical use of massively parallel sequencing will provide a way to identify the cause of many diseases of unknown etiology through simultaneous screening of thousands of loci for pathogenic mutations and by sequencing biological specimens for the genomic signatures of novel infectious agents. In addition to providing these entirely new diagnostic capabilities, massively parallel sequencing may also replace arrays and Sanger sequencing in clinical applications where they are currently being used. Routine clinical use of massively parallel sequencing will require higher accuracy, better ways to select genomic subsets of interest, and improvements in the functionality, speed, and ease of use of data analysis software. In addition, substantial enhancements in laboratory computer infrastructure, data storage, and data transfer capacity will be needed to handle the extremely large data sets produced. Clinicians and laboratory personnel will require training to use the sequence data effectively, and appropriate methods will need to be developed to deal with the incidental discovery of pathogenic mutations and variants of uncertain clinical significance. Massively parallel sequencing has the potential to transform the practice of medical genetics and related fields, but the vast amount of personal genomic data produced will increase the responsibility of geneticists to ensure that the information obtained is used in a medically and socially responsible manner. PMID:19679224
Spatial parallelism of a 3D finite difference, velocity-stress elastic wave propagation code
Minkoff, S.E.
1999-12-01
Finite difference methods for solving the wave equation more accurately capture the physics of waves propagating through the earth than asymptotic solution methods. Unfortunately, finite difference simulations for 3D elastic wave propagation are expensive. The authors model waves in a 3D isotropic elastic earth. The wave equation solution consists of three velocity components and six stresses. The partial derivatives are discretized using 2nd-order in time and 4th-order in space staggered finite difference operators. Staggered schemes allow one to obtain additional accuracy (via centered finite differences) without requiring additional storage. The serial code is most unique in its ability to model a number of different types of seismic sources. The parallel implementation uses the MPI library, thus allowing for portability between platforms. Spatial parallelism provides a highly efficient strategy for parallelizing finite difference simulations. In this implementation, one can decompose the global problem domain into one-, two-, and three-dimensional processor decompositions with 3D decompositions generally producing the best parallel speedup. Because I/O is handled largely outside of the time-step loop (the most expensive part of the simulation) the authors have opted for straight-forward broadcast and reduce operations to handle I/O. The majority of the communication in the code consists of passing subdomain face information to neighboring processors for use as ghost cells. When this communication is balanced against computation by allocating subdomains of reasonable size, they observe excellent scaled speedup. Allocating subdomains of size 25 x 25 x 25 on each node, they achieve efficiencies of 94% on 128 processors. Numerical examples for both a layered earth model and a homogeneous medium with a high-velocity blocky inclusion illustrate the accuracy of the parallel code.
Spatial Parallelism of a 3D Finite Difference, Velocity-Stress Elastic Wave Propagation Code
MINKOFF,SUSAN E.
1999-12-09
Finite difference methods for solving the wave equation more accurately capture the physics of waves propagating through the earth than asymptotic solution methods. Unfortunately. finite difference simulations for 3D elastic wave propagation are expensive. We model waves in a 3D isotropic elastic earth. The wave equation solution consists of three velocity components and six stresses. The partial derivatives are discretized using 2nd-order in time and 4th-order in space staggered finite difference operators. Staggered schemes allow one to obtain additional accuracy (via centered finite differences) without requiring additional storage. The serial code is most unique in its ability to model a number of different types of seismic sources. The parallel implementation uses the MP1 library, thus allowing for portability between platforms. Spatial parallelism provides a highly efficient strategy for parallelizing finite difference simulations. In this implementation, one can decompose the global problem domain into one-, two-, and three-dimensional processor decompositions with 3D decompositions generally producing the best parallel speed up. Because i/o is handled largely outside of the time-step loop (the most expensive part of the simulation) we have opted for straight-forward broadcast and reduce operations to handle i/o. The majority of the communication in the code consists of passing subdomain face information to neighboring processors for use as ''ghost cells''. When this communication is balanced against computation by allocating subdomains of reasonable size, we observe excellent scaled speed up. Allocating subdomains of size 25 x 25 x 25 on each node, we achieve efficiencies of 94% on 128 processors. Numerical examples for both a layered earth model and a homogeneous medium with a high-velocity blocky inclusion illustrate the accuracy of the parallel code.
Massively parallel neural encoding and decoding of visual stimuli.
Lazar, Aurel A; Zhou, Yiyin
2012-08-01
The massively parallel nature of video Time Encoding Machines (TEMs) calls for scalable, massively parallel decoders that are implemented with neural components. The current generation of decoding algorithms is based on computing the pseudo-inverse of a matrix and does not satisfy these requirements. Here we consider video TEMs with an architecture built using Gabor receptive fields and a population of Integrate-and-Fire neurons. We show how to build a scalable architecture for video Time Decoding Machines using recurrent neural networks. Furthermore, we extend our architecture to handle the reconstruction of visual stimuli encoded with massively parallel video TEMs having neurons with random thresholds. Finally, we discuss in detail our algorithms and demonstrate their scalability and performance on a large scale GPU cluster. PMID:22397951
Staging memory for massively parallel processor
NASA Technical Reports Server (NTRS)
Batcher, Kenneth E. (Inventor)
1988-01-01
The invention herein relates to a computer organization capable of rapidly processing extremely large volumes of data. A staging memory is provided having a main stager portion consisting of a large number of memory banks which are accessed in parallel to receive, store, and transfer data words simultaneous with each other. Substager portions interconnect with the main stager portion to match input and output data formats with the data format of the main stager portion. An address generator is coded for accessing the data banks for receiving or transferring the appropriate words. Input and output permutation networks arrange the lineal order of data into and out of the memory banks.
The Challenge of Massively Parallel Computing
WOMBLE,DAVID E.
1999-11-03
Since the mid-1980's, there have been a number of commercially available parallel computers with hundreds or thousands of processors. These machines have provided a new capability to the scientific community, and they been used successfully by scientists and engineers although with varying degrees of success. One of the reasons for the limited success is the difficulty, or perceived difficulty, in developing code for these machines. In this paper we discuss many of the issues and challenges in developing scalable hardware, system software and algorithms for machines comprising hundreds or thousands of processors.
Design and implementation of a massively parallel version of DIRECT
He, J.; Verstak, A.; Watson, L.; Sosonkina, M.
2007-10-24
This paper describes several massively parallel implementations for a global search algorithm DIRECT. Two parallel schemes take different approaches to address DIRECT's design challenges imposed by memory requirements and data dependency. Three design aspects in topology, data structures, and task allocation are compared in detail. The goal is to analytically investigate the strengths and weaknesses of these parallel schemes, identify several key sources of inefficiency, and experimentally evaluate a number of improvements in the latest parallel DIRECT implementation. The performance studies demonstrate improved data structure efficiency and load balancing on a 2200 processor cluster.
Parallel computation of 3-D Navier-Stokes flowfields for supersonic vehicles
NASA Technical Reports Server (NTRS)
Ryan, James S.; Weeratunga, Sisira
1993-01-01
Multidisciplinary design optimization of aircraft will require unprecedented capabilities of both analysis software and computer hardware. The speed and accuracy of the analysis will depend heavily on the computational fluid dynamics (CFD) module which is used. A new CFD module has been developed to combine the robust accuracy of conventional codes with the ability to run on parallel architectures. This is achieved by parallelizing the ARC3D algorithm, a central-differenced Navier-Stokes method, on the Intel iPSC/860. The computed solutions are identical to those from conventional machines. Computational speed on 64 processors is comparable to the rate on one Cray Y-MP processor and will increase as new generations of parallel computers become available.
Description of a parallel, 3D, finite element, hydrodynamics-diffusion code
Milovich, J L; Prasad, M K; Shestakov, A I
1999-04-11
We describe a parallel, 3D, unstructured grid finite element, hydrodynamic diffusion code for inertial confinement fusion (ICF) applications and the ancillary software used to run it. The code system is divided into two entities, a controller and a stand-alone physics code. The code system may reside on different computers; the controller on the user's workstation and the physics code on a supercomputer. The physics code is composed of separate hydrodynamic, equation-of-state, laser energy deposition, heat conduction, and radiation transport packages and is parallelized for distributed memory architectures. For parallelization, a SPMD model is adopted; the domain is decomposed into a disjoint collection of subdomains, one per processing element (PE). The PEs communicate using MPI. The code is used to simulate the hydrodynamic implosion of a spherical bubble.
Massively parallel solution of the assignment problem. Technical report
Wein, J.; Zenios, S.
1990-12-01
In this paper we discuss the design, implementation and effectiveness of massively parallel algorithms for the solution of large-scale assignment problems. In particular, we study the auction algorithms of Bertsekas, an algorithm based on the method of multipliers of Hestenes and Powell, and an algorithm based on the alternating direction method of multipliers of Eckstein. We discuss alternative approaches to the massively parallel implementation of the auction algorithm, including Jacobi, Gauss-Seidel and a hybrid scheme. The hybrid scheme, in particular, exploits two different levels of parallelism and an efficient way of communicating the data between them without the need to perform general router operations across the hypercube network. We then study the performance of massively parallel implementations of two methods of multipliers. Implementations are carried out on the Connection Machine CM-2, and the algorithms are evaluated empirically with the solution of large scale problems. The hybrid scheme significantly outperforms all of the other methods and gives the best computational results to date for a massively parallel solution to this problem.
Shift: A Massively Parallel Monte Carlo Radiation Transport Package
Pandya, Tara M; Johnson, Seth R; Davidson, Gregory G; Evans, Thomas M; Hamilton, Steven P
2015-01-01
This paper discusses the massively-parallel Monte Carlo radiation transport package, Shift, developed at Oak Ridge National Laboratory. It reviews the capabilities, implementation, and parallel performance of this code package. Scaling results demonstrate very good strong and weak scaling behavior of the implemented algorithms. Benchmark results from various reactor problems show that Shift results compare well to other contemporary Monte Carlo codes and experimental results.
Advanced quadratures and periodic boundary conditions in parallel 3D S{sub n} transport
Manalo, K.; Yi, C.; Huang, M.; Sjoden, G.
2013-07-01
Significant updates in numerical quadratures have warranted investigation with 3D Sn discrete ordinates transport. We show new applications of quadrature departing from level symmetric (S{sub 2}o). investigating 3 recently developed quadratures: Even-Odd (EO), Linear-Discontinuous Finite Element - Surface Area (LDFE-SA), and the non-symmetric Icosahedral Quadrature (IC). We discuss implementation changes to 3D Sn codes (applied to Hybrid MOC-Sn TITAN and 3D parallel PENTRAN) that can be performed to accommodate Icosahedral Quadrature, as this quadrature is not 90-degree rotation invariant. In particular, as demonstrated using PENTRAN, the properties of Icosahedral Quadrature are suitable for trivial application using periodic BCs versus that of reflective BCs. In addition to implementing periodic BCs for 3D Sn PENTRAN, we implemented a technique termed 'angular re-sweep' which properly conditions periodic BCs for outer eigenvalue iterative loop convergence. As demonstrated by two simple transport problems (3-group fixed source and 3-group reflected/periodic eigenvalue pin cell), we remark that all of the quadratures we investigated are generally superior to level symmetric quadrature, with Icosahedral Quadrature performing the most efficiently for problems tested. (authors)
Comparison of Parallel MRI Reconstruction Methods for Accelerated 3D Fast Spin-Echo Imaging
Xiao, Zhikui; Hoge, W. Scott; Mulkern, R.V.; Zhao, Lei; Hu, Guangshu; Kyriakos, Walid E.
2014-01-01
Parallel MRI (pMRI) achieves imaging acceleration by partially substituting gradient-encoding steps with spatial information contained in the component coils of the acquisition array. Variable-density subsampling in pMRI was previously shown to yield improved two-dimensional (2D) imaging in comparison to uniform subsampling, but has yet to be used routinely in clinical practice. In an effort to reduce acquisition time for 3D fast spin-echo (3D-FSE) sequences, this work explores a specific nonuniform sampling scheme for 3D imaging, subsampling along two phase-encoding (PE) directions on a rectilinear grid. We use two reconstruction methods—2D-GRAPPA-Operator and 2D-SPACE RIP—and present a comparison between them. We show that high-quality images can be reconstructed using both techniques. To evaluate the proposed sampling method and reconstruction schemes, results via simulation, phantom study, and in vivo 3D human data are shown. We find that fewer artifacts can be seen in the 2D-SPACE RIP reconstructions than in 2D-GRAPPA-Operator reconstructions, with comparable reconstruction times. PMID:18727083
Efficient parallel global garbage collection on massively parallel computers
Kamada, Tomio; Matsuoka, Satoshi; Yonezawa, Akinori
1994-12-31
On distributed-memory high-performance MPPs where processors are interconnected by an asynchronous network, efficient Garbage Collection (GC) becomes difficult due to inter-node references and references within pending, unprocessed messages. The parallel global GC algorithm (1) takes advantage of reference locality, (2) efficiently traverses references over nodes, (3) admits minimum pause time of ongoing computations, and (4) has been shown to scale up to 1024 node MPPs. The algorithm employs a global weight counting scheme to substantially reduce message traffic. The two methods for confirming the arrival of pending messages are used: one counts numbers of messages and the other uses network `bulldozing.` Performance evaluation in actual implementations on a multicomputer with 32-1024 nodes, Fujitsu AP1000, reveals various favorable properties of the algorithm.
Parallel Imaging of 3D Surface Profile with Space-Division Multiplexing
Lee, Hyung Seok; Cho, Soon-Woo; Kim, Gyeong Hun; Jeong, Myung Yung; Won, Young Jae; Kim, Chang-Seok
2016-01-01
We have developed a modified optical frequency domain imaging (OFDI) system that performs parallel imaging of three-dimensional (3D) surface profiles by using the space division multiplexing (SDM) method with dual-area swept sourced beams. We have also demonstrated that 3D surface information for two different areas could be well obtained in a same time with only one camera by our method. In this study, double field of views (FOVs) of 11.16 mm × 5.92 mm were achieved within 0.5 s. Height range for each FOV was 460 µm and axial and transverse resolutions were 3.6 and 5.52 µm, respectively. PMID:26805840
NASA Astrophysics Data System (ADS)
Chang, Yau-Zen; Hou, Jung-Fu; Tsao, Yi Hsiang; Lee, Shih-Tseng
2011-09-01
This paper proposes a scheme for finding the correspondence between uniformly spaced locations on the images of human face captured from different viewpoints at the same instant. The correspondence is dedicated for 3D reconstruction to be used in the registration procedure for neurosurgery where the exposure to projectors must be seriously restricted. The approach utilizes structured light to enhance patterns on the images and is initialized with the scale-invariant feature transform (SIFT). Successive locations are found according to spatial order using a parallel version of the particle swarm optimization algorithm. Furthermore, false locations are singled out for correction by searching for outliers from fitted curves. Case studies show that the scheme is able to correctly generate 456 evenly spaced 3D coordinate points in 23 seconds from a single shot of projected human face using a PC with 2.66 GHz Intel Q9400 CPU and 4GB RAM.
A Parallelized 3D Particle-In-Cell Method With Magnetostatic Field Solver And Its Applications
NASA Astrophysics Data System (ADS)
Hsu, Kuo-Hsien; Chen, Yen-Sen; Wu, Men-Zan Bill; Wu, Jong-Shinn
2008-10-01
A parallelized 3D self-consistent electrostatic particle-in-cell finite element (PIC-FEM) code using an unstructured tetrahedral mesh was developed. For simulating some applications with external permanent magnet set, the distribution of the magnetostatic field usually also need to be considered and determined accurately. In this paper, we will firstly present the development of a 3D magnetostatic field solver with an unstructured mesh for the flexibility of modeling objects with complex geometry. The vector Poisson equation for magnetostatic field is formulated using the Galerkin nodal finite element method and the resulting matrix is solved by parallel conjugate gradient method. A parallel adaptive mesh refinement module is coupled to this solver for better resolution. Completed solver is then verified by simulating a permanent magnet array with results comparable to previous experimental observations and simulations. By taking the advantage of the same unstructured grid format of this solver, the developed PIC-FEM code could directly and easily read the magnetostatic field for particle simulation. In the upcoming conference, magnetron is simulated and presented for demonstrating the capability of this code.
A 3D parallel simulator for crystal growth and solidification in complex alloy systems
NASA Astrophysics Data System (ADS)
Nestler, Britta
2005-02-01
A 3D parallel simulator is developed to numerically solve the evolution equations of a new non-isothermal phase-field model for crystal growth and solidification in complex alloy systems. The new model and the simulator are capable to simultaneously describe the diffusion processes of multiple components, the phase transitions between multiple phases and the development of the temperature field. Weak and facetted formulations of both, surface energy and kinetic anisotropies are incorporated in the phase-field model. Multicomponent bulk diffusion effects including interdiffusion coefficients as well as diffusion in the interfacial region of phase or grain boundaries are considered. We introduce our parallel simulator that is based on a finite difference discretization including effective adaptive strategies and multigrid methods to reduce computation time and memory usage. The parallelization is realized for distributed as well as shared memory computer architectures using MPI libraries and OpenMP concepts. Applying the new computer model, we present a variety of simulated crystal structures such as dendrites, grains, binary and ternary eutectics in 2D and 3D. The influence of anisotropy on the microstructure evolution shows the formation of facets in preferred crystallographic directions. Phase transformations and solidification processes in a real multi-component alloy can be described by incorporating the physical data (e.g. surface tensions, kinetic coefficients, specific heat, heat and mass diffusion coefficients) and the specific phase diagram (in particular latent heats and melting temperatures) into the diffuse interface model via the free energies.
Parallel graph search: application to intraretinal layer segmentation of 3D macular OCT scans
NASA Astrophysics Data System (ADS)
Lee, Kyungmoo; Abràmoff, Michael D.; Garvin, Mona K.; Sonka, Milan
2012-02-01
Image segmentation is of paramount importance for quantitative analysis of medical image data. Recently, a 3-D graph search method which can detect globally optimal interacting surfaces with respect to the cost function of volumetric images has been introduced, and its utility demonstrated in several application areas. Although the method provides excellent segmentation accuracy, its limitation is a slow processing speed when many surfaces are simultaneously segmented in large volumetric datasets. Here, we propose a novel method of parallel graph search, which overcomes the limitation and allows the quick detection of multiple surfaces. To demonstrate the obtained performance with respect to segmentation accuracy and processing speedup, the new approach was applied to retinal optical coherence tomography (OCT) image data and compared with the performance of the former non-parallel method. Our parallel graph search methods for single and double surface detection are approximately 267 and 181 times faster than the original graph search approach in 5 macular OCT volumes (200 x 5 x 1024 voxels) acquired from the right eyes of 5 normal subjects. The resulting segmentation differences were small as demonstrated by the mean unsigned differences between the non-parallel and parallel methods of 0.0 +/- 0.0 voxels (0.0 +/- 0.0 μm) and 0.27 +/- 0.34 voxels (0.53 +/- 0.66 μm) for the single- and dual-surface approaches, respectively.
NASA Astrophysics Data System (ADS)
Jung, Jaewoon; Kobayashi, Chigusa; Imamura, Toshiyuki; Sugita, Yuji
2016-03-01
Three-dimensional Fast Fourier Transform (3D FFT) plays an important role in a wide variety of computer simulations and data analyses, including molecular dynamics (MD) simulations. In this study, we develop hybrid (MPI+OpenMP) parallelization schemes of 3D FFT based on two new volumetric decompositions, mainly for the particle mesh Ewald (PME) calculation in MD simulations. In one scheme, (1d_Alltoall), five all-to-all communications in one dimension are carried out, and in the other, (2d_Alltoall), one two-dimensional all-to-all communication is combined with two all-to-all communications in one dimension. 2d_Alltoall is similar to the conventional volumetric decomposition scheme. We performed benchmark tests of 3D FFT for the systems with different grid sizes using a large number of processors on the K computer in RIKEN AICS. The two schemes show comparable performances, and are better than existing 3D FFTs. The performances of 1d_Alltoall and 2d_Alltoall depend on the supercomputer network system and number of processors in each dimension. There is enough leeway for users to optimize performance for their conditions. In the PME method, short-range real-space interactions as well as long-range reciprocal-space interactions are calculated. Our volumetric decomposition schemes are particularly useful when used in conjunction with the recently developed midpoint cell method for short-range interactions, due to the same decompositions of real and reciprocal spaces. The 1d_Alltoall scheme of 3D FFT takes 4.7 ms to simulate one MD cycle for a virus system containing more than 1 million atoms using 32,768 cores on the K computer.
Parallel goal-oriented adaptive finite element modeling for 3D electromagnetic exploration
NASA Astrophysics Data System (ADS)
Zhang, Y.; Key, K.; Ovall, J.; Holst, M.
2014-12-01
We present a parallel goal-oriented adaptive finite element method for accurate and efficient electromagnetic (EM) modeling of complex 3D structures. An unstructured tetrahedral mesh allows this approach to accommodate arbitrarily complex 3D conductivity variations and a priori known boundaries. The total electric field is approximated by the lowest order linear curl-conforming shape functions and the discretized finite element equations are solved by a sparse LU factorization. Accuracy of the finite element solution is achieved through adaptive mesh refinement that is performed iteratively until the solution converges to the desired accuracy tolerance. Refinement is guided by a goal-oriented error estimator that uses a dual-weighted residual method to optimize the mesh for accurate EM responses at the locations of the EM receivers. As a result, the mesh refinement is highly efficient since it only targets the elements where the inaccuracy of the solution corrupts the response at the possibly distant locations of the EM receivers. We compare the accuracy and efficiency of two approaches for estimating the primary residual error required at the core of this method: one uses local element and inter-element residuals and the other relies on solving a global residual system using a hierarchical basis. For computational efficiency our method follows the Bank-Holst algorithm for parallelization, where solutions are computed in subdomains of the original model. To resolve the load-balancing problem, this approach applies a spectral bisection method to divide the entire model into subdomains that have approximately equal error and the same number of receivers. The finite element solutions are then computed in parallel with each subdomain carrying out goal-oriented adaptive mesh refinement independently. We validate the newly developed algorithm by comparison with controlled-source EM solutions for 1D layered models and with 2D results from our earlier 2D goal oriented
PARALLEL 3-D SPACE CHARGE CALCULATIONS IN THE UNIFIED ACCELERATOR LIBRARY.
D'IMPERIO, N.L.; LUCCIO, A.U.; MALITSKY, N.
2006-06-26
The paper presents the integration of the SIMBAD space charge module in the UAL framework. SIMBAD is a Particle-in-Cell (PIC) code. Its 3-D Parallel approach features an optimized load balancing scheme based on a genetic algorithm. The UAL framework enhances the SIMBAD standalone version with the interactive ROOT-based analysis environment and an open catalog of accelerator algorithms. The composite package addresses complex high intensity beam dynamics and has been developed as part of the FAIR SIS 100 project.
NASA Technical Reports Server (NTRS)
Luke, Edward Allen
1993-01-01
Two algorithms capable of computing a transonic 3-D inviscid flow field about rotating machines are considered for parallel implementation. During the study of these algorithms, a significant new method of measuring the performance of parallel algorithms is developed. The theory that supports this new method creates an empirical definition of scalable parallel algorithms that is used to produce quantifiable evidence that a scalable parallel application was developed. The implementation of the parallel application and an automated domain decomposition tool are also discussed.
Billion-atom synchronous parallel kinetic Monte Carlo simulations of critical 3D Ising systems
Martinez, E.; Monasterio, P.R.; Marian, J.
2011-02-20
An extension of the synchronous parallel kinetic Monte Carlo (spkMC) algorithm developed by Martinez et al. [J. Comp. Phys. 227 (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors' time clocks current in a global sense. Boundary conflicts are resolved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of serial calculations, which confirms the statistical validity of our algorithm. We have analyzed the parallel efficiency of spkMC and find that it scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin simulations.
Parallel 3D Finite Element Numerical Modelling of DC Electron Guns
Prudencio, E.; Candel, A.; Ge, L.; Kabel, A.; Ko, K.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; /SLAC
2008-02-04
In this paper we present Gun3P, a parallel 3D finite element application that the Advanced Computations Department at the Stanford Linear Accelerator Center is developing for the analysis of beam formation in DC guns and beam transport in klystrons. Gun3P is targeted specially to complex geometries that cannot be described by 2D models and cannot be easily handled by finite difference discretizations. Its parallel capability allows simulations with more accuracy and less processing time than packages currently available. We present simulation results for the L-band Sheet Beam Klystron DC gun, in which case Gun3P is able to reduce simulation time from days to some hours.
Billion-atom synchronous parallel kinetic Monte Carlo simulations of critical 3D Ising systems
NASA Astrophysics Data System (ADS)
Martínez, E.; Monasterio, P. R.; Marian, J.
2011-02-01
An extension of the synchronous parallel kinetic Monte Carlo (spkMC) algorithm developed by Martinez et al. [J. Comp. Phys. 227 (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors' time clocks current in a global sense. Boundary conflicts are resolved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of serial calculations, which confirms the statistical validity of our algorithm. We have analyzed the parallel efficiency of spkMC and find that it scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin simulations.
Solving unstructured grid problems on massively parallel computers
NASA Technical Reports Server (NTRS)
Hammond, Steven W.; Schreiber, Robert
1990-01-01
A highly parallel graph mapping technique that enables one to efficiently solve unstructured grid problems on massively parallel computers is presented. Many implicit and explicit methods for solving discretized partial differential equations require each point in the discretization to exchange data with its neighboring points every time step or iteration. The cost of this communication can negate the high performance promised by massively parallel computing. To eliminate this bottleneck, the graph of the irregular problem is mapped into the graph representing the interconnection topology of the computer such that the sum of the distances that the messages travel is minimized. It is shown that using the heuristic mapping algorithm significantly reduces the communication time compared to a naive assignment of processes to processors.
Parallel 3D Multi-Stage Simulation of a Turbofan Engine
NASA Technical Reports Server (NTRS)
Turner, Mark G.; Topp, David A.
1998-01-01
A 3D multistage simulation of each component of a modern GE Turbofan engine has been made. An axisymmetric view of this engine is presented in the document. This includes a fan, booster rig, high pressure compressor rig, high pressure turbine rig and a low pressure turbine rig. In the near future, all components will be run in a single calculation for a solution of 49 blade rows. The simulation exploits the use of parallel computations by using two levels of parallelism. Each blade row is run in parallel and each blade row grid is decomposed into several domains and run in parallel. 20 processors are used for the 4 blade row analysis. The average passage approach developed by John Adamczyk at NASA Lewis Research Center has been further developed and parallelized. This is APNASA Version A. It is a Navier-Stokes solver using a 4-stage explicit Runge-Kutta time marching scheme with variable time steps and residual smoothing for convergence acceleration. It has an implicit K-E turbulence model which uses an ADI solver to factor the matrix. Between 50 and 100 explicit time steps are solved before a blade row body force is calculated and exchanged with the other blade rows. This outer iteration has been coined a "flip." Efforts have been made to make the solver linearly scaleable with the number of blade rows. Enough flips are run (between 50 and 200) so the solution in the entire machine is not changing. The K-E equations are generally solved every other explicit time step. One of the key requirements in the development of the parallel code was to make the parallel solution exactly (bit for bit) match the serial solution. This has helped isolate many small parallel bugs and guarantee the parallelization was done correctly. The domain decomposition is done only in the axial direction since the number of points axially is much larger than the other two directions. This code uses MPI for message passing. The parallel speed up of the solver portion (no 1/0 or body force
Three-dimensional parallel UNIPIC-3D code for simulations of high-power microwave devices
NASA Astrophysics Data System (ADS)
Wang, Jianguo; Chen, Zaigao; Wang, Yue; Zhang, Dianhui; Liu, Chunliang; Li, Yongdong; Wang, Hongguang; Qiao, Hailiang; Fu, Meiyan; Yuan, Yuan
2010-07-01
This paper introduces a self-developed, three-dimensional parallel fully electromagnetic particle simulation code UNIPIC-3D. In this code, the electromagnetic fields are updated using the second-order, finite-difference time-domain method, and the particles are moved using the relativistic Newton-Lorentz force equation. The electromagnetic field and particles are coupled through the current term in Maxwell's equations. Two numerical examples are used to verify the algorithms adopted in this code, numerical results agree well with theoretical ones. This code can be used to simulate the high-power microwave (HPM) devices, such as the relativistic backward wave oscillator, coaxial vircator, and magnetically insulated line oscillator, etc. UNIPIC-3D is written in the object-oriented C++ language and can be run on a variety of platforms including WINDOWS, LINUX, and UNIX. Users can use the graphical user's interface to create the complex geometric structures of the simulated HPM devices, which can be automatically meshed by UNIPIC-3D code. This code has a powerful postprocessor which can display the electric field, magnetic field, current, voltage, power, spectrum, momentum of particles, etc. For the sake of comparison, the results computed by using the two-and-a-half-dimensional UNIPIC code are also provided for the same parameters of HPM devices, the numerical results computed from these two codes agree well with each other.
BioFVM: an efficient, parallelized diffusive transport solver for 3-D biological simulations
Ghaffarizadeh, Ahmadreza; Friedman, Samuel H.; Macklin, Paul
2016-01-01
Motivation: Computational models of multicellular systems require solving systems of PDEs for release, uptake, decay and diffusion of multiple substrates in 3D, particularly when incorporating the impact of drugs, growth substrates and signaling factors on cell receptors and subcellular systems biology. Results: We introduce BioFVM, a diffusive transport solver tailored to biological problems. BioFVM can simulate release and uptake of many substrates by cell and bulk sources, diffusion and decay in large 3D domains. It has been parallelized with OpenMP, allowing efficient simulations on desktop workstations or single supercomputer nodes. The code is stable even for large time steps, with linear computational cost scalings. Solutions are first-order accurate in time and second-order accurate in space. The code can be run by itself or as part of a larger simulator. Availability and implementation: BioFVM is written in C ++ with parallelization in OpenMP. It is maintained and available for download at http://BioFVM.MathCancer.org and http://BioFVM.sf.net under the Apache License (v2.0). Contact: paul.macklin@usc.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26656933
3D-radiative transfer in terrestrial atmosphere: An efficient parallel numerical procedure
NASA Astrophysics Data System (ADS)
Bass, L. P.; Germogenova, T. A.; Nikolaeva, O. V.; Kokhanovsky, A. A.; Kuznetsov, V. S.
2003-04-01
Light propagation and scattering in terrestrial atmosphere is usually studied in the framework of the 1D radiative transfer theory [1]. However, in reality particles (e.g., ice crystals, solid and liquid aerosols, cloud droplets) are randomly distributed in 3D space. In particular, their concentrations vary both in vertical and horizontal directions. Therefore, 3D effects influence modern cloud and aerosol retrieval procedures, which are currently based on the 1D radiative transfer theory. It should be pointed out that the standard radiative transfer equation allows to study these more complex situations as well [2]. In recent year the parallel version of the 2D and 3D RADUGA code has been developed. This version is successfully used in gammas and neutrons transport problems [3]. Applications of this code to radiative transfer in atmosphere problems are contained in [4]. Possibilities of code RADUGA are presented in [5]. The RADUGA code system is an universal solver of radiative transfer problems for complicated models, including 2D and 3D aerosol and cloud fields with arbitrary scattering anisotropy, light absorption, inhomogeneous underlying surface and topography. Both delta type and distributed light sources can be accounted for in the framework of the algorithm developed. The accurate numerical procedure is based on the new discrete ordinate SWDD scheme [6]. The algorithm is specifically designed for parallel supercomputers. The version RADUGA 5.1(P) can run on MBC1000M [7] (768 processors with 10 Gb of hard disc memory for each processor). The peak productivity is equal 1 Tfl. Corresponding scalar version RADUGA 5.1 is working on PC. As a first example of application of the algorithm developed, we have studied the shadowing effects of clouds on neighboring cloudless atmosphere, depending on the cloud optical thickness, surface albedo, and illumination conditions. This is of importance for modern satellite aerosol retrieval algorithms development. [1] Sobolev
A Programming Model for Massive Data Parallelism with Data Dependencies
Cui, Xiaohui; Mueller, Frank; Potok, Thomas E; Zhang, Yongpeng
2009-01-01
Accelerating processors can often be more cost and energy effective for a wide range of data-parallel computing problems than general-purpose processors. For graphics processor units (GPUs), this is particularly the case when program development is aided by environments such as NVIDIA s Compute Unified Device Architecture (CUDA), which dramatically reduces the gap between domain-specific architectures and general purpose programming. Nonetheless, general-purpose GPU (GPGPU) programming remains subject to several restrictions. Most significantly, the separation of host (CPU) and accelerator (GPU) address spaces requires explicit management of GPU memory resources, especially for massive data parallelism that well exceeds the memory capacity of GPUs. One solution to this problem is to transfer data between the GPU and host memories frequently. In this work, we investigate another approach. We run massively data-parallel applications on GPU clusters. We further propose a programming model for massive data parallelism with data dependencies for this scenario. Experience from micro benchmarks and real-world applications shows that our model provides not only ease of programming but also significant performance gains.
Design and verification of an ultra-precision 3D-coordinate measuring machine with parallel drives
NASA Astrophysics Data System (ADS)
Bos, Edwin; Moers, Ton; van Riel, Martijn
2015-08-01
An ultra-precision 3D coordinate measuring machine (CMM), the TriNano N100, has been developed. In our design, the workpiece is mounted on a 3D stage, which is driven by three parallel drives that are mutually orthogonal. The linear drives support the 3D stage using vacuum preloaded (VPL) air bearings, whereby each drive determines the position of the 3D stage along one translation direction only. An exactly constrained design results in highly repeatable machine behavior. Furthermore, the machine complies with the Abbé principle over its full measurement range and the application of parallel drives allows for excellent dynamic behavior. The design allows a 3D measurement uncertainty of 100 nanometers in a measurement range of 200 cubic centimeters. Verification measurements using a Gannen XP 3D tactile probing system on a spherical artifact show a standard deviation in single point repeatability of around 2 nm in each direction.
The language parallel Pascal and other aspects of the massively parallel processor
NASA Technical Reports Server (NTRS)
Reeves, A. P.; Bruner, J. D.
1982-01-01
A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
Three-Dimensional Radiative Transfer on a Massively Parallel Computer.
NASA Astrophysics Data System (ADS)
Vath, Horst Michael
1994-01-01
We perform three-dimensional radiative transfer calculations on the MasPar MP-1, which contains 8192 processors and is a single instruction multiple data (SIMD) machine, an example of the new generation of massively parallel computers. To make radiative transfer calculations efficient, we must re-consider the numerical methods and methods of storage of data that have been used with serial machines. We developed a numerical code which efficiently calculates images and spectra of astrophysical systems as seen from different viewing directions and at different wavelengths. We use this code to examine a number of different astrophysical systems. First we image the HI distribution of model galaxies. Then we investigate the galaxy NGC 5055, which displays a radial asymmetry in its optical appearance. This can be explained by the presence of dust in the outer HI disk far beyond the optical disk. As the formation of dust is connected to the presence of stars, the existence of dust in outer regions of this galaxy could have consequences for star formation at a time when this galaxy was just forming. Next we use the code for polarized radiative transfer. We first discuss the numerical computation of the required cyclotron opacities and use them to calculate spectra of AM Her systems, binaries containing accreting magnetic white dwarfs. Then we obtain spectra of an extended polar cap. Previous calculations did not consider the three -dimensional extension of the shock. We find that this results in a significant underestimate of the radiation emitted in the shock. Next we calculate the spectrum of the intermediate polar RE 0751+14. For this system we obtain a magnetic field of ~10 MG, which has consequences for the evolution of intermediate polars. Finally we perform 3D radiative transfer in NLTE in the two-level atom approximation. To solve the transfer equation in this case, we adapt the short characteristic method and examine different acceleration methods to obtain the
Supercomputing on massively parallel bit-serial architectures
NASA Technical Reports Server (NTRS)
Iobst, Ken
1985-01-01
Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.
Development of massively parallel quantum chemistry program SMASH
Ishimura, Kazuya
2015-12-31
A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C{sub 150}H{sub 30}){sub 2} with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.
Development of massively parallel quantum chemistry program SMASH
NASA Astrophysics Data System (ADS)
Ishimura, Kazuya
2015-12-01
A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C150H30)2 with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.
TSE computers - A means for massively parallel computations
NASA Technical Reports Server (NTRS)
Strong, J. P., III
1976-01-01
A description is presented of hardware concepts for building a massively parallel processing system for two-dimensional data. The processing system is to use logic arrays of 128 x 128 elements which perform over 16 thousand operations simultaneously. Attention is given to image data, logic arrays, basic image logic functions, a prototype negator, an interleaver device, image logic circuits, and an image memory circuit.
MIMD massively parallel methods for engineering and science problems
Camp, W.J.; Plimpton, S.J.
1993-08-01
MIMD massively parallel computers promise unique power and flexibility for engineering and scientific simulations. In this paper we review the development of a number of software methods and algorithms for scientific and engineering problems which are helping to realize that promise. We discuss new domain decomposition, load balancing, data layout and communications methods applicable to simulations in a broad range of technical field including signal processing, multi-dimensional structural and fluid mechanics, materials science, and chemical and biological systems.
Massively parallel Wang Landau sampling on multiple GPUs
Yin, Junqi; Landau, D. P.
2012-01-01
Wang Landau sampling is implemented on the Graphics Processing Unit (GPU) with the Compute Unified Device Architecture (CUDA). Performances on three different GPU cards, including the new generation Fermi architecture card, are compared with that on a Central Processing Unit (CPU). The parameters for massively parallel Wang Landau sampling are tuned in order to achieve fast convergence. For simulations of the water cluster systems, we obtain an average of over 50 times speedup for a given workload.
Time-dependent 3-D dterministic transport on parallel architectures using Dantsys/MPI
Baker, R.S.; Alcouffe, R.E.
1996-12-31
In addition to the ability to solve the static transport equation, we have also incorporated time dependence into our parallel 3-D S{sub {ital N}} code DANTSYS/MPI. Using a semi-implicit scheme, DANTSYS/MPI is capable of performing time-dependent calculations for both fissioning and pure source driven problems. We have applied this to various types of problems such as nuclear well logging and prompt fission experiments. This paper describes the form of the time- dependent equations implemented, their solution strategies in DANTSYS/MPI including iteration acceleration, and the strategies used for time-step control. Results are presented for a model nuclear well logging calculation.
Ruh, Dominic; Tränkle, Benjamin; Rohrbach, Alexander
2011-10-24
Multi-dimensional, correlated particle tracking is a key technology to reveal dynamic processes in living and synthetic soft matter systems. In this paper we present a new method for tracking micron-sized beads in parallel and in all three dimensions - faster and more precise than existing techniques. Using an acousto-optic deflector and two quadrant-photo-diodes, we can track numerous optically trapped beads at up to tens of kHz with a precision of a few nanometers by back-focal plane interferometry. By time-multiplexing the laser focus, we can calibrate individually all traps and all tracking signals in a few seconds and in 3D. We show 3D histograms and calibration constants for nine beads in a quadratic arrangement, although trapping and tracking is easily possible for more beads also in arbitrary 2D arrangements. As an application, we investigate the hydrodynamic coupling and diffusion anomalies of spheres trapped in a 3 × 3 arrangement. PMID:22109012
Requirements for supercomputing in energy research: The transition to massively parallel computing
Not Available
1993-02-01
This report discusses: The emergence of a practical path to TeraFlop computing and beyond; requirements of energy research programs at DOE; implementation: supercomputer production computing environment on massively parallel computers; and implementation: user transition to massively parallel computing.
The 2nd Symposium on the Frontiers of Massively Parallel Computations
NASA Technical Reports Server (NTRS)
Mills, Ronnie (Editor)
1988-01-01
Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
NASA Astrophysics Data System (ADS)
Yang, Dikun; Oldenburg, Douglas W.; Haber, Eldad
2014-03-01
Airborne electromagnetic (AEM) methods are highly efficient tools for assessing the Earth's conductivity structures in a large area at low cost. However, the configuration of AEM measurements, which typically have widely distributed transmitter-receiver pairs, makes the rigorous modelling and interpretation extremely time-consuming in 3-D. Excessive overcomputing can occur when working on a large mesh covering the entire survey area and inverting all soundings in the data set. We propose two improvements. The first is to use a locally optimized mesh for each AEM sounding for the forward modelling and calculation of sensitivity. This dedicated local mesh is small with fine cells near the sounding location and coarse cells far away in accordance with EM diffusion and the geometric decay of the signals. Once the forward problem is solved on the local meshes, the sensitivity for the inversion on the global mesh is available through quick interpolation. Using local meshes for AEM forward modelling avoids unnecessary computing on fine cells on a global mesh that are far away from the sounding location. Since local meshes are highly independent, the forward modelling can be efficiently parallelized over an array of processors. The second improvement is random and dynamic down-sampling of the soundings. Each inversion iteration only uses a random subset of the soundings, and the subset is reselected for every iteration. The number of soundings in the random subset, determined by an adaptive algorithm, is tied to the degree of model regularization. This minimizes the overcomputing caused by working with redundant soundings. Our methods are compared against conventional methods and tested with a synthetic example. We also invert a field data set that was previously considered to be too large to be practically inverted in 3-D. These examples show that our methodology can dramatically reduce the processing time of 3-D inversion to a practical level without losing resolution
NASA Astrophysics Data System (ADS)
Rastogi, Richa; Srivastava, Abhishek; Khonde, Kiran; Sirasala, Kirannmayi M.; Londhe, Ashutosh; Chavhan, Hitesh
2015-07-01
This paper presents an efficient parallel 3D Kirchhoff depth migration algorithm suitable for current class of multicore architecture. The fundamental Kirchhoff depth migration algorithm exhibits inherent parallelism however, when it comes to 3D data migration, as the data size increases the resource requirement of the algorithm also increases. This challenges its practical implementation even on current generation high performance computing systems. Therefore a smart parallelization approach is essential to handle 3D data for migration. The most compute intensive part of Kirchhoff depth migration algorithm is the calculation of traveltime tables due to its resource requirements such as memory/storage and I/O. In the current research work, we target this area and develop a competent parallel algorithm for post and prestack 3D Kirchhoff depth migration, using hybrid MPI+OpenMP programming techniques. We introduce a concept of flexi-depth iterations while depth migrating data in parallel imaging space, using optimized traveltime table computations. This concept provides flexibility to the algorithm by migrating data in a number of depth iterations, which depends upon the available node memory and the size of data to be migrated during runtime. Furthermore, it minimizes the requirements of storage, I/O and inter-node communication, thus making it advantageous over the conventional parallelization approaches. The developed parallel algorithm is demonstrated and analysed on Yuva II, a PARAM series of supercomputers. Optimization, performance and scalability experiment results along with the migration outcome show the effectiveness of the parallel algorithm.
In situ patterned micro 3D liver constructs for parallel toxicology testing in a fluidic device.
Skardal, Aleksander; Devarasetty, Mahesh; Soker, Shay; Hall, Adam R
2015-09-01
3D tissue models are increasingly being implemented for drug and toxicology testing. However, the creation of tissue-engineered constructs for this purpose often relies on complex biofabrication techniques that are time consuming, expensive, and difficult to scale up. Here, we describe a strategy for realizing multiple tissue constructs in a parallel microfluidic platform using an approach that is simple and can be easily scaled for high-throughput formats. Liver cells mixed with a UV-crosslinkable hydrogel solution are introduced into parallel channels of a sealed microfluidic device and photopatterned to produce stable tissue constructs in situ. The remaining uncrosslinked material is washed away, leaving the structures in place. By using a hydrogel that specifically mimics the properties of the natural extracellular matrix, we closely emulate native tissue, resulting in constructs that remain stable and functional in the device during a 7-day culture time course under recirculating media flow. As proof of principle for toxicology analysis, we expose the constructs to ethyl alcohol (0-500 mM) and show that the cell viability and the secretion of urea and albumin decrease with increasing alcohol exposure, while markers for cell damage increase. PMID:26355538
Parallel robot for micro assembly with integrated innovative optical 3D-sensor
NASA Astrophysics Data System (ADS)
Hesselbach, Juergen; Ispas, Diana; Pokar, Gero; Soetebier, Sven; Tutsch, Rainer
2002-10-01
Recent advances in the fields of MEMS and MOEMS often require precise assembly of very small parts with an accuracy of a few microns. In order to meet this demand, a new approach using a robot based on parallel mechanisms in combination with a novel 3D-vision system has been chosen. The planar parallel robot structure with 2 DOF provides a high resolution in the XY-plane. It carries two additional serial axes for linear and rotational movement in/about z direction. In order to achieve high precision as well as good dynamic capabilities, the drive concept for the parallel (main) axes incorporates air bearings in combination with a linear electric servo motors. High accuracy position feedback is provided by optical encoders with a resolution of 0.1 μm. To allow for visualization and visual control of assembly processes, a camera module fits into the hollow tool head. It consists of a miniature CCD camera and a light source. In addition a modular gripper support is integrated into the tool head. To increase the accuracy a control loop based on an optoelectronic sensor will be implemented. As a result of an in-depth analysis of different approaches a photogrammetric system using one single camera and special beam-splitting optics was chosen. A pattern of elliptical marks is applied to the surfaces of workpiece and gripper. Using a model-based recognition algorithm the image processing software identifies the gripper and the workpiece and determines their relative position. A deviation vector is calculated and fed into the robot control to guide the gripper.
NASA Astrophysics Data System (ADS)
Commerçon, B.; Hennebelle, P.; Levrier, F.; Launhardt, R.; Henning, Th.
2012-03-01
I will present radiation-magneto-hydrodynamics calculations of low-mass and massive dense core collapse, focusing on the first collapse and the first hydrostatic core (first Larson core) formation. The influence of magnetic field and initial mass on the fragmentation properties will be investigated. In the first part reporting low mass dense core collapse calculations, synthetic observations of spectral energy distributions will be derived, as well as classical observational quantities such as bolometric temperature and luminosity. I will show how the dust continuum can help to target first hydrostatic cores and to state about the nature of VeLLOs. Last, I will present synthetic ALMA observation predictions of first hydrostatic cores which may give an answer, if not definitive, to the fragmentation issue at the early Class 0 stage. In the second part, I will report the results of radiation-magneto-hydrodynamics calculations in the context of high mass star formation, using for the first time a self-consistent model for photon emission (i.e. via thermal emission and in radiative shocks) and with the high resolution necessary to resolve properly magnetic braking effects and radiative shocks on scales <100 AU (Commercon, Hennebelle & Henning ApJL 2011). In this study, we investigate the combined effects of magnetic field, turbulence, and radiative transfer on the early phases of the collapse and the fragmentation of massive dense cores (M=100 M_⊙). We identify a new mechanism that inhibits initial fragmentation of massive dense cores, where magnetic field and radiative transfer interplay. We show that this interplay becomes stronger as the magnetic field strength increases. We speculate that highly magnetized massive dense cores are good candidates for isolated massive star formation, while moderately magnetized massive dense cores are more appropriate to form OB associations or small star clusters. Finally we will also present synthetic observations of these
Comparison of 3-D synthetic aperture phased-array ultrasound imaging and parallel beamforming.
Rasmussen, Morten Fischer; Jensen, Jørgen Arendt
2014-10-01
This paper demonstrates that synthetic aperture imaging (SAI) can be used to achieve real-time 3-D ultrasound phased-array imaging. It investigates whether SAI increases the image quality compared with the parallel beamforming (PB) technique for real-time 3-D imaging. Data are obtained using both simulations and measurements with an ultrasound research scanner and a commercially available 3.5- MHz 1024-element 2-D transducer array. To limit the probe cable thickness, 256 active elements are used in transmit and receive for both techniques. The two imaging techniques were designed for cardiac imaging, which requires sequences designed for imaging down to 15 cm of depth and a frame rate of at least 20 Hz. The imaging quality of the two techniques is investigated through simulations as a function of depth and angle. SAI improved the full-width at half-maximum (FWHM) at low steering angles by 35%, and the 20-dB cystic resolution by up to 62%. The FWHM of the measured line spread function (LSF) at 80 mm depth showed a difference of 20% in favor of SAI. SAI reduced the cyst radius at 60 mm depth by 39% in measurements. SAI improved the contrast-to-noise ratio measured on anechoic cysts embedded in a tissue-mimicking material by 29% at 70 mm depth. The estimated penetration depth on the same tissue-mimicking phantom shows that SAI increased the penetration by 24% compared with PB. Neither SAI nor PB achieved the design goal of 15 cm penetration depth. This is likely due to the limited transducer surface area and a low SNR of the experimental scanner used. PMID:25265174
Routing performance analysis and optimization within a massively parallel computer
Archer, Charles Jens; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen
2013-04-16
An apparatus, program product and method optimize the operation of a massively parallel computer system by, in part, receiving actual performance data concerning an application executed by the plurality of interconnected nodes, and analyzing the actual performance data to identify an actual performance pattern. A desired performance pattern may be determined for the application, and an algorithm may be selected from among a plurality of algorithms stored within a memory, the algorithm being configured to achieve the desired performance pattern based on the actual performance data.
The Massively Parallel Processor and its applications. [for environmental monitoring
NASA Technical Reports Server (NTRS)
Strong, J. P.; Schaefer, D. H.; Fischer, J. R.; Wallgren, K. R.; Bracken, P. A.
1979-01-01
A long-term experimental development program conducted at Goddard Space Flight Center to implement an ultrahigh-speed data processing system known as the Massively Parallel Processor (MPP) is described. The MPP is a single instruction multiple data stream computer designed to perform logical, integer, and floating point arithmetic operations on variable word length data. Information is presented on system architecture, the system configuration, the array unit architecture, individual processing units, and expected operating rates for several image processing applications (including the processing of Landsat data).
A Massively Parallel Solver for the Mechanical Harmonic Analysis of Accelerator Cavities
O. Kononenko
2015-02-17
ACE3P is a 3D massively parallel simulation suite that developed at SLAC National Accelerator Laboratory that can perform coupled electromagnetic, thermal and mechanical study. Effectively utilizing supercomputer resources, ACE3P has become a key simulation tool for particle accelerator R and D. A new frequency domain solver to perform mechanical harmonic response analysis of accelerator components is developed within the existing parallel framework. This solver is designed to determine the frequency response of the mechanical system to external harmonic excitations for time-efficient accurate analysis of the large-scale problems. Coupled with the ACE3P electromagnetic modules, this capability complements a set of multi-physics tools for a comprehensive study of microphonics in superconducting accelerating cavities in order to understand the RF response and feedback requirements for the operational reliability of a particle accelerator. (auth)
A biconjugate gradient type algorithm on massively parallel architectures
NASA Technical Reports Server (NTRS)
Freund, Roland W.; Hochbruck, Marlis
1991-01-01
The biconjugate gradient (BCG) method is the natural generalization of the classical conjugate gradient algorithm for Hermitian positive definite matrices to general non-Hermitian linear systems. Unfortunately, the original BCG algorithm is susceptible to possible breakdowns and numerical instabilities. Recently, Freund and Nachtigal have proposed a novel BCG type approach, the quasi-minimal residual method (QMR), which overcomes the problems of BCG. Here, an implementation is presented of QMR based on an s-step version of the nonsymmetric look-ahead Lanczos algorithm. The main feature of the s-step Lanczos algorithm is that, in general, all inner products, except for one, can be computed in parallel at the end of each block; this is unlike the other standard Lanczos process where inner products are generated sequentially. The resulting implementation of QMR is particularly attractive on massively parallel SIMD architectures, such as the Connection Machine.
3D magnetospheric parallel hybrid multi-grid method applied to planet-plasma interactions
NASA Astrophysics Data System (ADS)
Leclercq, L.; Modolo, R.; Leblanc, F.; Hess, S.; Mancini, M.
2016-03-01
We present a new method to exploit multiple refinement levels within a 3D parallel hybrid model, developed to study planet-plasma interactions. This model is based on the hybrid formalism: ions are kinetically treated whereas electrons are considered as a inertia-less fluid. Generally, ions are represented by numerical particles whose size equals the volume of the cells. Particles that leave a coarse grid subsequently entering a refined region are split into particles whose volume corresponds to the volume of the refined cells. The number of refined particles created from a coarse particle depends on the grid refinement rate. In order to conserve velocity distribution functions and to avoid calculations of average velocities, particles are not coalesced. Moreover, to ensure the constancy of particles' shape function sizes, the hybrid method is adapted to allow refined particles to move within a coarse region. Another innovation of this approach is the method developed to compute grid moments at interfaces between two refinement levels. Indeed, the hybrid method is adapted to accurately account for the special grid structure at the interfaces, avoiding any overlapping grid considerations. Some fundamental test runs were performed to validate our approach (e.g. quiet plasma flow, Alfven wave propagation). Lastly, we also show a planetary application of the model, simulating the interaction between Jupiter's moon Ganymede and the Jovian plasma.
A 3D Parallel Beam Dynamics Code for Modeling High Brightness Beams in Photoinjectors
Qiang, Ji; Lidia, S.; Ryne, R.D.; Limborg, C.; /SLAC
2006-02-13
In this paper we report on IMPACT-T, a 3D beam dynamics code for modeling high brightness beams in photoinjectors and rf linacs. IMPACT-T is one of the few codes used in the photoinjector community that has a parallel implementation, making it very useful for high statistics simulations of beam halos and beam diagnostics. It has a comprehensive set of beamline elements, and furthermore allows arbitrary overlap of their fields. It is unique in its use of space-charge solvers based on an integrated Green function to efficiently and accurately treat beams with large aspect ratio, and a shifted Green function to efficiently treat image charge effects of a cathode. It is also unique in its inclusion of energy binning in the space-charge calculation to model beams with large energy spread. Together, all these features make IMPACT-T a powerful and versatile tool for modeling beams in photoinjectors and other systems. In this paper we describe the code features and present results of IMPACT-T simulations of the LCLS photoinjectors. We also include a comparison of IMPACT-T and PARMELA results.
A 3d Parallel Beam Dynamics Code for Modeling High BrightnessBeams in Photoinjectors
Qiang, J.; Lidia, S.; Ryne, R.; Limborg, C.
2005-05-16
In this paper we report on IMPACT-T, a 3D beam dynamics code for modeling high brightness beams in photoinjectors and rf linacs. IMPACT-T is one of the few codes used in the photoinjector community that has a parallel implementation, making it very useful for high statistics simulations of beam halos and beam diagnostics. It has a comprehensive set of beamline elements, and furthermore allows arbitrary overlap of their fields. It is unique in its use of space-charge solvers based on an integrated Green function to efficiently and accurately treat beams with large aspect ratio, and a shifted Green function to efficiently treat image charge effects of a cathode. It is also unique in its inclusion of energy binning in the space-charge calculation to model beams with large energy spread. Together, all these features make IMPACT-T a powerful and versatile tool for modeling beams in photoinjectors and other systems. In this paper we describe the code features and present results of IMPACT-T simulations of the LCLS photoinjectors. We also include a comparison of IMPACT-T and PARMELA results.
Numerical computation on massively parallel hypercubes. [Connection machine
McBryan, O.A.
1986-01-01
We describe numerical computations on the Connection Machine, a massively parallel hypercube architecture with 65,536 single-bit processors and 32 Mbytes of memory. A parallel extension of COMMON LISP, provides access to the processors and network. The rich software environment is further enhanced by a powerful virtual processor capability, which extends the degree of fine-grained parallelism beyond 1,000,000. We briefly describe the hardware and indicate the principal features of the parallel programming environment. We then present implementations of SOR, multigrid and pre-conditioned conjugate gradient algorithms for solving partial differential equations on the Connection Machine. Despite the lack of floating point hardware, computation rates above 100 megaflops have been achieved in PDE solution. Virtual processors prove to be a real advantage, easing the effort of software development while improving system performance significantly. The software development effort is also facilitated by the fact that hypercube communications prove to be fast and essentially independent of distance. 29 refs., 4 figs.
The performance realities of massively parallel processors: A case study
Lubeck, O.M.; Simmons, M.L.; Wasserman, H.J.
1992-07-01
This paper presents the results of an architectural comparison of SIMD massive parallelism, as implemented in the Thinking Machines Corp. CM-2 computer, and vector or concurrent-vector processing, as implemented in the Cray Research Inc. Y-MP/8. The comparison is based primarily upon three application codes that represent Los Alamos production computing. Tests were run by porting optimized CM Fortran codes to the Y-MP, so that the same level of optimization was obtained on both machines. The results for fully-configured systems, using measured data rather than scaled data from smaller configurations, show that the Y-MP/8 is faster than the 64k CM-2 for all three codes. A simple model that accounts for the relative characteristic computational speeds of the two machines, and reduction in overall CM-2 performance due to communication or SIMD conditional execution, is included. The model predicts the performance of two codes well, but fails for the third code, because the proportion of communications in this code is very high. Other factors, such as memory bandwidth and compiler effects, are also discussed. Finally, the paper attempts to show the equivalence of the CM-2 and Y-MP programming models, and also comments on selected future massively parallel processor designs.
Comparison of massively parallel hand-print segmenters
Wilkinson, R.A.; Garris, M.D.
1992-09-01
NIST has developed a massively parallel hand-print recognition system that allows components to be interchanged. Using this system, three different character segmentation algorithms have been developed and studied. They are blob coloring, histogramming, and a hybrid of the two. The blob coloring method uses connected components to isolate characters. The histogramming method locates linear spaces, which may be slanted, to segment characters. The hybrid method is an augmented histogramming method that incorporates statistically adaptive rules to decide when a histogrammed item is too large and applies blob coloring to further segment the difficult item. The hardware configuration is a serial host computer with a 1024 processor Single Instruction Multiple Data (SIMD) machine attached to it. The data used in this comparison is 'NIST Special Database 1' which contains 2100 forms from different writers where each form contains 130 digit characters distributed across 28 fields. This gives a potential 273,000 characters to be segmented. Running the massively parallel system across the 2100 forms, blob coloring required 2.1 seconds per form with an accuracy of 97.5%, histogramming required 14.4 seconds with an accuracy of 95.3%, and the hybrid method required 13.2 seconds with an accuracy of 95.4%. The results of this comparison show that the blob coloring method on a SIMD architecture is superior.
Learning Quantitative Sequence-Function Relationships from Massively Parallel Experiments
NASA Astrophysics Data System (ADS)
Atwal, Gurinder S.; Kinney, Justin B.
2016-03-01
A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships—functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes"—directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.
3D parallel computations of turbofan noise propagation using a spectral element method
NASA Astrophysics Data System (ADS)
Taghaddosi, Farzad
2006-12-01
A three-dimensional code has been developed for the simulation of tone noise generated by turbofan engine inlets using computational aeroacoustics. The governing equations are the linearized Euler equations, which are further simplified to a set of equations in terms of acoustic potential, using the irrotational flow assumption, and subsequently solved in the frequency domain. Due to the special nature of acoustic wave propagation, the spatial discretization is performed using a spectral element method, where a tensor product of the nth-degree polynomials based on Chebyshev orthogonal functions is used to approximate variations within hexahedral elements. Non-reflecting boundary conditions are imposed at the far-field using a damping layer concept. This is done by augmenting the continuity equation with an additional term without modifying the governing equations as in PML methods. Solution of the linear system of equations for the acoustic problem is based on the Schur complement method, which is a nonoverlapping domain decomposition technique. The Schur matrix is first solved using a matrix-free iterative method, whose convergence is accelerated with a novel local preconditioner. The solution in the entire domain is then obtained by finding solutions in smaller subdomains. The 3D code also contains a mean flow solver based on the full potential equation in order to take into account the effects of flow variations around the nacelle on the scattering of the radiated sound field. All aspects of numerical simulations, including building and assembling the coefficient matrices, implementation of the Schur complement method, and solution of the system of equations for both the acoustic and mean flow problems are performed on multiprocessors in parallel using the resources of the CLUMEQ Supercomputer Center. A large number of test cases are presented, ranging in size from 100 000-2 000 000 unknowns for which, depending on the size of the problem, between 8-48 CPU's are
Exact solutions and the consistency of 3D minimal massive gravity
NASA Astrophysics Data System (ADS)
Altas, Emel; Tekin, Bayram
2015-07-01
We show that all algebraic type-O , type-N and type-D and some Kundt-type solutions of topologically massive gravity are inherited by its holographically well-defined deformation, that is, the recently found minimal massive gravity. This construction provides a large class of constant scalar curvature solutions to the theory. We also study the consistency of the field equations both in the source-free and matter-coupled cases. Since the field equations of MMG do not come from a Lagrangian that depends on the metric and its derivatives only, it lacks the Bianchi identity valid for all nonsingular metrics. But it turns out that for the solutions of the equations, the Bianchi identity is satisfied. This is a necessary condition for the consistency of the classical field equations but not a sufficient one, since the rank-two tensor equations are susceptible to double divergence. We show that for the source-free case the double divergence of the field equations vanishes for the solutions. In the matter-coupled case, we show that the double divergences on the left-hand side and the right-hand side are equal to each other for the solutions of the theory. This construction completes the proof of the consistency of the field equations.
Dharmaraj, Christopher D.; Thadikonda, Kishan; Fletcher, Anthony R.; Doan, Phuc N.; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A.; Cook, John A.; Mitchell, James B.; Subramanian, Sankaran; Krishna, Murali C.
2009-01-01
Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 × 23 × 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time. PMID:19672315
Dharmaraj, Christopher D; Thadikonda, Kishan; Fletcher, Anthony R; Doan, Phuc N; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A; Cook, John A; Mitchell, James B; Subramanian, Sankaran; Krishna, Murali C
2009-01-01
Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 x 23 x 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time. PMID:19672315
Formiconi, A R; Passeri, A; Guelfi, M R; Masoni, M; Pupi, A; Meldolesi, U; Malfetti, P; Calori, L; Guidazzoli, A
1997-11-01
Data from Single Photon Emission Computed Tomography (SPECT) studies are blurred by inevitable physical phenomena occurring during data acquisition. These errors may be compensated by means of reconstruction algorithms which take into account accurate physical models of the data acquisition procedure. Unfortunately, this approach involves high memory requirements as well as a high computational burden which cannot be afforded by the computer systems of SPECT acquisition devices. In this work the possibility of accessing High Performance Computing and Networking (HPCN) resources through a World Wide Web interface for the advanced reconstruction of SPECT data in a clinical environment was investigated. An iterative algorithm with an accurate model of the variable system response was ported on the Multiple Instruction Multiple Data (MIMD) parallel architecture of a Cray T3D massively parallel computer. The system was accessible even from low cost PC-based workstations through standard TCP/IP networking. A speedup factor of 148 was predicted by the benchmarks run on the Cray T3D. A complete brain study of 30 (64 x 64) slices was reconstructed from a set of 90 (64 x 64) projections with ten iterations of the conjugate gradients algorithm in 9 s which corresponds to an actual speed-up factor of 135. The technique was extended to a more accurate 3D modeling of the system response for a true 3D reconstruction of SPECT data; the reconstruction time of the same data set with this more accurate model was 5 min. This work demonstrates the possibility of exploiting remote HPCN resources from hospital sites by means of low cost workstations using standard communication protocols and an user-friendly WWW interface without particular problems for routine use. PMID:9506406
Optimal evaluation of array expressions on massively parallel machines
NASA Technical Reports Server (NTRS)
Chatterjee, Siddhartha; Gilbert, John R.; Schreiber, Robert; Teng, Shang-Hua
1992-01-01
We investigate the problem of evaluating FORTRAN 90 style array expressions on massively parallel distributed-memory machines. On such machines, an elementwise operation can be performed in constant time for arrays whose corresponding elements are in the same processor. If the arrays are not aligned in this manner, the cost of aligning them is part of the cost of evaluating the expression. The choice of where to perform the operation then affects this cost. We present algorithms based on dynamic programming to solve this problem efficiently for a wide variety of interconnection schemes, including multidimensional grids and rings, hypercubes, and fat-trees. We also consider expressions containing operations that change the shape of the arrays, and show that our approach extends naturally to handle this case.
Applications of massively parallel computers in telemetry processing
NASA Technical Reports Server (NTRS)
El-Ghazawi, Tarek A.; Pritchard, Jim; Knoble, Gordon
1994-01-01
Telemetry processing refers to the reconstruction of full resolution raw instrumentation data with artifacts, of space and ground recording and transmission, removed. Being the first processing phase of satellite data, this process is also referred to as level-zero processing. This study is aimed at investigating the use of massively parallel computing technology in providing level-zero processing to spaceflights that adhere to the recommendations of the Consultative Committee on Space Data Systems (CCSDS). The workload characteristics, of level-zero processing, are used to identify processing requirements in high-performance computing systems. An example of level-zero functions on a SIMD MPP, such as the MasPar, is discussed. The requirements in this paper are based in part on the Earth Observing System (EOS) Data and Operation System (EDOS).
A Computational Fluid Dynamics Algorithm on a Massively Parallel Computer
NASA Technical Reports Server (NTRS)
Jespersen, Dennis C.; Levit, Creon
1989-01-01
The discipline of computational fluid dynamics is demanding ever-increasing computational power to deal with complex fluid flow problems. We investigate the performance of a finite-difference computational fluid dynamics algorithm on a massively parallel computer, the Connection Machine. Of special interest is an implicit time-stepping algorithm; to obtain maximum performance from the Connection Machine, it is necessary to use a nonstandard algorithm to solve the linear systems that arise in the implicit algorithm. We find that the Connection Machine ran achieve very high computation rates on both explicit and implicit algorithms. The performance of the Connection Machine puts it in the same class as today's most powerful conventional supercomputers.
Beam dynamics calculations and particle tracking using massively parallel processors
Ryne, R.D.; Habib, S.
1995-12-31
During the past decade massively parallel processors (MPPs) have slowly gained acceptance within the scientific community. At present these machines typically contain a few hundred to one thousand off-the-shelf microprocessors and a total memory of up to 32 GBytes. The potential performance of these machines is illustrated by the fact that a month long job on a high end workstation might require only a few hours on an MPP. The acceptance of MPPs has been slow for a variety of reasons. For example, some algorithms are not easily parallelizable. Also, in the past these machines were difficult to program. But in recent years the development of Fortran-like languages such as CM Fortran and High Performance Fortran have made MPPs much easier to use. In the following we will describe how MPPs can be used for beam dynamics calculations and long term particle tracking.
Integration of IR focal plane arrays with massively parallel processor
NASA Astrophysics Data System (ADS)
Esfandiari, P.; Koskey, P.; Vaccaro, K.; Buchwald, W.; Clark, F.; Krejca, B.; Rekeczky, C.; Zarandy, A.
2008-04-01
The intent of this investigation is to replace the low fill factor visible sensor of a Cellular Neural Network (CNN) processor with an InGaAs Focal Plane Array (FPA) using both bump bonding and epitaxial layer transfer techniques for use in the Ballistic Missile Defense System (BMDS) interceptor seekers. The goal is to fabricate a massively parallel digital processor with a local as well as a global interconnect architecture. Currently, this unique CNN processor is capable of processing a target scene in excess of 10,000 frames per second with its visible sensor. What makes the CNN processor so unique is that each processing element includes memory, local data storage, local and global communication devices and a visible sensor supported by a programmable analog or digital computer program.
Transmissive Nanohole Arrays for Massively-Parallel Optical Biosensing
2015-01-01
A high-throughput optical biosensing technique is proposed and demonstrated. This hybrid technique combines optical transmission of nanoholes with colorimetric silver staining. The size and spacing of the nanoholes are chosen so that individual nanoholes can be independently resolved in massive parallel using an ordinary transmission optical microscope, and, in place of determining a spectral shift, the brightness of each nanohole is recorded to greatly simplify the readout. Each nanohole then acts as an independent sensor, and the blocking of nanohole optical transmission by enzymatic silver staining defines the specific detection of a biological agent. Nearly 10000 nanoholes can be simultaneously monitored under the field of view of a typical microscope. As an initial proof of concept, biotinylated lysozyme (biotin-HEL) was used as a model analyte, giving a detection limit as low as 0.1 ng/mL. PMID:25530982
Development of a massively parallel parachute performance prediction code
Peterson, C.W.; Strickland, J.H.; Wolfe, W.P.; Sundberg, W.D.; McBride, D.D.
1997-04-01
The Department of Energy has given Sandia full responsibility for the complete life cycle (cradle to grave) of all nuclear weapon parachutes. Sandia National Laboratories is initiating development of a complete numerical simulation of parachute performance, beginning with parachute deployment and continuing through inflation and steady state descent. The purpose of the parachute performance code is to predict the performance of stockpile weapon parachutes as these parachutes continue to age well beyond their intended service life. A new massively parallel computer will provide unprecedented speed and memory for solving this complex problem, and new software will be written to treat the coupled fluid, structure and trajectory calculations as part of a single code. Verification and validation experiments have been proposed to provide the necessary confidence in the computations.
Efficient Identification of Assembly Neurons within Massively Parallel Spike Trains
Berger, Denise; Borgelt, Christian; Louis, Sebastien; Morrison, Abigail; Grün, Sonja
2010-01-01
The chance of detecting assembly activity is expected to increase if the spiking activities of large numbers of neurons are recorded simultaneously. Although such massively parallel recordings are now becoming available, methods able to analyze such data for spike correlation are still rare, as a combinatorial explosion often makes it infeasible to extend methods developed for smaller data sets. By evaluating pattern complexity distributions the existence of correlated groups can be detected, but their member neurons cannot be identified. In this contribution, we present approaches to actually identify the individual neurons involved in assemblies. Our results may complement other methods and also provide a way to reduce data sets to the “relevant” neurons, thus allowing us to carry out a refined analysis of the detailed correlation structure due to reduced computation time. PMID:19809521
Massively parallel high-order combinatorial genetics in human cells
Wong, Alan S L; Choi, Gigi C G; Cheng, Allen A; Purcell, Oliver; Lu, Timothy K
2016-01-01
The systematic functional analysis of combinatorial genetics has been limited by the throughput that can be achieved and the order of complexity that can be studied. To enable massively parallel characterization of genetic combinations in human cells, we developed a technology for rapid, scalable assembly of high-order barcoded combinatorial genetic libraries that can be quantified with high-throughput sequencing. We applied this technology, combinatorial genetics en masse (CombiGEM), to create high-coverage libraries of 1,521 two-wise and 51,770 three-wise barcoded combinations of 39 human microRNA (miRNA) precursors. We identified miRNA combinations that synergistically sensitize drug-resistant cancer cells to chemotherapy and/or inhibit cancer cell proliferation, providing insights into complex miRNA networks. More broadly, our method will enable high-throughput profiling of multifactorial genetic combinations that regulate phenotypes of relevance to biomedicine, biotechnology and basic science. PMID:26280411
Field-Scale, Massively Parallel Simulation of Production from Oceanic Gas Hydrate Deposits
NASA Astrophysics Data System (ADS)
Reagan, M. T.; Moridis, G. J.; Freeman, C. M.; Pan, L.; Boyle, K. L.; Johnson, J. N.; Husebo, J. A.
2012-12-01
The quantity of hydrocarbon gases trapped in natural hydrate accumulations is enormous, leading to significant interest in the evaluation of their potential as an energy source. It has been shown that large volumes of gas can be readily produced at high rates for long times from some types of methane hydrate accumulations by means of depressurization-induced dissociation, and using conventional technologies with horizontal or vertical well configurations. However, these systems are currently assessed using simplified or reduced-scale 3D or even 2D production simulations. In this study, we use the massively parallel TOUGH+HYDRATE code (pT+H) to assess the production potential of a large, deep-ocean hydrate reservoir and develop strategies for effective production. The simulations model a full 3D system of over 24 km2 extent, examining the productivity of vertical and horizontal wells, single or multiple wells, and explore variations in reservoir properties. Systems of up to 2.5M gridblocks, running on thousands of supercomputing nodes, are required to simulate such large systems at the highest level of detail. The simulations reveal the challenges inherent in producing from deep, relatively cold systems with extensive water-bearing channels and connectivity to large aquifers, including the difficulty of achieving depressurizing, the challenges of high water removal rates, and the complexity of production design. Also highlighted are new frontiers in large-scale reservoir simulation of coupled flow, transport, thermodynamics, and phase behavior, including the construction of large meshes, the use parallel numerical solvers and MPI, and large-scale, parallel 3D visualization of results.
Massively Parallel Simulations of Diffusion in Dense Polymeric Structures
Faulon, Jean-Loup, Wilcox, R.T. , Hobbs, J.D. , Ford, D.M.
1997-11-01
An original computational technique to generate close-to-equilibrium dense polymeric structures is proposed. Diffusion of small gases are studied on the equilibrated structures using massively parallel molecular dynamics simulations running on the Intel Teraflops (9216 Pentium Pro processors) and Intel Paragon(1840 processors). Compared to the current state-of-the-art equilibration methods this new technique appears to be faster by some orders of magnitude.The main advantage of the technique is that one can circumvent the bottlenecks in configuration space that inhibit relaxation in molecular dynamics simulations. The technique is based on the fact that tetravalent atoms (such as carbon and silicon) fit in the center of a regular tetrahedron and that regular tetrahedrons can be used to mesh the three-dimensional space. Thus, the problem of polymer equilibration described by continuous equations in molecular dynamics is reduced to a discrete problem where solutions are approximated by simple algorithms. Practical modeling applications include the constructing of butyl rubber and ethylene-propylene-dimer-monomer (EPDM) models for oxygen and water diffusion calculations. Butyl and EPDM are used in O-ring systems and serve as sealing joints in many manufactured objects. Diffusion coefficients of small gases have been measured experimentally on both polymeric systems, and in general the diffusion coefficients in EPDM are an order of magnitude larger than in butyl. In order to better understand the diffusion phenomena, 10, 000 atoms models were generated and equilibrated for butyl and EPDM. The models were submitted to a massively parallel molecular dynamics simulation to monitor the trajectories of the diffusing species.
NASA Astrophysics Data System (ADS)
Meléndez, A.; Korenaga, J.; Sallarès, V.; Miniussi, A.; Ranero, C. R.
2015-10-01
We present a new 3-D traveltime tomography code (TOMO3D) for the modelling of active-source seismic data that uses the arrival times of both refracted and reflected seismic phases to derive the velocity distribution and the geometry of reflecting boundaries in the subsurface. This code is based on its popular 2-D version TOMO2D from which it inherited the methods to solve the forward and inverse problems. The traveltime calculations are done using a hybrid ray-tracing technique combining the graph and bending methods. The LSQR algorithm is used to perform the iterative regularized inversion to improve the initial velocity and depth models. In order to cope with an increased computational demand due to the incorporation of the third dimension, the forward problem solver, which takes most of the run time (˜90 per cent in the test presented here), has been parallelized with a combination of multi-processing and message passing interface standards. This parallelization distributes the ray-tracing and traveltime calculations among available computational resources. The code's performance is illustrated with a realistic synthetic example, including a checkerboard anomaly and two reflectors, which simulates the geometry of a subduction zone. The code is designed to invert for a single reflector at a time. A data-driven layer-stripping strategy is proposed for cases involving multiple reflectors, and it is tested for the successive inversion of the two reflectors. Layers are bound by consecutive reflectors, and an initial velocity model for each inversion step incorporates the results from previous steps. This strategy poses simpler inversion problems at each step, allowing the recovery of strong velocity discontinuities that would otherwise be smoothened.
The role of MHD in 3D aspects of massive gas injection
NASA Astrophysics Data System (ADS)
Izzo, V. A.; Parks, P. B.; Eidietis, N. W.; Shiraki, D.; Hollmann, E. M.; Commaux, N.; Granetz, R. S.; Humphreys, D. A.; Lasnier, C. J.; Moyer, R. A.; Paz-Soldan, C.; Raman, R.; Strait, E. J.
2015-07-01
Simulations of massive gas injection for disruption mitigation in DIII-D are carried out to compare the toroidal peaking of radiated power for the cases of one and two gas jets. The radiation toroidal peaking factor (TPF) results from a combination of the distribution of impurities and the distribution of heat flux associated with the n=1 mode. When ignoring the effects of strong uni-directional neutral beam injection and rotation present in the experiment, the injected impurities are found to spread helically along field lines preferentially toward the high-field-side, which is explained in terms of a nozzle equation. Therefore when considering the plasma rest frame, reversing the current direction also reverses the toroidal direction of impurity spreading. During the pre-thermal quench phase of the disruption, the toroidal peaking of radiated power is reduced in a straightforward manner by increasing from one to two gas jets. However, during the thermal quench phase, reduction in the TPF is achieved only for a particular arrangement of the two gas valves with respect to the field line pitch. In particular, the relationship between the two valve locations and the 1/1 mode phase is critical, where gas valve spacing that is coherent with 1/1 symmetry effectively reduces TPF.
Three-dimensional electromagnetic modeling and inversion on massively parallel computers
Newman, G.A.; Alumbaugh, D.L.
1996-03-01
This report has demonstrated techniques that can be used to construct solutions to the 3-D electromagnetic inverse problem using full wave equation modeling. To this point great progress has been made in developing an inverse solution using the method of conjugate gradients which employs a 3-D finite difference solver to construct model sensitivities and predicted data. The forward modeling code has been developed to incorporate absorbing boundary conditions for high frequency solutions (radar), as well as complex electrical properties, including electrical conductivity, dielectric permittivity and magnetic permeability. In addition both forward and inverse codes have been ported to a massively parallel computer architecture which allows for more realistic solutions that can be achieved with serial machines. While the inversion code has been demonstrated on field data collected at the Richmond field site, techniques for appraising the quality of the reconstructions still need to be developed. Here it is suggested that rather than employing direct matrix inversion to construct the model covariance matrix which would be impossible because of the size of the problem, one can linearize about the 3-D model achieved in the inverse and use Monte-Carlo simulations to construct it. Using these appraisal and construction tools, it is now necessary to demonstrate 3-D inversion for a variety of EM data sets that span the frequency range from induction sounding to radar: below 100 kHz to 100 MHz. Appraised 3-D images of the earth`s electrical properties can provide researchers opportunities to infer the flow paths, flow rates and perhaps the chemistry of fluids in geologic mediums. It also offers a means to study the frequency dependence behavior of the properties in situ. This is of significant relevance to the Department of Energy, paramount to characterizing and monitoring of environmental waste sites and oil and gas exploration.
Digital relief 3D model of the Khibiny massive (Kola peninsula)
NASA Astrophysics Data System (ADS)
Chesalova, Elena; Asavin, Alex
2015-04-01
On the basis of maps of 1: 50,000 and 1: 200,000 3D model Khibiny massif developed. We used software ARC / INFO v10.2 ESRI. This project will be organised to build background for gas pollution monitoring network. We planned to use the model to estimate local heterogeneities in the composition of the atmosphere at the emanation of greenhouse gases in the area, the construction of models of vertical distribution of the content of trace gases in the rock mass. In addition to the project GIS digital elevation model contains layers of geological and tectonic map that allows us to estimate the area of the output of certain petrographic rock groups characterized by different ratios of emitted hydrocarbons (CH4/ H2). The model allows to construct a classification of fault in the array. At first glance, there are two groups of faults - the ancient associated with the formation of the intrusive phases sequence, and the young - due to recent tectonic shifts. Ancient faults form a common semicircular structure of the pluton cause overall asymmetry Khibin heights with the transition to the border area between the Khibiny and Lovoozero. Modern tectonics mainly represented by radial and chord faults which are formed narrow mountain valleys and troughs. It remains an open question as to which system fault (old or young) is more productive to gas emanations? On the one hand the system characterized by a large old depth, on the other hand a young more active faults. Address these issues require further detailed observations. The essential question is to assess the possibility of maintaining a constant concentration gradient of these impurities in the atmosphere due to gas emanations of fracture zones and areas enriched occluded gases. In the simulation of these processes can be used initially set parameters: 1 the flow rate of the gas impurities 2 the value of wind flows in closed and open valley 3 Assessment of thermal diffusion coefficients determined by the temperature gradient
Newman, G.A.; Commer, M.
2009-06-01
Three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical conductivity mapping of potential offshore oil and gas reservoirs. The imaging technology employs controlled source electromagnetic (CSEM) and magnetotelluric (MT) fields and treats geological media exhibiting transverse anisotropy. Moreover when combined with established seismic methods, direct imaging of reservoir fluids is possible. Because of the size of the 3D conductivity imaging problem, strategies are required exploiting computational parallelism and optimal meshing. The algorithm thus developed has been shown to scale to tens of thousands of processors. In one imaging experiment, 32,768 tasks/processors on the IBM Watson Research Blue Gene/L supercomputer were successfully utilized. Over a 24 hour period we were able to image a large scale field data set that previously required over four months of processing time on distributed clusters based on Intel or AMD processors utilizing 1024 tasks on an InfiniBand fabric. Electrical conductivity imaging using massively parallel computational resources produces results that cannot be obtained otherwise and are consistent with timeframes required for practical exploration problems.
NASA Astrophysics Data System (ADS)
Newman, Gregory A.; Commer, Michael
2009-07-01
Three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical conductivity mapping of potential offshore oil and gas reservoirs. The imaging technology employs controlled source electromagnetic (CSEM) and magnetotelluric (MT) fields and treats geological media exhibiting transverse anisotropy. Moreover when combined with established seismic methods, direct imaging of reservoir fluids is possible. Because of the size of the 3D conductivity imaging problem, strategies are required exploiting computational parallelism and optimal meshing. The algorithm thus developed has been shown to scale to tens of thousands of processors. In one imaging experiment, 32,768 tasks/processors on the IBM Watson Research Blue Gene/L supercomputer were successfully utilized. Over a 24 hour period we were able to image a large scale field data set that previously required over four months of processing time on distributed clusters based on Intel or AMD processors utilizing 1024 tasks on an InfiniBand fabric. Electrical conductivity imaging using massively parallel computational resources produces results that cannot be obtained otherwise and are consistent with timeframes required for practical exploration problems.
Particle simulation of plasmas on the massively parallel processor
NASA Technical Reports Server (NTRS)
Gledhill, I. M. A.; Storey, L. R. O.
1987-01-01
Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.
Massively parallel simulations of multiphase flows using Lattice Boltzmann methods
NASA Astrophysics Data System (ADS)
Ahrenholz, Benjamin
2010-03-01
In the last two decades the lattice Boltzmann method (LBM) has matured as an alternative and efficient numerical scheme for the simulation of fluid flows and transport problems. Unlike conventional numerical schemes based on discretizations of macroscopic continuum equations, the LBM is based on microscopic models and mesoscopic kinetic equations. The fundamental idea of the LBM is to construct simplified kinetic models that incorporate the essential physics of microscopic or mesoscopic processes so that the macroscopic averaged properties obey the desired macroscopic equations. Especially applications involving interfacial dynamics, complex and/or changing boundaries and complicated constitutive relationships which can be derived from a microscopic picture are suitable for the LBM. In this talk a modified and optimized version of a Gunstensen color model is presented to describe the dynamics of the fluid/fluid interface where the flow field is based on a multi-relaxation-time model. Based on that modeling approach validation studies of contact line motion are shown. Due to the fact that the LB method generally needs only nearest neighbor information, the algorithm is an ideal candidate for parallelization. Hence, it is possible to perform efficient simulations in complex geometries at a large scale by massively parallel computations. Here, the results of drainage and imbibition (Degree of Freedom > 2E11) in natural porous media gained from microtomography methods are presented. Those fully resolved pore scale simulations are essential for a better understanding of the physical processes in porous media and therefore important for the determination of constitutive relationships.
Efficiently modeling neural networks on massively parallel computers
NASA Technical Reports Server (NTRS)
Farber, Robert M.
1993-01-01
Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.
Reactor Dosimetry Applications Using RAPTOR-M3G:. a New Parallel 3-D Radiation Transport Code
NASA Astrophysics Data System (ADS)
Longoni, Gianluca; Anderson, Stanwood L.
2009-08-01
The numerical solution of the Linearized Boltzmann Equation (LBE) via the Discrete Ordinates method (SN) requires extensive computational resources for large 3-D neutron and gamma transport applications due to the concurrent discretization of the angular, spatial, and energy domains. This paper will discuss the development RAPTOR-M3G (RApid Parallel Transport Of Radiation - Multiple 3D Geometries), a new 3-D parallel radiation transport code, and its application to the calculation of ex-vessel neutron dosimetry responses in the cavity of a commercial 2-loop Pressurized Water Reactor (PWR). RAPTOR-M3G is based domain decomposition algorithms, where the spatial and angular domains are allocated and processed on multi-processor computer architectures. As compared to traditional single-processor applications, this approach reduces the computational load as well as the memory requirement per processor, yielding an efficient solution methodology for large 3-D problems. Measured neutron dosimetry responses in the reactor cavity air gap will be compared to the RAPTOR-M3G predictions. This paper is organized as follows: Section 1 discusses the RAPTOR-M3G methodology; Section 2 describes the 2-loop PWR model and the numerical results obtained. Section 3 addresses the parallel performance of the code, and Section 4 concludes this paper with final remarks and future work.
Lehoucq, Richard B.; Salinger, Andrew G.
1999-08-01
We present an approach for determining the linear stability of steady states of PDEs on massively parallel computers. Linearizing the transient behavior around a steady state leads to a generalized eigenvalue problem. The eigenvalues with largest real part are calculated using Arnoldi's iteration driven by a novel implementation of the Cayley transformation to recast the problem as an ordinary eigenvalue problem. The Cayley transformation requires the solution of a linear system at each Arnoldi iteration, which must be done iteratively for the algorithm to scale with problem size. A representative model problem of 3D incompressible flow and heat transfer in a rotating disk reactor is used to analyze the effect of algorithmic parameters on the performance of the eigenvalue algorithm. Successful calculations of leading eigenvalues for matrix systems of order up to 4 million were performed, identifying the critical Grashof number for a Hopf bifurcation.
Massively Parallel Interrogation of Aptamer Sequence, Structure and Function
Fischer, N O; Tok, J B; Tarasow, T M
2008-02-08
Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. Methodology/Principal Findings. High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and interchip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.
NASA Astrophysics Data System (ADS)
Huang, Qinghua; Li, Zhanhui; Wang, Yanbin
2010-12-01
We presented a parallel 3-D staggered grid pseudospectral time domain (PSTD) method for simulating ground-penetrating radar (GPR) wave propagation. We took the staggered grid method to weaken the global effect in PSTD and developed a modified fast Fourier transform (FFT) spatial derivative operator to eliminate the wraparound effect due to the implicit periodical boundary condition in FFT operator. After the above improvements, we achieved the parallel PSTD computation based on an overlap domain decomposition method without any absorbing condition for each subdomain, which can significantly reduce the required grids in each overlap subdomain comparing with other proposed algorithms. We test our parallel technique for some numerical models and obtained consistent results with the analytical ones and/or those of the nonparallel PSTD method. The above numerical tests showed that our parallel PSTD algorithm is effective in simulating 3-D GPR wave propagation, with merits of saving computation time, as well as more flexibility in dealing with complicated models without losing the accuracy. The application of our parallel PSTD method in applied geophysics and paleoseismology based on GPR data confirmed the efficiency of our algorithm and its potential applications in various subdisciplines of solid earth geophysics. This study would also provide a useful parallel PSTD approach to the simulation of other geophysical problems on distributed memory PC cluster.
Duda, Sven; Dreyer, Lutz; Behrens, Peter; Wienecke, Soenke; Chakradeo, Tanmay; Glasmacher, Birgit; Haastert-Talini, Kirsten
2014-01-01
We report on the performance of composite nerve grafts with an inner 3D multichannel porous chitosan core and an outer electrospun polycaprolactone shell. The inner chitosan core provided multiple guidance channels for regrowing axons. To analyze the in vivo properties of the bare chitosan cores, we separately implanted them into an epineural sheath. The effects of both graft types on structural and functional regeneration across a 10 mm rat sciatic nerve gap were compared to autologous nerve transplantation (ANT). The mechanical biomaterial properties and the immunological impact of the grafts were assessed with histological techniques before and after transplantation in vivo. Furthermore during a 13-week examination period functional tests and electrophysiological recordings were performed and supplemented by nerve morphometry. The sheathing of the chitosan core with a polycaprolactone shell induced massive foreign body reaction and impairment of nerve regeneration. Although the isolated novel chitosan core did allow regeneration of axons in a similar size distribution as the ANT, the ANT was superior in terms of functional regeneration. We conclude that an outer polycaprolactone shell should not be used for the purpose of bioartificial nerve grafting, while 3D multichannel porous chitosan cores could be candidate scaffolds for structured nerve grafts. PMID:24818158
Behrens, Peter; Wienecke, Soenke; Chakradeo, Tanmay; Glasmacher, Birgit
2014-01-01
We report on the performance of composite nerve grafts with an inner 3D multichannel porous chitosan core and an outer electrospun polycaprolactone shell. The inner chitosan core provided multiple guidance channels for regrowing axons. To analyze the in vivo properties of the bare chitosan cores, we separately implanted them into an epineural sheath. The effects of both graft types on structural and functional regeneration across a 10 mm rat sciatic nerve gap were compared to autologous nerve transplantation (ANT). The mechanical biomaterial properties and the immunological impact of the grafts were assessed with histological techniques before and after transplantation in vivo. Furthermore during a 13-week examination period functional tests and electrophysiological recordings were performed and supplemented by nerve morphometry. The sheathing of the chitosan core with a polycaprolactone shell induced massive foreign body reaction and impairment of nerve regeneration. Although the isolated novel chitosan core did allow regeneration of axons in a similar size distribution as the ANT, the ANT was superior in terms of functional regeneration. We conclude that an outer polycaprolactone shell should not be used for the purpose of bioartificial nerve grafting, while 3D multichannel porous chitosan cores could be candidate scaffolds for structured nerve grafts. PMID:24818158
Seismic waves modeling with the Fourier pseudo-spectral method on massively parallel machines.
NASA Astrophysics Data System (ADS)
Klin, Peter
2015-04-01
The Fourier pseudo-spectral method (FPSM) is an approach for the 3D numerical modeling of the wave propagation, which is based on the discretization of the spatial domain in a structured grid and relies on global spatial differential operators for the solution of the wave equation. This last peculiarity is advantageous from the accuracy point of view but poses difficulties for an efficient implementation of the method to be run on parallel computers with distributed memory architecture. The 1D spatial domain decomposition approach has been so far commonly adopted in the parallel implementations of the FPSM, but it implies an intensive data exchange among all the processors involved in the computation, which can degrade the performance because of communication latencies. Moreover, the scalability of the 1D domain decomposition is limited, since the number of processors can not exceed the number of grid points along the directions in which the domain is partitioned. This limitation inhibits an efficient exploitation of the computational environments with a very large number of processors. In order to overcome the limitations of the 1D domain decomposition we implemented a parallel version of the FPSM based on a 2D domain decomposition, which allows to achieve a higher degree of parallelism and scalability on massively parallel machines with several thousands of processing elements. The parallel programming is essentially achieved using the MPI protocol but OpenMP parts are also included in order to exploit the single processor multi - threading capabilities, when available. The developed tool is aimed at the numerical simulation of the seismic waves propagation and in particular is intended for earthquake ground motion research. We show the scalability tests performed up to 16k processing elements on the IBM Blue Gene/Q computer at CINECA (Italy), as well as the application to the simulation of the earthquake ground motion in the alluvial plain of the Po river (Italy).
Characterization of a parallel-beam CCD optical-CT apparatus for 3D radiation dosimetry.
Krstajić, Nikola; Doran, Simon J
2007-07-01
3D measurement of optical attenuation is of interest in a variety of fields of biomedical importance, including spectrophotometry, optical projection tomography (OPT) and analysis of 3D radiation dosimeters. Accurate, precise and economical 3D measurements of optical density (OD) are a crucial step in enabling 3D radiation dosimeters to enter wider use in clinics. Polymer gels and Fricke gels, as well as dosimeters not based around gels, have been characterized for 3D dosimetry over the last two decades. A separate problem is the verification of the best readout method. A number of different imaging modalities (magnetic resonance imaging (MRI), optical CT, x-ray CT and ultrasound) have been suggested for the readout of information from 3D dosimeters. To date only MRI and laser-based optical CT have been characterized in detail. This paper describes some initial steps we have taken in establishing charge coupled device (CCD)-based optical CT as a viable alternative to MRI for readout of 3D radiation dosimeters. The main advantage of CCD-based optical CT over traditional laser-based optical CT is a speed increase of at least an order of magnitude, while the simplicity of its architecture would lend itself to cheaper implementation than both MRI and laser-based optical CT if the camera itself were inexpensive enough. Specifically, we study the following aspects of optical metrology, using high quality test targets: (i) calibration and quality of absorbance measurements and the camera requirements for 3D dosimetry; (ii) the modulation transfer function (MTF) of individual projections; (iii) signal-to-noise ratio (SNR) in the projection and reconstruction domains; (iv) distortion in the projection domain, depth-of-field (DOF) and telecentricity. The principal results for our current apparatus are as follows: (i) SNR of optical absorbance in projections is better than 120:1 for uniform phantoms in absorbance range 0.3 to 1.6 (and better than 200:1 for absorbances 1.0 to
Characterization of a parallel-beam CCD optical-CT apparatus for 3D radiation dosimetry
NASA Astrophysics Data System (ADS)
Krstajic, Nikola; Doran, Simon J.
2007-07-01
3D measurement of optical attenuation is of interest in a variety of fields of biomedical importance, including spectrophotometry, optical projection tomography (OPT) and analysis of 3D radiation dosimeters. Accurate, precise and economical 3D measurements of optical density (OD) are a crucial step in enabling 3D radiation dosimeters to enter wider use in clinics. Polymer gels and Fricke gels, as well as dosimeters not based around gels, have been characterized for 3D dosimetry over the last two decades. A separate problem is the verification of the best readout method. A number of different imaging modalities (magnetic resonance imaging (MRI), optical CT, x-ray CT and ultrasound) have been suggested for the readout of information from 3D dosimeters. To date only MRI and laser-based optical CT have been characterized in detail. This paper describes some initial steps we have taken in establishing charge coupled device (CCD)-based optical CT as a viable alternative to MRI for readout of 3D radiation dosimeters. The main advantage of CCD-based optical CT over traditional laser-based optical CT is a speed increase of at least an order of magnitude, while the simplicity of its architecture would lend itself to cheaper implementation than both MRI and laser-based optical CT if the camera itself were inexpensive enough. Specifically, we study the following aspects of optical metrology, using high quality test targets: (i) calibration and quality of absorbance measurements and the camera requirements for 3D dosimetry; (ii) the modulation transfer function (MTF) of individual projections; (iii) signal-to-noise ratio (SNR) in the projection and reconstruction domains; (iv) distortion in the projection domain, depth-of-field (DOF) and telecentricity. The principal results for our current apparatus are as follows: (i) SNR of optical absorbance in projections is better than 120:1 for uniform phantoms in absorbance range 0.3 to 1.6 (and better than 200:1 for absorbances 1.0 to
Analysis of composite ablators using massively parallel computation
NASA Technical Reports Server (NTRS)
Shia, David
1995-01-01
In this work, the feasibility of using massively parallel computation to study the response of ablative materials is investigated. Explicit and implicit finite difference methods are used on a massively parallel computer, the Thinking Machines CM-5. The governing equations are a set of nonlinear partial differential equations. The governing equations are developed for three sample problems: (1) transpiration cooling, (2) ablative composite plate, and (3) restrained thermal growth testing. The transpiration cooling problem is solved using a solution scheme based solely on the explicit finite difference method. The results are compared with available analytical steady-state through-thickness temperature and pressure distributions and good agreement between the numerical and analytical solutions is found. It is also found that a solution scheme based on the explicit finite difference method has the following advantages: incorporates complex physics easily, results in a simple algorithm, and is easily parallelizable. However, a solution scheme of this kind needs very small time steps to maintain stability. A solution scheme based on the implicit finite difference method has the advantage that it does not require very small times steps to maintain stability. However, this kind of solution scheme has the disadvantages that complex physics cannot be easily incorporated into the algorithm and that the solution scheme is difficult to parallelize. A hybrid solution scheme is then developed to combine the strengths of the explicit and implicit finite difference methods and minimize their weaknesses. This is achieved by identifying the critical time scale associated with the governing equations and applying the appropriate finite difference method according to this critical time scale. The hybrid solution scheme is then applied to the ablative composite plate and restrained thermal growth problems. The gas storage term is included in the explicit pressure calculation of both
gEMfitter: a highly parallel FFT-based 3D density fitting tool with GPU texture memory acceleration.
Hoang, Thai V; Cavin, Xavier; Ritchie, David W
2013-11-01
Fitting high resolution protein structures into low resolution cryo-electron microscopy (cryo-EM) density maps is an important technique for modeling the atomic structures of very large macromolecular assemblies. This article presents "gEMfitter", a highly parallel fast Fourier transform (FFT) EM density fitting program which can exploit the special hardware properties of modern graphics processor units (GPUs) to accelerate both the translational and rotational parts of the correlation search. In particular, by using the GPU's special texture memory hardware to rotate 3D voxel grids, the cost of rotating large 3D density maps is almost completely eliminated. Compared to performing 3D correlations on one core of a contemporary central processor unit (CPU), running gEMfitter on a modern GPU gives up to 26-fold speed-up. Furthermore, using our parallel processing framework, this speed-up increases linearly with the number of CPUs or GPUs used. Thus, it is now possible to use routinely more robust but more expensive 3D correlation techniques. When tested on low resolution experimental cryo-EM data for the GroEL-GroES complex, we demonstrate the satisfactory fitting results that may be achieved by using a locally normalised cross-correlation with a Laplacian pre-filter, while still being up to three orders of magnitude faster than the well-known COLORES program. PMID:24060989
NASA Technical Reports Server (NTRS)
Morgan, Philip E.
2004-01-01
This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.
Massively Parallel Processing for Fast and Accurate Stamping Simulations
NASA Astrophysics Data System (ADS)
Gress, Jeffrey J.; Xu, Siguang; Joshi, Ramesh; Wang, Chuan-tao; Paul, Sabu
2005-08-01
The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.
Cloud identification using genetic algorithms and massively parallel computation
NASA Technical Reports Server (NTRS)
Buckles, Bill P.; Petry, Frederick E.
1996-01-01
As a Guest Computational Investigator under the NASA administered component of the High Performance Computing and Communication Program, we implemented a massively parallel genetic algorithm on the MasPar SIMD computer. Experiments were conducted using Earth Science data in the domains of meteorology and oceanography. Results obtained in these domains are competitive with, and in most cases better than, similar problems solved using other methods. In the meteorological domain, we chose to identify clouds using AVHRR spectral data. Four cloud speciations were used although most researchers settle for three. Results were remarkedly consistent across all tests (91% accuracy). Refinements of this method may lead to more timely and complete information for Global Circulation Models (GCMS) that are prevalent in weather forecasting and global environment studies. In the oceanographic domain, we chose to identify ocean currents from a spectrometer having similar characteristics to AVHRR. Here the results were mixed (60% to 80% accuracy). Given that one is willing to run the experiment several times (say 10), then it is acceptable to claim the higher accuracy rating. This problem has never been successfully automated. Therefore, these results are encouraging even though less impressive than the cloud experiment. Successful conclusion of an automated ocean current detection system would impact coastal fishing, naval tactics, and the study of micro-climates. Finally we contributed to the basic knowledge of GA (genetic algorithm) behavior in parallel environments. We developed better knowledge of the use of subpopulations in the context of shared breeding pools and the migration of individuals. Rigorous experiments were conducted based on quantifiable performance criteria. While much of the work confirmed current wisdom, for the first time we were able to submit conclusive evidence. The software developed under this grant was placed in the public domain. An extensive user
Comparing current cluster, massively parallel, and accelerated systems
Barker, Kevin J; Davis, Kei; Hoisie, Adolfy; Kerbyson, Darren J; Pakin, Scott; Lang, Mike; Sancho Pitarch, Jose C
2010-01-01
Currently there is large architectural diversity in high perfonnance computing systems. They include 'commodity' cluster systems that optimize per-node performance for small jobs, massively parallel processors (MPPs) that optimize aggregate perfonnance for large jobs, and accelerated systems that optimize both per-node and aggregate performance but only for applications custom-designed to take advantage of such systems. Because of these dissimilarities, meaningful comparisons of achievable performance are not straightforward. In this work we utilize a methodology that combines both empirical analysis and performance modeling to compare clusters (represented by a 4,352-core IB cluster), MPPs (represented by a 147,456-core BG/P), and accelerated systems (represented by the 129,600-core Roadrunner) across a workload of four applications. Strengths of our approach include the ability to compare architectures - as opposed to specific implementations of an architecture - attribute each application's performance bottlenecks to characteristics unique to each system, and to explore performance scenarios in advance of their availability for measurement. Our analysis illustrates that application performance is essentially unrelated to relative peak performance but that application performance can be both predicted and explained using modeling.
Investigation of reflective notching with massively parallel simulation
NASA Astrophysics Data System (ADS)
Tadros, Karim H.; Neureuther, Andrew R.; Gamelin, John K.; Guerrieri, Roberto
1990-06-01
A massively parallel simulation program TEMPEST is used to investigate the role of topography in generating reflective notching and to study the possibility of reducing effects through the introduction of special properties of resists and antireflection coating materials. The emphasis is on examining physical scattering mechanisms such as focused specular reflections resist thickness interference effects reflections from substrate grains and focusing of incident light by the resist curvature. Specular reflection from topography can focus incident radiation causing a 10-fold increase in effective exposure. Further complications such as dimples in the surface of positive resist features can result from a second reflection of focused energy by the resist/air interface. Variations in line-edge exposure due to substrate grain structure are primarily specular in nature and can become significant for grains larger than )tresi Local exposure variations due to vertical standing waves and changes in energy coupling due to changes in resist thickness are displaced laterally and are significant effects even though they are slightly less severe than vertical wave propagation theory suggests. Focusing effects due to refraction by the curved surface of the resist produce only minor changes in exposure. Increased resist contrast and resist absorption offer some improvement in reducing notching effects though minimizing substrate reflectivity is more effective. CPU time using 32 virtual nodes to simulate a 4 pm by 2 pm isolated domain with 13 bleaching steps was 30 minutes
A high-plex PCR approach for massively parallel sequencing.
Nguyen-Dumont, Tú; Pope, Bernard J; Hammet, Fleur; Southey, Melissa C; Park, Daniel J
2013-08-01
Current methods for targeted massively parallel sequencing (MPS) have several drawbacks, including limited design flexibility, expense, and protocol complexity, which restrict their application to settings involving modest target size and requiring low cost and high throughput. To address this, we have developed Hi-Plex, a PCR-MPS strategy intended for high-throughput screening of multiple genomic target regions that integrates simple, automated primer design software to control product size. Featuring permissive thermocycling conditions and clamp bias reduction, our protocol is simple, cost- and time-effective, uses readily available reagents, does not require expensive instrumentation, and requires minimal optimization. In a 60-plex assay targeting the breast cancer predisposition genes PALB2 and XRCC2, we applied Hi-Plex to 100 ng LCL-derived DNA, and 100 ng and 25 ng FFPE tumor-derived DNA. Altogether, at least 86.94% of the human genome-mapped reads were on target, and 100% of targeted amplicons were represented within 25-fold of the mean. Using 25 ng FFPE-derived DNA, 95.14% of mapped reads were on-target and relative representation ranged from 10.1-fold lower to 5.8-fold higher than the mean. These results were obtained using only the initial automatically-designed primers present in equal concentration. Hi-Plex represents a powerful new approach for screening panels of genomic target regions. PMID:23931594
Wavelet-Based DFT calculations on Massively Parallel Hybrid Architectures
NASA Astrophysics Data System (ADS)
Genovese, Luigi
2011-03-01
In this contribution, we present an implementation of a full DFT code that can run on massively parallel hybrid CPU-GPU clusters. Our implementation is based on modern GPU architectures which support double-precision floating-point numbers. This DFT code, named BigDFT, is delivered within the GNU-GPL license either in a stand-alone version or integrated in the ABINIT software package. Hybrid BigDFT routines were initially ported with NVidia's CUDA language, and recently more functionalities have been added with new routines writeen within Kronos' OpenCL standard. The formalism of this code is based on Daubechies wavelets, which is a systematic real-space based basis set. As we will see in the presentation, the properties of this basis set are well suited for an extension on a GPU-accelerated environment. In addition to focusing on the implementation of the operators of the BigDFT code, this presentation also relies of the usage of the GPU resources in a complex code with different kinds of operations. A discussion on the interest of present and expected performances of Hybrid architectures computation in the framework of electronic structure calculations is also adressed.
Massively parallel processor networks with optical express channels
Deri, R.J.; Brooks, E.D. III; Haigh, R.E.; DeGroot, A.J.
1999-08-24
An optical method for separating and routing local and express channel data comprises interconnecting the nodes in a network with fiber optic cables. A single fiber optic cable carries both express channel traffic and local channel traffic, e.g., in a massively parallel processor (MPP) network. Express channel traffic is placed on, or filtered from, the fiber optic cable at a light frequency or a color different from that of the local channel traffic. The express channel traffic is thus placed on a light carrier that skips over the local intermediate nodes one-by-one by reflecting off of selective mirrors placed at each local node. The local-channel-traffic light carriers pass through the selective mirrors and are not reflected. A single fiber optic cable can thus be threaded throughout a three-dimensional matrix of nodes with the x,y,z directions of propagation encoded by the color of the respective light carriers for both local and express channel traffic. Thus frequency division multiple access is used to hierarchically separate the local and express channels to eliminate the bucket brigade latencies that would otherwise result if the express traffic had to hop between every local node to reach its ultimate destination. 3 figs.
Massively parallel processor networks with optical express channels
Deri, Robert J.; Brooks, III, Eugene D.; Haigh, Ronald E.; DeGroot, Anthony J.
1999-01-01
An optical method for separating and routing local and express channel data comprises interconnecting the nodes in a network with fiber optic cables. A single fiber optic cable carries both express channel traffic and local channel traffic, e.g., in a massively parallel processor (MPP) network. Express channel traffic is placed on, or filtered from, the fiber optic cable at a light frequency or a color different from that of the local channel traffic. The express channel traffic is thus placed on a light carrier that skips over the local intermediate nodes one-by-one by reflecting off of selective mirrors placed at each local node. The local-channel-traffic light carriers pass through the selective mirrors and are not reflected. A single fiber optic cable can thus be threaded throughout a three-dimensional matrix of nodes with the x,y,z directions of propagation encoded by the color of the respective light carriers for both local and express channel traffic. Thus frequency division multiple access is used to hierarchically separate the local and express channels to eliminate the bucket brigade latencies that would otherwise result if the express traffic had to hop between every local node to reach its ultimate destination.
Massively parallel support for a case-based planning system
NASA Technical Reports Server (NTRS)
Kettler, Brian P.; Hendler, James A.; Anderson, William A.
1993-01-01
Case-based planning (CBP), a kind of case-based reasoning, is a technique in which previously generated plans (cases) are stored in memory and can be reused to solve similar planning problems in the future. CBP can save considerable time over generative planning, in which a new plan is produced from scratch. CBP thus offers a potential (heuristic) mechanism for handling intractable problems. One drawback of CBP systems has been the need for a highly structured memory to reduce retrieval times. This approach requires significant domain engineering and complex memory indexing schemes to make these planners efficient. In contrast, our CBP system, CaPER, uses a massively parallel frame-based AI language (PARKA) and can do extremely fast retrieval of complex cases from a large, unindexed memory. The ability to do fast, frequent retrievals has many advantages: indexing is unnecessary; very large case bases can be used; memory can be probed in numerous alternate ways; and queries can be made at several levels, allowing more specific retrieval of stored plans that better fit the target problem with less adaptation. In this paper we describe CaPER's case retrieval techniques and some experimental results showing its good performance, even on large case bases.
Parallel Adaptive Computation of Blood Flow in a 3D ``Whole'' Body Model
NASA Astrophysics Data System (ADS)
Zhou, M.; Figueroa, C. A.; Taylor, C. A.; Sahni, O.; Jansen, K. E.
2008-11-01
Accurate numerical simulations of vascular trauma require the consideration of a larger portion of the vasculature than previously considered, due to the systemic nature of the human body's response. A patient-specific 3D model composed of 78 connected arterial branches extending from the neck to the lower legs is constructed to effectively represent the entire body. Recently developed outflow boundary conditions that appropriately represent the downstream vasculature bed which is not included in the 3D computational domain are applied at 78 outlets. In this work, the pulsatile blood flow simulations are started on a fairly uniform, unstructured mesh that is subsequently adapted using a solution-based approach to efficiently resolve the flow features. The adapted mesh contains non-uniform, anisotropic elements resulting in resolution that conforms with the physical length scales present in the problem. The effects of the mesh resolution on the flow field are studied, specifically on relevant quantities of pressure, velocity and wall shear stress.
Parallel load balancing strategy for Volume-of-Fluid methods on 3-D unstructured meshes
NASA Astrophysics Data System (ADS)
Jofre, Lluís; Borrell, Ricard; Lehmkuhl, Oriol; Oliva, Assensi
2015-02-01
Volume-of-Fluid (VOF) is one of the methods of choice to reproduce the interface motion in the simulation of multi-fluid flows. One of its main strengths is its accuracy in capturing sharp interface geometries, although requiring for it a number of geometric calculations. Under these circumstances, achieving parallel performance on current supercomputers is a must. The main obstacle for the parallelization is that the computing costs are concentrated only in the discrete elements that lie on the interface between fluids. Consequently, if the interface is not homogeneously distributed throughout the domain, standard domain decomposition (DD) strategies lead to imbalanced workload distributions. In this paper, we present a new parallelization strategy for general unstructured VOF solvers, based on a dynamic load balancing process complementary to the underlying DD. Its parallel efficiency has been analyzed and compared to the DD one using up to 1024 CPU-cores on an Intel SandyBridge based supercomputer. The results obtained on the solution of several artificially generated test cases show a speedup of up to ∼12× with respect to the standard DD, depending on the interface size, the initial distribution and the number of parallel processes engaged. Moreover, the new parallelization strategy presented is of general purpose, therefore, it could be used to parallelize any VOF solver without requiring changes on the coupled flow solver. Finally, note that although designed for the VOF method, our approach could be easily adapted to other interface-capturing methods, such as the Level-Set, which may present similar workload imbalances.
Implementation of a 3D mixing layer code on parallel computers
NASA Technical Reports Server (NTRS)
Roe, K.; Thakur, R.; Dang, T.; Bogucz, E.
1995-01-01
This paper summarizes our progress and experience in the development of a Computational-Fluid-Dynamics code on parallel computers to simulate three-dimensional spatially-developing mixing layers. In this initial study, the three-dimensional time-dependent Euler equations are solved using a finite-volume explicit time-marching algorithm. The code was first programmed in Fortran 77 for sequential computers. The code was then converted for use on parallel computers using the conventional message-passing technique, while we have not been able to compile the code with the present version of HPF compilers.
Focusing optics of a parallel beam CCD optical tomography apparatus for 3D radiation gel dosimetry.
Krstajić, Nikola; Doran, Simon J
2006-04-21
Optical tomography of gel dosimeters is a promising and cost-effective avenue for quality control of radiotherapy treatments such as intensity-modulated radiotherapy (IMRT). Systems based on a laser coupled to a photodiode have so far shown the best results within the context of optical scanning of radiosensitive gels, but are very slow ( approximately 9 min per slice) and poorly suited to measurements that require many slices. Here, we describe a fast, three-dimensional (3D) optical computed tomography (optical-CT) apparatus, based on a broad, collimated beam, obtained from a high power LED and detected by a charged coupled detector (CCD). The main advantages of such a system are (i) an acquisition speed approximately two orders of magnitude higher than a laser-based system when 3D data are required, and (ii) a greater simplicity of design. This paper advances our previous work by introducing a new design of focusing optics, which take information from a suitably positioned focal plane and project an image onto the CCD. An analysis of the ray optics is presented, which explains the roles of telecentricity, focusing, acceptance angle and depth-of-field (DOF) in the formation of projections. A discussion of the approximation involved in measuring the line integrals required for filtered backprojection reconstruction is given. Experimental results demonstrate (i) the effect on projections of changing the position of the focal plane of the apparatus, (ii) how to measure the acceptance angle of the optics, and (iii) the ability of the new scanner to image both absorbing and scattering gel phantoms. The quality of reconstructed images is very promising and suggests that the new apparatus may be useful in a clinical setting for fast and accurate 3D dosimetry. PMID:16585845
Parallel 3-D particle-in-cell modelling of charged ultrarelativistic beam dynamics
NASA Astrophysics Data System (ADS)
Boronina, Marina A.; Vshivkov, Vitaly A.
2015-12-01
> ) in supercolliders. We use the 3-D set of Maxwell's equations for the electromagnetic fields, and the Vlasov equation for the distribution function of the beam particles. The model incorporates automatically the longitudinal effects, which can play a significant role in the cases of super-high densities. We present numerical results for the dynamics of two focused ultrarelativistic beams with a size ratio 10:1:100. The results demonstrate high efficiency of the proposed computational methods and algorithms, which are applicable to a variety of problems in relativistic plasma physics.
Characterization of a parallel beam CCD optical-CT apparatus for 3D radiation dosimetry
NASA Astrophysics Data System (ADS)
Krstajić, Nikola; Doran, Simon J.
2006-12-01
This paper describes the initial steps we have taken in establishing CCD based optical-CT as a viable alternative for 3-D radiation dosimetry. First, we compare the optical density (OD) measurements from a high quality test target and variable neutral density filter (VNDF). A modulation transfer function (MTF) of individual projections is derived for three positions of the sinusoidal test target within the scanning tank. Our CCD is then characterized in terms of its signal-to-noise ratio (SNR). Finally, a sample reconstruction of a scan of a PRESAGETM (registered trademark of Heuris Pharma, NJ, Skillman, USA.) dosimeter is given, demonstrating the capabilities of the apparatus.
Simulations of implosions with a 3D, parallel, unstructured-grid, radiation-hydrodynamics code
Kaiser, T B; Milovich, J L; Prasad, M K; Rathkopf, J; Shestakov, A I
1998-12-28
An unstructured-grid, radiation-hydrodynamics code is used to simulate implosions. Although most of the problems are spherically symmetric, they are run on 3D, unstructured grids in order to test the code's ability to maintain spherical symmetry of the converging waves. Three problems, of increasing complexity, are presented. In the first, a cold, spherical, ideal gas bubble is imploded by an enclosing high pressure source. For the second, we add non-linear heat conduction and drive the implosion with twelve laser beams centered on the vertices of an icosahedron. In the third problem, a NIF capsule is driven with a Planckian radiation source.
3D multi-scale analysis of coupled heat and moisture transport and its parallel implementation
NASA Astrophysics Data System (ADS)
Kruis, Jaroslav
2016-06-01
Parallel implementation of two-scale model of coupled heat and moisture transport is described. The coupled heat and moisture transport is based on the Künzel model. Motivation for the two-scale analysis comes from the requirement to describe distribution of the relative humidity and temperature in historical masonry structures.
PFLOTRAN: Recent Developments Facilitating Massively-Parallel Reactive Biogeochemical Transport
NASA Astrophysics Data System (ADS)
Hammond, G. E.
2015-12-01
With the recent shift towards modeling carbon and nitrogen cycling in support of climate-related initiatives, emphasis has been placed on incorporating increasingly mechanistic biogeochemistry within Earth system models to more accurately predict the response of terrestrial processes to natural and anthropogenic climate cycles. PFLOTRAN is an open-source subsurface code that is specialized for simulating multiphase flow and multicomponent biogeochemical transport on supercomputers. The object-oriented code was designed with modularity in mind and has been coupled with several third-party simulators (e.g. CLM to simulate land surface processes and E4D for coupled hydrogeophysical inversion). Central to PFLOTRAN's capabilities is its ability to simulate tightly-coupled reactive transport processes. This presentation focuses on recent enhancements to the code that enable the solution of large parameterized biogeochemical reaction networks with numerous chemical species. PFLOTRAN's "reaction sandbox" is described, which facilitates the implementation of user-defined reaction networks without the need for a comprehensive understanding of PFLOTRAN software infrastructure. The reaction sandbox is written in modern Fortran (2003-2008) and leverages encapsulation, inheritance, and polymorphism to provide the researcher with a flexible workspace for prototyping reactions within a massively parallel flow and transport simulation framework. As these prototypical reactions mature into well-accepted implementations, they can be incorporated into PFLOTRAN as native biogeochemistry capability. Users of the reaction sandbox are encouraged to upload their source code to PFLOTRAN's main source code repository, including the addition of simple regression tests to better ensure the long-term code compatibility and validity of simulation results.
A new method to combine 3D reconstruction volumes for multiple parallel circular cone beam orbits
Baek, Jongduk; Pelc, Norbert J.
2010-01-01
Purpose: This article presents a new reconstruction method for 3D imaging using a multiple 360° circular orbit cone beam CT system, specifically a way to combine 3D volumes reconstructed with each orbit. The main goal is to improve the noise performance in the combined image while avoiding cone beam artifacts. Methods: The cone beam projection data of each orbit are reconstructed using the FDK algorithm. When at least a portion of the total volume can be reconstructed by more than one source, the proposed combination method combines these overlap regions using weighted averaging in frequency space. The local exactness and the noise performance of the combination method were tested with computer simulations of a Defrise phantom, a FORBILD head phantom, and uniform noise in the raw data. Results: A noiseless simulation showed that the local exactness of the reconstructed volume from the source with the smallest tilt angle was preserved in the combined image. A noise simulation demonstrated that the combination method improved the noise performance compared to a single orbit reconstruction. Conclusions: In CT systems which have overlap volumes that can be reconstructed with data from more than one orbit and in which the spatial frequency content of each reconstruction can be calculated, the proposed method offers improved noise performance while keeping the local exactness of data from the source with the smallest tilt angle. PMID:21089770
Simulation of the 3D viscoelastic free surface flow by a parallel corrected particle scheme
NASA Astrophysics Data System (ADS)
Jin-Lian, Ren; Tao, Jiang
2016-02-01
In this work, the behavior of the three-dimensional (3D) jet coiling based on the viscoelastic Oldroyd-B model is investigated by a corrected particle scheme, which is named the smoothed particle hydrodynamics with corrected symmetric kernel gradient and shifting particle technique (SPH_CS_SP) method. The accuracy and stability of SPH_CS_SP method is first tested by solving Poiseuille flow and Taylor-Green flow. Then the capacity for the SPH_CS_SP method to solve the viscoelastic fluid is verified by the polymer flow through a periodic array of cylinders. Moreover, the convergence of the SPH_CS_SP method is also investigated. Finally, the proposed method is further applied to the 3D viscoelastic jet coiling problem, and the influences of macroscopic parameters on the jet coiling are discussed. The numerical results show that the SPH_CS_SP method has higher accuracy and better stability than the traditional SPH method and other corrected SPH method, and can improve the tensile instability. Project supported by the Natural Science Foundation of Jiangsu Province, China (Grant Nos. BK20130436 and BK20150436) and the Natural Science Foundation of the Higher Education Institutions of Jiangsu Province, China (Grant No. 15KJB110025).
User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code
Earth Sciences Division; Zhang, Keni; Zhang, Keni; Wu, Yu-Shu; Pruess, Karsten
2008-05-27
TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator is to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used
Lim, Jong-Min; Bertrand, Nicolas; Valencia, Pedro M.; Rhee, Minsoung; Langer, Robert; Jon, Sangyong; Farokhzad, Omid C.; Karnik, Rohit
2014-01-01
Microfluidic synthesis of nanoparticles (NPs) can enhance the controllability and reproducibility in physicochemical properties of NPs compared to bulk synthesis methods. However, applications of microfluidic synthesis are typically limited to in vitro studies due to low production rates. Herein, we report the parallelization of NP synthesis by 3D hydrodynamic flow focusing (HFF) using a multilayer microfluidic system to enhance the production rate without losing the advantages of reproducibility, controllability, and robustness. Using parallel 3D HFF, polymeric poly(lactide-co-glycolide)-b-polyethyleneglycol (PLGA-PEG) NPs with sizes tunable in the range of 13–150 nm could be synthesized reproducibly with high production rate. As a proof of concept, we used this system to perform in vivo pharmacokinetic and biodistribution study of small (20 nm diameter) PLGA-PEG NPs that are otherwise difficult to synthesize. Microfluidic parallelization thus enables synthesis of NPs with tunable properties with production rates suitable for both in vitro and in vivo studies. PMID:23969105
Payne, J.L.; Hassan, B.
1998-09-01
Massively parallel computers have enabled the analyst to solve complicated flow fields (turbulent, chemically reacting) that were previously intractable. Calculations are presented using a massively parallel CFD code called SACCARA (Sandia Advanced Code for Compressible Aerothermodynamics Research and Analysis) currently under development at Sandia National Laboratories as part of the Department of Energy (DOE) Accelerated Strategic Computing Initiative (ASCI). Computations were made on a generic reentry vehicle in a hypersonic flowfield utilizing three different distributed parallel computers to assess the parallel efficiency of the code with increasing numbers of processors. The parallel efficiencies for the SACCARA code will be presented for cases using 1, 150, 100 and 500 processors. Computations were also made on a subsonic/transonic vehicle using both 236 and 521 processors on a grid containing approximately 14.7 million grid points. Ongoing and future plans to implement a parallel overset grid capability and couple SACCARA with other mechanics codes in a massively parallel environment are discussed.
Large-Scale Parallel Unstructured Mesh Computations for 3D High-Lift Analysis
NASA Technical Reports Server (NTRS)
Mavriplis, D. J.; Pirzadeh, S.
1999-01-01
A complete "geometry to drag-polar" analysis capability for three-dimensional high-lift configurations is described. The approach is based on the use of unstructured meshes in order to enable rapid turnaround for complicated geometries which arise in high-lift configurations. Special attention is devoted to creating a capability for enabling analyses on highly resolved grids. Unstructured meshes of several million vertices are initially generated on a work-station, and subsequently refined on a supercomputer. The flow is solved on these refined meshes on large parallel computers using an unstructured agglomeration multigrid algorithm. Good prediction of lift and drag throughout the range of incidences is demonstrated on a transport take-off configuration using up to 24.7 million grid points. The feasibility of using this approach in a production environment on existing parallel machines is demonstrated, as well as the scalability of the solver on machines using up to 1450 processors.
Large-scale Parallel Unstructured Mesh Computations for 3D High-lift Analysis
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.; Pirzadeh, S.
1999-01-01
A complete "geometry to drag-polar" analysis capability for the three-dimensional high-lift configurations is described. The approach is based on the use of unstructured meshes in order to enable rapid turnaround for complicated geometries that arise in high-lift configurations. Special attention is devoted to creating a capability for enabling analyses on highly resolved grids. Unstructured meshes of several million vertices are initially generated on a work-station, and subsequently refined on a supercomputer. The flow is solved on these refined meshes on large parallel computers using an unstructured agglomeration multigrid algorithm. Good prediction of lift and drag throughout the range of incidences is demonstrated on a transport take-off configuration using up to 24.7 million grid points. The feasibility of using this approach in a production environment on existing parallel machines is demonstrated, as well as the scalability of the solver on machines using up to 1450 processors.
Large-Scale Parallel Unstructured Mesh Computations for 3D High-Lift Analysis
NASA Technical Reports Server (NTRS)
Mavriplis, D. J.; Pirzadeh, S.
1999-01-01
A complete "geometry to drag-polar" analysis capability for three-dimensional high-lift configurations is described. The approach is based on the use of unstructured meshes in order to enable rapid turnaround for complicated geometries which arise in high-lift con gurations. Special attention is devoted to creating a capability for enabling analyses on highly resolved grids. Unstructured meshes of several million vertices are initially generated on a work-station, and subsequently refined on a supercomputer. The flow is solved on these refined meshes on large parallel computers using an unstructured agglomeration multigrid algorithm. Good prediction of lift and drag throughout the range of incidences is demonstrated on a transport take-off configuration using up to 24.7 million grid points. The feasibility of using this approach in a production environment on existing parallel machines is demonstrated, as well as the scalability of the solver on machines using up to 1450 processors.
A parallel dynamic load balancing algorithm for 3-D adaptive unstructured grids
NASA Technical Reports Server (NTRS)
Vidwans, A.; Kallinderis, Y.; Venkatakrishnan, V.
1993-01-01
Adaptive local grid refinement and coarsening results in unequal distribution of workload among the processors of a parallel system. A novel method for balancing the load in cases of dynamically changing tetrahedral grids is developed. The approach employs local exchange of cells among processors in order to redistribute the load equally. An important part of the load balancing algorithm is the method employed by a processor to determine which cells within its subdomain are to be exchanged. Two such methods are presented and compared. The strategy for load balancing is based on the Divide-and-Conquer approach which leads to an efficient parallel algorithm. This method is implemented on a distributed-memory MIMD system.
High-Performance Computation of Distributed-Memory Parallel 3D Voronoi and Delaunay Tessellation
Peterka, Tom; Morozov, Dmitriy; Phillips, Carolyn
2014-11-14
Computing a Voronoi or Delaunay tessellation from a set of points is a core part of the analysis of many simulated and measured datasets: N-body simulations, molecular dynamics codes, and LIDAR point clouds are just a few examples. Such computational geometry methods are common in data analysis and visualization; but as the scale of simulations and observations surpasses billions of particles, the existing serial and shared-memory algorithms no longer suffice. A distributed-memory scalable parallel algorithm is the only feasible approach. The primary contribution of this paper is a new parallel Delaunay and Voronoi tessellation algorithm that automatically determines which neighbor points need to be exchanged among the subdomains of a spatial decomposition. Other contributions include periodic and wall boundary conditions, comparison of our method using two popular serial libraries, and application to numerous science datasets.
Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael
2015-04-08
The growth in size of networked high performance computers along with novel accelerator-based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub-optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on themore » performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter-task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm-based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. As a result, application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, therefore enabling the applications to achieve better time to solution and scalability on Titan during production.« less
Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael
2015-04-08
The growth in size of networked high performance computers along with novel accelerator-based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub-optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on the performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter-task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm-based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. As a result, application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, therefore enabling the applications to achieve better time to solution and scalability on Titan during production.
Climate system modeling on massively parallel systems: LDRD Project 95-ERP-47 final report
Mirin, A.A.; Dannevik, W.P.; Chan, B.; Duffy, P.B.; Eltgroth, P.G.; Wehner, M.F.
1996-12-01
Global warming, acid rain, ozone depletion, and biodiversity loss are some of the major climate-related issues presently being addressed by climate and environmental scientists. Because unexpected changes in the climate could have significant effect on our economy, it is vitally important to improve the scientific basis for understanding and predicting the earth`s climate. The impracticality of modeling the earth experimentally in the laboratory together with the fact that the model equations are highly nonlinear has created a unique and vital role for computer-based climate experiments. However, today`s computer models, when run at desired spatial and temporal resolution and physical complexity, severely overtax the capabilities of our most powerful computers. Parallel processing offers significant potential for attaining increased performance and making tractable simulations that cannot be performed today. The principal goals of this project have been to develop and demonstrate the capability to perform large-scale climate simulations on high-performance computing systems (using methodology that scales to the systems of tomorrow), and to carry out leading-edge scientific calculations using parallelized models. The demonstration platform for these studies has been the 256-processor Cray-T3D located at Lawrence Livermore National Laboratory. Our plan was to undertake an ambitious program in optimization, proof-of-principle and scientific study. These goals have been met. We are now regularly using massively parallel processors for scientific study of the ocean and atmosphere, and preliminary parallel coupled ocean/atmosphere calculations are being carried out as well. Furthermore, our work suggests that it should be possible to develop an advanced comprehensive climate system model with performance scalable to the teraflops range. 9 refs., 3 figs.
Task-parallel implementation of 3D shortest path raytracing for geophysical applications
NASA Astrophysics Data System (ADS)
Giroux, Bernard; Larouche, Benoît
2013-04-01
This paper discusses two variants of the shortest path method and their parallel implementation on a shared-memory system. One variant is designed to perform raytracing in models with stepwise distributions of interval velocity while the other is better suited for continuous velocity models. Both rely on a discretization scheme where primary nodes are located at the corners of cuboid cells and where secondary nodes are found on the edges and sides of the cells. The parallel implementations allow raytracing concurrently for different sources, providing an attractive framework for ray-based tomography. The accuracy and performance of the implementations were measured by comparison with the analytic solution for a layered model and for a vertical gradient model. Mean relative error less than 0.2% was obtained with 5 secondary nodes for the layered model and 9 secondary nodes for the gradient model. Parallel performance depends on the level of discretization refinement, on the number of threads, and on the problem size, with the most determinant variable being the level of discretization refinement (number of secondary nodes). The results indicate that a good trade-off between speed and accuracy is achieved with the number of secondary nodes equal to 5. The programs are written in C++ and rely on the Standard Template Library and OpenMP.
Stankovski, Z.
1995-12-31
The collision probability method in neutron transport, as applied to 2D geometries, consume a great amount of computer time, for a typical 2D assembly calculation about 90% of the computing time is consumed in the collision probability evaluations. Consequently RZ or 3D calculations became prohibitive. In this paper the author presents a simple but efficient parallel algorithm based on the message passing host/node programmation model. Parallelization was applied to the energy group treatment. Such approach permits parallelization of the existing code, requiring only limited modifications. Sequential/parallel computer portability is preserved, which is a necessary condition for a industrial code. Sequential performances are also preserved. The algorithm is implemented on a CRAY 90 coupled to a 128 processor T3D computer, a 16 processor IBM SPI and a network of workstations, using the Public Domain PVM library. The tests were executed for a 2D geometry with the standard 99-group library. All results were very satisfactory, the best ones with IBM SPI. Because of heterogeneity of the workstation network, the author did not ask high performances for this architecture. The same source code was used for all computers. A more impressive advantage of this algorithm will appear in the calculations of the SAPHYR project (with the future fine multigroup library of about 8000 groups) with a massively parallel computer, using several hundreds of processors.
3D parallel-detection microwave tomography for clinical breast imaging
Epstein, N. R.; Meaney, P. M.; Paulsen, K. D.
2014-12-15
A biomedical microwave tomography system with 3D-imaging capabilities has been constructed and translated to the clinic. Updates to the hardware and reconfiguration of the electronic-network layouts in a more compartmentalized construct have streamlined system packaging. Upgrades to the data acquisition and microwave components have increased data-acquisition speeds and improved system performance. By incorporating analog-to-digital boards that accommodate the linear amplification and dynamic-range coverage our system requires, a complete set of data (for a fixed array position at a single frequency) is now acquired in 5.8 s. Replacement of key components (e.g., switches and power dividers) by devices with improved operational bandwidths has enhanced system response over a wider frequency range. High-integrity, low-power signals are routinely measured down to −130 dBm for frequencies ranging from 500 to 2300 MHz. Adequate inter-channel isolation has been maintained, and a dynamic range >110 dB has been achieved for the full operating frequency range (500–2900 MHz). For our primary band of interest, the associated measurement deviations are less than 0.33% and 0.5° for signal amplitude and phase values, respectively. A modified monopole antenna array (composed of two interwoven eight-element sub-arrays), in conjunction with an updated motion-control system capable of independently moving the sub-arrays to various in-plane and cross-plane positions within the illumination chamber, has been configured in the new design for full volumetric data acquisition. Signal-to-noise ratios (SNRs) are more than adequate for all transmit/receive antenna pairs over the full frequency range and for the variety of in-plane and cross-plane configurations. For proximal receivers, in-plane SNRs greater than 80 dB are observed up to 2900 MHz, while cross-plane SNRs greater than 80 dB are seen for 6 cm sub-array spacing (for frequencies up to 1500 MHz). We demonstrate accurate
3D parallel-detection microwave tomography for clinical breast imaging
NASA Astrophysics Data System (ADS)
Epstein, N. R.; Meaney, P. M.; Paulsen, K. D.
2014-12-01
A biomedical microwave tomography system with 3D-imaging capabilities has been constructed and translated to the clinic. Updates to the hardware and reconfiguration of the electronic-network layouts in a more compartmentalized construct have streamlined system packaging. Upgrades to the data acquisition and microwave components have increased data-acquisition speeds and improved system performance. By incorporating analog-to-digital boards that accommodate the linear amplification and dynamic-range coverage our system requires, a complete set of data (for a fixed array position at a single frequency) is now acquired in 5.8 s. Replacement of key components (e.g., switches and power dividers) by devices with improved operational bandwidths has enhanced system response over a wider frequency range. High-integrity, low-power signals are routinely measured down to -130 dBm for frequencies ranging from 500 to 2300 MHz. Adequate inter-channel isolation has been maintained, and a dynamic range >110 dB has been achieved for the full operating frequency range (500-2900 MHz). For our primary band of interest, the associated measurement deviations are less than 0.33% and 0.5° for signal amplitude and phase values, respectively. A modified monopole antenna array (composed of two interwoven eight-element sub-arrays), in conjunction with an updated motion-control system capable of independently moving the sub-arrays to various in-plane and cross-plane positions within the illumination chamber, has been configured in the new design for full volumetric data acquisition. Signal-to-noise ratios (SNRs) are more than adequate for all transmit/receive antenna pairs over the full frequency range and for the variety of in-plane and cross-plane configurations. For proximal receivers, in-plane SNRs greater than 80 dB are observed up to 2900 MHz, while cross-plane SNRs greater than 80 dB are seen for 6 cm sub-array spacing (for frequencies up to 1500 MHz). We demonstrate accurate recovery
3D parallel-detection microwave tomography for clinical breast imaging.
Epstein, N R; Meaney, P M; Paulsen, K D
2014-12-01
A biomedical microwave tomography system with 3D-imaging capabilities has been constructed and translated to the clinic. Updates to the hardware and reconfiguration of the electronic-network layouts in a more compartmentalized construct have streamlined system packaging. Upgrades to the data acquisition and microwave components have increased data-acquisition speeds and improved system performance. By incorporating analog-to-digital boards that accommodate the linear amplification and dynamic-range coverage our system requires, a complete set of data (for a fixed array position at a single frequency) is now acquired in 5.8 s. Replacement of key components (e.g., switches and power dividers) by devices with improved operational bandwidths has enhanced system response over a wider frequency range. High-integrity, low-power signals are routinely measured down to -130 dBm for frequencies ranging from 500 to 2300 MHz. Adequate inter-channel isolation has been maintained, and a dynamic range >110 dB has been achieved for the full operating frequency range (500-2900 MHz). For our primary band of interest, the associated measurement deviations are less than 0.33% and 0.5° for signal amplitude and phase values, respectively. A modified monopole antenna array (composed of two interwoven eight-element sub-arrays), in conjunction with an updated motion-control system capable of independently moving the sub-arrays to various in-plane and cross-plane positions within the illumination chamber, has been configured in the new design for full volumetric data acquisition. Signal-to-noise ratios (SNRs) are more than adequate for all transmit/receive antenna pairs over the full frequency range and for the variety of in-plane and cross-plane configurations. For proximal receivers, in-plane SNRs greater than 80 dB are observed up to 2900 MHz, while cross-plane SNRs greater than 80 dB are seen for 6 cm sub-array spacing (for frequencies up to 1500 MHz). We demonstrate accurate recovery
3D parallel-detection microwave tomography for clinical breast imaging
Meaney, P. M.; Paulsen, K. D.
2014-01-01
A biomedical microwave tomography system with 3D-imaging capabilities has been constructed and translated to the clinic. Updates to the hardware and reconfiguration of the electronic-network layouts in a more compartmentalized construct have streamlined system packaging. Upgrades to the data acquisition and microwave components have increased data-acquisition speeds and improved system performance. By incorporating analog-to-digital boards that accommodate the linear amplification and dynamic-range coverage our system requires, a complete set of data (for a fixed array position at a single frequency) is now acquired in 5.8 s. Replacement of key components (e.g., switches and power dividers) by devices with improved operational bandwidths has enhanced system response over a wider frequency range. High-integrity, low-power signals are routinely measured down to −130 dBm for frequencies ranging from 500 to 2300 MHz. Adequate inter-channel isolation has been maintained, and a dynamic range >110 dB has been achieved for the full operating frequency range (500–2900 MHz). For our primary band of interest, the associated measurement deviations are less than 0.33% and 0.5° for signal amplitude and phase values, respectively. A modified monopole antenna array (composed of two interwoven eight-element sub-arrays), in conjunction with an updated motion-control system capable of independently moving the sub-arrays to various in-plane and cross-plane positions within the illumination chamber, has been configured in the new design for full volumetric data acquisition. Signal-to-noise ratios (SNRs) are more than adequate for all transmit/receive antenna pairs over the full frequency range and for the variety of in-plane and cross-plane configurations. For proximal receivers, in-plane SNRs greater than 80 dB are observed up to 2900 MHz, while cross-plane SNRs greater than 80 dB are seen for 6 cm sub-array spacing (for frequencies up to 1500 MHz). We demonstrate accurate
NASA Astrophysics Data System (ADS)
Awatsuji, Yasuhiro; Xia, Peng; Wang, Yexin; Matoba, Osamu
2016-03-01
Digital holography is a technique of 3D measurement of object. The technique uses an image sensor to record the interference fringe image containing the complex amplitude of object, and numerically reconstructs the complex amplitude by computer. Parallel phase-shifting digital holography is capable of accurate 3D measurement of dynamic object. This is because this technique can reconstruct the complex amplitude of object, on which the undesired images are not superimposed, form a single hologram. The undesired images are the non-diffraction wave and the conjugate image which are associated with holography. In parallel phase-shifting digital holography, a hologram, whose phase of the reference wave is spatially and periodically shifted every other pixel, is recorded to obtain complex amplitude of object by single-shot exposure. The recorded hologram is decomposed into multiple holograms required for phase-shifting digital holography. The complex amplitude of the object is free from the undesired images is reconstructed from the multiple holograms. To validate parallel phase-shifting digital holography, a high-speed parallel phase-shifting digital holography system was constructed. The system consists of a Mach-Zehnder interferometer, a continuous-wave laser, and a high-speed polarization imaging camera. Phase motion picture of dynamic air flow sprayed from a nozzle was recorded at 180,000 frames per second (FPS) have been recorded by the system. Also phase motion picture of dynamic air induced by discharge between two electrodes has been recorded at 1,000,000 FPS, when high voltage was applied between the electrodes.
High-speed 3D imaging using two-wavelength parallel-phase-shift interferometry.
Safrani, Avner; Abdulhalim, Ibrahim
2015-10-15
High-speed three dimensional imaging based on two-wavelength parallel-phase-shift interferometry is presented. The technique is demonstrated using a high-resolution polarization-based Linnik interferometer operating with three high-speed phase-masked CCD cameras and two quasi-monochromatic modulated light sources. The two light sources allow for phase unwrapping the single source wrapped phase so that relatively high step profiles having heights as large as 3.7 μm can be imaged in video rate with ±2 nm accuracy and repeatability. The technique is validated using a certified very large scale integration (VLSI) step standard followed by a demonstration from the semiconductor industry showing an integrated chip with 2.75 μm height copper micro pillars at different packing densities. PMID:26469586
Parallel deconvolution of large 3D images obtained by confocal laser scanning microscopy.
Pawliczek, Piotr; Romanowska-Pawliczek, Anna; Soltys, Zbigniew
2010-03-01
Various deconvolution algorithms are often used for restoration of digital images. Image deconvolution is especially needed for the correction of three-dimensional images obtained by confocal laser scanning microscopy. Such images suffer from distortions, particularly in the Z dimension. As a result, reliable automatic segmentation of these images may be difficult or even impossible. Effective deconvolution algorithms are memory-intensive and time-consuming. In this work, we propose a parallel version of the well-known Richardson-Lucy deconvolution algorithm developed for a system with distributed memory and implemented with the use of Message Passing Interface (MPI). It enables significantly more rapid deconvolution of two-dimensional and three-dimensional images by efficiently splitting the computation across multiple computers. The implementation of this algorithm can be used on professional clusters provided by computing centers as well as on simple networks of ordinary PC machines. PMID:19725070
Parallel Implementation of an Adaptive Scheme for 3D Unstructured Grids on the SP2
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak; Strawn, Roger C.
1996-01-01
Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.OX speedup on 64 processors when 10% of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.
Parallel implementation of an adaptive scheme for 3D unstructured grids on the SP2
NASA Technical Reports Server (NTRS)
Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak
1996-01-01
Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.0X speedup on 64 processors when 10 percent of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all the mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.
Enhancements, Parallelization and Future Directions of the V3FIT 3-D Equilibrium Reconstruction Code
NASA Astrophysics Data System (ADS)
Cianciosa, M. R.; Hanson, J. D.; Maurer, D. A.; Hartwell, G. J.; Archmiller, M. C.; Ma, X.; Herfindal, J.
2014-10-01
Three-dimensional equilibrium reconstruction is spreading beyond its original application to stellarators. Three-dimensional effects in nominally axisymmetric systems, including quasi-helical states in reversed field pinches and error fields in tokamaks, are becoming increasingly important. V3FIT is a fully three dimensional equilibrium reconstruction code in widespread use throughout the fusion community. The code has recently undergone extensive revision to prepare for the next generation of equilibrium reconstruction problems. The most notable changes are the abstraction of the equilibrium model, the propagation of experimental errors to the reconstructed results, support for multicolor soft x-ray emissivity cameras, and recent efforts to add parallelization for efficient computation on multi-processor system. Work presented will contain discussions on these new capabilities. We will compare probability distributions of reconstructed parameters with results from whole shot reconstructions. We will show benchmarking and profiling results of initial performance improvements through the addition of OpenMP and MPI support. We will discuss future directions of the V3FIT code including steps taken for support of the W-7X stellarator. Work supported by US. Department of Energy Grant No. DEFG-0203-ER-54692B.
A 3D MPI-Parallel GPU-accelerated framework for simulating ocean wave energy converters
NASA Astrophysics Data System (ADS)
Pathak, Ashish; Raessi, Mehdi
2015-11-01
We present an MPI-parallel GPU-accelerated computational framework for studying the interaction between ocean waves and wave energy converters (WECs). The computational framework captures the viscous effects, nonlinear fluid-structure interaction (FSI), and breaking of waves around the structure, which cannot be captured in many potential flow solvers commonly used for WEC simulations. The full Navier-Stokes equations are solved using the two-step projection method, which is accelerated by porting the pressure Poisson equation to GPUs. The FSI is captured using the numerically stable fictitious domain method. A novel three-phase interface reconstruction algorithm is used to resolve three phases in a VOF-PLIC context. A consistent mass and momentum transport approach enables simulations at high density ratios. The accuracy of the overall framework is demonstrated via an array of test cases. Numerical simulations of the interaction between ocean waves and WECs are presented. Funding from the National Science Foundation CBET-1236462 grant is gratefully acknowledged.
Borazjani, Iman; Ge, Liang; Le, Trung; Sotiropoulos, Fotis
2013-01-01
We develop an overset-curvilinear immersed boundary (overset-CURVIB) method in a general non-inertial frame of reference to simulate a wide range of challenging biological flow problems. The method incorporates overset-curvilinear grids to efficiently handle multi-connected geometries and increase the resolution locally near immersed boundaries. Complex bodies undergoing arbitrarily large deformations may be embedded within the overset-curvilinear background grid and treated as sharp interfaces using the curvilinear immersed boundary (CURVIB) method (Ge and Sotiropoulos, Journal of Computational Physics, 2007). The incompressible flow equations are formulated in a general non-inertial frame of reference to enhance the overall versatility and efficiency of the numerical approach. Efficient search algorithms to identify areas requiring blanking, donor cells, and interpolation coefficients for constructing the boundary conditions at grid interfaces of the overset grid are developed and implemented using efficient parallel computing communication strategies to transfer information among sub-domains. The governing equations are discretized using a second-order accurate finite-volume approach and integrated in time via an efficient fractional-step method. Various strategies for ensuring globally conservative interpolation at grid interfaces suitable for incompressible flow fractional step methods are implemented and evaluated. The method is verified and validated against experimental data, and its capabilities are demonstrated by simulating the flow past multiple aquatic swimmers and the systolic flow in an anatomic left ventricle with a mechanical heart valve implanted in the aortic position. PMID:23833331
NASA Astrophysics Data System (ADS)
Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; Cela, José María
2014-06-01
We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element (FE) solvers for 3-D electromagnetic (EM) numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid (AMG) that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation (SSOR) and Gauss-Seidel, as smoothers and the wave front algorithm to create groups, which are used for a coarse-level generation. We have implemented and tested this new preconditioner within our parallel nodal FE solver for 3-D forward problems in EM induction geophysics. We have performed series of experiments for several models with different conductivity structures and characteristics to test the performance of our AMG preconditioning technique when combined with biconjugate gradient stabilized method. The results have shown that, the more challenging the problem is in terms of conductivity contrasts, ratio between the sizes of grid elements and/or frequency, the more benefit is obtained by using this preconditioner. Compared to other preconditioning schemes, such as diagonal, SSOR and truncated approximate inverse, the AMG preconditioner greatly improves the convergence of the iterative solver for all tested models. Also, when it comes to cases in which other preconditioners succeed to converge to a desired precision, AMG is able to considerably reduce the total execution time of the forward-problem code-up to an order of magnitude. Furthermore, the tests have confirmed that our AMG scheme ensures grid-independent rate of convergence, as well as improvement in convergence regardless of how big local mesh refinements are. In addition, AMG is designed to be a black-box preconditioner, which makes it easy to use and combine with different iterative methods. Finally, it has proved to be very practical and efficient in the
NASA Astrophysics Data System (ADS)
Morgan, J. P.; Hasenclever, J.; Shi, C.
2009-12-01
Computational studies of mantle convection face large challenges to obtain fast and accurate solutions for variable viscosity 3d flow. Recently we have been using parallel (MPI-based) MATLAB to more thoroughly explore possible pitfalls and algorithmic improvements to current ‘best-practice’ variable viscosity Stokes and D’Arcy flow solvers. Here we focus on study of finite-element solvers based on a decomposition of the equations for incompressible Stokes flow: Ku + Gp = f and G’u = 0 (K-velocity stiffness matrix, G-discretized gradient operator, G’=transpose(G)-discretized divergence operator) into a single equation for pressure Sp==G’K^-1Gp =G’K^-1f, in which the velocity is also updated as part of each pressure iteration. The outer pressure iteration is solved with preconditioned conjugate gradients (CG) (Maday and Patera, 1989), with a multigrid-preconditioned CG solver for the z=K^-1 (Gq) step of each pressure iteration. One fairly well-known pitfall (Fortin, 1985) is that constant-pressure elements can generate a spurious non-zero flow under a constant body force within non-rectangular geometries. We found a new pitfall when using an iterative method to solve the Kz=y operation in evaluating each G’K^-1Gq product -- even if the residual of the outer pressure equation converges to zero, the discrete divergence of this equation does not correspondingly converge; the error in the incompressibility depends on roughly the square of the tolerance used to solve each Kz=y velocity-like subproblem. Our current best recipe is: (1) Use flexible CG (cf. Notay, 2001) to solve the outer pressure problem. This is analogous to GMRES for a symmetric positive definite problem. It allows use of numerically unsymmetric and/or inexact preconditioners with CG. (2) In this outer-iteration, use an ‘alpha-bar’ technique to find the appropriate magnitude alpha to change the solution in each search direction. This improvement allows a similar iterative tolerance of
SWAMP+: multiple subsequence alignment using associative massive parallelism
Steinfadt, Shannon Irene; Baker, Johnnie W
2010-10-18
A new parallel algorithm SWAMP+ incorporates the Smith-Waterman sequence alignment on an associative parallel model known as ASC. It is a highly sensitive parallel approach that expands traditional pairwise sequence alignment. This is the first parallel algorithm to provide multiple non-overlapping, non-intersecting subsequence alignments with the accuracy of Smith-Waterman. The efficient algorithm provides multiple alignments similar to BLAST while creating a better workflow for the end users. The parallel portions of the code run in O(m+n) time using m processors. When m = n, the algorithmic analysis becomes O(n) with a coefficient of two, yielding a linear speedup. Implementation of the algorithm on the SIMD ClearSpeed CSX620 confirms this theoretical linear speedup with real timings.
NASA Astrophysics Data System (ADS)
Gainullin, I. K.; Sonkin, M. A.
2015-03-01
A parallelized three-dimensional (3D) time-dependent Schrodinger equation (TDSE) solver for one-electron systems is presented in this paper. The TDSE Solver is based on the finite-difference method (FDM) in Cartesian coordinates and uses a simple and explicit leap-frog numerical scheme. The simplicity of the numerical method provides very efficient parallelization and high performance of calculations using Graphics Processing Units (GPUs). For example, calculation of 106 time-steps on the 1000ṡ1000ṡ1000 numerical grid (109 points) takes only 16 hours on 16 Tesla M2090 GPUs. The TDSE Solver demonstrates scalability (parallel efficiency) close to 100% with some limitations on the problem size. The TDSE Solver is validated by calculation of energy eigenstates of the hydrogen atom (13.55 eV) and affinity level of H- ion (0.75 eV). The comparison with other TDSE solvers shows that a GPU-based TDSE Solver is 3 times faster for the problems of the same size and with the same cost of computational resources. The usage of a non-regular Cartesian grid or problem-specific non-Cartesian coordinates increases this benefit up to 10 times. The TDSE Solver was applied to the calculation of the resonant charge transfer (RCT) in nanosystems, including several related physical problems, such as electron capture during H+-H0 collision and electron tunneling between H- ion and thin metallic island film.
NASA Astrophysics Data System (ADS)
Kim, Jaewook; Ghim, Young-Chul; Nuclear Fusion and Plasma Lab Team
2014-10-01
A BES (beam emission spectroscopy) system and an MIR (Microwave Imaging Reflectometer) system installed in KSTAR measure 2D (radial and poloidal) density fluctuations at two different toroidal locations. This gives a possibility of measuring the parallel correlation length of ion-scale turbulence in KSTAR. Due to lack of measurement points in toroidal direction and shorter separation distance between the diagnostics compared to an expected parallel correlation length, it is necessary to confirm whether a conventional statistical method, i.e., using a cross-correlation function, is valid for measuring the parallel correlation length. For this reason, we generated synthetic 3D density fluctuation data following Gaussian random field in a toroidal coordinate system that mimic real density fluctuation data. We measure the correlation length of the synthetic data by fitting a Gaussian function to the cross-correlation function. We observe that there is disagreement between the measured and actual correlation lengths, and the degree of disagreement is a function of at least, correlation length, correlation time and advection velocity of synthetic data. We identify the cause of disagreement and propose an appropriate method to measure correct correlation length.
NASA Astrophysics Data System (ADS)
Petersson, Anders; Rodgers, Arthur
2010-05-01
conserving, coupling procedure for the elastic wave equation at grid refinement interfaces. When used together with our single grid finite difference scheme, it results in a method which is provably stable, without artificial dissipation, for arbitrary heterogeneous isotropic elastic materials. The new coupling procedure is based on satisfying the summation-by-parts principle across refinement interfaces. From a practical standpoint, an important advantage of the proposed method is the absence of tunable numerical parameters, which seldom are appreciated by application experts. In WPP, the composite grid discretization is combined with a curvilinear grid approach that enables accurate modeling of free surfaces on realistic (non-planar) topography. The overall method satisfies the summation-by-parts principle and is stable under a CFL time step restriction. A feature of great practical importance is that WPP automatically generates the composite grid based on the user provided topography and the depths of the grid refinement interfaces. The WPP code has been verified extensively, for example using the method of manufactured solutions, by solving Lamb's problem, by solving various layer over half- space problems and comparing to semi-analytic (FK) results, and by simulating scenario earthquakes where results from other seismic simulation codes are available. WPP has also been validated against seismographic recordings of moderate earthquakes. WPP performs well on large parallel computers and has been run on up to 32,768 processors using about 26 Billion grid points (78 Billion DOF) and 41,000 time steps. WPP is an open source code that is available under the Gnu general public license.
Chiang, Mao-Hsiung; Lin, Hao-Ting
2011-01-01
This study aimed to develop a novel 3D parallel mechanism robot driven by three vertical-axial pneumatic actuators with a stereo vision system for path tracking control. The mechanical system and the control system are the primary novel parts for developing a 3D parallel mechanism robot. In the mechanical system, a 3D parallel mechanism robot contains three serial chains, a fixed base, a movable platform and a pneumatic servo system. The parallel mechanism are designed and analyzed first for realizing a 3D motion in the X-Y-Z coordinate system of the robot's end-effector. The inverse kinematics and the forward kinematics of the parallel mechanism robot are investigated by using the Denavit-Hartenberg notation (D-H notation) coordinate system. The pneumatic actuators in the three vertical motion axes are modeled. In the control system, the Fourier series-based adaptive sliding-mode controller with H(∞) tracking performance is used to design the path tracking controllers of the three vertical servo pneumatic actuators for realizing 3D path tracking control of the end-effector. Three optical linear scales are used to measure the position of the three pneumatic actuators. The 3D position of the end-effector is then calculated from the measuring position of the three pneumatic actuators by means of the kinematics. However, the calculated 3D position of the end-effector cannot consider the manufacturing and assembly tolerance of the joints and the parallel mechanism so that errors between the actual position and the calculated 3D position of the end-effector exist. In order to improve this situation, sensor collaboration is developed in this paper. A stereo vision system is used to collaborate with the three position sensors of the pneumatic actuators. The stereo vision system combining two CCD serves to measure the actual 3D position of the end-effector and calibrate the error between the actual and the calculated 3D position of the end-effector. Furthermore, to
NASA Astrophysics Data System (ADS)
Buntemeyer, Lars; Banerjee, Robi; Peters, Thomas; Klassen, Mikhail; Pudritz, Ralph E.
2016-02-01
We present an algorithm for solving the radiative transfer problem on massively parallel computers using adaptive mesh refinement and domain decomposition. The solver is based on the method of characteristics which requires an adaptive raytracer that integrates the equation of radiative transfer. The radiation field is split into local and global components which are handled separately to overcome the non-locality problem. The solver is implemented in the framework of the magneto-hydrodynamics code FLASH and is coupled by an operator splitting step. The goal is the study of radiation in the context of star formation simulations with a focus on early disc formation and evolution. This requires a proper treatment of radiation physics that covers both the optically thin as well as the optically thick regimes and the transition region in particular. We successfully show the accuracy and feasibility of our method in a series of standard radiative transfer problems and two 3D collapse simulations resembling the early stages of protostar and disc formation.
ALEGRA -- A massively parallel h-adaptive code for solid dynamics
Summers, R.M.; Wong, M.K.; Boucheron, E.A.; Weatherby, J.R.
1997-12-31
ALEGRA is a multi-material, arbitrary-Lagrangian-Eulerian (ALE) code for solid dynamics designed to run on massively parallel (MP) computers. It combines the features of modern Eulerian shock codes, such as CTH, with modern Lagrangian structural analysis codes using an unstructured grid. ALEGRA is being developed for use on the teraflop supercomputers to conduct advanced three-dimensional (3D) simulations of shock phenomena important to a variety of systems. ALEGRA was designed with the Single Program Multiple Data (SPMD) paradigm, in which the mesh is decomposed into sub-meshes so that each processor gets a single sub-mesh with approximately the same number of elements. Using this approach the authors have been able to produce a single code that can scale from one processor to thousands of processors. A current major effort is to develop efficient, high precision simulation capabilities for ALEGRA, without the computational cost of using a global highly resolved mesh, through flexible, robust h-adaptivity of finite elements. H-adaptivity is the dynamic refinement of the mesh by subdividing elements, thus changing the characteristic element size and reducing numerical error. The authors are working on several major technical challenges that must be met to make effective use of HAMMER on MP computers.
QCD on the Massively Parallel Computer AP1000
NASA Astrophysics Data System (ADS)
Akemi, K.; Fujisaki, M.; Okuda, M.; Tago, Y.; Hashimoto, T.; Hioki, S.; Miyamura, O.; Takaishi, T.; Nakamura, A.; de Forcrand, Ph.; Hege, C.; Stamatescu, I. O.
We present the QCD-TARO program of calculations which uses the parallel computer AP1000 of Fujitsu. We discuss the results on scaling, correlation times and hadronic spectrum, some aspects of the implementation and the future prospects.
Ma, Yingliang; Saetzler, Kurt
2008-01-01
In this paper we describe a novel 3D subdivision strategy to extract the surface of binary image data. This iterative approach generates a series of surface meshes that capture different levels of detail of the underlying structure. At the highest level of detail, the resulting surface mesh generated by our approach uses only about 10% of the triangles in comparison to the marching cube algorithm (MC) even in settings were almost no image noise is present. Our approach also eliminates the so-called "staircase effect" which voxel based algorithms like the MC are likely to show, particularly if non-uniformly sampled images are processed. Finally, we show how the presented algorithm can be parallelized by subdividing 3D image space into rectilinear blocks of subimages. As the algorithm scales very well with an increasing number of processors in a multi-threaded setting, this approach is suited to process large image data sets of several gigabytes. Although the presented work is still computationally more expensive than simple voxel-based algorithms, it produces fewer surface triangles while capturing the same level of detail, is more robust towards image noise and eliminates the above-mentioned "staircase" effect in anisotropic settings. These properties make it particularly useful for biomedical applications, where these conditions are often encountered. PMID:17993710
A development plan for a massively parallel version of the hydrocode CTH
Robinson, A.C.; Fang, E.; Holdridge, D.; McGlaun, J.M.
1990-07-01
Massively parallel computers and computer networks are beginning to appear as an integral part of the scientific computing workplace. This report documents the goals and the corresponding development plan of the massively parallel project of Departments 1530 and 1420. The main goal of the project is to provide a clear understanding of the issues and difficulties involved in bringing the current production hydrocode CTH to the state of being portable to a number of currently available parallel computing architectures. In the process of this research, various working versions of the code will be produced. 6 refs., 6 figs.
A Massively Parallel Adaptive Fast Multipole Method on Heterogeneous Architectures
Lashuk, Ilya; Chandramowlishwaran, Aparna; Langston, Harper; Nguyen, Tuan-Anh; Sampath, Rahul S; Shringarpure, Aashay; Vuduc, Richard; Ying, Lexing; Zorin, Denis; Biros, George
2012-01-01
We describe a parallel fast multipole method (FMM) for highly nonuniform distributions of particles. We employ both distributed memory parallelism (via MPI) and shared memory parallelism (via OpenMP and GPU acceleration) to rapidly evaluate two-body nonoscillatory potentials in three dimensions on heterogeneous high performance computing architectures. We have performed scalability tests with up to 30 billion particles on 196,608 cores on the AMD/CRAY-based Jaguar system at ORNL. On a GPU-enabled system (NSF's Keeneland at Georgia Tech/ORNL), we observed 30x speedup over a single core CPU and 7x speedup over a multicore CPU implementation. By combining GPUs with MPI, we achieve less than 10 ns/particle and six digits of accuracy for a run with 48 million nonuniformly distributed particles on 192 GPUs.
Massively parallel switch-level simulation: A feasibility study
Kravitz, S.A.
1989-01-01
This thesis addresses the feasibility of mapping the COSMOS switch-level simulator onto computers with thousands of simple processors. COSMOS Preprocesses transistor networks into equivalent Boolean behavioral models, capturing the switch-level behavior of a circuit in a set of Boolean formulas. The author shows that thousand-fold parallelism exists in the formulas derived by COSMOS for some actual circuits. He exposes this parallelism by eliminating the event list from the simulator, and he demonstrates that this represents an attractive tradeoff given sufficient parallelism in the circuit model. To investigate the feasibility of this approach, he has developed a prototype implementation of the COSMOS simulator on a 32k processor Connection Machine.
Massively parallel determination and modeling of endonuclease substrate specificity
Thyme, Summer B.; Song, Yifan; Brunette, T. J.; Szeto, Mindy D.; Kusak, Lara; Bradley, Philip; Baker, David
2014-01-01
We describe the identification and characterization of novel homing endonucleases using genome database mining to identify putative target sites, followed by high throughput activity screening in a bacterial selection system. We characterized the substrate specificity and kinetics of these endonucleases by monitoring DNA cleavage events with deep sequencing. The endonuclease specificities revealed by these experiments can be partially recapitulated using 3D structure-based computational models. Analysis of these models together with genome sequence data provide insights into how alternative endonuclease specificities were generated during natural evolution. PMID:25389263