Efficient, massively parallel eigenvalue computation
NASA Technical Reports Server (NTRS)
Huo, Yan; Schreiber, Robert
1993-01-01
In numerical simulations of disordered electronic systems, one of the most common approaches is to diagonalize random Hamiltonian matrices and to study the eigenvalues and eigenfunctions of a single electron in the presence of a random potential. An effort to implement a matrix diagonalization routine for real symmetric dense matrices on massively parallel SIMD computers, the Maspar MP-1 and MP-2 systems, is described. Results of numerical tests and timings are also presented.
NASA Technical Reports Server (NTRS)
Dongarra, Jack (Editor); Messina, Paul (Editor); Sorensen, Danny C. (Editor); Voigt, Robert G. (Editor)
1990-01-01
Attention is given to such topics as an evaluation of block algorithm variants in LAPACK and presents a large-grain parallel sparse system solver, a multiprocessor method for the solution of the generalized Eigenvalue problem on an interval, and a parallel QR algorithm for iterative subspace methods on the CM2. A discussion of numerical methods includes the topics of asynchronous numerical solutions of PDEs on parallel computers, parallel homotopy curve tracking on a hypercube, and solving Navier-Stokes equations on the Cedar Multi-Cluster system. A section on differential equations includes a discussion of a six-color procedure for the parallel solution of elliptic systems using the finite quadtree structure, data parallel algorithms for the finite element method, and domain decomposition methods in aerodynamics. Topics dealing with massively parallel computing include hypercube vs. 2-dimensional meshes and massively parallel computation of conservation laws. Performance and tools are also discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sofronov, I.D.; Voronin, B.L.; Butnev, O.I.
1997-12-31
The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle.more » The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.« less
Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.; ...
2016-09-18
This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.
This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
A massively parallel computational approach to coupled thermoelastic/porous gas flow problems
NASA Technical Reports Server (NTRS)
Shia, David; Mcmanus, Hugh L.
1995-01-01
A new computational scheme for coupled thermoelastic/porous gas flow problems is presented. Heat transfer, gas flow, and dynamic thermoelastic governing equations are expressed in fully explicit form, and solved on a massively parallel computer. The transpiration cooling problem is used as an example problem. The numerical solutions have been verified by comparison to available analytical solutions. Transient temperature, pressure, and stress distributions have been obtained. Small spatial oscillations in pressure and stress have been observed, which would be impractical to predict with previously available schemes. Comparisons between serial and massively parallel versions of the scheme have also been made. The results indicate that for small scale problems the serial and parallel versions use practically the same amount of CPU time. However, as the problem size increases the parallel version becomes more efficient than the serial version.
NASA Technical Reports Server (NTRS)
Lou, John; Ferraro, Robert; Farrara, John; Mechoso, Carlos
1996-01-01
An analysis is presented of several factors influencing the performance of a parallel implementation of the UCLA atmospheric general circulation model (AGCM) on massively parallel computer systems. Several modificaitons to the original parallel AGCM code aimed at improving its numerical efficiency, interprocessor communication cost, load-balance and issues affecting single-node code performance are discussed.
Massively parallel GPU-accelerated minimization of classical density functional theory
NASA Astrophysics Data System (ADS)
Stopper, Daniel; Roth, Roland
2017-08-01
In this paper, we discuss the ability to numerically minimize the grand potential of hard disks in two-dimensional and of hard spheres in three-dimensional space within the framework of classical density functional and fundamental measure theory on modern graphics cards. Our main finding is that a massively parallel minimization leads to an enormous performance gain in comparison to standard sequential minimization schemes. Furthermore, the results indicate that in complex multi-dimensional situations, a heavy parallel minimization of the grand potential seems to be mandatory in order to reach a reasonable balance between accuracy and computational cost.
High-performance computing — an overview
NASA Astrophysics Data System (ADS)
Marksteiner, Peter
1996-08-01
An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
NASA Technical Reports Server (NTRS)
Logan, Terry G.
1994-01-01
The purpose of this study is to investigate the performance of the integral equation computations using numerical source field-panel method in a massively parallel processing (MPP) environment. A comparative study of computational performance of the MPP CM-5 computer and conventional Cray-YMP supercomputer for a three-dimensional flow problem is made. A serial FORTRAN code is converted into a parallel CM-FORTRAN code. Some performance results are obtained on CM-5 with 32, 62, 128 nodes along with those on Cray-YMP with a single processor. The comparison of the performance indicates that the parallel CM-FORTRAN code near or out-performs the equivalent serial FORTRAN code for some cases.
Topical perspective on massive threading and parallelism.
Farber, Robert M
2011-09-01
Unquestionably computer architectures have undergone a recent and noteworthy paradigm shift that now delivers multi- and many-core systems with tens to many thousands of concurrent hardware processing elements per workstation or supercomputer node. GPGPU (General Purpose Graphics Processor Unit) technology in particular has attracted significant attention as new software development capabilities, namely CUDA (Compute Unified Device Architecture) and OpenCL™, have made it possible for students as well as small and large research organizations to achieve excellent speedup for many applications over more conventional computing architectures. The current scientific literature reflects this shift with numerous examples of GPGPU applications that have achieved one, two, and in some special cases, three-orders of magnitude increased computational performance through the use of massive threading to exploit parallelism. Multi-core architectures are also evolving quickly to exploit both massive-threading and massive-parallelism such as the 1.3 million threads Blue Waters supercomputer. The challenge confronting scientists in planning future experimental and theoretical research efforts--be they individual efforts with one computer or collaborative efforts proposing to use the largest supercomputers in the world is how to capitalize on these new massively threaded computational architectures--especially as not all computational problems will scale to massive parallelism. In particular, the costs associated with restructuring software (and potentially redesigning algorithms) to exploit the parallelism of these multi- and many-threaded machines must be considered along with application scalability and lifespan. This perspective is an overview of the current state of threading and parallelize with some insight into the future. Published by Elsevier Inc.
A new parallel-vector finite element analysis software on distributed-memory computers
NASA Technical Reports Server (NTRS)
Qin, Jiangning; Nguyen, Duc T.
1993-01-01
A new parallel-vector finite element analysis software package MPFEA (Massively Parallel-vector Finite Element Analysis) is developed for large-scale structural analysis on massively parallel computers with distributed-memory. MPFEA is designed for parallel generation and assembly of the global finite element stiffness matrices as well as parallel solution of the simultaneous linear equations, since these are often the major time-consuming parts of a finite element analysis. Block-skyline storage scheme along with vector-unrolling techniques are used to enhance the vector performance. Communications among processors are carried out concurrently with arithmetic operations to reduce the total execution time. Numerical results on the Intel iPSC/860 computers (such as the Intel Gamma with 128 processors and the Intel Touchstone Delta with 512 processors) are presented, including an aircraft structure and some very large truss structures, to demonstrate the efficiency and accuracy of MPFEA.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of manymore » computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.« less
Massive parallel 3D PIC simulation of negative ion extraction
NASA Astrophysics Data System (ADS)
Revel, Adrien; Mochalskyy, Serhiy; Montellano, Ivar Mauricio; Wünderlich, Dirk; Fantz, Ursel; Minea, Tiberiu
2017-09-01
The 3D PIC-MCC code ONIX is dedicated to modeling Negative hydrogen/deuterium Ion (NI) extraction and co-extraction of electrons from radio-frequency driven, low pressure plasma sources. It provides valuable insight on the complex phenomena involved in the extraction process. In previous calculations, a mesh size larger than the Debye length was used, implying numerical electron heating. Important steps have been achieved in terms of computation performance and parallelization efficiency allowing successful massive parallel calculations (4096 cores), imperative to resolve the Debye length. In addition, the numerical algorithms have been improved in terms of grid treatment, i.e., the electric field near the complex geometry boundaries (plasma grid) is calculated more accurately. The revised model preserves the full 3D treatment, but can take advantage of a highly refined mesh. ONIX was used to investigate the role of the mesh size, the re-injection scheme for lost particles (extracted or wall absorbed), and the electron thermalization process on the calculated extracted current and plasma characteristics. It is demonstrated that all numerical schemes give the same NI current distribution for extracted ions. Concerning the electrons, the pair-injection technique is found well-adapted to simulate the sheath in front of the plasma grid.
Tough2{_}MP: A parallel version of TOUGH2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Keni; Wu, Yu-Shu; Ding, Chris
2003-04-09
TOUGH2{_}MP is a massively parallel version of TOUGH2. It was developed for running on distributed-memory parallel computers to simulate large simulation problems that may not be solved by the standard, single-CPU TOUGH2 code. The new code implements an efficient massively parallel scheme, while preserving the full capacity and flexibility of the original TOUGH2 code. The new software uses the METIS software package for grid partitioning and AZTEC software package for linear-equation solving. The standard message-passing interface is adopted for communication among processors. Numerical performance of the current version code has been tested on CRAY-T3E and IBM RS/6000 SP platforms. Inmore » addition, the parallel code has been successfully applied to real field problems of multi-million-cell simulations for three-dimensional multiphase and multicomponent fluid and heat flow, as well as solute transport. In this paper, we will review the development of the TOUGH2{_}MP, and discuss the basic features, modules, and their applications.« less
NASA Technical Reports Server (NTRS)
Keppenne, Christian L.; Rienecker, Michele; Borovikov, Anna Y.; Suarez, Max
1999-01-01
A massively parallel ensemble Kalman filter (EnKF)is used to assimilate temperature data from the TOGA/TAO array and altimetry from TOPEX/POSEIDON into a Pacific basin version of the NASA Seasonal to Interannual Prediction Project (NSIPP)ls quasi-isopycnal ocean general circulation model. The EnKF is an approximate Kalman filter in which the error-covariance propagation step is modeled by the integration of multiple instances of a numerical model. An estimate of the true error covariances is then inferred from the distribution of the ensemble of model state vectors. This inplementation of the filter takes advantage of the inherent parallelism in the EnKF algorithm by running all the model instances concurrently. The Kalman filter update step also occurs in parallel by having each processor process the observations that occur in the region of physical space for which it is responsible. The massively parallel data assimilation system is validated by withholding some of the data and then quantifying the extent to which the withheld information can be inferred from the assimilation of the remaining data. The distributions of the forecast and analysis error covariances predicted by the ENKF are also examined.
NASA Astrophysics Data System (ADS)
Wei, Xiaohui; Li, Weishan; Tian, Hailong; Li, Hongliang; Xu, Haixiao; Xu, Tianfu
2015-07-01
The numerical simulation of multiphase flow and reactive transport in the porous media on complex subsurface problem is a computationally intensive application. To meet the increasingly computational requirements, this paper presents a parallel computing method and architecture. Derived from TOUGHREACT that is a well-established code for simulating subsurface multi-phase flow and reactive transport problems, we developed a high performance computing THC-MP based on massive parallel computer, which extends greatly on the computational capability for the original code. The domain decomposition method was applied to the coupled numerical computing procedure in the THC-MP. We designed the distributed data structure, implemented the data initialization and exchange between the computing nodes and the core solving module using the hybrid parallel iterative and direct solver. Numerical accuracy of the THC-MP was verified through a CO2 injection-induced reactive transport problem by comparing the results obtained from the parallel computing and sequential computing (original code). Execution efficiency and code scalability were examined through field scale carbon sequestration applications on the multicore cluster. The results demonstrate successfully the enhanced performance using the THC-MP on parallel computing facilities.
Large-scale enrichment and discovery of gene-associated SNPs
USDA-ARS?s Scientific Manuscript database
With the recent advent of massively parallel pyrosequencing by 454 Life Sciences it has become feasible to cost-effectively identify numerous single nucleotide polymorphisms (SNPs) within the recombinogenic regions of the maize (Zea mays L.) genome. We developed a modified version of hypomethylated...
GPAW - massively parallel electronic structure calculations with Python-based software.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Enkovaara, J.; Romero, N.; Shende, S.
2011-01-01
Electronic structure calculations are a widely used tool in materials science and large consumer of supercomputing resources. Traditionally, the software packages for these kind of simulations have been implemented in compiled languages, where Fortran in its different versions has been the most popular choice. While dynamic, interpreted languages, such as Python, can increase the effciency of programmer, they cannot compete directly with the raw performance of compiled languages. However, by using an interpreted language together with a compiled language, it is possible to have most of the productivity enhancing features together with a good numerical performance. We have used thismore » approach in implementing an electronic structure simulation software GPAW using the combination of Python and C programming languages. While the chosen approach works well in standard workstations and Unix environments, massively parallel supercomputing systems can present some challenges in porting, debugging and profiling the software. In this paper we describe some details of the implementation and discuss the advantages and challenges of the combined Python/C approach. We show that despite the challenges it is possible to obtain good numerical performance and good parallel scalability with Python based software.« less
A biconjugate gradient type algorithm on massively parallel architectures
NASA Technical Reports Server (NTRS)
Freund, Roland W.; Hochbruck, Marlis
1991-01-01
The biconjugate gradient (BCG) method is the natural generalization of the classical conjugate gradient algorithm for Hermitian positive definite matrices to general non-Hermitian linear systems. Unfortunately, the original BCG algorithm is susceptible to possible breakdowns and numerical instabilities. Recently, Freund and Nachtigal have proposed a novel BCG type approach, the quasi-minimal residual method (QMR), which overcomes the problems of BCG. Here, an implementation is presented of QMR based on an s-step version of the nonsymmetric look-ahead Lanczos algorithm. The main feature of the s-step Lanczos algorithm is that, in general, all inner products, except for one, can be computed in parallel at the end of each block; this is unlike the other standard Lanczos process where inner products are generated sequentially. The resulting implementation of QMR is particularly attractive on massively parallel SIMD architectures, such as the Connection Machine.
Applications of Massive Mathematical Computations
1990-04-01
particles from the first principles of QCD . This problem is under intensive numerical study 11-6 using special purpose parallel supercomputers in...several places around the world. The method used here is the Monte Carlo integration for a fixed 3-D plus time lattices . Reliable results are still years...mathematical and theoretical physics, but its most promising applications are in the numerical realization of QCD computations. Our programs for the solution
NASA Technical Reports Server (NTRS)
Bruno, John
1984-01-01
The results of an investigation into the feasibility of using the MPP for direct and large eddy simulations of the Navier-Stokes equations is presented. A major part of this study was devoted to the implementation of two of the standard numerical algorithms for CFD. These implementations were not run on the Massively Parallel Processor (MPP) since the machine delivered to NASA Goddard does not have sufficient capacity. Instead, a detailed implementation plan was designed and from these were derived estimates of the time and space requirements of the algorithms on a suitably configured MPP. In addition, other issues related to the practical implementation of these algorithms on an MPP-like architecture were considered; namely, adaptive grid generation, zonal boundary conditions, the table lookup problem, and the software interface. Performance estimates show that the architectural components of the MPP, the Staging Memory and the Array Unit, appear to be well suited to the numerical algorithms of CFD. This combined with the prospect of building a faster and larger MMP-like machine holds the promise of achieving sustained gigaflop rates that are required for the numerical simulations in CFD.
Phase space simulation of collisionless stellar systems on the massively parallel processor
NASA Technical Reports Server (NTRS)
White, Richard L.
1987-01-01
A numerical technique for solving the collisionless Boltzmann equation describing the time evolution of a self gravitating fluid in phase space was implemented on the Massively Parallel Processor (MPP). The code performs calculations for a two dimensional phase space grid (with one space and one velocity dimension). Some results from calculations are presented. The execution speed of the code is comparable to the speed of a single processor of a Cray-XMP. Advantages and disadvantages of the MPP architecture for this type of problem are discussed. The nearest neighbor connectivity of the MPP array does not pose a significant obstacle. Future MPP-like machines should have much more local memory and easier access to staging memory and disks in order to be effective for this type of problem.
NASA Astrophysics Data System (ADS)
Stellmach, Stephan; Hansen, Ulrich
2008-05-01
Numerical simulations of the process of convection and magnetic field generation in planetary cores still fail to reach geophysically realistic control parameter values. Future progress in this field depends crucially on efficient numerical algorithms which are able to take advantage of the newest generation of parallel computers. Desirable features of simulation algorithms include (1) spectral accuracy, (2) an operation count per time step that is small and roughly proportional to the number of grid points, (3) memory requirements that scale linear with resolution, (4) an implicit treatment of all linear terms including the Coriolis force, (5) the ability to treat all kinds of common boundary conditions, and (6) reasonable efficiency on massively parallel machines with tens of thousands of processors. So far, algorithms for fully self-consistent dynamo simulations in spherical shells do not achieve all these criteria simultaneously, resulting in strong restrictions on the possible resolutions. In this paper, we demonstrate that local dynamo models in which the process of convection and magnetic field generation is only simulated for a small part of a planetary core in Cartesian geometry can achieve the above goal. We propose an algorithm that fulfills the first five of the above criteria and demonstrate that a model implementation of our method on an IBM Blue Gene/L system scales impressively well for up to O(104) processors. This allows for numerical simulations at rather extreme parameter values.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steich, D J; Brugger, S T; Kallman, J S
2000-02-01
This final report describes our efforts on the Three-Dimensional Massively Parallel CEM Technologies LDRD project (97-ERD-009). Significant need exists for more advanced time domain computational electromagnetics modeling. Bookkeeping details and modifying inflexible software constitute a vast majority of the effort required to address such needs. The required effort escalates rapidly as problem complexity increases. For example, hybrid meshes requiring hybrid numerics on massively parallel platforms (MPPs). This project attempts to alleviate the above limitations by investigating flexible abstractions for these numerical algorithms on MPPs using object-oriented methods, providing a programming environment insulating physics from bookkeeping. The three major design iterationsmore » during the project, known as TIGER-I to TIGER-III, are discussed. Each version of TIGER is briefly discussed along with lessons learned during the development and implementation. An Application Programming Interface (API) of the object-oriented interface for Tiger-III is included in three appendices. The three appendices contain the Utilities, Entity-Attribute, and Mesh libraries developed during the project. The API libraries represent a snapshot of our latest attempt at insulated the physics from the bookkeeping.« less
Parallel computing on Unix workstation arrays
NASA Astrophysics Data System (ADS)
Reale, F.; Bocchino, F.; Sciortino, S.
1994-12-01
We have tested arrays of general-purpose Unix workstations used as MIMD systems for massive parallel computations. In particular we have solved numerically a demanding test problem with a 2D hydrodynamic code, generally developed to study astrophysical flows, by exucuting it on arrays either of DECstations 5000/200 on Ethernet LAN, or of DECstations 3000/400, equipped with powerful Alpha processors, on FDDI LAN. The code is appropriate for data-domain decomposition, and we have used a library for parallelization previously developed in our Institute, and easily extended to work on Unix workstation arrays by using the PVM software toolset. We have compared the parallel efficiencies obtained on arrays of several processors to those obtained on a dedicated MIMD parallel system, namely a Meiko Computing Surface (CS-1), equipped with Intel i860 processors. We discuss the feasibility of using non-dedicated parallel systems and conclude that the convenience depends essentially on the size of the computational domain as compared to the relative processor power and network bandwidth. We point out that for future perspectives a parallel development of processor and network technology is important, and that the software still offers great opportunities of improvement, especially in terms of latency times in the message-passing protocols. In conditions of significant gain in terms of speedup, such workstation arrays represent a cost-effective approach to massive parallel computations.
Nishizawa, Hiroaki; Nishimura, Yoshifumi; Kobayashi, Masato; Irle, Stephan; Nakai, Hiromi
2016-08-05
The linear-scaling divide-and-conquer (DC) quantum chemical methodology is applied to the density-functional tight-binding (DFTB) theory to develop a massively parallel program that achieves on-the-fly molecular reaction dynamics simulations of huge systems from scratch. The functions to perform large scale geometry optimization and molecular dynamics with DC-DFTB potential energy surface are implemented to the program called DC-DFTB-K. A novel interpolation-based algorithm is developed for parallelizing the determination of the Fermi level in the DC method. The performance of the DC-DFTB-K program is assessed using a laboratory computer and the K computer. Numerical tests show the high efficiency of the DC-DFTB-K program, a single-point energy gradient calculation of a one-million-atom system is completed within 60 s using 7290 nodes of the K computer. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
LDRD final report on massively-parallel linear programming : the parPCx system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar
2005-02-01
This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less
2015-08-01
Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) Software by N Scott Weingarten and James P Larentzos Approved for...Massively Parallel Simulator ( LAMMPS ) Software by N Scott Weingarten Weapons and Materials Research Directorate, ARL James P Larentzos Engility...Shifted Periodic Boundary Conditions in the Large-Scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) Software 5a. CONTRACT NUMBER 5b
Parallel implementation of geometrical shock dynamics for two dimensional converging shock waves
NASA Astrophysics Data System (ADS)
Qiu, Shi; Liu, Kuang; Eliasson, Veronica
2016-10-01
Geometrical shock dynamics (GSD) theory is an appealing method to predict the shock motion in the sense that it is more computationally efficient than solving the traditional Euler equations, especially for converging shock waves. However, to solve and optimize large scale configurations, the main bottleneck is the computational cost. Among the existing numerical GSD schemes, there is only one that has been implemented on parallel computers, with the purpose to analyze detonation waves. To extend the computational advantage of the GSD theory to more general applications such as converging shock waves, a numerical implementation using a spatial decomposition method has been coupled with a front tracking approach on parallel computers. In addition, an efficient tridiagonal system solver for massively parallel computers has been applied to resolve the most expensive function in this implementation, resulting in an efficiency of 0.93 while using 32 HPCC cores. Moreover, symmetric boundary conditions have been developed to further reduce the computational cost, achieving a speedup of 19.26 for a 12-sided polygonal converging shock.
Conformal anomaly of some 2-d Z (n) models
NASA Astrophysics Data System (ADS)
William, Peter
1991-01-01
We describe a numerical calculation of the conformal anomaly in the case of some two-dimensional statistical models undergoing a second-order phase transition, utilizing a recently developed method to compute the partition function exactly. This computation is carried out on a massively parallel CM2 machine, using the finite size scaling behaviour of the free energy.
A Strassen-Newton algorithm for high-speed parallelizable matrix inversion
NASA Technical Reports Server (NTRS)
Bailey, David H.; Ferguson, Helaman R. P.
1988-01-01
Techniques are described for computing matrix inverses by algorithms that are highly suited to massively parallel computation. The techniques are based on an algorithm suggested by Strassen (1969). Variations of this scheme use matrix Newton iterations and other methods to improve the numerical stability while at the same time preserving a very high level of parallelism. One-processor Cray-2 implementations of these schemes range from one that is up to 55 percent faster than a conventional library routine to one that is slower than a library routine but achieves excellent numerical stability. The problem of computing the solution to a single set of linear equations is discussed, and it is shown that this problem can also be solved efficiently using these techniques.
The 2nd Symposium on the Frontiers of Massively Parallel Computations
NASA Technical Reports Server (NTRS)
Mills, Ronnie (Editor)
1988-01-01
Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
Substructured multibody molecular dynamics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grest, Gary Stephen; Stevens, Mark Jackson; Plimpton, Steven James
2006-11-01
We have enhanced our parallel molecular dynamics (MD) simulation software LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator, lammps.sandia.gov) to include many new features for accelerated simulation including articulated rigid body dynamics via coupling to the Rensselaer Polytechnic Institute code POEMS (Parallelizable Open-source Efficient Multibody Software). We use new features of the LAMMPS software package to investigate rhodopsin photoisomerization, and water model surface tension and capillary waves at the vapor-liquid interface. Finally, we motivate the recipes of MD for practitioners and researchers in numerical analysis and computational mechanics.
Development of iterative techniques for the solution of unsteady compressible viscous flows
NASA Technical Reports Server (NTRS)
Hixon, Duane; Sankar, L. N.
1993-01-01
During the past two decades, there has been significant progress in the field of numerical simulation of unsteady compressible viscous flows. At present, a variety of solution techniques exist such as the transonic small disturbance analyses (TSD), transonic full potential equation-based methods, unsteady Euler solvers, and unsteady Navier-Stokes solvers. These advances have been made possible by developments in three areas: (1) improved numerical algorithms; (2) automation of body-fitted grid generation schemes; and (3) advanced computer architectures with vector processing and massively parallel processing features. In this work, the GMRES scheme has been considered as a candidate for acceleration of a Newton iteration time marching scheme for unsteady 2-D and 3-D compressible viscous flow calculation; from preliminary calculations, this will provide up to a 65 percent reduction in the computer time requirements over the existing class of explicit and implicit time marching schemes. The proposed method has ben tested on structured grids, but is flexible enough for extension to unstructured grids. The described scheme has been tested only on the current generation of vector processor architecture of the Cray Y/MP class, but should be suitable for adaptation to massively parallel machines.
Parallel Computational Fluid Dynamics: Current Status and Future Requirements
NASA Technical Reports Server (NTRS)
Simon, Horst D.; VanDalsem, William R.; Dagum, Leonardo; Kutler, Paul (Technical Monitor)
1994-01-01
One or the key objectives of the Applied Research Branch in the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Allies Research Center is the accelerated introduction of highly parallel machines into a full operational environment. In this report we discuss the performance results obtained from the implementation of some computational fluid dynamics (CFD) applications on the Connection Machine CM-2 and the Intel iPSC/860. We summarize some of the experiences made so far with the parallel testbed machines at the NAS Applied Research Branch. Then we discuss the long term computational requirements for accomplishing some of the grand challenge problems in computational aerosciences. We argue that only massively parallel machines will be able to meet these grand challenge requirements, and we outline the computer science and algorithm research challenges ahead.
Explicit finite-difference simulation of optical integrated devices on massive parallel computers.
Sterkenburgh, T; Michels, R M; Dress, P; Franke, H
1997-02-20
An explicit method for the numerical simulation of optical integrated circuits by means of the finite-difference time-domain (FDTD) method is presented. This method, based on an explicit solution of Maxwell's equations, is well established in microwave technology. Although the simulation areas are small, we verified the behavior of three interesting problems, especially nonparaxial problems, with typical aspects of integrated optical devices. Because numerical losses are within acceptable limits, we suggest the use of the FDTD method to achieve promising quantitative simulation results.
New Parallel Algorithms for Landscape Evolution Model
NASA Astrophysics Data System (ADS)
Jin, Y.; Zhang, H.; Shi, Y.
2017-12-01
Most landscape evolution models (LEM) developed in the last two decades solve the diffusion equation to simulate the transportation of surface sediments. This numerical approach is difficult to parallelize due to the computation of drainage area for each node, which needs huge amount of communication if run in parallel. In order to overcome this difficulty, we developed two parallel algorithms for LEM with a stream net. One algorithm handles the partition of grid with traditional methods and applies an efficient global reduction algorithm to do the computation of drainage areas and transport rates for the stream net; the other algorithm is based on a new partition algorithm, which partitions the nodes in catchments between processes first, and then partitions the cells according to the partition of nodes. Both methods focus on decreasing communication between processes and take the advantage of massive computing techniques, and numerical experiments show that they are both adequate to handle large scale problems with millions of cells. We implemented the two algorithms in our program based on the widely used finite element library deal.II, so that it can be easily coupled with ASPECT.
NASA Astrophysics Data System (ADS)
Ukawa, Akira
1998-05-01
The CP-PACS computer is a massively parallel computer consisting of 2048 processing units and having a peak speed of 614 GFLOPS and 128 GByte of main memory. It was developed over the four years from 1992 to 1996 at the Center for Computational Physics, University of Tsukuba, for large-scale numerical simulations in computational physics, especially those of lattice QCD. The CP-PACS computer has been in full operation for physics computations since October 1996. In this article we describe the chronology of the development, the hardware and software characteristics of the computer, and its performance for lattice QCD simulations.
Massively parallel information processing systems for space applications
NASA Technical Reports Server (NTRS)
Schaefer, D. H.
1979-01-01
NASA is developing massively parallel systems for ultra high speed processing of digital image data collected by satellite borne instrumentation. Such systems contain thousands of processing elements. Work is underway on the design and fabrication of the 'Massively Parallel Processor', a ground computer containing 16,384 processing elements arranged in a 128 x 128 array. This computer uses existing technology. Advanced work includes the development of semiconductor chips containing thousands of feedthrough paths. Massively parallel image analog to digital conversion technology is also being developed. The goal is to provide compact computers suitable for real-time onboard processing of images.
NASA Technical Reports Server (NTRS)
Morgan, Philip E.
2004-01-01
This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.
Atlas : A library for numerical weather prediction and climate modelling
NASA Astrophysics Data System (ADS)
Deconinck, Willem; Bauer, Peter; Diamantakis, Michail; Hamrud, Mats; Kühnlein, Christian; Maciel, Pedro; Mengaldo, Gianmarco; Quintino, Tiago; Raoult, Baudouin; Smolarkiewicz, Piotr K.; Wedi, Nils P.
2017-11-01
The algorithms underlying numerical weather prediction (NWP) and climate models that have been developed in the past few decades face an increasing challenge caused by the paradigm shift imposed by hardware vendors towards more energy-efficient devices. In order to provide a sustainable path to exascale High Performance Computing (HPC), applications become increasingly restricted by energy consumption. As a result, the emerging diverse and complex hardware solutions have a large impact on the programming models traditionally used in NWP software, triggering a rethink of design choices for future massively parallel software frameworks. In this paper, we present Atlas, a new software library that is currently being developed at the European Centre for Medium-Range Weather Forecasts (ECMWF), with the scope of handling data structures required for NWP applications in a flexible and massively parallel way. Atlas provides a versatile framework for the future development of efficient NWP and climate applications on emerging HPC architectures. The applications range from full Earth system models, to specific tools required for post-processing weather forecast products. The Atlas library thus constitutes a step towards affordable exascale high-performance simulations by providing the necessary abstractions that facilitate the application in heterogeneous HPC environments by promoting the co-design of NWP algorithms with the underlying hardware.
Analysis of composite ablators using massively parallel computation
NASA Technical Reports Server (NTRS)
Shia, David
1995-01-01
In this work, the feasibility of using massively parallel computation to study the response of ablative materials is investigated. Explicit and implicit finite difference methods are used on a massively parallel computer, the Thinking Machines CM-5. The governing equations are a set of nonlinear partial differential equations. The governing equations are developed for three sample problems: (1) transpiration cooling, (2) ablative composite plate, and (3) restrained thermal growth testing. The transpiration cooling problem is solved using a solution scheme based solely on the explicit finite difference method. The results are compared with available analytical steady-state through-thickness temperature and pressure distributions and good agreement between the numerical and analytical solutions is found. It is also found that a solution scheme based on the explicit finite difference method has the following advantages: incorporates complex physics easily, results in a simple algorithm, and is easily parallelizable. However, a solution scheme of this kind needs very small time steps to maintain stability. A solution scheme based on the implicit finite difference method has the advantage that it does not require very small times steps to maintain stability. However, this kind of solution scheme has the disadvantages that complex physics cannot be easily incorporated into the algorithm and that the solution scheme is difficult to parallelize. A hybrid solution scheme is then developed to combine the strengths of the explicit and implicit finite difference methods and minimize their weaknesses. This is achieved by identifying the critical time scale associated with the governing equations and applying the appropriate finite difference method according to this critical time scale. The hybrid solution scheme is then applied to the ablative composite plate and restrained thermal growth problems. The gas storage term is included in the explicit pressure calculation of both problems. Results from ablative composite plate problems are compared with previous numerical results which did not include the gas storage term. It is found that the through-thickness temperature distribution is not affected much by the gas storage term. However, the through-thickness pressure and stress distributions, and the extent of chemical reactions are different from the previous numerical results. Two types of chemical reaction models are used in the restrained thermal growth testing problem: (1) pressure-independent Arrhenius type rate equations and (2) pressure-dependent Arrhenius type rate equations. The numerical results are compared to experimental results and the pressure-dependent model is able to capture the trend better than the pressure-independent one. Finally, a performance study is done on the hybrid algorithm using the ablative composite plate problem. It is found that there is a good speedup of performance on the CM-5. For 32 CPU's, the speedup of performance is 20. The efficiency of the algorithm is found to be a function of the size and execution time of a given problem and the effective parallelization of the algorithm. It also seems that there is an optimum number of CPU's to use for a given problem.
NASA Astrophysics Data System (ADS)
Kumari, Komal; Donzis, Diego
2017-11-01
Highly resolved computational simulations on massively parallel machines are critical in understanding the physics of a vast number of complex phenomena in nature governed by partial differential equations. Simulations at extreme levels of parallelism present many challenges with communication between processing elements (PEs) being a major bottleneck. In order to fully exploit the computational power of exascale machines one needs to devise numerical schemes that relax global synchronizations across PEs. This asynchronous computations, however, have a degrading effect on the accuracy of standard numerical schemes.We have developed asynchrony-tolerant (AT) schemes that maintain order of accuracy despite relaxed communications. We show, analytically and numerically, that these schemes retain their numerical properties with multi-step higher order temporal Runge-Kutta schemes. We also show that for a range of optimized parameters,the computation time and error for AT schemes is less than their synchronous counterpart. Stability of the AT schemes which depends upon history and random nature of delays, are also discussed. Support from NSF is gratefully acknowledged.
Visualization of Pulsar Search Data
NASA Astrophysics Data System (ADS)
Foster, R. S.; Wolszczan, A.
1993-05-01
The search for periodic signals from rotating neutron stars or pulsars has been a computationally taxing problem to astronomers for more than twenty-five years. Over this time interval, increases in computational capability have allowed ever more sensitive searches, covering a larger parameter space. The volume of input data and the general presence of radio frequency interference typically produce numerous spurious signals. Visualization of the search output and enhanced real-time processing of significant candidate events allow the pulsar searcher to optimally processes and search for new radio pulsars. The pulsar search algorithm and visualization system presented in this paper currently runs on serial RISC based workstations, a traditional vector based super computer, and a massively parallel computer. A description of the serial software algorithm and its modifications for massively parallel computing are describe. The results of four successive searches for millisecond period radio pulsars using the Arecibo telescope at 430 MHz have resulted in the successful detection of new long-period and millisecond period radio pulsars.
Cazzaniga, Paolo; Nobile, Marco S.; Besozzi, Daniela; Bellini, Matteo; Mauri, Giancarlo
2014-01-01
The introduction of general-purpose Graphics Processing Units (GPUs) is boosting scientific applications in Bioinformatics, Systems Biology, and Computational Biology. In these fields, the use of high-performance computing solutions is motivated by the need of performing large numbers of in silico analysis to study the behavior of biological systems in different conditions, which necessitate a computing power that usually overtakes the capability of standard desktop computers. In this work we present coagSODA, a CUDA-powered computational tool that was purposely developed for the analysis of a large mechanistic model of the blood coagulation cascade (BCC), defined according to both mass-action kinetics and Hill functions. coagSODA allows the execution of parallel simulations of the dynamics of the BCC by automatically deriving the system of ordinary differential equations and then exploiting the numerical integration algorithm LSODA. We present the biological results achieved with a massive exploration of perturbed conditions of the BCC, carried out with one-dimensional and bi-dimensional parameter sweep analysis, and show that GPU-accelerated parallel simulations of this model can increase the computational performances up to a 181× speedup compared to the corresponding sequential simulations. PMID:25025072
Massively parallel algorithms for real-time wavefront control of a dense adaptive optics system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fijany, A.; Milman, M.; Redding, D.
1994-12-31
In this paper massively parallel algorithms and architectures for real-time wavefront control of a dense adaptive optic system (SELENE) are presented. The authors have already shown that the computation of a near optimal control algorithm for SELENE can be reduced to the solution of a discrete Poisson equation on a regular domain. Although, this represents an optimal computation, due the large size of the system and the high sampling rate requirement, the implementation of this control algorithm poses a computationally challenging problem since it demands a sustained computational throughput of the order of 10 GFlops. They develop a novel algorithm,more » designated as Fast Invariant Imbedding algorithm, which offers a massive degree of parallelism with simple communication and synchronization requirements. Due to these features, this algorithm is significantly more efficient than other Fast Poisson Solvers for implementation on massively parallel architectures. The authors also discuss two massively parallel, algorithmically specialized, architectures for low-cost and optimal implementation of the Fast Invariant Imbedding algorithm.« less
RAMA: A file system for massively parallel computers
NASA Technical Reports Server (NTRS)
Miller, Ethan L.; Katz, Randy H.
1993-01-01
This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.
Gravitational field calculations on a dynamic lattice by distributed computing.
NASA Astrophysics Data System (ADS)
Mähönen, P.; Punkka, V.
A new method of calculating numerically time evolution of a gravitational field in general relativity is introduced. Vierbein (tetrad) formalism, dynamic lattice and massively parallelized computation are suggested as they are expected to speed up the calculations considerably and facilitate the solution of problems previously considered too hard to be solved, such as the time evolution of a system consisting of two or more black holes or the structure of worm holes.
Gravitation Field Calculations on a Dynamic Lattice by Distributed Computing
NASA Astrophysics Data System (ADS)
Mähönen, Petri; Punkka, Veikko
A new method of calculating numerically time evolution of a gravitational field in General Relatity is introduced. Vierbein (tetrad) formalism, dynamic lattice and massively parallelized computation are suggested as they are expected to speed up the calculations considerably and facilitate the solution of problems previously considered too hard to be solved, such as the time evolution of a system consisting of two or more black holes or the structure of worm holes.
2008-04-01
Space GmbH as follows: B. TECHNICAL PRPOPOSA/DESCRIPTION OF WORK Cell: A Revolutionary High Performance Computing Platform On 29 June 2005 [1...IBM has announced that is has partnered with Mercury Computer Systems, a maker of specialized computers . The Cell chip provides massive floating-point...the computing industry away from the traditional processor technology dominated by Intel. While in the past, the development of computing power has
Parallel Index and Query for Large Scale Data Analysis
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chou, Jerry; Wu, Kesheng; Ruebel, Oliver
2011-07-18
Modern scientific datasets present numerous data management and analysis challenges. State-of-the-art index and query technologies are critical for facilitating interactive exploration of large datasets, but numerous challenges remain in terms of designing a system for process- ing general scientific datasets. The system needs to be able to run on distributed multi-core platforms, efficiently utilize underlying I/O infrastructure, and scale to massive datasets. We present FastQuery, a novel software framework that address these challenges. FastQuery utilizes a state-of-the-art index and query technology (FastBit) and is designed to process mas- sive datasets on modern supercomputing platforms. We apply FastQuery to processing ofmore » a massive 50TB dataset generated by a large scale accelerator modeling code. We demonstrate the scalability of the tool to 11,520 cores. Motivated by the scientific need to search for inter- esting particles in this dataset, we use our framework to reduce search time from hours to tens of seconds.« less
Massively Parallel Real-Time TDDFT Simulations of Electronic Stopping Processes
NASA Astrophysics Data System (ADS)
Yost, Dillon; Lee, Cheng-Wei; Draeger, Erik; Correa, Alfredo; Schleife, Andre; Kanai, Yosuke
Electronic stopping describes transfer of kinetic energy from fast-moving charged particles to electrons, producing massive electronic excitations in condensed matter. Understanding this phenomenon for ion irradiation has implications in modern technologies, ranging from nuclear reactors, to semiconductor devices for aerospace missions, to proton-based cancer therapy. Recent advances in high-performance computing allow us to achieve an accurate parameter-free description of these phenomena through numerical simulations. Here we discuss results from our recently-developed large-scale real-time TDDFT implementation for electronic stopping processes in important example materials such as metals, semiconductors, liquid water, and DNA. We will illustrate important insight into the physics underlying electronic stopping and we discuss current limitations of our approach both regarding physical and numerical approximations. This work is supported by the DOE through the INCITE awards and by the NSF. Part of this work was performed under the auspices of U.S. DOE by LLNL under Contract DE-AC52-07NA27344.
Lattice dynamics calculations based on density-functional perturbation theory in real space
NASA Astrophysics Data System (ADS)
Shang, Honghui; Carbogno, Christian; Rinke, Patrick; Scheffler, Matthias
2017-06-01
A real-space formalism for density-functional perturbation theory (DFPT) is derived and applied for the computation of harmonic vibrational properties in molecules and solids. The practical implementation using numeric atom-centered orbitals as basis functions is demonstrated exemplarily for the all-electron Fritz Haber Institute ab initio molecular simulations (FHI-aims) package. The convergence of the calculations with respect to numerical parameters is carefully investigated and a systematic comparison with finite-difference approaches is performed both for finite (molecules) and extended (periodic) systems. Finally, the scaling tests and scalability tests on massively parallel computer systems demonstrate the computational efficiency.
The upwind control volume scheme for unstructured triangular grids
NASA Technical Reports Server (NTRS)
Giles, Michael; Anderson, W. Kyle; Roberts, Thomas W.
1989-01-01
A new algorithm for the numerical solution of the Euler equations is presented. This algorithm is particularly suited to the use of unstructured triangular meshes, allowing geometric flexibility. Solutions are second-order accurate in the steady state. Implementation of the algorithm requires minimal grid connectivity information, resulting in modest storage requirements, and should enhance the implementation of the scheme on massively parallel computers. A novel form of upwind differencing is developed, and is shown to yield sharp resolution of shocks. Two new artificial viscosity models are introduced that enhance the performance of the new scheme. Numerical results for transonic airfoil flows are presented, which demonstrate the performance of the algorithm.
Self-Scheduling Parallel Methods for Multiple Serial Codes with Application to WOPWOP
NASA Technical Reports Server (NTRS)
Long, Lyle N.; Brentner, Kenneth S.
2000-01-01
This paper presents a scheme for efficiently running a large number of serial jobs on parallel computers. Two examples are given of computer programs that run relatively quickly, but often they must be run numerous times to obtain all the results needed. It is very common in science and engineering to have codes that are not massive computing challenges in themselves, but due to the number of instances that must be run, they do become large-scale computing problems. The two examples given here represent common problems in aerospace engineering: aerodynamic panel methods and aeroacoustic integral methods. The first example simply solves many systems of linear equations. This is representative of an aerodynamic panel code where someone would like to solve for numerous angles of attack. The complete code for this first example is included in the appendix so that it can be readily used by others as a template. The second example is an aeroacoustics code (WOPWOP) that solves the Ffowcs Williams Hawkings equation to predict the far-field sound due to rotating blades. In this example, one quite often needs to compute the sound at numerous observer locations, hence parallelization is utilized to automate the noise computation for a large number of observers.
Numerical Prediction of Non-Reacting and Reacting Flow in a Model Gas Turbine Combustor
NASA Technical Reports Server (NTRS)
Davoudzadeh, Farhad; Liu, Nan-Suey
2005-01-01
The three-dimensional, viscous, turbulent, reacting and non-reacting flow characteristics of a model gas turbine combustor operating on air/methane are simulated via an unstructured and massively parallel Reynolds-Averaged Navier-Stokes (RANS) code. This serves to demonstrate the capabilities of the code for design and analysis of real combustor engines. The effects of some design features of combustors are examined. In addition, the computed results are validated against experimental data.
Aircraft optimization by a system approach: Achievements and trends
NASA Technical Reports Server (NTRS)
Sobieszczanski-Sobieski, Jaroslaw
1992-01-01
Recently emerging methodology for optimal design of aircraft treated as a system of interacting physical phenomena and parts is examined. The methodology is found to coalesce into methods for hierarchic, non-hierarchic, and hybrid systems all dependent on sensitivity analysis. A separate category of methods has also evolved independent of sensitivity analysis, hence suitable for discrete problems. References and numerical applications are cited. Massively parallel computer processing is seen as enabling technology for practical implementation of the methodology.
A domain-specific compiler for a parallel multiresolution adaptive numerical simulation environment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rajbhandari, Samyam; Kim, Jinsung; Krishnamoorthy, Sriram
This paper describes the design and implementation of a layered domain-specific compiler to support MADNESS---Multiresolution ADaptive Numerical Environment for Scientific Simulation. MADNESS is a high-level software environment for the solution of integral and differential equations in many dimensions, using adaptive and fast harmonic analysis methods with guaranteed precision. MADNESS uses k-d trees to represent spatial functions and implements operators like addition, multiplication, differentiation, and integration on the numerical representation of functions. The MADNESS runtime system provides global namespace support and a task-based execution model including futures. MADNESS is currently deployed on massively parallel supercomputers and has enabled many science advances.more » Due to the highly irregular and statically unpredictable structure of the k-d trees representing the spatial functions encountered in MADNESS applications, only purely runtime approaches to optimization have previously been implemented in the MADNESS framework. This paper describes a layered domain-specific compiler developed to address some performance bottlenecks in MADNESS. The newly developed static compile-time optimizations, in conjunction with the MADNESS runtime support, enable significant performance improvement for the MADNESS framework.« less
Parallel Logic Programming and Parallel Systems Software and Hardware
1989-07-29
Conference, Dallas TX. January 1985. (55) [Rous75] Roussel, P., "PROLOG: Manuel de Reference et d’Uilisation", Group d’ Intelligence Artificielle , Universite d...completed. Tools were provided for software development using artificial intelligence techniques. Al software for massively parallel architectures was...using artificial intelligence tech- niques. Al software for massively parallel architectures was started. 1. Introduction We describe research conducted
Design of a massively parallel computer using bit serial processing elements
NASA Technical Reports Server (NTRS)
Aburdene, Maurice F.; Khouri, Kamal S.; Piatt, Jason E.; Zheng, Jianqing
1995-01-01
A 1-bit serial processor designed for a parallel computer architecture is described. This processor is used to develop a massively parallel computational engine, with a single instruction-multiple data (SIMD) architecture. The computer is simulated and tested to verify its operation and to measure its performance for further development.
The EMCC / DARPA Massively Parallel Electromagnetic Scattering Project
NASA Technical Reports Server (NTRS)
Woo, Alex C.; Hill, Kueichien C.
1996-01-01
The Electromagnetic Code Consortium (EMCC) was sponsored by the Advanced Research Program Agency (ARPA) to demonstrate the effectiveness of massively parallel computing in large scale radar signature predictions. The EMCC/ARPA project consisted of three parts.
The design and implementation of a parallel unstructured Euler solver using software primitives
NASA Technical Reports Server (NTRS)
Das, R.; Mavriplis, D. J.; Saltz, J.; Gupta, S.; Ponnusamy, R.
1992-01-01
This paper is concerned with the implementation of a three-dimensional unstructured grid Euler-solver on massively parallel distributed-memory computer architectures. The goal is to minimize solution time by achieving high computational rates with a numerically efficient algorithm. An unstructured multigrid algorithm with an edge-based data structure has been adopted, and a number of optimizations have been devised and implemented in order to accelerate the parallel communication rates. The implementation is carried out by creating a set of software tools, which provide an interface between the parallelization issues and the sequential code, while providing a basis for future automatic run-time compilation support. Large practical unstructured grid problems are solved on the Intel iPSC/860 hypercube and Intel Touchstone Delta machine. The quantitative effect of the various optimizations are demonstrated, and we show that the combined effect of these optimizations leads to roughly a factor of three performance improvement. The overall solution efficiency is compared with that obtained on the CRAY-YMP vector supercomputer.
Programming a hillslope water movement model on the MPP
NASA Technical Reports Server (NTRS)
Devaney, J. E.; Irving, A. R.; Camillo, P. J.; Gurney, R. J.
1987-01-01
A physically based numerical model was developed of heat and moisture flow within a hillslope on a parallel architecture computer, as a precursor to a model of a complete catchment. Moisture flow within a catchment includes evaporation, overland flow, flow in unsaturated soil, and flow in saturated soil. Because of the empirical evidence that moisture flow in unsaturated soil is mainly in the vertical direction, flow in the unsaturated zone can be modeled as a series of one dimensional columns. This initial version of the hillslope model includes evaporation and a single column of one dimensional unsaturated zone flow. This case has already been solved on an IBM 3081 computer and is now being applied to the massively parallel processor architecture so as to make the extension to the one dimensional case easier and to check the problems and benefits of using a parallel architecture machine.
Graphics processing unit (GPU)-based computation of heat conduction in thermally anisotropic solids
NASA Astrophysics Data System (ADS)
Nahas, C. A.; Balasubramaniam, Krishnan; Rajagopal, Prabhu
2013-01-01
Numerical modeling of anisotropic media is a computationally intensive task since it brings additional complexity to the field problem in such a way that the physical properties are different in different directions. Largely used in the aerospace industry because of their lightweight nature, composite materials are a very good example of thermally anisotropic media. With advancements in video gaming technology, parallel processors are much cheaper today and accessibility to higher-end graphical processing devices has increased dramatically over the past couple of years. Since these massively parallel GPUs are very good in handling floating point arithmetic, they provide a new platform for engineers and scientists to accelerate their numerical models using commodity hardware. In this paper we implement a parallel finite difference model of thermal diffusion through anisotropic media using the NVIDIA CUDA (Compute Unified device Architecture). We use the NVIDIA GeForce GTX 560 Ti as our primary computing device which consists of 384 CUDA cores clocked at 1645 MHz with a standard desktop pc as the host platform. We compare the results from standard CPU implementation for its accuracy and speed and draw implications for simulation using the GPU paradigm.
Box schemes and their implementation on the iPSC/860
NASA Technical Reports Server (NTRS)
Chattot, J. J.; Merriam, M. L.
1991-01-01
Research on algoriths for efficiently solving fluid flow problems on massively parallel computers is continued in the present paper. Attention is given to the implementation of a box scheme on the iPSC/860, a massively parallel computer with a peak speed of 10 Gflops and a memory of 128 Mwords. A domain decomposition approach to parallelism is used.
Scan line graphics generation on the massively parallel processor
NASA Technical Reports Server (NTRS)
Dorband, John E.
1988-01-01
Described here is how researchers implemented a scan line graphics generation algorithm on the Massively Parallel Processor (MPP). Pixels are computed in parallel and their results are applied to the Z buffer in large groups. To perform pixel value calculations, facilitate load balancing across the processors and apply the results to the Z buffer efficiently in parallel requires special virtual routing (sort computation) techniques developed by the author especially for use on single-instruction multiple-data (SIMD) architectures.
Bit-parallel arithmetic in a massively-parallel associative processor
NASA Technical Reports Server (NTRS)
Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.
1992-01-01
A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.
Dust Dynamics in Protoplanetary Disks: Parallel Computing with PVM
NASA Astrophysics Data System (ADS)
de La Fuente Marcos, Carlos; Barge, Pierre; de La Fuente Marcos, Raúl
2002-03-01
We describe a parallel version of our high-order-accuracy particle-mesh code for the simulation of collisionless protoplanetary disks. We use this code to carry out a massively parallel, two-dimensional, time-dependent, numerical simulation, which includes dust particles, to study the potential role of large-scale, gaseous vortices in protoplanetary disks. This noncollisional problem is easy to parallelize on message-passing multicomputer architectures. We performed the simulations on a cache-coherent nonuniform memory access Origin 2000 machine, using both the parallel virtual machine (PVM) and message-passing interface (MPI) message-passing libraries. Our performance analysis suggests that, for our problem, PVM is about 25% faster than MPI. Using PVM and MPI made it possible to reduce CPU time and increase code performance. This allows for simulations with a large number of particles (N ~ 105-106) in reasonable CPU times. The performances of our implementation of the pa! rallel code on an Origin 2000 supercomputer are presented and discussed. They exhibit very good speedup behavior and low load unbalancing. Our results confirm that giant gaseous vortices can play a dominant role in giant planet formation.
Large-scale large eddy simulation of nuclear reactor flows: Issues and perspectives
DOE Office of Scientific and Technical Information (OSTI.GOV)
Merzari, Elia; Obabko, Aleks; Fischer, Paul
Numerical simulation has been an intrinsic part of nuclear engineering research since its inception. In recent years a transition is occurring toward predictive, first-principle-based tools such as computational fluid dynamics. Even with the advent of petascale computing, however, such tools still have significant limitations. In the present work some of these issues, and in particular the presence of massive multiscale separation, are discussed, as well as some of the research conducted to mitigate them. Petascale simulations at high fidelity (large eddy simulation/direct numerical simulation) were conducted with the massively parallel spectral element code Nek5000 on a series of representative problems.more » These simulations shed light on the requirements of several types of simulation: (1) axial flow around fuel rods, with particular attention to wall effects; (2) natural convection in the primary vessel; and (3) flow in a rod bundle in the presence of spacing devices. Finally, the focus of the work presented here is on the lessons learned and the requirements to perform these simulations at exascale. Additional physical insight gained from these simulations is also emphasized.« less
Large-scale large eddy simulation of nuclear reactor flows: Issues and perspectives
Merzari, Elia; Obabko, Aleks; Fischer, Paul; ...
2016-11-03
Numerical simulation has been an intrinsic part of nuclear engineering research since its inception. In recent years a transition is occurring toward predictive, first-principle-based tools such as computational fluid dynamics. Even with the advent of petascale computing, however, such tools still have significant limitations. In the present work some of these issues, and in particular the presence of massive multiscale separation, are discussed, as well as some of the research conducted to mitigate them. Petascale simulations at high fidelity (large eddy simulation/direct numerical simulation) were conducted with the massively parallel spectral element code Nek5000 on a series of representative problems.more » These simulations shed light on the requirements of several types of simulation: (1) axial flow around fuel rods, with particular attention to wall effects; (2) natural convection in the primary vessel; and (3) flow in a rod bundle in the presence of spacing devices. Finally, the focus of the work presented here is on the lessons learned and the requirements to perform these simulations at exascale. Additional physical insight gained from these simulations is also emphasized.« less
Increasing the reach of forensic genetics with massively parallel sequencing.
Budowle, Bruce; Schmedes, Sarah E; Wendt, Frank R
2017-09-01
The field of forensic genetics has made great strides in the analysis of biological evidence related to criminal and civil matters. More so, the discipline has set a standard of performance and quality in the forensic sciences. The advent of massively parallel sequencing will allow the field to expand its capabilities substantially. This review describes the salient features of massively parallel sequencing and how it can impact forensic genetics. The features of this technology offer increased number and types of genetic markers that can be analyzed, higher throughput of samples, and the capability of targeting different organisms, all by one unifying methodology. While there are many applications, three are described where massively parallel sequencing will have immediate impact: molecular autopsy, microbial forensics and differentiation of monozygotic twins. The intent of this review is to expose the forensic science community to the potential enhancements that have or are soon to arrive and demonstrate the continued expansion the field of forensic genetics and its service in the investigation of legal matters.
Streaming parallel GPU acceleration of large-scale filter-based spiking neural networks.
Slażyński, Leszek; Bohte, Sander
2012-01-01
The arrival of graphics processing (GPU) cards suitable for massively parallel computing promises affordable large-scale neural network simulation previously only available at supercomputing facilities. While the raw numbers suggest that GPUs may outperform CPUs by at least an order of magnitude, the challenge is to develop fine-grained parallel algorithms to fully exploit the particulars of GPUs. Computation in a neural network is inherently parallel and thus a natural match for GPU architectures: given inputs, the internal state for each neuron can be updated in parallel. We show that for filter-based spiking neurons, like the Spike Response Model, the additive nature of membrane potential dynamics enables additional update parallelism. This also reduces the accumulation of numerical errors when using single precision computation, the native precision of GPUs. We further show that optimizing simulation algorithms and data structures to the GPU's architecture has a large pay-off: for example, matching iterative neural updating to the memory architecture of the GPU speeds up this simulation step by a factor of three to five. With such optimizations, we can simulate in better-than-realtime plausible spiking neural networks of up to 50 000 neurons, processing over 35 million spiking events per second.
Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.
Maruyama, Yutaka; Yoshida, Norio; Tadano, Hiroto; Takahashi, Daisuke; Sato, Mitsuhisa; Hirata, Fumio
2014-07-05
A new three-dimensional reference interaction site model (3D-RISM) program for massively parallel machines combined with the volumetric 3D fast Fourier transform (3D-FFT) was developed, and tested on the RIKEN K supercomputer. The ordinary parallel 3D-RISM program has a limitation on the number of parallelizations because of the limitations of the slab-type 3D-FFT. The volumetric 3D-FFT relieves this limitation drastically. We tested the 3D-RISM calculation on the large and fine calculation cell (2048(3) grid points) on 16,384 nodes, each having eight CPU cores. The new 3D-RISM program achieved excellent scalability to the parallelization, running on the RIKEN K supercomputer. As a benchmark application, we employed the program, combined with molecular dynamics simulation, to analyze the oligomerization process of chymotrypsin Inhibitor 2 mutant. The results demonstrate that the massive parallel 3D-RISM program is effective to analyze the hydration properties of the large biomolecular systems. Copyright © 2014 Wiley Periodicals, Inc.
Parallel computing of a climate model on the dawn 1000 by domain decomposition method
NASA Astrophysics Data System (ADS)
Bi, Xunqiang
1997-12-01
In this paper the parallel computing of a grid-point nine-level atmospheric general circulation model on the Dawn 1000 is introduced. The model was developed by the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences (CAS). The Dawn 1000 is a MIMD massive parallel computer made by National Research Center for Intelligent Computer (NCIC), CAS. A two-dimensional domain decomposition method is adopted to perform the parallel computing. The potential ways to increase the speed-up ratio and exploit more resources of future massively parallel supercomputation are also discussed.
A sweep algorithm for massively parallel simulation of circuit-switched networks
NASA Technical Reports Server (NTRS)
Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.
1992-01-01
A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.
Multi-threading: A new dimension to massively parallel scientific computation
NASA Astrophysics Data System (ADS)
Nielsen, Ida M. B.; Janssen, Curtis L.
2000-06-01
Multi-threading is becoming widely available for Unix-like operating systems, and the application of multi-threading opens new ways for performing parallel computations with greater efficiency. We here briefly discuss the principles of multi-threading and illustrate the application of multi-threading for a massively parallel direct four-index transformation of electron repulsion integrals. Finally, other potential applications of multi-threading in scientific computing are outlined.
Neural Parallel Engine: A toolbox for massively parallel neural signal processing.
Tam, Wing-Kin; Yang, Zhi
2018-05-01
Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.
Massively parallel support for a case-based planning system
NASA Technical Reports Server (NTRS)
Kettler, Brian P.; Hendler, James A.; Anderson, William A.
1993-01-01
Case-based planning (CBP), a kind of case-based reasoning, is a technique in which previously generated plans (cases) are stored in memory and can be reused to solve similar planning problems in the future. CBP can save considerable time over generative planning, in which a new plan is produced from scratch. CBP thus offers a potential (heuristic) mechanism for handling intractable problems. One drawback of CBP systems has been the need for a highly structured memory to reduce retrieval times. This approach requires significant domain engineering and complex memory indexing schemes to make these planners efficient. In contrast, our CBP system, CaPER, uses a massively parallel frame-based AI language (PARKA) and can do extremely fast retrieval of complex cases from a large, unindexed memory. The ability to do fast, frequent retrievals has many advantages: indexing is unnecessary; very large case bases can be used; memory can be probed in numerous alternate ways; and queries can be made at several levels, allowing more specific retrieval of stored plans that better fit the target problem with less adaptation. In this paper we describe CaPER's case retrieval techniques and some experimental results showing its good performance, even on large case bases.
The CP-PACS Project and Lattice QCD Results
NASA Astrophysics Data System (ADS)
Iwasaki, Y.
The aim of the CP-PACS project was to develop a massively parallel computer for performing numerical research in computational physics with primary emphasis on lattice QCD. The CP-PACS computer with a peak speed of 614 GFLOPS with 2048 processors was completed in September 1996, and has been in full operation since October 1996. We present an overview of the CP-PACS project and describe characteristics of the CP-PACS computer. The CP-PACS has been mainly used for hadron spectroscopy studies in lattice QCD. Main results in lattice QCD simulations are given.
Multidisciplinary propulsion simulation using NPSS
NASA Technical Reports Server (NTRS)
Claus, Russell W.; Evans, Austin L.; Follen, Gregory J.
1992-01-01
The current status of the Numerical Propulsion System Simulation (NPSS) program, a cooperative effort of NASA, industry, and universities to reduce the cost and time of advanced technology propulsion system development, is reviewed. The technologies required for this program include (1) interdisciplinary analysis to couple the relevant disciplines, such as aerodynamics, structures, heat transfer, combustion, acoustics, controls, and materials; (2) integrated systems analysis; (3) a high-performance computing platform, including massively parallel processing; and (4) a simulation environment providing a user-friendly interface. Several research efforts to develop these technologies are discussed.
GPU COMPUTING FOR PARTICLE TRACKING
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nishimura, Hiroshi; Song, Kai; Muriki, Krishna
2011-03-25
This is a feasibility study of using a modern Graphics Processing Unit (GPU) to parallelize the accelerator particle tracking code. To demonstrate the massive parallelization features provided by GPU computing, a simplified TracyGPU program is developed for dynamic aperture calculation. Performances, issues, and challenges from introducing GPU are also discussed. General purpose Computation on Graphics Processing Units (GPGPU) bring massive parallel computing capabilities to numerical calculation. However, the unique architecture of GPU requires a comprehensive understanding of the hardware and programming model to be able to well optimize existing applications. In the field of accelerator physics, the dynamic aperture calculationmore » of a storage ring, which is often the most time consuming part of the accelerator modeling and simulation, can benefit from GPU due to its embarrassingly parallel feature, which fits well with the GPU programming model. In this paper, we use the Tesla C2050 GPU which consists of 14 multi-processois (MP) with 32 cores on each MP, therefore a total of 448 cores, to host thousands ot threads dynamically. Thread is a logical execution unit of the program on GPU. In the GPU programming model, threads are grouped into a collection of blocks Within each block, multiple threads share the same code, and up to 48 KB of shared memory. Multiple thread blocks form a grid, which is executed as a GPU kernel. A simplified code that is a subset of Tracy++ [2] is developed to demonstrate the possibility of using GPU to speed up the dynamic aperture calculation by having each thread track a particle.« less
NASA Technical Reports Server (NTRS)
Hussaini, M. Y. (Editor); Kumar, A. (Editor); Salas, M. D. (Editor)
1993-01-01
The purpose here is to assess the state of the art in the areas of numerical analysis that are particularly relevant to computational fluid dynamics (CFD), to identify promising new developments in various areas of numerical analysis that will impact CFD, and to establish a long-term perspective focusing on opportunities and needs. Overviews are given of discretization schemes, computational fluid dynamics, algorithmic trends in CFD for aerospace flow field calculations, simulation of compressible viscous flow, and massively parallel computation. Also discussed are accerelation methods, spectral and high-order methods, multi-resolution and subcell resolution schemes, and inherently multidimensional schemes.
The language parallel Pascal and other aspects of the massively parallel processor
NASA Technical Reports Server (NTRS)
Reeves, A. P.; Bruner, J. D.
1982-01-01
A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Earth Sciences Division; Zhang, Keni; Zhang, Keni
TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator ismore » to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used. To familiarize users with the parallel code, illustrative sample problems are presented.« less
The factorization of large composite numbers on the MPP
NASA Technical Reports Server (NTRS)
Mckurdy, Kathy J.; Wunderlich, Marvin C.
1987-01-01
The continued fraction method for factoring large integers (CFRAC) was an ideal algorithm to be implemented on a massively parallel computer such as the Massively Parallel Processor (MPP). After much effort, the first 60 digit number was factored on the MPP using about 6 1/2 hours of array time. Although this result added about 10 digits to the size number that could be factored using CFRAC on a serial machine, it was already badly beaten by the implementation of Davis and Holdridge on the CRAY-1 using the quadratic sieve, an algorithm which is clearly superior to CFRAC for large numbers. An algorithm is illustrated which is ideally suited to the single instruction multiple data (SIMD) massively parallel architecture and some of the modifications which were needed in order to make the parallel implementation effective and efficient are described.
Matthew Parks; Richard Cronn; Aaron Liston
2009-01-01
We reconstruct the infrageneric phylogeny of Pinus from 37 nearly-complete chloroplast genomes (average 109 kilobases each of an approximately 120 kilobase genome) generated using multiplexed massively parallel sequencing. We found that 30/33 ingroup nodes resolved wlth > 95-percent bootstrap support; this is a substantial improvement relative...
Fast, Massively Parallel Data Processors
NASA Technical Reports Server (NTRS)
Heaton, Robert A.; Blevins, Donald W.; Davis, ED
1994-01-01
Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.
Massively Parallel Solution of Poisson Equation on Coarse Grain MIMD Architectures
NASA Technical Reports Server (NTRS)
Fijany, A.; Weinberger, D.; Roosta, R.; Gulati, S.
1998-01-01
In this paper a new algorithm, designated as Fast Invariant Imbedding algorithm, for solution of Poisson equation on vector and massively parallel MIMD architectures is presented. This algorithm achieves the same optimal computational efficiency as other Fast Poisson solvers while offering a much better structure for vector and parallel implementation. Our implementation on the Intel Delta and Paragon shows that a speedup of over two orders of magnitude can be achieved even for moderate size problems.
Proxy-equation paradigm: A strategy for massively parallel asynchronous computations
NASA Astrophysics Data System (ADS)
Mittal, Ankita; Girimaji, Sharath
2017-09-01
Massively parallel simulations of transport equation systems call for a paradigm change in algorithm development to achieve efficient scalability. Traditional approaches require time synchronization of processing elements (PEs), which severely restricts scalability. Relaxing synchronization requirement introduces error and slows down convergence. In this paper, we propose and develop a novel "proxy equation" concept for a general transport equation that (i) tolerates asynchrony with minimal added error, (ii) preserves convergence order and thus, (iii) expected to scale efficiently on massively parallel machines. The central idea is to modify a priori the transport equation at the PE boundaries to offset asynchrony errors. Proof-of-concept computations are performed using a one-dimensional advection (convection) diffusion equation. The results demonstrate the promise and advantages of the present strategy.
Helical vortices generated by flapping wings of bumblebees
NASA Astrophysics Data System (ADS)
Engels, Thomas; Kolomenskiy, Dmitry; Schneider, Kai; Farge, Marie; Lehmann, Fritz-Olaf; Sesterhenn, Jörn
2018-02-01
High resolution direct numerical simulations of rotating and flapping bumblebee wings are presented and their aerodynamics is studied focusing on the role of leading edge vortices and the associated helicity production. We first study the flow generated by only one rotating bumblebee wing in circular motion with 45◦ angle of attack. We then consider a model bumblebee flying in a numerical wind tunnel, which is tethered and has rigid wings flapping with a prescribed generic motion. The inflow condition of the wind varies from laminar to strongly turbulent regimes. Massively parallel simulations show that inflow turbulence does not significantly alter the wings’ leading edge vortex, which enhances lift production. Finally, we focus on studying the helicity of the generated vortices and analyze their contribution at different scales using orthogonal wavelets.
Relativistic Collisions of Highly-Charged Ions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ionescu, Dorin; Belkacem, Ali
1998-11-19
The physics of elementary atomic processes in relativistic collisions between highly-charged ions and atoms or other ions is briefly discussed, and some recent theoretical and experimental results in this field are summarized. They include excitation, capture, ionization, and electron-positron pair creation. The numerical solution of the two-center Dirac equation in momentum space is shown to be a powerful nonperturbative method for describing atomic processes in relativistic collisions involving heavy and highly-charged ions. By propagating negative-energy wave packets in time the evolution of the QED vacuum around heavy ions in relativistic motion is investigated. Recent results obtained from numerical calculations usingmore » massively parallel processing on the Cray-T3E supercomputer of the National Energy Research Scientific Computer Center (NERSC) at Berkeley National Laboratory are presented.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xiong, Yi; Fakcharoenphol, Perapon; Wang, Shihao
2013-12-01
TOUGH2-EGS-MP is a parallel numerical simulation program coupling geomechanics with fluid and heat flow in fractured and porous media, and is applicable for simulation of enhanced geothermal systems (EGS). TOUGH2-EGS-MP is based on the TOUGH2-MP code, the massively parallel version of TOUGH2. In TOUGH2-EGS-MP, the fully-coupled flow-geomechanics model is developed from linear elastic theory for thermo-poro-elastic systems and is formulated in terms of mean normal stress as well as pore pressure and temperature. Reservoir rock properties such as porosity and permeability depend on rock deformation, and the relationships between these two, obtained from poro-elasticity theories and empirical correlations, are incorporatedmore » into the simulation. This report provides the user with detailed information on the TOUGH2-EGS-MP mathematical model and instructions for using it for Thermal-Hydrological-Mechanical (THM) simulations. The mathematical model includes the fluid and heat flow equations, geomechanical equation, and discretization of those equations. In addition, the parallel aspects of the code, such as domain partitioning and communication between processors, are also included. Although TOUGH2-EGS-MP has the capability for simulating fluid and heat flows coupled with geomechanical effects, it is up to the user to select the specific coupling process, such as THM or only TH, in a simulation. There are several example problems illustrating applications of this program. These example problems are described in detail and their input data are presented. Their results demonstrate that this program can be used for field-scale geothermal reservoir simulation in porous and fractured media with fluid and heat flow coupled with geomechanical effects.« less
Haimovich, Adrian D.; Muir, Paul; Isaacs, Farren J.
2016-01-01
Next-generation DNA sequencing has revealed the complete genome sequences of numerous organisms, establishing a fundamental and growing understanding of genetic variation and phenotypic diversity. Engineering at the gene, network and whole-genome scale aims to introduce targeted genetic changes both to explore emergent phenotypes and to introduce new functionalities. Expansion of these approaches into massively parallel platforms establishes the ability to generate targeted genome modifications, elucidating causal links between genotype and phenotype, as well as the ability to design and reprogramme organisms. In this Review, we explore techniques and applications in genome engineering, outlining key advances and defining challenges. PMID:26260262
2013-08-01
potential for HMX / RDX (3, 9). ...................................................................................8 1 1. Purpose This work...6 dispersion and electrostatic interactions. Constants for the SB potential are given in table 1. 8 Table 1. SB potential for HMX / RDX (3, 9...modeling dislocations in the energetic molecular crystal RDX using the Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) molecular
ALEGRA -- A massively parallel h-adaptive code for solid dynamics
DOE Office of Scientific and Technical Information (OSTI.GOV)
Summers, R.M.; Wong, M.K.; Boucheron, E.A.
1997-12-31
ALEGRA is a multi-material, arbitrary-Lagrangian-Eulerian (ALE) code for solid dynamics designed to run on massively parallel (MP) computers. It combines the features of modern Eulerian shock codes, such as CTH, with modern Lagrangian structural analysis codes using an unstructured grid. ALEGRA is being developed for use on the teraflop supercomputers to conduct advanced three-dimensional (3D) simulations of shock phenomena important to a variety of systems. ALEGRA was designed with the Single Program Multiple Data (SPMD) paradigm, in which the mesh is decomposed into sub-meshes so that each processor gets a single sub-mesh with approximately the same number of elements. Usingmore » this approach the authors have been able to produce a single code that can scale from one processor to thousands of processors. A current major effort is to develop efficient, high precision simulation capabilities for ALEGRA, without the computational cost of using a global highly resolved mesh, through flexible, robust h-adaptivity of finite elements. H-adaptivity is the dynamic refinement of the mesh by subdividing elements, thus changing the characteristic element size and reducing numerical error. The authors are working on several major technical challenges that must be met to make effective use of HAMMER on MP computers.« less
Algorithms and programming tools for image processing on the MPP
NASA Technical Reports Server (NTRS)
Reeves, A. P.
1985-01-01
Topics addressed include: data mapping and rotational algorithms for the Massively Parallel Processor (MPP); Parallel Pascal language; documentation for the Parallel Pascal Development system; and a description of the Parallel Pascal language used on the MPP.
Parallelized direct execution simulation of message-passing parallel programs
NASA Technical Reports Server (NTRS)
Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.
1994-01-01
As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
The architecture of tomorrow's massively parallel computer
NASA Technical Reports Server (NTRS)
Batcher, Ken
1987-01-01
Goodyear Aerospace delivered the Massively Parallel Processor (MPP) to NASA/Goddard in May 1983, over three years ago. Ever since then, Goodyear has tried to look in a forward direction. There is always some debate as to which way is forward when it comes to supercomputer architecture. Improvements to the MPP's massively parallel architecture are discussed in the areas of data I/O, memory capacity, connectivity, and indirect (or local) addressing. In I/O, transfer rates up to 640 megabytes per second can be achieved. There are devices that can supply the data and accept it at this rate. The memory capacity can be increased up to 128 megabytes in the ARU and over a gigabyte in the staging memory. For connectivity, there are several different kinds of multistage networks that should be considered.
NASA Technical Reports Server (NTRS)
Reinsch, K. G. (Editor); Schmidt, W. (Editor); Ecer, A. (Editor); Haeuser, Jochem (Editor); Periaux, J. (Editor)
1992-01-01
A conference was held on parallel computational fluid dynamics and produced related papers. Topics discussed in these papers include: parallel implicit and explicit solvers for compressible flow, parallel computational techniques for Euler and Navier-Stokes equations, grid generation techniques for parallel computers, and aerodynamic simulation om massively parallel systems.
NASA Technical Reports Server (NTRS)
Barnden, John; Srinivas, Kankanahalli
1990-01-01
Symbol manipulation as used in traditional Artificial Intelligence has been criticized by neural net researchers for being excessively inflexible and sequential. On the other hand, the application of neural net techniques to the types of high-level cognitive processing studied in traditional artificial intelligence presents major problems as well. A promising way out of this impasse is to build neural net models that accomplish massively parallel case-based reasoning. Case-based reasoning, which has received much attention recently, is essentially the same as analogy-based reasoning, and avoids many of the problems leveled at traditional artificial intelligence. Further problems are avoided by doing many strands of case-based reasoning in parallel, and by implementing the whole system as a neural net. In addition, such a system provides an approach to some aspects of the problems of noise, uncertainty and novelty in reasoning systems. The current neural net system (Conposit), which performs standard rule-based reasoning, is being modified into a massively parallel case-based reasoning version.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sierra Thermal/Fluid Team
SIERRA/Aero is a compressible fluid dynamics program intended to solve a wide variety compressible fluid flows including transonic and hypersonic problems. This document describes the commands for assembling a fluid model for analysis with this module, henceforth referred to simply as Aero for brevity. Aero is an application developed using the SIERRA Toolkit (STK). The intent of STK is to provide a set of tools for handling common tasks that programmers encounter when developing a code for numerical simulation. For example, components of STK provide field allocation and management, and parallel input/output of field and mesh data. These services alsomore » allow the development of coupled mechanics analysis software for a massively parallel computing environment. In the definitions of the commands that follow, the term Real_Max denotes the largest floating point value that can be represented on a given computer. Int_Max is the largest such integer value.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Griebel, M., E-mail: griebel@ins.uni-bonn.de, E-mail: ruettgers@ins.uni-bonn.de; Rüttgers, A., E-mail: griebel@ins.uni-bonn.de, E-mail: ruettgers@ins.uni-bonn.de
The multiscale FENE model is applied to a 3D square-square contraction flow problem. For this purpose, the stochastic Brownian configuration field method (BCF) has been coupled with our fully parallelized three-dimensional Navier-Stokes solver NaSt3DGPF. The robustness of the BCF method enables the numerical simulation of high Deborah number flows for which most macroscopic methods suffer from stability issues. The results of our simulations are compared with that of experimental measurements from literature and show a very good agreement. In particular, flow phenomena such as a strong vortex enhancement, streamline divergence and a flow inversion for highly elastic flows are reproduced.more » Due to their computational complexity, our simulations require massively parallel computations. Using a domain decomposition approach with MPI, the implementation achieves excellent scale-up results for up to 128 processors.« less
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Performance of the Heavy Flavor Tracker (HFT) detector in star experiment at RHIC
NASA Astrophysics Data System (ADS)
Alruwaili, Manal
With the growing technology, the number of the processors is becoming massive. Current supercomputer processing will be available on desktops in the next decade. For mass scale application software development on massive parallel computing available on desktops, existing popular languages with large libraries have to be augmented with new constructs and paradigms that exploit massive parallel computing and distributed memory models while retaining the user-friendliness. Currently, available object oriented languages for massive parallel computing such as Chapel, X10 and UPC++ exploit distributed computing, data parallel computing and thread-parallelism at the process level in the PGAS (Partitioned Global Address Space) memory model. However, they do not incorporate: 1) any extension at for object distribution to exploit PGAS model; 2) the programs lack the flexibility of migrating or cloning an object between places to exploit load balancing; and 3) lack the programming paradigms that will result from the integration of data and thread-level parallelism and object distribution. In the proposed thesis, I compare different languages in PGAS model; propose new constructs that extend C++ with object distribution and object migration; and integrate PGAS based process constructs with these extensions on distributed objects. Object cloning and object migration. Also a new paradigm MIDD (Multiple Invocation Distributed Data) is presented when different copies of the same class can be invoked, and work on different elements of a distributed data concurrently using remote method invocations. I present new constructs, their grammar and their behavior. The new constructs have been explained using simple programs utilizing these constructs.
Role of APOE Isoforms in the Pathogenesis of TBI induced Alzheimer’s Disease
2016-10-01
deletion, APOE targeted replacement, complex breeding, CCI model optimization, mRNA library generation, high throughput massive parallel sequencing...demonstrate that the lack of Abca1 increases amyloid plaques and decreased APOE protein levels in AD-model mice. In this proposal we will test the hypothesis...injury, inflammatory reaction, transcriptome, high throughput massive parallel sequencing, mRNA-seq., behavioral testing, memory impairment, recovery 3
2010-10-14
High-Resolution Functional Mapping of the Venezuelan Equine Encephalitis Virus Genome by Insertional Mutagenesis and Massively Parallel Sequencing...Venezuelan equine encephalitis virus (VEEV) genome. We initially used a capillary electrophoresis method to gain insight into the role of the VEEV...Smith JM, Schmaljohn CS (2010) High-Resolution Functional Mapping of the Venezuelan Equine Encephalitis Virus Genome by Insertional Mutagenesis and
2012-10-01
using the open-source code Large-scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) (http://lammps.sandia.gov) (23). The commercial...parameters are proprietary and cannot be ported to the LAMMPS 4 simulation code. In our molecular dynamics simulations at the atomistic resolution, we...IBI iterative Boltzmann inversion LAMMPS Large-scale Atomic/Molecular Massively Parallel Simulator MAPS Materials Processes and Simulations MS
NASA Astrophysics Data System (ADS)
Heister, Timo; Dannberg, Juliane; Gassmöller, Rene; Bangerth, Wolfgang
2017-08-01
Computations have helped elucidate the dynamics of Earth's mantle for several decades already. The numerical methods that underlie these simulations have greatly evolved within this time span, and today include dynamically changing and adaptively refined meshes, sophisticated and efficient solvers, and parallelization to large clusters of computers. At the same time, many of the methods - discussed in detail in a previous paper in this series - were developed and tested primarily using model problems that lack many of the complexities that are common to the realistic models our community wants to solve today. With several years of experience solving complex and realistic models, we here revisit some of the algorithm designs of the earlier paper and discuss the incorporation of more complex physics. In particular, we re-consider time stepping and mesh refinement algorithms, evaluate approaches to incorporate compressibility, and discuss dealing with strongly varying material coefficients, latent heat, and how to track chemical compositions and heterogeneities. Taken together and implemented in a high-performance, massively parallel code, the techniques discussed in this paper then allow for high resolution, 3-D, compressible, global mantle convection simulations with phase transitions, strongly temperature dependent viscosity and realistic material properties based on mineral physics data.
GRAVIDY, a GPU modular, parallel direct-summation N-body integrator: dynamics with softening
NASA Astrophysics Data System (ADS)
Maureira-Fredes, Cristián; Amaro-Seoane, Pau
2018-01-01
A wide variety of outstanding problems in astrophysics involve the motion of a large number of particles under the force of gravity. These include the global evolution of globular clusters, tidal disruptions of stars by a massive black hole, the formation of protoplanets and sources of gravitational radiation. The direct-summation of N gravitational forces is a complex problem with no analytical solution and can only be tackled with approximations and numerical methods. To this end, the Hermite scheme is a widely used integration method. With different numerical techniques and special-purpose hardware, it can be used to speed up the calculations. But these methods tend to be computationally slow and cumbersome to work with. We present a new graphics processing unit (GPU), direct-summation N-body integrator written from scratch and based on this scheme, which includes relativistic corrections for sources of gravitational radiation. GRAVIDY has high modularity, allowing users to readily introduce new physics, it exploits available computational resources and will be maintained by regular updates. GRAVIDY can be used in parallel on multiple CPUs and GPUs, with a considerable speed-up benefit. The single-GPU version is between one and two orders of magnitude faster than the single-CPU version. A test run using four GPUs in parallel shows a speed-up factor of about 3 as compared to the single-GPU version. The conception and design of this first release is aimed at users with access to traditional parallel CPU clusters or computational nodes with one or a few GPU cards.
Template based parallel checkpointing in a massively parallel computer system
Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN
2009-01-13
A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
3D brain tumor localization and parameter estimation using thermographic approach on GPU.
Bousselham, Abdelmajid; Bouattane, Omar; Youssfi, Mohamed; Raihani, Abdelhadi
2018-01-01
The aim of this paper is to present a GPU parallel algorithm for brain tumor detection to estimate its size and location from surface temperature distribution obtained by thermography. The normal brain tissue is modeled as a rectangular cube including spherical tumor. The temperature distribution is calculated using forward three dimensional Pennes bioheat transfer equation, it's solved using massively parallel Finite Difference Method (FDM) and implemented on Graphics Processing Unit (GPU). Genetic Algorithm (GA) was used to solve the inverse problem and estimate the tumor size and location by minimizing an objective function involving measured temperature on the surface to those obtained by numerical simulation. The parallel implementation of Finite Difference Method reduces significantly the time of bioheat transfer and greatly accelerates the inverse identification of brain tumor thermophysical and geometrical properties. Experimental results show significant gains in the computational speed on GPU and achieve a speedup of around 41 compared to the CPU. The analysis performance of the estimation based on tumor size inside brain tissue also presented. Copyright © 2017 Elsevier Ltd. All rights reserved.
Reconstructing evolutionary trees in parallel for massive sequences.
Zou, Quan; Wan, Shixiang; Zeng, Xiangxiang; Ma, Zhanshan Sam
2017-12-14
Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/ .
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ohsuga, Ken; Takahashi, Hiroyuki R.
2016-02-20
We develop a numerical scheme for solving the equations of fully special relativistic, radiation magnetohydrodynamics (MHDs), in which the frequency-integrated, time-dependent radiation transfer equation is solved to calculate the specific intensity. The radiation energy density, the radiation flux, and the radiation stress tensor are obtained by the angular quadrature of the intensity. In the present method, conservation of total mass, momentum, and energy of the radiation magnetofluids is guaranteed. We treat not only the isotropic scattering but also the Thomson scattering. The numerical method of MHDs is the same as that of our previous work. The advection terms are explicitlymore » solved, and the source terms, which describe the gas–radiation interaction, are implicitly integrated. Our code is suitable for massive parallel computing. We present that our code shows reasonable results in some numerical tests for propagating radiation and radiation hydrodynamics. Particularly, the correct solution is given even in the optically very thin or moderately thin regimes, and the special relativistic effects are nicely reproduced.« less
Load balancing for massively-parallel soft-real-time systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hailperin, M.
1988-09-01
Global load balancing, if practical, would allow the effective use of massively-parallel ensemble architectures for large soft-real-problems. The challenge is to replace quick global communications, which is impractical in a massively-parallel system, with statistical techniques. In this vein, the author proposes a novel approach to decentralized load balancing based on statistical time-series analysis. Each site estimates the system-wide average load using information about past loads of individual sites and attempts to equal that average. This estimation process is practical because the soft-real-time systems of interest naturally exhibit loads that are periodic, in a statistical sense akin to seasonality in econometrics.more » It is shown how this load-characterization technique can be the foundation for a load-balancing system in an architecture employing cut-through routing and an efficient multicast protocol.« less
Evaluation of massively parallel sequencing for forensic DNA methylation profiling.
Richards, Rebecca; Patel, Jayshree; Stevenson, Kate; Harbison, SallyAnn
2018-05-11
Epigenetics is an emerging area of interest in forensic science. DNA methylation, a type of epigenetic modification, can be applied to chronological age estimation, identical twin differentiation and body fluid identification. However, there is not yet an agreed, established methodology for targeted detection and analysis of DNA methylation markers in forensic research. Recently a massively parallel sequencing-based approach has been suggested. The use of massively parallel sequencing is well established in clinical epigenetics and is emerging as a new technology in the forensic field. This review investigates the potential benefits, limitations and considerations of this technique for the analysis of DNA methylation in a forensic context. The importance of a robust protocol, regardless of the methodology used, that minimises potential sources of bias is highlighted. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Supercomputing on massively parallel bit-serial architectures
NASA Technical Reports Server (NTRS)
Iobst, Ken
1985-01-01
Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.
Zhang, Hong; Zapol, Peter; Dixon, David A.; ...
2015-11-17
The Shift-and-invert parallel spectral transformations (SIPs), a computational approach to solve sparse eigenvalue problems, is developed for massively parallel architectures with exceptional parallel scalability and robustness. The capabilities of SIPs are demonstrated by diagonalization of density-functional based tight-binding (DFTB) Hamiltonian and overlap matrices for single-wall metallic carbon nanotubes, diamond nanowires, and bulk diamond crystals. The largest (smallest) example studied is a 128,000 (2000) atom nanotube for which ~330,000 (~5600) eigenvalues and eigenfunctions are obtained in ~190 (~5) seconds when parallelized over 266,144 (16,384) Blue Gene/Q cores. Weak scaling and strong scaling of SIPs are analyzed and the performance of SIPsmore » is compared with other novel methods. Different matrix ordering methods are investigated to reduce the cost of the factorization step, which dominates the time-to-solution at the strong scaling limit. As a result, a parallel implementation of assembling the density matrix from the distributed eigenvectors is demonstrated.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Hong; Zapol, Peter; Dixon, David A.
The Shift-and-invert parallel spectral transformations (SIPs), a computational approach to solve sparse eigenvalue problems, is developed for massively parallel architectures with exceptional parallel scalability and robustness. The capabilities of SIPs are demonstrated by diagonalization of density-functional based tight-binding (DFTB) Hamiltonian and overlap matrices for single-wall metallic carbon nanotubes, diamond nanowires, and bulk diamond crystals. The largest (smallest) example studied is a 128,000 (2000) atom nanotube for which ~330,000 (~5600) eigenvalues and eigenfunctions are obtained in ~190 (~5) seconds when parallelized over 266,144 (16,384) Blue Gene/Q cores. Weak scaling and strong scaling of SIPs are analyzed and the performance of SIPsmore » is compared with other novel methods. Different matrix ordering methods are investigated to reduce the cost of the factorization step, which dominates the time-to-solution at the strong scaling limit. As a result, a parallel implementation of assembling the density matrix from the distributed eigenvectors is demonstrated.« less
A Novel Implementation of Massively Parallel Three Dimensional Monte Carlo Radiation Transport
NASA Astrophysics Data System (ADS)
Robinson, P. B.; Peterson, J. D. L.
2005-12-01
The goal of our summer project was to implement the difference formulation for radiation transport into Cosmos++, a multidimensional, massively parallel, magneto hydrodynamics code for astrophysical applications (Peter Anninos - AX). The difference formulation is a new method for Symbolic Implicit Monte Carlo thermal transport (Brooks and Szöke - PAT). Formerly, simultaneous implementation of fully implicit Monte Carlo radiation transport in multiple dimensions on multiple processors had not been convincingly demonstrated. We found that a combination of the difference formulation and the inherent structure of Cosmos++ makes such an implementation both accurate and straightforward. We developed a "nearly nearest neighbor physics" technique to allow each processor to work independently, even with a fully implicit code. This technique coupled with the increased accuracy of an implicit Monte Carlo solution and the efficiency of parallel computing systems allows us to demonstrate the possibility of massively parallel thermal transport. This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48
NASA Astrophysics Data System (ADS)
Furuichi, M.; Nishiura, D.
2015-12-01
Fully Lagrangian methods such as Smoothed Particle Hydrodynamics (SPH) and Discrete Element Method (DEM) have been widely used to solve the continuum and particles motions in the computational geodynamics field. These mesh-free methods are suitable for the problems with the complex geometry and boundary. In addition, their Lagrangian nature allows non-diffusive advection useful for tracking history dependent properties (e.g. rheology) of the material. These potential advantages over the mesh-based methods offer effective numerical applications to the geophysical flow and tectonic processes, which are for example, tsunami with free surface and floating body, magma intrusion with fracture of rock, and shear zone pattern generation of granular deformation. In order to investigate such geodynamical problems with the particle based methods, over millions to billion particles are required for the realistic simulation. Parallel computing is therefore important for handling such huge computational cost. An efficient parallel implementation of SPH and DEM methods is however known to be difficult especially for the distributed-memory architecture. Lagrangian methods inherently show workload imbalance problem for parallelization with the fixed domain in space, because particles move around and workloads change during the simulation. Therefore dynamic load balance is key technique to perform the large scale SPH and DEM simulation. In this work, we present the parallel implementation technique of SPH and DEM method utilizing dynamic load balancing algorithms toward the high resolution simulation over large domain using the massively parallel super computer system. Our method utilizes the imbalances of the executed time of each MPI process as the nonlinear term of parallel domain decomposition and minimizes them with the Newton like iteration method. In order to perform flexible domain decomposition in space, the slice-grid algorithm is used. Numerical tests show that our approach is suitable for solving the particles with different calculation costs (e.g. boundary particles) as well as the heterogeneous computer architecture. We analyze the parallel efficiency and scalability on the super computer systems (K-computer, Earth simulator 3, etc.).
NASA Astrophysics Data System (ADS)
Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian
2018-01-01
We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.
Logarithmic Superdiffusion in Two Dimensional Driven Lattice Gases
NASA Astrophysics Data System (ADS)
Krug, J.; Neiss, R. A.; Schadschneider, A.; Schmidt, J.
2018-03-01
The spreading of density fluctuations in two-dimensional driven diffusive systems is marginally anomalous. Mode coupling theory predicts that the diffusivity in the direction of the drive diverges with time as (ln t)^{2/3} with a prefactor depending on the macroscopic current-density relation and the diffusion tensor of the fluctuating hydrodynamic field equation. Here we present the first numerical verification of this behavior for a particular version of the two-dimensional asymmetric exclusion process. Particles jump strictly asymmetrically along one of the lattice directions and symmetrically along the other, and an anisotropy parameter p governs the ratio between the two rates. Using a novel massively parallel coupling algorithm that strongly reduces the fluctuations in the numerical estimate of the two-point correlation function, we are able to accurately determine the exponent of the logarithmic correction. In addition, the variation of the prefactor with p provides a stringent test of mode coupling theory.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rutland, Christopher J.
2009-04-26
The Terascale High-Fidelity Simulations of Turbulent Combustion (TSTC) project is a multi-university collaborative effort to develop a high-fidelity turbulent reacting flow simulation capability utilizing terascale, massively parallel computer technology. The main paradigm of the approach is direct numerical simulation (DNS) featuring the highest temporal and spatial accuracy, allowing quantitative observations of the fine-scale physics found in turbulent reacting flows as well as providing a useful tool for development of sub-models needed in device-level simulations. Under this component of the TSTC program the simulation code named S3D, developed and shared with coworkers at Sandia National Laboratories, has been enhanced with newmore » numerical algorithms and physical models to provide predictive capabilities for turbulent liquid fuel spray dynamics. Major accomplishments include improved fundamental understanding of mixing and auto-ignition in multi-phase turbulent reactant mixtures and turbulent fuel injection spray jets.« less
Using CLIPS in the domain of knowledge-based massively parallel programming
NASA Technical Reports Server (NTRS)
Dvorak, Jiri J.
1994-01-01
The Program Development Environment (PDE) is a tool for massively parallel programming of distributed-memory architectures. Adopting a knowledge-based approach, the PDE eliminates the complexity introduced by parallel hardware with distributed memory and offers complete transparency in respect of parallelism exploitation. The knowledge-based part of the PDE is realized in CLIPS. Its principal task is to find an efficient parallel realization of the application specified by the user in a comfortable, abstract, domain-oriented formalism. A large collection of fine-grain parallel algorithmic skeletons, represented as COOL objects in a tree hierarchy, contains the algorithmic knowledge. A hybrid knowledge base with rule modules and procedural parts, encoding expertise about application domain, parallel programming, software engineering, and parallel hardware, enables a high degree of automation in the software development process. In this paper, important aspects of the implementation of the PDE using CLIPS and COOL are shown, including the embedding of CLIPS with C++-based parts of the PDE. The appropriateness of the chosen approach and of the CLIPS language for knowledge-based software engineering are discussed.
Massively parallel sparse matrix function calculations with NTPoly
NASA Astrophysics Data System (ADS)
Dawson, William; Nakajima, Takahito
2018-04-01
We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
Porting AMG2013 to Heterogeneous CPU+GPU Nodes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Samfass, Philipp
LLNL's future advanced technology system SIERRA will feature heterogeneous compute nodes that consist of IBM PowerV9 CPUs and NVIDIA Volta GPUs. Conceptually, the motivation for such an architecture is quite straightforward: While GPUs are optimized for throughput on massively parallel workloads, CPUs strive to minimize latency for rather sequential operations. Yet, making optimal use of heterogeneous architectures raises new challenges for the development of scalable parallel software, e.g., with respect to work distribution. Porting LLNL's parallel numerical libraries to upcoming heterogeneous CPU+GPU architectures is therefore a critical factor for ensuring LLNL's future success in ful lling its national mission. Onemore » of these libraries, called HYPRE, provides parallel solvers and precondi- tioners for large, sparse linear systems of equations. In the context of this intern- ship project, I consider AMG2013 which is a proxy application for major parts of HYPRE that implements a benchmark for setting up and solving di erent systems of linear equations. In the following, I describe in detail how I ported multiple parts of AMG2013 to the GPU (Section 2) and present results for di erent experiments that demonstrate a successful parallel implementation on the heterogeneous ma- chines surface and ray (Section 3). In Section 4, I give guidelines on how my code should be used. Finally, I conclude and give an outlook for future work (Section 5).« less
High-Performance Parallel Analysis of Coupled Problems for Aircraft Propulsion
NASA Technical Reports Server (NTRS)
Felippa, C. A.; Farhat, C.; Park, K. C.; Gumaste, U.; Chen, P.-S.; Lesoinne, M.; Stern, P.
1997-01-01
Applications are described of high-performance computing methods to the numerical simulation of complete jet engines. The methodology focuses on the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by structural displacements. The latter is treated by a ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field elements. New partitioned analysis procedures to treat this coupled three-component problem were developed. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers, including the iPSC-860, Paragon XP/S and the IBM SP2. The NASA-sponsored ENG10 program was used for the global steady state analysis of the whole engine. This program uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor for parallel versions of ENG10 was developed as well as the capability for the first full 3D aeroelastic simulation of a multirow engine stage. This capability was tested on the IBM SP2 parallel supercomputer at NASA Ames.
Multiphase three-dimensional direct numerical simulation of a rotating impeller with code Blue
NASA Astrophysics Data System (ADS)
Kahouadji, Lyes; Shin, Seungwon; Chergui, Jalel; Juric, Damir; Craster, Richard V.; Matar, Omar K.
2017-11-01
The flow driven by a rotating impeller inside an open fixed cylindrical cavity is simulated using code Blue, a solver for massively-parallel simulations of fully three-dimensional multiphase flows. The impeller is composed of four blades at a 45° inclination all attached to a central hub and tube stem. In Blue, solid forms are constructed through the definition of immersed objects via a distance function that accounts for the object's interaction with the flow for both single and two-phase flows. We use a moving frame technique for imposing translation and/or rotation. The variation of the Reynolds number, the clearance, and the tank aspect ratio are considered, and we highlight the importance of the confinement ratio (blade radius versus the tank radius) in the mixing process. Blue uses a domain decomposition strategy for parallelization with MPI. The fluid interface solver is based on a parallel implementation of a hybrid front-tracking/level-set method designed complex interfacial topological changes. Parallel GMRES and multigrid iterative solvers are applied to the linear systems arising from the implicit solution for the fluid velocities and pressure in the presence of strong density and viscosity discontinuities across fluid phases. EPSRC, UK, MEMPHIS program Grant (EP/K003976/1), RAEng Research Chair (OKM).
Benzi, Michele; Evans, Thomas M.; Hamilton, Steven P.; ...
2017-03-05
Here, we consider hybrid deterministic-stochastic iterative algorithms for the solution of large, sparse linear systems. Starting from a convergent splitting of the coefficient matrix, we analyze various types of Monte Carlo acceleration schemes applied to the original preconditioned Richardson (stationary) iteration. We expect that these methods will have considerable potential for resiliency to faults when implemented on massively parallel machines. We also establish sufficient conditions for the convergence of the hybrid schemes, and we investigate different types of preconditioners including sparse approximate inverses. Numerical experiments on linear systems arising from the discretization of partial differential equations are presented.
Capturing the 'ome': the expanding molecular toolbox for RNA and DNA library construction.
Boone, Morgane; De Koker, Andries; Callewaert, Nico
2018-04-06
All sequencing experiments and most functional genomics screens rely on the generation of libraries to comprehensively capture pools of targeted sequences. In the past decade especially, driven by the progress in the field of massively parallel sequencing, numerous studies have comprehensively assessed the impact of particular manipulations on library complexity and quality, and characterized the activities and specificities of several key enzymes used in library construction. Fortunately, careful protocol design and reagent choice can substantially mitigate many of these biases, and enable reliable representation of sequences in libraries. This review aims to guide the reader through the vast expanse of literature on the subject to promote informed library generation, independent of the application.
A fast ultrasonic simulation tool based on massively parallel implementations
NASA Astrophysics Data System (ADS)
Lambert, Jason; Rougeron, Gilles; Lacassagne, Lionel; Chatillon, Sylvain
2014-02-01
This paper presents a CIVA optimized ultrasonic inspection simulation tool, which takes benefit of the power of massively parallel architectures: graphical processing units (GPU) and multi-core general purpose processors (GPP). This tool is based on the classical approach used in CIVA: the interaction model is based on Kirchoff, and the ultrasonic field around the defect is computed by the pencil method. The model has been adapted and parallelized for both architectures. At this stage, the configurations addressed by the tool are : multi and mono-element probes, planar specimens made of simple isotropic materials, planar rectangular defects or side drilled holes of small diameter. Validations on the model accuracy and performances measurements are presented.
Ordered fast Fourier transforms on a massively parallel hypercube multiprocessor
NASA Technical Reports Server (NTRS)
Tong, Charles; Swarztrauber, Paul N.
1991-01-01
The present evaluation of alternative, massively parallel hypercube processor-applicable designs for ordered radix-2 decimation-in-frequency FFT algorithms gives attention to the reduction of computation time-dominating communication. A combination of the order and computational phases of the FFT is accordingly employed, in conjunction with sequence-to-processor maps which reduce communication. Two orderings, 'standard' and 'cyclic', in which the order of the transform is the same as that of the input sequence, can be implemented with ease on the Connection Machine (where orderings are determined by geometries and priorities. A parallel method for trigonometric coefficient computation is presented which does not employ trigonometric functions or interprocessor communication.
O'keefe, Matthew; Parr, Terence; Edgar, B. Kevin; ...
1995-01-01
Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. Wemore » have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.« less
Lee, Anthony; Yau, Christopher; Giles, Michael B.; Doucet, Arnaud; Holmes, Christopher C.
2011-01-01
We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. PMID:22003276
MHD code using multi graphical processing units: SMAUG+
NASA Astrophysics Data System (ADS)
Gyenge, N.; Griffiths, M. K.; Erdélyi, R.
2018-01-01
This paper introduces the Sheffield Magnetohydrodynamics Algorithm Using GPUs (SMAUG+), an advanced numerical code for solving magnetohydrodynamic (MHD) problems, using multi-GPU systems. Multi-GPU systems facilitate the development of accelerated codes and enable us to investigate larger model sizes and/or more detailed computational domain resolutions. This is a significant advancement over the parent single-GPU MHD code, SMAUG (Griffiths et al., 2015). Here, we demonstrate the validity of the SMAUG + code, describe the parallelisation techniques and investigate performance benchmarks. The initial configuration of the Orszag-Tang vortex simulations are distributed among 4, 16, 64 and 100 GPUs. Furthermore, different simulation box resolutions are applied: 1000 × 1000, 2044 × 2044, 4000 × 4000 and 8000 × 8000 . We also tested the code with the Brio-Wu shock tube simulations with model size of 800 employing up to 10 GPUs. Based on the test results, we observed speed ups and slow downs, depending on the granularity and the communication overhead of certain parallel tasks. The main aim of the code development is to provide massively parallel code without the memory limitation of a single GPU. By using our code, the applied model size could be significantly increased. We demonstrate that we are able to successfully compute numerically valid and large 2D MHD problems.
Final Report: Subcontract B623868 Algebraic Multigrid solvers for coupled PDE systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brannick, J.
The Pennsylvania State University (“Subcontractor”) continued to work on the design of algebraic multigrid solvers for coupled systems of partial differential equations (PDEs) arising in numerical modeling of various applications, with a main focus on solving the Dirac equation arising in Quantum Chromodynamics (QCD). The goal of the proposed work was to develop combined geometric and algebraic multilevel solvers that are robust and lend themselves to efficient implementation on massively parallel heterogeneous computers for these QCD systems. The research in these areas built on previous works, focusing on the following three topics: (1) the development of parallel full-multigrid (PFMG) andmore » non-Galerkin coarsening techniques in this frame work for solving the Wilson Dirac system; (2) the use of these same Wilson MG solvers for preconditioning the Overlap and Domain Wall formulations of the Dirac equation; and (3) the design and analysis of algebraic coarsening algorithms for coupled PDE systems including Stokes equation, Maxwell equation and linear elasticity.« less
NASA Astrophysics Data System (ADS)
Hadjidoukas, P. E.; Angelikopoulos, P.; Papadimitriou, C.; Koumoutsakos, P.
2015-03-01
We present Π4U, an extensible framework, for non-intrusive Bayesian Uncertainty Quantification and Propagation (UQ+P) of complex and computationally demanding physical models, that can exploit massively parallel computer architectures. The framework incorporates Laplace asymptotic approximations as well as stochastic algorithms, along with distributed numerical differentiation and task-based parallelism for heterogeneous clusters. Sampling is based on the Transitional Markov Chain Monte Carlo (TMCMC) algorithm and its variants. The optimization tasks associated with the asymptotic approximations are treated via the Covariance Matrix Adaptation Evolution Strategy (CMA-ES). A modified subset simulation method is used for posterior reliability measurements of rare events. The framework accommodates scheduling of multiple physical model evaluations based on an adaptive load balancing library and shows excellent scalability. In addition to the software framework, we also provide guidelines as to the applicability and efficiency of Bayesian tools when applied to computationally demanding physical models. Theoretical and computational developments are demonstrated with applications drawn from molecular dynamics, structural dynamics and granular flow.
Analysis of multigrid methods on massively parallel computers: Architectural implications
NASA Technical Reports Server (NTRS)
Matheson, Lesley R.; Tarjan, Robert E.
1993-01-01
We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.
Fast I/O for Massively Parallel Applications
NASA Technical Reports Server (NTRS)
OKeefe, Matthew T.
1996-01-01
The two primary goals for this report were the design, contruction and modeling of parallel disk arrays for scientific visualization and animation, and a study of the IO requirements of highly parallel applications. In addition, further work in parallel display systems required to project and animate the very high-resolution frames resulting from our supercomputing simulations in ocean circulation and compressible gas dynamics.
Optimizing ion channel models using a parallel genetic algorithm on graphical processors.
Ben-Shalom, Roy; Aviv, Amit; Razon, Benjamin; Korngreen, Alon
2012-01-01
We have recently shown that we can semi-automatically constrain models of voltage-gated ion channels by combining a stochastic search algorithm with ionic currents measured using multiple voltage-clamp protocols. Although numerically successful, this approach is highly demanding computationally, with optimization on a high performance Linux cluster typically lasting several days. To solve this computational bottleneck we converted our optimization algorithm for work on a graphical processing unit (GPU) using NVIDIA's CUDA. Parallelizing the process on a Fermi graphic computing engine from NVIDIA increased the speed ∼180 times over an application running on an 80 node Linux cluster, considerably reducing simulation times. This application allows users to optimize models for ion channel kinetics on a single, inexpensive, desktop "super computer," greatly reducing the time and cost of building models relevant to neuronal physiology. We also demonstrate that the point of algorithm parallelization is crucial to its performance. We substantially reduced computing time by solving the ODEs (Ordinary Differential Equations) so as to massively reduce memory transfers to and from the GPU. This approach may be applied to speed up other data intensive applications requiring iterative solutions of ODEs. Copyright © 2012 Elsevier B.V. All rights reserved.
3D multiphysics modeling of superconducting cavities with a massively parallel simulation suite
NASA Astrophysics Data System (ADS)
Kononenko, Oleksiy; Adolphsen, Chris; Li, Zenghai; Ng, Cho-Kuen; Rivetta, Claudio
2017-10-01
Radiofrequency cavities based on superconducting technology are widely used in particle accelerators for various applications. The cavities usually have high quality factors and hence narrow bandwidths, so the field stability is sensitive to detuning from the Lorentz force and external loads, including vibrations and helium pressure variations. If not properly controlled, the detuning can result in a serious performance degradation of a superconducting accelerator, so an understanding of the underlying detuning mechanisms can be very helpful. Recent advances in the simulation suite ace3p have enabled realistic multiphysics characterization of such complex accelerator systems on supercomputers. In this paper, we present the new capabilities in ace3p for large-scale 3D multiphysics modeling of superconducting cavities, in particular, a parallel eigensolver for determining mechanical resonances, a parallel harmonic response solver to calculate the response of a cavity to external vibrations, and a numerical procedure to decompose mechanical loads, such as from the Lorentz force or piezoactuators, into the corresponding mechanical modes. These capabilities have been used to do an extensive rf-mechanical analysis of dressed TESLA-type superconducting cavities. The simulation results and their implications for the operational stability of the Linac Coherent Light Source-II are discussed.
High-order asynchrony-tolerant finite difference schemes for partial differential equations
NASA Astrophysics Data System (ADS)
Aditya, Konduri; Donzis, Diego A.
2017-12-01
Synchronizations of processing elements (PEs) in massively parallel simulations, which arise due to communication or load imbalances between PEs, significantly affect the scalability of scientific applications. We have recently proposed a method based on finite-difference schemes to solve partial differential equations in an asynchronous fashion - synchronization between PEs is relaxed at a mathematical level. While standard schemes can maintain their stability in the presence of asynchrony, their accuracy is drastically affected. In this work, we present a general methodology to derive asynchrony-tolerant (AT) finite difference schemes of arbitrary order of accuracy, which can maintain their accuracy when synchronizations are relaxed. We show that there are several choices available in selecting a stencil to derive these schemes and discuss their effect on numerical and computational performance. We provide a simple classification of schemes based on the stencil and derive schemes that are representative of different classes. Their numerical error is rigorously analyzed within a statistical framework to obtain the overall accuracy of the solution. Results from numerical experiments are used to validate the performance of the schemes.
Notes on the KIVA-2 software and chemically reactive fluid mechanics
NASA Astrophysics Data System (ADS)
Holst, M. J.
1992-09-01
Working notes regarding the mechanics of chemically reactive fluids with sprays, and their numerical simulation with the KIVA-2 software are presented. KIVA-2 is a large FORTRAN program developed at Los Alamos National Laboratory for internal combustion engine simulation. It is our hope that these notes summarize some of the necessary background material in fluid mechanics and combustion, explain the numerical methods currently used in KIVA-2 and similar combustion codes, and provide an outline of the overall structure of KIVA-2 as a representative combustion program, in order to aid the researcher in the task of implementing KIVA-2 or a similar combustion code on a massively parallel computer. The notes are organized into three parts as follows. In Part 1, a brief introduction to continuum mechanics, to fluid mechanics, and to the mechanics of chemically reactive fluids with sprays is presented. In Part 2, a close look at the governing equations of KIVA-2 is taken, and the methods employed in the numerical solution of these equations is discussed. Some conclusions are drawn and some observations are made in Part 3.
Biesecker, Leslie G
2012-04-01
The debate surrounding the return of results from high-throughput genomic interrogation encompasses many important issues including ethics, law, economics, and social policy. As well, the debate is also informed by the molecular, genetic, and clinical foundations of the emerging field of clinical genomics, which is based on this new technology. This article outlines the main biomedical considerations of sequencing technologies and demonstrates some of the early clinical experiences with the technology to enable the debate to stay focused on real-world practicalities. These experiences are based on early data from the ClinSeq project, which is a project to pilot the use of massively parallel sequencing in a clinical research context with a major aim to develop modes of returning results to individual subjects. The study has enrolled >900 subjects and generated exome sequence data on 572 subjects. These data are beginning to be interpreted and returned to the subjects, which provides examples of the potential usefulness and pitfalls of clinical genomics. There are numerous genetic results that can be readily derived from a genome including rare, high-penetrance traits, and carrier states. However, much work needs to be done to develop the tools and resources for genomic interpretation. The main lesson learned is that a genome sequence may be better considered as a health-care resource, rather than a test, one that can be interpreted and used over the lifetime of the patient.
Block iterative restoration of astronomical images with the massively parallel processor
NASA Technical Reports Server (NTRS)
Heap, Sara R.; Lindler, Don J.
1987-01-01
A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images.
A Massively Parallel Code for Polarization Calculations
NASA Astrophysics Data System (ADS)
Akiyama, Shizuka; Höflich, Peter
2001-03-01
We present an implementation of our Monte-Carlo radiation transport method for rapidly expanding, NLTE atmospheres for massively parallel computers which utilizes both the distributed and shared memory models. This allows us to take full advantage of the fast communication and low latency inherent to nodes with multiple CPUs, and to stretch the limits of scalability with the number of nodes compared to a version which is based on the shared memory model. Test calculations on a local 20-node Beowulf cluster with dual CPUs showed an improved scalability by about 40%.
Routing performance analysis and optimization within a massively parallel computer
Archer, Charles Jens; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen
2013-04-16
An apparatus, program product and method optimize the operation of a massively parallel computer system by, in part, receiving actual performance data concerning an application executed by the plurality of interconnected nodes, and analyzing the actual performance data to identify an actual performance pattern. A desired performance pattern may be determined for the application, and an algorithm may be selected from among a plurality of algorithms stored within a memory, the algorithm being configured to achieve the desired performance pattern based on the actual performance data.
NASA Technical Reports Server (NTRS)
Manohar, Mareboyana; Tilton, James C.
1994-01-01
A progressive vector quantization (VQ) compression approach is discussed which decomposes image data into a number of levels using full search VQ. The final level is losslessly compressed, enabling lossless reconstruction. The computational difficulties are addressed by implementation on a massively parallel SIMD machine. We demonstrate progressive VQ on multispectral imagery obtained from the Advanced Very High Resolution Radiometer instrument and other Earth observation image data, and investigate the trade-offs in selecting the number of decomposition levels and codebook training method.
Regional-scale calculation of the LS factor using parallel processing
NASA Astrophysics Data System (ADS)
Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong
2015-05-01
With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.
Massively parallel processor computer
NASA Technical Reports Server (NTRS)
Fung, L. W. (Inventor)
1983-01-01
An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.
Compact holographic optical neural network system for real-time pattern recognition
NASA Astrophysics Data System (ADS)
Lu, Taiwei; Mintzer, David T.; Kostrzewski, Andrew A.; Lin, Freddie S.
1996-08-01
One of the important characteristics of artificial neural networks is their capability for massive interconnection and parallel processing. Recently, specialized electronic neural network processors and VLSI neural chips have been introduced in the commercial market. The number of parallel channels they can handle is limited because of the limited parallel interconnections that can be implemented with 1D electronic wires. High-resolution pattern recognition problems can require a large number of neurons for parallel processing of an image. This paper describes a holographic optical neural network (HONN) that is based on high- resolution volume holographic materials and is capable of performing massive 3D parallel interconnection of tens of thousands of neurons. A HONN with more than 16,000 neurons packaged in an attache case has been developed. Rotation- shift-scale-invariant pattern recognition operations have been demonstrated with this system. System parameters such as the signal-to-noise ratio, dynamic range, and processing speed are discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barrett, Brian; Brightwell, Ronald B.; Grant, Ryan
This report presents a specification for the Portals 4 networ k programming interface. Portals 4 is intended to allow scalable, high-performance network communication betwee n nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded syste ms. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platfor ms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is tarmore » geted to the next generation of machines employing advanced network interface architectures that support enh anced offload capabilities.« less
The Portals 4.0 network programming interface.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin
2012-11-01
This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities.« less
Scalable isosurface visualization of massive datasets on commodity off-the-shelf clusters
Bajaj, Chandrajit
2009-01-01
Tomographic imaging and computer simulations are increasingly yielding massive datasets. Interactive and exploratory visualizations have rapidly become indispensable tools to study large volumetric imaging and simulation data. Our scalable isosurface visualization framework on commodity off-the-shelf clusters is an end-to-end parallel and progressive platform, from initial data access to the final display. Interactive browsing of extracted isosurfaces is made possible by using parallel isosurface extraction, and rendering in conjunction with a new specialized piece of image compositing hardware called Metabuffer. In this paper, we focus on the back end scalability by introducing a fully parallel and out-of-core isosurface extraction algorithm. It achieves scalability by using both parallel and out-of-core processing and parallel disks. It statically partitions the volume data to parallel disks with a balanced workload spectrum, and builds I/O-optimal external interval trees to minimize the number of I/O operations of loading large data from disk. We also describe an isosurface compression scheme that is efficient for progress extraction, transmission and storage of isosurfaces. PMID:19756231
Massively parallel multicanonical simulations
NASA Astrophysics Data System (ADS)
Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard
2018-03-01
Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.
Thought Leaders during Crises in Massive Social Networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Corley, Courtney D.; Farber, Robert M.; Reynolds, William
The vast amount of social media data that can be gathered from the internet coupled with workflows that utilize both commodity systems and massively parallel supercomputers, such as the Cray XMT, open new vistas for research to support health, defense, and national security. Computer technology now enables the analysis of graph structures containing more than 4 billion vertices joined by 34 billion edges along with metrics and massively parallel algorithms that exhibit near-linear scalability according to number of processors. The challenge lies in making this massive data and analysis comprehensible to an analyst and end-users that require actionable knowledge tomore » carry out their duties. Simply stated, we have developed language and content agnostic techniques to reduce large graphs built from vast media corpora into forms people can understand. Specifically, our tools and metrics act as a survey tool to identify thought leaders' -- those members that lead or reflect the thoughts and opinions of an online community, independent of the source language.« less
A FAST ITERATIVE METHOD FOR SOLVING THE EIKONAL EQUATION ON TETRAHEDRAL DOMAINS
Fu, Zhisong; Kirby, Robert M.; Whitaker, Ross T.
2014-01-01
Generating numerical solutions to the eikonal equation and its many variations has a broad range of applications in both the natural and computational sciences. Efficient solvers on cutting-edge, parallel architectures require new algorithms that may not be theoretically optimal, but that are designed to allow asynchronous solution updates and have limited memory access patterns. This paper presents a parallel algorithm for solving the eikonal equation on fully unstructured tetrahedral meshes. The method is appropriate for the type of fine-grained parallelism found on modern massively-SIMD architectures such as graphics processors and takes into account the particular constraints and capabilities of these computing platforms. This work builds on previous work for solving these equations on triangle meshes; in this paper we adapt and extend previous two-dimensional strategies to accommodate three-dimensional, unstructured, tetrahedralized domains. These new developments include a local update strategy with data compaction for tetrahedral meshes that provides solutions on both serial and parallel architectures, with a generalization to inhomogeneous, anisotropic speed functions. We also propose two new update schemes, specialized to mitigate the natural data increase observed when moving to three dimensions, and the data structures necessary for efficiently mapping data to parallel SIMD processors in a way that maintains computational density. Finally, we present descriptions of the implementations for a single CPU, as well as multicore CPUs with shared memory and SIMD architectures, with comparative results against state-of-the-art eikonal solvers. PMID:25221418
Parallel algorithm of real-time infrared image restoration based on total variation theory
NASA Astrophysics Data System (ADS)
Zhu, Ran; Li, Miao; Long, Yunli; Zeng, Yaoyuan; An, Wei
2015-10-01
Image restoration is a necessary preprocessing step for infrared remote sensing applications. Traditional methods allow us to remove the noise but penalize too much the gradients corresponding to edges. Image restoration techniques based on variational approaches can solve this over-smoothing problem for the merits of their well-defined mathematical modeling of the restore procedure. The total variation (TV) of infrared image is introduced as a L1 regularization term added to the objective energy functional. It converts the restoration process to an optimization problem of functional involving a fidelity term to the image data plus a regularization term. Infrared image restoration technology with TV-L1 model exploits the remote sensing data obtained sufficiently and preserves information at edges caused by clouds. Numerical implementation algorithm is presented in detail. Analysis indicates that the structure of this algorithm can be easily implemented in parallelization. Therefore a parallel implementation of the TV-L1 filter based on multicore architecture with shared memory is proposed for infrared real-time remote sensing systems. Massive computation of image data is performed in parallel by cooperating threads running simultaneously on multiple cores. Several groups of synthetic infrared image data are used to validate the feasibility and effectiveness of the proposed parallel algorithm. Quantitative analysis of measuring the restored image quality compared to input image is presented. Experiment results show that the TV-L1 filter can restore the varying background image reasonably, and that its performance can achieve the requirement of real-time image processing.
NASA Astrophysics Data System (ADS)
Sourbier, Florent; Operto, Stéphane; Virieux, Jean; Amestoy, Patrick; L'Excellent, Jean-Yves
2009-03-01
This is the first paper in a two-part series that describes a massively parallel code that performs 2D frequency-domain full-waveform inversion of wide-aperture seismic data for imaging complex structures. Full-waveform inversion methods, namely quantitative seismic imaging methods based on the resolution of the full wave equation, are computationally expensive. Therefore, designing efficient algorithms which take advantage of parallel computing facilities is critical for the appraisal of these approaches when applied to representative case studies and for further improvements. Full-waveform modelling requires the resolution of a large sparse system of linear equations which is performed with the massively parallel direct solver MUMPS for efficient multiple-shot simulations. Efficiency of the multiple-shot solution phase (forward/backward substitutions) is improved by using the BLAS3 library. The inverse problem relies on a classic local optimization approach implemented with a gradient method. The direct solver returns the multiple-shot wavefield solutions distributed over the processors according to a domain decomposition driven by the distribution of the LU factors. The domain decomposition of the wavefield solutions is used to compute in parallel the gradient of the objective function and the diagonal Hessian, this latter providing a suitable scaling of the gradient. The algorithm allows one to test different strategies for multiscale frequency inversion ranging from successive mono-frequency inversion to simultaneous multifrequency inversion. These different inversion strategies will be illustrated in the following companion paper. The parallel efficiency and the scalability of the code will also be quantified.
Roever, Stefan
2012-01-01
A massively parallel, low cost molecular analysis platform will dramatically change the nature of protein, molecular and genomics research, DNA sequencing, and ultimately, molecular diagnostics. An integrated circuit (IC) with 264 sensors was fabricated using standard CMOS semiconductor processing technology. Each of these sensors is individually controlled with precision analog circuitry and is capable of single molecule measurements. Under electronic and software control, the IC was used to demonstrate the feasibility of creating and detecting lipid bilayers and biological nanopores using wild type α-hemolysin. The ability to dynamically create bilayers over each of the sensors will greatly accelerate pore development and pore mutation analysis. In addition, the noise performance of the IC was measured to be 30fA(rms). With this noise performance, single base detection of DNA was demonstrated using α-hemolysin. The data shows that a single molecule, electrical detection platform using biological nanopores can be operationalized and can ultimately scale to millions of sensors. Such a massively parallel platform will revolutionize molecular analysis and will completely change the field of molecular diagnostics in the future.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wylie, Brian Neil; Moreland, Kenneth D.
Graphs are a vital way of organizing data with complex correlations. A good visualization of a graph can fundamentally change human understanding of the data. Consequently, there is a rich body of work on graph visualization. Although there are many techniques that are effective on small to medium sized graphs (tens of thousands of nodes), there is a void in the research for visualizing massive graphs containing millions of nodes. Sandia is one of the few entities in the world that has the means and motivation to handle data on such a massive scale. For example, homeland security generates graphsmore » from prolific media sources such as television, telephone, and the Internet. The purpose of this project is to provide the groundwork for visualizing such massive graphs. The research provides for two major feature gaps: a parallel, interactive visualization framework and scalable algorithms to make the framework usable to a practical application. Both the frameworks and algorithms are designed to run on distributed parallel computers, which are already available at Sandia. Some features are integrated into the ThreatView{trademark} application and future work will integrate further parallel algorithms.« less
Optical fiber systems for the BigBOSS instrument
NASA Astrophysics Data System (ADS)
Edelstein, Jerry; Poppett, Claire; Sirk, Martin; Besuner, Robert; Lafever, Robin; Allington-Smith, Jeremy R.; Murray, Graham J.
2012-09-01
We describe the fiber optics systems for use in BigBOSS, a proposed massively parallel multi-object spectrograph for the Kitt Peak Mayall 4-m telescope that will measure baryon acoustic oscillations to explore dark energy. BigBOSS will include 5,000 optical fibers each precisely actuator-positioned to collect an astronomical target’s flux at the telescope prime-focus. The fibers are to be routed 40m through the telescope facility to feed ten visible-band imaging spectrographs. We report on our fiber component development and performance measurement program. Results include the numerical modeling of focal ratio degradation (FRD), observations of actual fibers’ collimated and converging beam FRD, and observations of FRD from different types of fiber terminations, mechanical connectors, and fusion-splice connections.
Genetically Engineered Microelectronic Infrared Filters
NASA Technical Reports Server (NTRS)
Cwik, Tom; Klimeck, Gerhard
1998-01-01
A genetic algorithm is used for design of infrared filters and in the understanding of the material structure of a resonant tunneling diode. These two components are examples of microdevices and nanodevices that can be numerically simulated using fundamental mathematical and physical models. Because the number of parameters that can be used in the design of one of these devices is large, and because experimental exploration of the design space is unfeasible, reliable software models integrated with global optimization methods are examined The genetic algorithm and engineering design codes have been implemented on massively parallel computers to exploit their high performance. Design results are presented for the infrared filter showing new and optimized device design. Results for nanodevices are presented in a companion paper at this workshop.
A Performance Evaluation of the Cray X1 for Scientific Applications
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak; Borrill, Julian; Canning, Andrew; Carter, Jonathan; Djomehri, M. Jahed; Shan, Hongzhang; Skinner, David
2004-01-01
The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end capability and cost effectiveness. However, the recent development of massively parallel vector systems is having a significant effect on the supercomputing landscape. In this paper, we compare the performance of the recently released Cray X1 vector system with that of the cacheless NEC SX-6 vector machine, and the superscalar cache-based IBM Power3 and Power4 architectures for scientific applications. Overall results demonstrate that the X1 is quite promising, but performance improvements are expected as the hardware, systems software, and numerical libraries mature. Code reengineering to effectively utilize the complex architecture may also lead to significant efficiency enhancements.
Revealing the Physics of Galactic Winds Through Massively-Parallel Hydrodynamics Simulations
NASA Astrophysics Data System (ADS)
Schneider, Evan Elizabeth
This thesis documents the hydrodynamics code Cholla and a numerical study of multiphase galactic winds. Cholla is a massively-parallel, GPU-based code designed for astrophysical simulations that is freely available to the astrophysics community. A static-mesh Eulerian code, Cholla is ideally suited to carrying out massive simulations (> 20483 cells) that require very high resolution. The code incorporates state-of-the-art hydrodynamics algorithms including third-order spatial reconstruction, exact and linearized Riemann solvers, and unsplit integration algorithms that account for transverse fluxes on multidimensional grids. Operator-split radiative cooling and a dual-energy formalism for high mach number flows are also included. An extensive test suite demonstrates Cholla's superior ability to model shocks and discontinuities, while the GPU-native design makes the code extremely computationally efficient - speeds of 5-10 million cell updates per GPU-second are typical on current hardware for 3D simulations with all of the aforementioned physics. The latter half of this work comprises a comprehensive study of the mixing between a hot, supernova-driven wind and cooler clouds representative of those observed in multiphase galactic winds. Both adiabatic and radiatively-cooling clouds are investigated. The analytic theory of cloud-crushing is applied to the problem, and adiabatic turbulent clouds are found to be mixed with the hot wind on similar timescales as the classic spherical case (4-5 t cc) with an appropriate rescaling of the cloud-crushing time. Radiatively cooling clouds survive considerably longer, and the differences in evolution between turbulent and spherical clouds cannot be reconciled with a simple rescaling. The rapid incorporation of low-density material into the hot wind implies efficient mass-loading of hot phases of galactic winds. At the same time, the extreme compression of high-density cloud material leads to long-lived but slow-moving clumps that are unlikely to escape the galaxy.
High Order Semi-Lagrangian Advection Scheme
NASA Astrophysics Data System (ADS)
Malaga, Carlos; Mandujano, Francisco; Becerra, Julian
2014-11-01
In most fluid phenomena, advection plays an important roll. A numerical scheme capable of making quantitative predictions and simulations must compute correctly the advection terms appearing in the equations governing fluid flow. Here we present a high order forward semi-Lagrangian numerical scheme specifically tailored to compute material derivatives. The scheme relies on the geometrical interpretation of material derivatives to compute the time evolution of fields on grids that deform with the material fluid domain, an interpolating procedure of arbitrary order that preserves the moments of the interpolated distributions, and a nonlinear mapping strategy to perform interpolations between undeformed and deformed grids. Additionally, a discontinuity criterion was implemented to deal with discontinuous fields and shocks. Tests of pure advection, shock formation and nonlinear phenomena are presented to show performance and convergence of the scheme. The high computational cost is considerably reduced when implemented on massively parallel architectures found in graphic cards. The authors acknowledge funding from Fondo Sectorial CONACYT-SENER Grant Number 42536 (DGAJ-SPI-34-170412-217).
The portals 4.0.1 network programming interface.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin
2013-04-01
This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities. 3« less
Merlin - Massively parallel heterogeneous computing
NASA Technical Reports Server (NTRS)
Wittie, Larry; Maples, Creve
1989-01-01
Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.
Massively parallel quantum computer simulator
NASA Astrophysics Data System (ADS)
De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.
2007-01-01
We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.
Archer, Charles Jens [Rochester, MN; Musselman, Roy Glenn [Rochester, MN; Peters, Amanda [Rochester, MN; Pinnow, Kurt Walter [Rochester, MN; Swartz, Brent Allen [Chippewa Falls, WI; Wallenfelt, Brian Paul [Eden Prairie, MN
2011-10-04
A massively parallel nodal computer system periodically collects and broadcasts usage data for an internal communications network. A node sending data over the network makes a global routing determination using the network usage data. Preferably, network usage data comprises an N-bit usage value for each output buffer associated with a network link. An optimum routing is determined by summing the N-bit values associated with each link through which a data packet must pass, and comparing the sums associated with different possible routes.
Scalable Visual Analytics of Massive Textual Datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krishnan, Manoj Kumar; Bohn, Shawn J.; Cowley, Wendy E.
2007-04-01
This paper describes the first scalable implementation of text processing engine used in Visual Analytics tools. These tools aid information analysts in interacting with and understanding large textual information content through visual interfaces. By developing parallel implementation of the text processing engine, we enabled visual analytics tools to exploit cluster architectures and handle massive dataset. The paper describes key elements of our parallelization approach and demonstrates virtually linear scaling when processing multi-gigabyte data sets such as Pubmed. This approach enables interactive analysis of large datasets beyond capabilities of existing state-of-the art visual analytics tools.
Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul
2010-03-16
A massively parallel computer system contains an inter-nodal communications network of node-to-node links. Each node implements a respective routing strategy for routing data through the network, the routing strategies not necessarily being the same in every node. The routing strategies implemented in the nodes are dynamically adjusted during application execution to shift network workload as required. Preferably, adjustment of routing policies in selective nodes is performed at synchronization points. The network may be dynamically monitored, and routing strategies adjusted according to detected network conditions.
Parallel processing architecture for H.264 deblocking filter on multi-core platforms
NASA Astrophysics Data System (ADS)
Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao
2012-03-01
Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking filter for multi core platforms such as HyperX technology. Parallel techniques such as parallel processing of independent macroblocks, sub blocks, and pixel row level are examined in this work. The deblocking architecture consists of a basic cell called deblocking filter unit (DFU) and dependent data buffer manager (DFM). The DFU can be used in several instances, catering to different performance needs the DFM serves the data required for the different number of DFUs, and also manages all the neighboring data required for future data processing of DFUs. This approach achieves the scalability, flexibility, and performance excellence required in deblocking filters.
State Transition Matrix for Perturbed Orbital Motion Using Modified Chebyshev Picard Iteration
NASA Astrophysics Data System (ADS)
Read, Julie L.; Younes, Ahmad Bani; Macomber, Brent; Turner, James; Junkins, John L.
2015-06-01
The Modified Chebyshev Picard Iteration (MCPI) method has recently proven to be highly efficient for a given accuracy compared to several commonly adopted numerical integration methods, as a means to solve for perturbed orbital motion. This method utilizes Picard iteration, which generates a sequence of path approximations, and Chebyshev Polynomials, which are orthogonal and also enable both efficient and accurate function approximation. The nodes consistent with discrete Chebyshev orthogonality are generated using cosine sampling; this strategy also reduces the Runge effect and as a consequence of orthogonality, there is no matrix inversion required to find the basis function coefficients. The MCPI algorithms considered herein are parallel-structured so that they are immediately well-suited for massively parallel implementation with additional speedup. MCPI has a wide range of applications beyond ephemeris propagation, including the propagation of the State Transition Matrix (STM) for perturbed two-body motion. A solution is achieved for a spherical harmonic series representation of earth gravity (EGM2008), although the methodology is suitable for application to any gravity model. Included in this representation the normalized, Associated Legendre Functions are given and verified numerically. Modifications of the classical algorithm techniques, such as rewriting the STM equations in a second-order cascade formulation, gives rise to additional speedup. Timing results for the baseline formulation and this second-order formulation are given.
NASA Astrophysics Data System (ADS)
Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo
2017-08-01
We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU data processing time)
NASA Astrophysics Data System (ADS)
Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.
2012-06-01
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A L
2012-06-13
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
Parallel Tensor Compression for Large-Scale Scientific Data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kolda, Tamara G.; Ballard, Grey; Austin, Woody Nathan
As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data. By viewing the data as a dense five way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 10000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed memorymore » parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.« less
NASA Astrophysics Data System (ADS)
Neff, John A.
1989-12-01
Experiments originating from Gestalt psychology have shown that representing information in a symbolic form provides a more effective means to understanding. Computer scientists have been struggling for the last two decades to determine how best to create, manipulate, and store collections of symbolic structures. In the past, much of this struggling led to software innovations because that was the path of least resistance. For example, the development of heuristics for organizing the searching through knowledge bases was much less expensive than building massively parallel machines that could search in parallel. That is now beginning to change with the emergence of parallel architectures which are showing the potential for handling symbolic structures. This paper will review the relationships between symbolic computing and parallel computing architectures, and will identify opportunities for optics to significantly impact the performance of such computing machines. Although neural networks are an exciting subset of massively parallel computing structures, this paper will not touch on this area since it is receiving a great deal of attention in the literature. That is, the concepts presented herein do not consider the distributed representation of knowledge.
Massively Parallel DNA Sequencing Facilitates Diagnosis of Patients with Usher Syndrome Type 1
Yoshimura, Hidekane; Iwasaki, Satoshi; Nishio, Shin-ya; Kumakawa, Kozo; Tono, Tetsuya; Kobayashi, Yumiko; Sato, Hiroaki; Nagai, Kyoko; Ishikawa, Kotaro; Ikezono, Tetsuo; Naito, Yasushi; Fukushima, Kunihiro; Oshikawa, Chie; Kimitsuki, Takashi; Nakanishi, Hiroshi; Usami, Shin-ichi
2014-01-01
Usher syndrome is an autosomal recessive disorder manifesting hearing loss, retinitis pigmentosa and vestibular dysfunction, and having three clinical subtypes. Usher syndrome type 1 is the most severe subtype due to its profound hearing loss, lack of vestibular responses, and retinitis pigmentosa that appears in prepuberty. Six of the corresponding genes have been identified, making early diagnosis through DNA testing possible, with many immediate and several long-term advantages for patients and their families. However, the conventional genetic techniques, such as direct sequence analysis, are both time-consuming and expensive. Targeted exon sequencing of selected genes using the massively parallel DNA sequencing technology will potentially enable us to systematically tackle previously intractable monogenic disorders and improve molecular diagnosis. Using this technique combined with direct sequence analysis, we screened 17 unrelated Usher syndrome type 1 patients and detected probable pathogenic variants in the 16 of them (94.1%) who carried at least one mutation. Seven patients had the MYO7A mutation (41.2%), which is the most common type in Japanese. Most of the mutations were detected by only the massively parallel DNA sequencing. We report here four patients, who had probable pathogenic mutations in two different Usher syndrome type 1 genes, and one case of MYO7A/PCDH15 digenic inheritance. This is the first report of Usher syndrome mutation analysis using massively parallel DNA sequencing and the frequency of Usher syndrome type 1 genes in Japanese. Mutation screening using this technique has the power to quickly identify mutations of many causative genes while maintaining cost-benefit performance. In addition, the simultaneous mutation analysis of large numbers of genes is useful for detecting mutations in different genes that are possibly disease modifiers or of digenic inheritance. PMID:24618850
Massively parallel DNA sequencing facilitates diagnosis of patients with Usher syndrome type 1.
Yoshimura, Hidekane; Iwasaki, Satoshi; Nishio, Shin-Ya; Kumakawa, Kozo; Tono, Tetsuya; Kobayashi, Yumiko; Sato, Hiroaki; Nagai, Kyoko; Ishikawa, Kotaro; Ikezono, Tetsuo; Naito, Yasushi; Fukushima, Kunihiro; Oshikawa, Chie; Kimitsuki, Takashi; Nakanishi, Hiroshi; Usami, Shin-Ichi
2014-01-01
Usher syndrome is an autosomal recessive disorder manifesting hearing loss, retinitis pigmentosa and vestibular dysfunction, and having three clinical subtypes. Usher syndrome type 1 is the most severe subtype due to its profound hearing loss, lack of vestibular responses, and retinitis pigmentosa that appears in prepuberty. Six of the corresponding genes have been identified, making early diagnosis through DNA testing possible, with many immediate and several long-term advantages for patients and their families. However, the conventional genetic techniques, such as direct sequence analysis, are both time-consuming and expensive. Targeted exon sequencing of selected genes using the massively parallel DNA sequencing technology will potentially enable us to systematically tackle previously intractable monogenic disorders and improve molecular diagnosis. Using this technique combined with direct sequence analysis, we screened 17 unrelated Usher syndrome type 1 patients and detected probable pathogenic variants in the 16 of them (94.1%) who carried at least one mutation. Seven patients had the MYO7A mutation (41.2%), which is the most common type in Japanese. Most of the mutations were detected by only the massively parallel DNA sequencing. We report here four patients, who had probable pathogenic mutations in two different Usher syndrome type 1 genes, and one case of MYO7A/PCDH15 digenic inheritance. This is the first report of Usher syndrome mutation analysis using massively parallel DNA sequencing and the frequency of Usher syndrome type 1 genes in Japanese. Mutation screening using this technique has the power to quickly identify mutations of many causative genes while maintaining cost-benefit performance. In addition, the simultaneous mutation analysis of large numbers of genes is useful for detecting mutations in different genes that are possibly disease modifiers or of digenic inheritance.
3D multiphysics modeling of superconducting cavities with a massively parallel simulation suite
Kononenko, Oleksiy; Adolphsen, Chris; Li, Zenghai; ...
2017-10-10
Radiofrequency cavities based on superconducting technology are widely used in particle accelerators for various applications. The cavities usually have high quality factors and hence narrow bandwidths, so the field stability is sensitive to detuning from the Lorentz force and external loads, including vibrations and helium pressure variations. If not properly controlled, the detuning can result in a serious performance degradation of a superconducting accelerator, so an understanding of the underlying detuning mechanisms can be very helpful. Recent advances in the simulation suite ace3p have enabled realistic multiphysics characterization of such complex accelerator systems on supercomputers. In this paper, we presentmore » the new capabilities in ace3p for large-scale 3D multiphysics modeling of superconducting cavities, in particular, a parallel eigensolver for determining mechanical resonances, a parallel harmonic response solver to calculate the response of a cavity to external vibrations, and a numerical procedure to decompose mechanical loads, such as from the Lorentz force or piezoactuators, into the corresponding mechanical modes. These capabilities have been used to do an extensive rf-mechanical analysis of dressed TESLA-type superconducting cavities. Furthermore, the simulation results and their implications for the operational stability of the Linac Coherent Light Source-II are discussed.« less
Hesford, Andrew J; Tillett, Jason C; Astheimer, Jeffrey P; Waag, Robert C
2014-08-01
Accurate and efficient modeling of ultrasound propagation through realistic tissue models is important to many aspects of clinical ultrasound imaging. Simplified problems with known solutions are often used to study and validate numerical methods. Greater confidence in a time-domain k-space method and a frequency-domain fast multipole method is established in this paper by analyzing results for realistic models of the human breast. Models of breast tissue were produced by segmenting magnetic resonance images of ex vivo specimens into seven distinct tissue types. After confirming with histologic analysis by pathologists that the model structures mimicked in vivo breast, the tissue types were mapped to variations in sound speed and acoustic absorption. Calculations of acoustic scattering by the resulting model were performed on massively parallel supercomputer clusters using parallel implementations of the k-space method and the fast multipole method. The efficient use of these resources was confirmed by parallel efficiency and scalability studies using large-scale, realistic tissue models. Comparisons between the temporal and spectral results were performed in representative planes by Fourier transforming the temporal results. An RMS field error less than 3% throughout the model volume confirms the accuracy of the methods for modeling ultrasound propagation through human breast.
3D multiphysics modeling of superconducting cavities with a massively parallel simulation suite
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kononenko, Oleksiy; Adolphsen, Chris; Li, Zenghai
Radiofrequency cavities based on superconducting technology are widely used in particle accelerators for various applications. The cavities usually have high quality factors and hence narrow bandwidths, so the field stability is sensitive to detuning from the Lorentz force and external loads, including vibrations and helium pressure variations. If not properly controlled, the detuning can result in a serious performance degradation of a superconducting accelerator, so an understanding of the underlying detuning mechanisms can be very helpful. Recent advances in the simulation suite ace3p have enabled realistic multiphysics characterization of such complex accelerator systems on supercomputers. In this paper, we presentmore » the new capabilities in ace3p for large-scale 3D multiphysics modeling of superconducting cavities, in particular, a parallel eigensolver for determining mechanical resonances, a parallel harmonic response solver to calculate the response of a cavity to external vibrations, and a numerical procedure to decompose mechanical loads, such as from the Lorentz force or piezoactuators, into the corresponding mechanical modes. These capabilities have been used to do an extensive rf-mechanical analysis of dressed TESLA-type superconducting cavities. Furthermore, the simulation results and their implications for the operational stability of the Linac Coherent Light Source-II are discussed.« less
A hierarchical, automated target recognition algorithm for a parallel analog processor
NASA Technical Reports Server (NTRS)
Woodward, Gail; Padgett, Curtis
1997-01-01
A hierarchical approach is described for an automated target recognition (ATR) system, VIGILANTE, that uses a massively parallel, analog processor (3DANN). The 3DANN processor is capable of performing 64 concurrent inner products of size 1x4096 every 250 nanoseconds.
Araujo, Luiz H.; Timmers, Cynthia; Bell, Erica Hlavin; Shilo, Konstantin; Lammers, Philip E.; Zhao, Weiqiang; Natarajan, Thanemozhi G.; Miller, Clinton J.; Zhang, Jianying; Yilmaz, Ayse S.; Liu, Tom; Coombes, Kevin; Amann, Joseph; Carbone, David P.
2015-01-01
Purpose Technologic advances have enabled the comprehensive analysis of genetic perturbations in non–small-cell lung cancer (NSCLC); however, African Americans have often been underrepresented in these studies. This ethnic group has higher lung cancer incidence and mortality rates, and some studies have suggested a lower incidence of epidermal growth factor receptor mutations. Herein, we report the most in-depth molecular profile of NSCLC in African Americans to date. Methods A custom panel was designed to cover the coding regions of 81 NSCLC-related genes and 40 ancestry-informative markers. Clinical samples were sequenced on a massively parallel sequencing instrument, and anaplastic lymphoma kinase translocation was evaluated by fluorescent in situ hybridization. Results The study cohort included 99 patients (61% males, 94% smokers) comprising 31 squamous and 68 nonsquamous cell carcinomas. We detected 227 nonsilent variants in the coding sequence, including 24 samples with nonoverlapping, classic driver alterations. The frequency of driver mutations was not significantly different from that of whites, and no association was found between genetic ancestry and the presence of somatic mutations. Copy number alteration analysis disclosed distinguishable amplifications in the 3q chromosome arm in squamous cell carcinomas and pointed toward a handful of targetable alterations. We also found frequent SMARCA4 mutations and protein loss, mostly in driver-negative tumors. Conclusion Our data suggest that African American ancestry may not be significantly different from European/white background for the presence of somatic driver mutations in NSCLC. Furthermore, we demonstrated that using a comprehensive genotyping approach could identify numerous targetable alterations, with potential impact on therapeutic decisions. PMID:25918285
Amores, Angel; Catchen, Julian; Ferrara, Allyse; Fontenot, Quenton; Postlethwait, John H.
2011-01-01
Genomic resources for hundreds of species of evolutionary, agricultural, economic, and medical importance are unavailable due to the expense of well-assembled genome sequences and difficulties with multigenerational studies. Teleost fish provide many models for human disease but possess anciently duplicated genomes that sometimes obfuscate connectivity. Genomic information representing a fish lineage that diverged before the teleost genome duplication (TGD) would provide an outgroup for exploring the mechanisms of evolution after whole-genome duplication. We exploited massively parallel DNA sequencing to develop meiotic maps with thrift and speed by genotyping F1 offspring of a single female and a single male spotted gar (Lepisosteus oculatus) collected directly from nature utilizing only polymorphisms existing in these two wild individuals. Using Stacks, software that automates the calling of genotypes from polymorphisms assayed by Illumina sequencing, we constructed a map containing 8406 markers. RNA-seq on two map-cross larvae provided a reference transcriptome that identified nearly 1000 mapped protein-coding markers and allowed genome-wide analysis of conserved synteny. Results showed that the gar lineage diverged from teleosts before the TGD and its genome is organized more similarly to that of humans than teleosts. Thus, spotted gar provides a critical link between medical models in teleost fish, to which gar is biologically similar, and humans, to which gar is genomically similar. Application of our F1 dense mapping strategy to species with no prior genome information promises to facilitate comparative genomics and provide a scaffold for ordering the numerous contigs arising from next generation genome sequencing. PMID:21828280
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.
Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel
2013-08-01
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce
Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel
2013-01-01
Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS – a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive. PMID:24187650
Real-time electron dynamics for massively parallel excited-state simulations
NASA Astrophysics Data System (ADS)
Andrade, Xavier
The simulation of the real-time dynamics of electrons, based on time dependent density functional theory (TDDFT), is a powerful approach to study electronic excited states in molecular and crystalline systems. What makes the method attractive is its flexibility to simulate different kinds of phenomena beyond the linear-response regime, including strongly-perturbed electronic systems and non-adiabatic electron-ion dynamics. Electron-dynamics simulations are also attractive from a computational point of view. They can run efficiently on massively parallel architectures due to the low communication requirements. Our implementations of electron dynamics, based on the codes Octopus (real-space) and Qball (plane-waves), allow us to simulate systems composed of thousands of atoms and to obtain good parallel scaling up to 1.6 million processor cores. Due to the versatility of real-time electron dynamics and its parallel performance, we expect it to become the method of choice to apply the capabilities of exascale supercomputers for the simulation of electronic excited states.
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe
A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations
Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; ...
2017-11-14
A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations
NASA Astrophysics Data System (ADS)
Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; Gagliardi, Laura; de Jong, Wibe A.
2017-11-01
A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals for the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. The chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.
The NASA Neutron Star Grand Challenge: The coalescences of Neutron Star Binary System
NASA Astrophysics Data System (ADS)
Suen, Wai-Mo
1998-04-01
NASA funded a Grand Challenge Project (9/1996-1999) for the development of a multi-purpose numerical treatment for relativistic astrophysics and gravitational wave astronomy. The coalescence of binary neutron stars is chosen as the model problem for the code development. The institutes involved in it are the Argonne Lab, Livermore lab, Max-Planck Institute at Potsdam, StonyBrook, U of Illinois and Washington U. We have recently succeeded in constructing a highly optimized parallel code which is capable of solving the full Einstein equations coupled with relativistic hydrodynamics, running at over 50 GFLOPS on a T3E (the second milestone point of the project). We are presently working on the head-on collisions of two neutron stars, and the inclusion of realistic equations of state into the code. The code will be released to the relativity and astrophysics community in April of 1998. With the full dynamics of the spacetime, relativistic hydro and microphysics all combined into a unified 3D code for the first time, many interesting large scale calculations in general relativistic astrophysics can now be carried out on massively parallel computers.
Magnetosphere simulations with a high-performance 3D AMR MHD Code
NASA Astrophysics Data System (ADS)
Gombosi, Tamas; Dezeeuw, Darren; Groth, Clinton; Powell, Kenneth; Song, Paul
1998-11-01
BATS-R-US is a high-performance 3D AMR MHD code for space physics applications running on massively parallel supercomputers. In BATS-R-US the electromagnetic and fluid equations are solved with a high-resolution upwind numerical scheme in a tightly coupled manner. The code is very robust and it is capable of spanning a wide range of plasma parameters (such as β, acoustic and Alfvénic Mach numbers). Our code is highly scalable: it achieved a sustained performance of 233 GFLOPS on a Cray T3E-1200 supercomputer with 1024 PEs. This talk reports results from the BATS-R-US code for the GGCM (Geospace General Circularculation Model) Phase 1 Standard Model Suite. This model suite contains 10 different steady-state configurations: 5 IMF clock angles (north, south, and three equally spaced angles in- between) with 2 IMF field strengths for each angle (5 nT and 10 nT). The other parameters are: solar wind speed =400 km/sec; solar wind number density = 5 protons/cc; Hall conductance = 0; Pedersen conductance = 5 S; parallel conductivity = ∞.
NASA Technical Reports Server (NTRS)
Treinish, Lloyd A.; Gough, Michael L.; Wildenhain, W. David
1987-01-01
The capability was developed of rapidly producing visual representations of large, complex, multi-dimensional space and earth sciences data sets via the implementation of computer graphics modeling techniques on the Massively Parallel Processor (MPP) by employing techniques recently developed for typically non-scientific applications. Such capabilities can provide a new and valuable tool for the understanding of complex scientific data, and a new application of parallel computing via the MPP. A prototype system with such capabilities was developed and integrated into the National Space Science Data Center's (NSSDC) Pilot Climate Data System (PCDS) data-independent environment for computer graphics data display to provide easy access to users. While developing these capabilities, several problems had to be solved independently of the actual use of the MPP, all of which are outlined.
A cost-effective methodology for the design of massively-parallel VLSI functional units
NASA Technical Reports Server (NTRS)
Venkateswaran, N.; Sriram, G.; Desouza, J.
1993-01-01
In this paper we propose a generalized methodology for the design of cost-effective massively-parallel VLSI Functional Units. This methodology is based on a technique of generating and reducing a massive bit-array on the mask-programmable PAcube VLSI array. This methodology unifies (maintains identical data flow and control) the execution of complex arithmetic functions on PAcube arrays. It is highly regular, expandable and uniform with respect to problem-size and wordlength, thereby reducing the communication complexity. The memory-functional unit interface is regular and expandable. Using this technique functional units of dedicated processors can be mask-programmed on the naked PAcube arrays, reducing the turn-around time. The production cost of such dedicated processors can be drastically reduced since the naked PAcube arrays can be mass-produced. Analysis of the the performance of functional units designed by our method yields promising results.
Spacecraft charging analysis with the implicit particle-in-cell code iPic3D
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deca, J.; Lapenta, G.; Marchand, R.
2013-10-15
We present the first results on the analysis of spacecraft charging with the implicit particle-in-cell code iPic3D, designed for running on massively parallel supercomputers. The numerical algorithm is presented, highlighting the implementation of the electrostatic solver and the immersed boundary algorithm; the latter which creates the possibility to handle complex spacecraft geometries. As a first step in the verification process, a comparison is made between the floating potential obtained with iPic3D and with Orbital Motion Limited theory for a spherical particle in a uniform stationary plasma. Second, the numerical model is verified for a CubeSat benchmark by comparing simulation resultsmore » with those of PTetra for space environment conditions with increasing levels of complexity. In particular, we consider spacecraft charging from plasma particle collection, photoelectron and secondary electron emission. The influence of a background magnetic field on the floating potential profile near the spacecraft is also considered. Although the numerical approaches in iPic3D and PTetra are rather different, good agreement is found between the two models, raising the level of confidence in both codes to predict and evaluate the complex plasma environment around spacecraft.« less
Massive black hole and gas dynamics in galaxy nuclei mergers - I. Numerical implementation
NASA Astrophysics Data System (ADS)
Lupi, Alessandro; Haardt, Francesco; Dotti, Massimo
2015-01-01
Numerical effects are known to plague adaptive mesh refinement (AMR) codes when treating massive particles, e.g. representing massive black holes (MBHs). In an evolving background, they can experience strong, spurious perturbations and then follow unphysical orbits. We study by means of numerical simulations the dynamical evolution of a pair MBHs in the rapidly and violently evolving gaseous and stellar background that follows a galaxy major merger. We confirm that spurious numerical effects alter the MBH orbits in AMR simulations, and show that numerical issues are ultimately due to a drop in the spatial resolution during the simulation, drastically reducing the accuracy in the gravitational force computation. We therefore propose a new refinement criterion suited for massive particles, able to solve in a fast and precise way for their orbits in highly dynamical backgrounds. The new refinement criterion we designed enforces the region around each massive particle to remain at the maximum resolution allowed, independently upon the local gas density. Such maximally resolved regions then follow the MBHs along their orbits, and effectively avoids all spurious effects caused by resolution changes. Our suite of high-resolution, AMR hydrodynamic simulations, including different prescriptions for the sub-grid gas physics, shows that the new refinement implementation has the advantage of not altering the physical evolution of the MBHs, accounting for all the non-trivial physical processes taking place in violent dynamical scenarios, such as the final stages of a galaxy major merger.
Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul
2010-04-27
A massively parallel computer system contains an inter-nodal communications network of node-to-node links. An automated routing strategy routes packets through one or more intermediate nodes of the network to reach a final destination. The default routing strategy is altered responsive to detection of overutilization of a particular path of one or more links, and at least some traffic is re-routed by distributing the traffic among multiple paths (which may include the default path). An alternative path may require a greater number of link traversals to reach the destination node.
A Massively Parallel Bayesian Approach to Planetary Protection Trajectory Analysis and Design
NASA Technical Reports Server (NTRS)
Wallace, Mark S.
2015-01-01
The NASA Planetary Protection Office has levied a requirement that the upper stage of future planetary launches have a less than 10(exp -4) chance of impacting Mars within 50 years after launch. A brute-force approach requires a decade of computer time to demonstrate compliance. By using a Bayesian approach and taking advantage of the demonstrated reliability of the upper stage, the required number of fifty-year propagations can be massively reduced. By spreading the remaining embarrassingly parallel Monte Carlo simulations across multiple computers, compliance can be demonstrated in a reasonable time frame. The method used is described here.
Systems and methods for rapid processing and storage of data
Stalzer, Mark A.
2017-01-24
Systems and methods of building massively parallel computing systems using low power computing complexes in accordance with embodiments of the invention are disclosed. A massively parallel computing system in accordance with one embodiment of the invention includes at least one Solid State Blade configured to communicate via a high performance network fabric. In addition, each Solid State Blade includes a processor configured to communicate with a plurality of low power computing complexes interconnected by a router, and each low power computing complex includes at least one general processing core, an accelerator, an I/O interface, and cache memory and is configured to communicate with non-volatile solid state memory.
Neupauerová, Jana; Grečmalová, Dagmar; Seeman, Pavel; Laššuthová, Petra
2016-05-01
We describe a patient with early onset severe axonal Charcot-Marie-Tooth disease (CMT2) with dominant inheritance, in whom Sanger sequencing failed to detect a mutation in the mitofusin 2 (MFN2) gene because of a single nucleotide polymorphism (rs2236057) under the PCR primer sequence. The severe early onset phenotype and the family history with severely affected mother (died after delivery) was very suggestive of CMT2A and this suspicion was finally confirmed by a MFN2 mutation. The mutation p.His361Tyr was later detected in the patient by massively parallel sequencing with a gene panel for hereditary neuropathies. According to this information, new primers for amplification and sequencing were designed which bind away from the polymorphic sites of the patient's DNA. Sanger sequencing with these new primers then confirmed the heterozygous mutation in the MFN2 gene in this patient. This case report shows that massively parallel sequencing may in some rare cases be more sensitive than Sanger sequencing and highlights the importance of accurate primer design which requires special attention. © 2016 John Wiley & Sons Ltd/University College London.
Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato; Takeyama, Haruko
2017-07-12
Massively parallel single-cell genome sequencing is required to further understand genetic diversities in complex biological systems. Whole genome amplification (WGA) is the first step for single-cell sequencing, but its throughput and accuracy are insufficient in conventional reaction platforms. Here, we introduce single droplet multiple displacement amplification (sd-MDA), a method that enables massively parallel amplification of single cell genomes while maintaining sequence accuracy and specificity. Tens of thousands of single cells are compartmentalized in millions of picoliter droplets and then subjected to lysis and WGA by passive droplet fusion in microfluidic channels. Because single cells are isolated in compartments, their genomes are amplified to saturation without contamination. This enables the high-throughput acquisition of contamination-free and cell specific sequence reads from single cells (21,000 single-cells/h), resulting in enhancement of the sequence data quality compared to conventional methods. This method allowed WGA of both single bacterial cells and human cancer cells. The obtained sequencing coverage rivals those of conventional techniques with superior sequence quality. In addition, we also demonstrate de novo assembly of uncultured soil bacteria and obtain draft genomes from single cell sequencing. This sd-MDA is promising for flexible and scalable use in single-cell sequencing.
A Performance Evaluation of the Cray X1 for Scientific Applications
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak; Borrill, Julian; Canning, Andrew; Carter, Jonathan; Djomehri, M. Jahed; Shan, Hongzhang; Skinner, David
2003-01-01
The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end capability and capacity computers because of their generality, scalability, and cost effectiveness. However, the recent development of massively parallel vector systems is having a significant effect on the supercomputing landscape. In this paper, we compare the performance of the recently-released Cray X1 vector system with that of the cacheless NEC SX-6 vector machine, and the superscalar cache-based IBM Power3 and Power4 architectures for scientific applications. Overall results demonstrate that the X1 is quite promising, but performance improvements are expected as the hardware, systems software, and numerical libraries mature. Code reengineering to effectively utilize the complex architecture may also lead to significant efficiency enhancements.
Freiberger, Manuel; Egger, Herbert; Liebmann, Manfred; Scharfetter, Hermann
2011-11-01
Image reconstruction in fluorescence optical tomography is a three-dimensional nonlinear ill-posed problem governed by a system of partial differential equations. In this paper we demonstrate that a combination of state of the art numerical algorithms and a careful hardware optimized implementation allows to solve this large-scale inverse problem in a few seconds on standard desktop PCs with modern graphics hardware. In particular, we present methods to solve not only the forward but also the non-linear inverse problem by massively parallel programming on graphics processors. A comparison of optimized CPU and GPU implementations shows that the reconstruction can be accelerated by factors of about 15 through the use of the graphics hardware without compromising the accuracy in the reconstructed images.
NASA Astrophysics Data System (ADS)
Vrugt, J. A.
2012-12-01
In the past decade much progress has been made in the treatment of uncertainty in earth systems modeling. Whereas initial approaches has focused mostly on quantification of parameter and predictive uncertainty, recent methods attempt to disentangle the effects of parameter, forcing (input) data, model structural and calibration data errors. In this talk I will highlight some of our recent work involving theory, concepts and applications of Bayesian parameter and/or state estimation. In particular, new methods for sequential Monte Carlo (SMC) and Markov Chain Monte Carlo (MCMC) simulation will be presented with emphasis on massively parallel distributed computing and quantification of model structural errors. The theoretical and numerical developments will be illustrated using model-data synthesis problems in hydrology, hydrogeology and geophysics.
Aerodynamic simulation on massively parallel systems
NASA Technical Reports Server (NTRS)
Haeuser, Jochem; Simon, Horst D.
1992-01-01
This paper briefly addresses the computational requirements for the analysis of complete configurations of aircraft and spacecraft currently under design to be used for advanced transportation in commercial applications as well as in space flight. The discussion clearly shows that massively parallel systems are the only alternative which is both cost effective and on the other hand can provide the necessary TeraFlops, needed to satisfy the narrow design margins of modern vehicles. It is assumed that the solution of the governing physical equations, i.e., the Navier-Stokes equations which may be complemented by chemistry and turbulence models, is done on multiblock grids. This technique is situated between the fully structured approach of classical boundary fitted grids and the fully unstructured tetrahedra grids. A fully structured grid best represents the flow physics, while the unstructured grid gives best geometrical flexibility. The multiblock grid employed is structured within a block, but completely unstructured on the block level. While a completely unstructured grid is not straightforward to parallelize, the above mentioned multiblock grid is inherently parallel, in particular for multiple instruction multiple datastream (MIMD) machines. In this paper guidelines are provided for setting up or modifying an existing sequential code so that a direct parallelization on a massively parallel system is possible. Results are presented for three parallel systems, namely the Intel hypercube, the Ncube hypercube, and the FPS 500 system. Some preliminary results for an 8K CM2 machine will also be mentioned. The code run is the two dimensional grid generation module of Grid, which is a general two dimensional and three dimensional grid generation code for complex geometries. A system of nonlinear Poisson equations is solved. This code is also a good testcase for complex fluid dynamics codes, since the same datastructures are used. All systems provided good speedups, but message passing MIMD systems seem to be best suited for large miltiblock applications.
Application of high-performance computing to numerical simulation of human movement
NASA Technical Reports Server (NTRS)
Anderson, F. C.; Ziegler, J. M.; Pandy, M. G.; Whalen, R. T.
1995-01-01
We have examined the feasibility of using massively-parallel and vector-processing supercomputers to solve large-scale optimization problems for human movement. Specifically, we compared the computational expense of determining the optimal controls for the single support phase of gait using a conventional serial machine (SGI Iris 4D25), a MIMD parallel machine (Intel iPSC/860), and a parallel-vector-processing machine (Cray Y-MP 8/864). With the human body modeled as a 14 degree-of-freedom linkage actuated by 46 musculotendinous units, computation of the optimal controls for gait could take up to 3 months of CPU time on the Iris. Both the Cray and the Intel are able to reduce this time to practical levels. The optimal solution for gait can be found with about 77 hours of CPU on the Cray and with about 88 hours of CPU on the Intel. Although the overall speeds of the Cray and the Intel were found to be similar, the unique capabilities of each machine are better suited to different portions of the computational algorithm used. The Intel was best suited to computing the derivatives of the performance criterion and the constraints whereas the Cray was best suited to parameter optimization of the controls. These results suggest that the ideal computer architecture for solving very large-scale optimal control problems is a hybrid system in which a vector-processing machine is integrated into the communication network of a MIMD parallel machine.
High-performance parallel analysis of coupled problems for aircraft propulsion
NASA Technical Reports Server (NTRS)
Felippa, C. A.; Farhat, C.; Chen, P.-S.; Gumaste, U.; Leoinne, M.; Stern, P.
1995-01-01
This research program deals with the application of high-performance computing methods to the numerical simulation of complete jet engines. The program was initiated in 1993 by applying two-dimensional parallel aeroelastic codes to the interior gas flow problem of a by-pass jet engine. The fluid mesh generation, domain decomposition and solution capabilities were successfully tested. Attention was then focused on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by these structural displacements. The latter is treated by an ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field fluid elements. New partitioned analysis procedures to treat this coupled 3-component problem were developed in 1994. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers. For the global steady-state axisymmetric analysis of a complete engine we have decided to use the NASA-sponsored ENG10 program, which uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor for parallel versions of ENG10 has been developed. It is planned to use the steady-state global solution provided by ENG10 as input to a localized three-dimensional FSI analysis for engine regions where aeroelastic effects may be important.
A Survey of Parallel Computing
1988-07-01
Evaluating Two Massively Parallel Machines. Communications of the ACM .9, , , 176 BIBLIOGRAPHY 29, 8 (August), pp. 752-758. Gajski , D.D., Padua, D.A., Kuck...Computer Architecture, edited by Gajski , D. D., Milutinovic, V. M. Siegel, H. J. and Furht, B. P. IEEE Computer Society Press, Washington, D.C., pp. 387-407
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Keni; Yamamoto, Hajime; Pruess, Karsten
2008-02-15
TMVOC-MP is a massively parallel version of the TMVOC code (Pruess and Battistelli, 2002), a numerical simulator for three-phase non-isothermal flow of water, gas, and a multicomponent mixture of volatile organic chemicals (VOCs) in multidimensional heterogeneous porous/fractured media. TMVOC-MP was developed by introducing massively parallel computing techniques into TMVOC. It retains the physical process model of TMVOC, designed for applications to contamination problems that involve hydrocarbon fuels or organic solvents in saturated and unsaturated zones. TMVOC-MP can model contaminant behavior under 'natural' environmental conditions, as well as for engineered systems, such as soil vapor extraction, groundwater pumping, or steam-assisted sourcemore » remediation. With its sophisticated parallel computing techniques, TMVOC-MP can handle much larger problems than TMVOC, and can be much more computationally efficient. TMVOC-MP models multiphase fluid systems containing variable proportions of water, non-condensible gases (NCGs), and water-soluble volatile organic chemicals (VOCs). The user can specify the number and nature of NCGs and VOCs. There are no intrinsic limitations to the number of NCGs or VOCs, although the arrays for fluid components are currently dimensioned as 20, accommodating water plus 19 components that may be either NCGs or VOCs. Among them, NCG arrays are dimensioned as 10. The user may select NCGs from a data bank provided in the software. The currently available choices include O{sub 2}, N{sub 2}, CO{sub 2}, CH{sub 4}, ethane, ethylene, acetylene, and air (a pseudo-component treated with properties averaged from N{sub 2} and O{sub 2}). Thermophysical property data of VOCs can be selected from a chemical data bank, included with TMVOC-MP, that provides parameters for 26 commonly encountered chemicals. Users also can input their own data for other fluids. The fluid components may partition (volatilize and/or dissolve) among gas, aqueous, and NAPL phases. Any combination of the three phases may present, and phases may appear and disappear in the course of a simulation. In addition, VOCs may be adsorbed by the porous medium, and may biodegrade according to a simple half-life model. Detailed discussion of physical processes, assumptions, and fluid properties used in TMVOC-MP can be found in the TMVOC user's guide (Pruess and Battistelli, 2002). TMVOC-MP was developed based on the parallel framework of the TOUGH2-MP code (Zhang et al. 2001, Wu et al. 2002). It uses the MPI (Message Passing Forum, 1994) for parallel implementation. A domain decomposition approach is adopted for the parallelization. The code partitions a simulation domain, defined by an unstructured grid, using partitioning algorithm from the METIS software package (Karypsis and Kumar, 1998). In parallel simulation, each processor is in charge of one part of the simulation domain for assembling mass and energy balance equations, solving linear equation systems, updating thermophysical properties, and performing other local computations. The local linear-equation systems are solved in parallel by multiple processors with the Aztec linear solver package (Tuminaro et al., 1999). Although each processor solves the linearized equations of subdomains independently, the entire linear equation system is solved together by all processors collaboratively via communication between neighboring processors during each iteration. Detailed discussion of the prototype of the data-exchange scheme can be found in Elmroth et al. (2001). In addition, FORTRAN 90 features are introduced to TMVOC-MP, such as dynamic memory allocation, array operation, matrix manipulation, and replacing 'common blocks' (used in the original TMVOC) with modules. All new subroutines are written in FORTRAN 90. Program units imported from the original TMVOC remain in standard FORTRAN 77. This report provides a quick starting guide for using the TMVOC-MP program. We suppose that the users have basic knowledge of using the original TMVOC code. The users can find the detailed technical description of the physical processes modeled, and the mathematical and numerical methods in the user's guide for TMVOC (Pruess and Battistelli, 2002).« less
Computational mechanics analysis tools for parallel-vector supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Baddourah, Majdi; Qin, Jiangning
1993-01-01
Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigensolution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization search analysis and domain decomposition. The source code for many of these algorithms is available.
Parallel Preconditioning for CFD Problems on the CM-5
NASA Technical Reports Server (NTRS)
Simon, Horst D.; Kremenetsky, Mark D.; Richardson, John; Lasinski, T. A. (Technical Monitor)
1994-01-01
Up to today, preconditioning methods on massively parallel systems have faced a major difficulty. The most successful preconditioning methods in terms of accelerating the convergence of the iterative solver such as incomplete LU factorizations are notoriously difficult to implement on parallel machines for two reasons: (1) the actual computation of the preconditioner is not very floating-point intensive, but requires a large amount of unstructured communication, and (2) the application of the preconditioning matrix in the iteration phase (i.e. triangular solves) are difficult to parallelize because of the recursive nature of the computation. Here we present a new approach to preconditioning for very large, sparse, unsymmetric, linear systems, which avoids both difficulties. We explicitly compute an approximate inverse to our original matrix. This new preconditioning matrix can be applied most efficiently for iterative methods on massively parallel machines, since the preconditioning phase involves only a matrix-vector multiplication, with possibly a dense matrix. Furthermore the actual computation of the preconditioning matrix has natural parallelism. For a problem of size n, the preconditioning matrix can be computed by solving n independent small least squares problems. The algorithm and its implementation on the Connection Machine CM-5 are discussed in detail and supported by extensive timings obtained from real problem data.
Massive parallelization of serial inference algorithms for a complex generalized linear model
Suchard, Marc A.; Simpson, Shawn E.; Zorych, Ivan; Ryan, Patrick; Madigan, David
2014-01-01
Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this paper we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety. PMID:25328363
Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; ...
2015-12-21
This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Somemore » specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000 ® problems. These benchmark and scaling studies show promising results.« less
Crystal MD: The massively parallel molecular dynamics software for metal with BCC structure
NASA Astrophysics Data System (ADS)
Hu, Changjun; Bai, He; He, Xinfu; Zhang, Boyao; Nie, Ningming; Wang, Xianmeng; Ren, Yingwen
2017-02-01
Material irradiation effect is one of the most important keys to use nuclear power. However, the lack of high-throughput irradiation facility and knowledge of evolution process, lead to little understanding of the addressed issues. With the help of high-performance computing, we could make a further understanding of micro-level-material. In this paper, a new data structure is proposed for the massively parallel simulation of the evolution of metal materials under irradiation environment. Based on the proposed data structure, we developed the new molecular dynamics software named Crystal MD. The simulation with Crystal MD achieved over 90% parallel efficiency in test cases, and it takes more than 25% less memory on multi-core clusters than LAMMPS and IMD, which are two popular molecular dynamics simulation software. Using Crystal MD, a two trillion particles simulation has been performed on Tianhe-2 cluster.
Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip
2014-02-28
In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations.
Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip
2015-01-01
In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations. PMID:26512230
Grindon, Christina; Harris, Sarah; Evans, Tom; Novik, Keir; Coveney, Peter; Laughton, Charles
2004-07-15
Molecular modelling played a central role in the discovery of the structure of DNA by Watson and Crick. Today, such modelling is done on computers: the more powerful these computers are, the more detailed and extensive can be the study of the dynamics of such biological macromolecules. To fully harness the power of modern massively parallel computers, however, we need to develop and deploy algorithms which can exploit the structure of such hardware. The Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is a scalable molecular dynamics code including long-range Coulomb interactions, which has been specifically designed to function efficiently on parallel platforms. Here we describe the implementation of the AMBER98 force field in LAMMPS and its validation for molecular dynamics investigations of DNA structure and flexibility against the benchmark of results obtained with the long-established code AMBER6 (Assisted Model Building with Energy Refinement, version 6). Extended molecular dynamics simulations on the hydrated DNA dodecamer d(CTTTTGCAAAAG)(2), which has previously been the subject of extensive dynamical analysis using AMBER6, show that it is possible to obtain excellent agreement in terms of static, dynamic and thermodynamic parameters between AMBER6 and LAMMPS. In comparison with AMBER6, LAMMPS shows greatly improved scalability in massively parallel environments, opening up the possibility of efficient simulations of order-of-magnitude larger systems and/or for order-of-magnitude greater simulation times.
NASA Astrophysics Data System (ADS)
Hayano, Akira; Ishii, Eiichi
2016-10-01
This study investigates the mechanical relationship between bedding-parallel and bedding-oblique faults in a Neogene massive siliceous mudstone at the site of the Horonobe Underground Research Laboratory (URL) in Hokkaido, Japan, on the basis of observations of drill-core recovered from pilot boreholes and fracture mapping on shaft and gallery walls. Four bedding-parallel faults with visible fault gouge, named respectively the MM Fault, the Last MM Fault, the S1 Fault, and the S2 Fault (stratigraphically, from the highest to the lowest), were observed in two pilot boreholes (PB-V01 and SAB-1). The distribution of the bedding-parallel faults at 350 m depth in the Horonobe URL indicates that these faults are spread over at least several tens of meters in parallel along a bedding plane. The observation that the bedding-oblique fault displaces the Last MM fault is consistent with the previous interpretation that the bedding- oblique faults formed after the bedding-parallel faults. In addition, the bedding-parallel faults terminate near the MM and S1 faults, indicating that the bedding-parallel faults with visible fault gouge act to terminate the propagation of younger bedding-oblique faults. In particular, the MM and S1 faults, which have a relatively thick fault gouge, appear to have had a stronger control on the propagation of bedding-oblique faults than did the Last MM fault, which has a relatively thin fault gouge.
3-D readout-electronics packaging for high-bandwidth massively paralleled imager
Kwiatkowski, Kris; Lyke, James
2007-12-18
Dense, massively parallel signal processing electronics are co-packaged behind associated sensor pixels. Microchips containing a linear or bilinear arrangement of photo-sensors, together with associated complex electronics, are integrated into a simple 3-D structure (a "mirror cube"). An array of photo-sensitive cells are disposed on a stacked CMOS chip's surface at a 45.degree. angle from light reflecting mirror surfaces formed on a neighboring CMOS chip surface. Image processing electronics are held within the stacked CMOS chip layers. Electrical connections couple each of said stacked CMOS chip layers and a distribution grid, the connections for distributing power and signals to components associated with each stacked CSMO chip layer.
Scalable load balancing for massively parallel distributed Monte Carlo particle transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Brien, M. J.; Brantley, P. S.; Joy, K. I.
2013-07-01
In order to run computer simulations efficiently on massively parallel computers with hundreds of thousands or millions of processors, care must be taken that the calculation is load balanced across the processors. Examining the workload of every processor leads to an unscalable algorithm, with run time at least as large as O(N), where N is the number of processors. We present a scalable load balancing algorithm, with run time 0(log(N)), that involves iterated processor-pair-wise balancing steps, ultimately leading to a globally balanced workload. We demonstrate scalability of the algorithm up to 2 million processors on the Sequoia supercomputer at Lawrencemore » Livermore National Laboratory. (authors)« less
Gooding, Thomas Michael; McCarthy, Patrick Joseph
2010-03-02
A data collector for a massively parallel computer system obtains call-return stack traceback data for multiple nodes by retrieving partial call-return stack traceback data from each node, grouping the nodes in subsets according to the partial traceback data, and obtaining further call-return stack traceback data from a representative node or nodes of each subset. Preferably, the partial data is a respective instruction address from each node, nodes having identical instruction address being grouped together in the same subset. Preferably, a single node of each subset is chosen and full stack traceback data is retrieved from the call-return stack within the chosen node.
Gooding, Thomas Michael [Rochester, MN
2011-04-19
An analytical mechanism for a massively parallel computer system automatically analyzes data retrieved from the system, and identifies nodes which exhibit anomalous behavior in comparison to their immediate neighbors. Preferably, anomalous behavior is determined by comparing call-return stack tracebacks for each node, grouping like nodes together, and identifying neighboring nodes which do not themselves belong to the group. A node, not itself in the group, having a large number of neighbors in the group, is a likely locality of error. The analyzer preferably presents this information to the user by sorting the neighbors according to number of adjoining members of the group.
Estimating water flow through a hillslope using the massively parallel processor
NASA Technical Reports Server (NTRS)
Devaney, Judy E.; Camillo, P. J.; Gurney, R. J.
1988-01-01
A new two-dimensional model of water flow in a hillslope has been implemented on the Massively Parallel Processor at the Goddard Space Flight Center. Flow in the soil both in the saturated and unsaturated zones, evaporation and overland flow are all modelled, and the rainfall rates are allowed to vary spatially. Previous models of this type had always been very limited computationally. This model takes less than a minute to model all the components of the hillslope water flow for a day. The model can now be used in sensitivity studies to specify which measurements should be taken and how accurate they should be to describe such flows for environmental studies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chiang, Patrick
2014-01-31
The research goal of this CAREER proposal is to develop energy-efficient, VLSI interconnect circuits and systems that will facilitate future massively-parallel, high-performance computing. Extreme-scale computing will exhibit massive parallelism on multiple vertical levels, from thou sands of computational units on a single processor to thousands of processors in a single data center. Unfortunately, the energy required to communicate between these units at every level (on chip, off-chip, off-rack) will be the critical limitation to energy efficiency. Therefore, the PI's career goal is to become a leading researcher in the design of energy-efficient VLSI interconnect for future computing systems.
De novo assembly of human genomes with massively parallel short read sequencing.
Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue; Qian, Wubin; Fang, Xiaodong; Shi, Zhongbin; Li, Yingrui; Li, Shengting; Shan, Gao; Kristiansen, Karsten; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun
2010-02-01
Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.
Parallel computations and control of adaptive structures
NASA Technical Reports Server (NTRS)
Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)
1991-01-01
The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.
A Generic Mesh Data Structure with Parallel Applications
ERIC Educational Resources Information Center
Cochran, William Kenneth, Jr.
2009-01-01
High performance, massively-parallel multi-physics simulations are built on efficient mesh data structures. Most data structures are designed from the bottom up, focusing on the implementation of linear algebra routines. In this thesis, we explore a top-down approach to design, evaluating the various needs of many aspects of simulation, not just…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Uhr, L.
1987-01-01
This book is written by research scientists involved in the development of massively parallel, but hierarchically structured, algorithms, architectures, and programs for image processing, pattern recognition, and computer vision. The book gives an integrated picture of the programs and algorithms that are being developed, and also of the multi-computer hardware architectures for which these systems are designed.
DICE/ColDICE: 6D collisionless phase space hydrodynamics using a lagrangian tesselation
NASA Astrophysics Data System (ADS)
Sousbie, Thierry
2018-01-01
DICE is a C++ template library designed to solve collisionless fluid dynamics in 6D phase space using massively parallel supercomputers via an hybrid OpenMP/MPI parallelization. ColDICE, based on DICE, implements a cosmological and physical VLASOV-POISSON solver for cold systems such as dark matter (CDM) dynamics.
A Review of Lightweight Thread Approaches for High Performance Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castello, Adrian; Pena, Antonio J.; Seo, Sangmin
High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonlyfound patterns in current parallel codes. Moreover, wemore » study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns and that those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.« less
NASA Technical Reports Server (NTRS)
Reif, John H.
1987-01-01
A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.
Computational mechanics analysis tools for parallel-vector supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.; Qin, J.
1993-01-01
Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigen-solution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization algorithm and domain decomposition. The source code for many of these algorithms is available from NASA Langley.
High order parallel numerical schemes for solving incompressible flows
NASA Technical Reports Server (NTRS)
Lin, Avi; Milner, Edward J.; Liou, May-Fun; Belch, Richard A.
1992-01-01
The use of parallel computers for numerically solving flow fields has gained much importance in recent years. This paper introduces a new high order numerical scheme for computational fluid dynamics (CFD) specifically designed for parallel computational environments. A distributed MIMD system gives the flexibility of treating different elements of the governing equations with totally different numerical schemes in different regions of the flow field. The parallel decomposition of the governing operator to be solved is the primary parallel split. The primary parallel split was studied using a hypercube like architecture having clusters of shared memory processors at each node. The approach is demonstrated using examples of simple steady state incompressible flows. Future studies should investigate the secondary split because, depending on the numerical scheme that each of the processors applies and the nature of the flow in the specific subdomain, it may be possible for a processor to seek better, or higher order, schemes for its particular subcase.
Simulation of Quantum Many-Body Dynamics for Generic Strongly-Interacting Systems
NASA Astrophysics Data System (ADS)
Meyer, Gregory; Machado, Francisco; Yao, Norman
2017-04-01
Recent experimental advances have enabled the bottom-up assembly of complex, strongly interacting quantum many-body systems from individual atoms, ions, molecules and photons. These advances open the door to studying dynamics in isolated quantum systems as well as the possibility of realizing novel out-of-equilibrium phases of matter. Numerical studies provide insight into these systems; however, computational time and memory usage limit common numerical methods such as exact diagonalization to relatively small Hilbert spaces of dimension 215 . Here we present progress toward a new software package for dynamical time evolution of large generic quantum systems on massively parallel computing architectures. By projecting large sparse Hamiltonians into a much smaller Krylov subspace, we are able to compute the evolution of strongly interacting systems with Hilbert space dimension nearing 230. We discuss and benchmark different design implementations, such as matrix-free methods and GPU based calculations, using both pre-thermal time crystals and the Sachdev-Ye-Kitaev model as examples. We also include a simple symbolic language to describe generic Hamiltonians, allowing simulation of diverse quantum systems without any modification of the underlying C and Fortran code.
MoMaS reactive transport benchmark using PFLOTRAN
NASA Astrophysics Data System (ADS)
Park, H.
2017-12-01
MoMaS benchmark was developed to enhance numerical simulation capability for reactive transport modeling in porous media. The benchmark was published in late September of 2009; it is not taken from a real chemical system, but realistic and numerically challenging tests. PFLOTRAN is a state-of-art massively parallel subsurface flow and reactive transport code that is being used in multiple nuclear waste repository projects at Sandia National Laboratories including Waste Isolation Pilot Plant and Used Fuel Disposition. MoMaS benchmark has three independent tests with easy, medium, and hard chemical complexity. This paper demonstrates how PFLOTRAN is applied to this benchmark exercise and shows results of the easy benchmark test case which includes mixing of aqueous components and surface complexation. Surface complexations consist of monodentate and bidentate reactions which introduces difficulty in defining selectivity coefficient if the reaction applies to a bulk reference volume. The selectivity coefficient becomes porosity dependent for bidentate reaction in heterogeneous porous media. The benchmark is solved by PFLOTRAN with minimal modification to address the issue and unit conversions were made properly to suit PFLOTRAN.
NASA Technical Reports Server (NTRS)
Lopez, Isaac; Follen, Gregory J.; Gutierrez, Richard; Foster, Ian; Ginsburg, Brian; Larsson, Olle; Martin, Stuart; Tuecke, Steven; Woodford, David
2000-01-01
This paper describes a project to evaluate the feasibility of combining Grid and Numerical Propulsion System Simulation (NPSS) technologies, with a view to leveraging the numerous advantages of commodity technologies in a high-performance Grid environment. A team from the NASA Glenn Research Center and Argonne National Laboratory has been studying three problems: a desktop-controlled parameter study using Excel (Microsoft Corporation); a multicomponent application using ADPAC, NPSS, and a controller program-, and an aviation safety application running about 100 jobs in near real time. The team has successfully demonstrated (1) a Common-Object- Request-Broker-Architecture- (CORBA-) to-Globus resource manager gateway that allows CORBA remote procedure calls to be used to control the submission and execution of programs on workstations and massively parallel computers, (2) a gateway from the CORBA Trader service to the Grid information service, and (3) a preliminary integration of CORBA and Grid security mechanisms. We have applied these technologies to two applications related to NPSS, namely a parameter study and a multicomponent simulation.
NASA Technical Reports Server (NTRS)
Kramer, Williams T. C.; Simon, Horst D.
1994-01-01
This tutorial proposes to be a practical guide for the uninitiated to the main topics and themes of high-performance computing (HPC), with particular emphasis to distributed computing. The intent is first to provide some guidance and directions in the rapidly increasing field of scientific computing using both massively parallel and traditional supercomputers. Because of their considerable potential computational power, loosely or tightly coupled clusters of workstations are increasingly considered as a third alternative to both the more conventional supercomputers based on a small number of powerful vector processors, as well as high massively parallel processors. Even though many research issues concerning the effective use of workstation clusters and their integration into a large scale production facility are still unresolved, such clusters are already used for production computing. In this tutorial we will utilize the unique experience made at the NAS facility at NASA Ames Research Center. Over the last five years at NAS massively parallel supercomputers such as the Connection Machines CM-2 and CM-5 from Thinking Machines Corporation and the iPSC/860 (Touchstone Gamma Machine) and Paragon Machines from Intel were used in a production supercomputer center alongside with traditional vector supercomputers such as the Cray Y-MP and C90.
Takano, Yu; Nakata, Kazuto; Yonezawa, Yasushige; Nakamura, Haruki
2016-05-05
A massively parallel program for quantum mechanical-molecular mechanical (QM/MM) molecular dynamics simulation, called Platypus (PLATform for dYnamic Protein Unified Simulation), was developed to elucidate protein functions. The speedup and the parallelization ratio of Platypus in the QM and QM/MM calculations were assessed for a bacteriochlorophyll dimer in the photosynthetic reaction center (DIMER) on the K computer, a massively parallel computer achieving 10 PetaFLOPs with 705,024 cores. Platypus exhibited the increase in speedup up to 20,000 core processors at the HF/cc-pVDZ and B3LYP/cc-pVDZ, and up to 10,000 core processors by the CASCI(16,16)/6-31G** calculations. We also performed excited QM/MM-MD simulations on the chromophore of Sirius (SIRIUS) in water. Sirius is a pH-insensitive and photo-stable ultramarine fluorescent protein. Platypus accelerated on-the-fly excited-state QM/MM-MD simulations for SIRIUS in water, using over 4000 core processors. In addition, it also succeeded in 50-ps (200,000-step) on-the-fly excited-state QM/MM-MD simulations for the SIRIUS in water. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.
High-Performance Parallel Analysis of Coupled Problems for Aircraft Propulsion
NASA Technical Reports Server (NTRS)
Felippa, C. A.; Farhat, C.; Park, K. C.; Gumaste, U.; Chen, P.-S.; Lesoinne, M.; Stern, P.
1996-01-01
This research program dealt with the application of high-performance computing methods to the numerical simulation of complete jet engines. The program was initiated in January 1993 by applying two-dimensional parallel aeroelastic codes to the interior gas flow problem of a bypass jet engine. The fluid mesh generation, domain decomposition and solution capabilities were successfully tested. Attention was then focused on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by these structural displacements. The latter is treated by a ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field fluid elements. New partitioned analysis procedures to treat this coupled three-component problem were developed during 1994 and 1995. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers, including the iPSC-860, Paragon XP/S and the IBM SP2. For the global steady-state axisymmetric analysis of a complete engine we have decided to use the NASA-sponsored ENG10 program, which uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor tor parallel versions of ENG10 was developed. During 1995 and 1996 we developed the capability tor the first full 3D aeroelastic simulation of a multirow engine stage. This capability was tested on the IBM SP2 parallel supercomputer at NASA Ames. Benchmark results were presented at the 1196 Computational Aeroscience meeting.
Long-time atomistic simulations with the Parallel Replica Dynamics method
NASA Astrophysics Data System (ADS)
Perez, Danny
Molecular Dynamics (MD) -- the numerical integration of atomistic equations of motion -- is a workhorse of computational materials science. Indeed, MD can in principle be used to obtain any thermodynamic or kinetic quantity, without introducing any approximation or assumptions beyond the adequacy of the interaction potential. It is therefore an extremely powerful and flexible tool to study materials with atomistic spatio-temporal resolution. These enviable qualities however come at a steep computational price, hence limiting the system sizes and simulation times that can be achieved in practice. While the size limitation can be efficiently addressed with massively parallel implementations of MD based on spatial decomposition strategies, allowing for the simulation of trillions of atoms, the same approach usually cannot extend the timescales much beyond microseconds. In this article, we discuss an alternative parallel-in-time approach, the Parallel Replica Dynamics (ParRep) method, that aims at addressing the timescale limitation of MD for systems that evolve through rare state-to-state transitions. We review the formal underpinnings of the method and demonstrate that it can provide arbitrarily accurate results for any definition of the states. When an adequate definition of the states is available, ParRep can simulate trajectories with a parallel speedup approaching the number of replicas used. We demonstrate the usefulness of ParRep by presenting different examples of materials simulations where access to long timescales was essential to access the physical regime of interest and discuss practical considerations that must be addressed to carry out these simulations. Work supported by the United States Department of Energy (U.S. DOE), Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division.
NASA Astrophysics Data System (ADS)
Cervone, G.; Clemente-Harding, L.; Alessandrini, S.; Delle Monache, L.
2016-12-01
A methodology based on Artificial Neural Networks (ANN) and an Analog Ensemble (AnEn) is presented to generate 72-hour deterministic and probabilistic forecasts of power generated by photovoltaic (PV) power plants using input from a numerical weather prediction model and computed astronomical variables. ANN and AnEn are used individually and in combination to generate forecasts for three solar power plant located in Italy. The computational scalability of the proposed solution is tested using synthetic data simulating 4,450 PV power stations. The NCAR Yellowstone supercomputer is employed to test the parallel implementation of the proposed solution, ranging from 1 node (32 cores) to 4,450 nodes (141,140 cores). Results show that a combined AnEn + ANN solution yields best results, and that the proposed solution is well suited for massive scale computation.
Zhu, Xiang; Zhang, Dianwen
2013-01-01
We present a fast, accurate and robust parallel Levenberg-Marquardt minimization optimizer, GPU-LMFit, which is implemented on graphics processing unit for high performance scalable parallel model fitting processing. GPU-LMFit can provide a dramatic speed-up in massive model fitting analyses to enable real-time automated pixel-wise parametric imaging microscopy. We demonstrate the performance of GPU-LMFit for the applications in superresolution localization microscopy and fluorescence lifetime imaging microscopy. PMID:24130785
NASA Astrophysics Data System (ADS)
Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David
2006-05-01
The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.
CMOS VLSI Layout and Verification of a SIMD Computer
NASA Technical Reports Server (NTRS)
Zheng, Jianqing
1996-01-01
A CMOS VLSI layout and verification of a 3 x 3 processor parallel computer has been completed. The layout was done using the MAGIC tool and the verification using HSPICE. Suggestions for expanding the computer into a million processor network are presented. Many problems that might be encountered when implementing a massively parallel computer are discussed.
Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul
2010-11-16
A massively parallel computer system contains an inter-nodal communications network of node-to-node links. An automated routing strategy routes packets through one or more intermediate nodes of the network to reach a destination. Some packets are constrained to be routed through respective designated transporter nodes, the automated routing strategy determining a path from a respective source node to a respective transporter node, and from a respective transporter node to a respective destination node. Preferably, the source node chooses a routing policy from among multiple possible choices, and that policy is followed by all intermediate nodes. The use of transporter nodes allows greater flexibility in routing.
Parallel design patterns for a low-power, software-defined compressed video encoder
NASA Astrophysics Data System (ADS)
Bruns, Michael W.; Hunt, Martin A.; Prasad, Durga; Gunupudi, Nageswara R.; Sonachalam, Sekar
2011-06-01
Video compression algorithms such as H.264 offer much potential for parallel processing that is not always exploited by the technology of a particular implementation. Consumer mobile encoding devices often achieve real-time performance and low power consumption through parallel processing in Application Specific Integrated Circuit (ASIC) technology, but many other applications require a software-defined encoder. High quality compression features needed for some applications such as 10-bit sample depth or 4:2:2 chroma format often go beyond the capability of a typical consumer electronics device. An application may also need to efficiently combine compression with other functions such as noise reduction, image stabilization, real time clocks, GPS data, mission/ESD/user data or software-defined radio in a low power, field upgradable implementation. Low power, software-defined encoders may be implemented using a massively parallel memory-network processor array with 100 or more cores and distributed memory. The large number of processor elements allow the silicon device to operate more efficiently than conventional DSP or CPU technology. A dataflow programming methodology may be used to express all of the encoding processes including motion compensation, transform and quantization, and entropy coding. This is a declarative programming model in which the parallelism of the compression algorithm is expressed as a hierarchical graph of tasks with message communication. Data parallel and task parallel design patterns are supported without the need for explicit global synchronization control. An example is described of an H.264 encoder developed for a commercially available, massively parallel memorynetwork processor device.
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce.
Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng
2013-11-01
The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS - a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing.
A hybrid interface tracking - level set technique for multiphase flow with soluble surfactant
NASA Astrophysics Data System (ADS)
Shin, Seungwon; Chergui, Jalel; Juric, Damir; Kahouadji, Lyes; Matar, Omar K.; Craster, Richard V.
2018-04-01
A formulation for soluble surfactant transport in multiphase flows recently presented by Muradoglu and Tryggvason (JCP 274 (2014) 737-757) [17] is adapted to the context of the Level Contour Reconstruction Method, LCRM, (Shin et al. IJNMF 60 (2009) 753-778, [8]) which is a hybrid method that combines the advantages of the Front-tracking and Level Set methods. Particularly close attention is paid to the formulation and numerical implementation of the surface gradients of surfactant concentration and surface tension. Various benchmark tests are performed to demonstrate the accuracy of different elements of the algorithm. To verify surfactant mass conservation, values for surfactant diffusion along the interface are compared with the exact solution for the problem of uniform expansion of a sphere. The numerical implementation of the discontinuous boundary condition for the source term in the bulk concentration is compared with the approximate solution. Surface tension forces are tested for Marangoni drop translation. Our numerical results for drop deformation in simple shear are compared with experiments and results from previous simulations. All benchmarking tests compare well with existing data thus providing confidence that the adapted LCRM formulation for surfactant advection and diffusion is accurate and effective in three-dimensional multiphase flows with a structured mesh. We also demonstrate that this approach applies easily to massively parallel simulations.
Ordered fast fourier transforms on a massively parallel hypercube multiprocessor
NASA Technical Reports Server (NTRS)
Tong, Charles; Swarztrauber, Paul N.
1989-01-01
Design alternatives for ordered Fast Fourier Transformation (FFT) algorithms were examined on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end, the order and computational phases of the FFT were combined, and the sequence to processor maps that reduce communication were used. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, standard-order and A-order which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 exp r elements and the hypercube has P = 2 exp d processors, then a standard-order FFT can be implemented with d + r/2 + 1 parallel transmissions. An A-order sequence can be transformed with 2d - r/2 parallel transmissions which is r - d + 1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine.
Liwo, Adam; Ołdziej, Stanisław; Czaplewski, Cezary; Kleinerman, Dana S.; Blood, Philip; Scheraga, Harold A.
2010-01-01
We report the implementation of our united-residue UNRES force field for simulations of protein structure and dynamics with massively parallel architectures. In addition to coarse-grained parallelism already implemented in our previous work, in which each conformation was treated by a different task, we introduce a fine-grained level in which energy and gradient evaluation are split between several tasks. The Message Passing Interface (MPI) libraries have been utilized to construct the parallel code. The parallel performance of the code has been tested on a professional Beowulf cluster (Xeon Quad Core), a Cray XT3 supercomputer, and two IBM BlueGene/P supercomputers with canonical and replica-exchange molecular dynamics. With IBM BlueGene/P, about 50 % efficiency and 120-fold speed-up of the fine-grained part was achieved for a single trajectory of a 767-residue protein with use of 256 processors/trajectory. Because of averaging over the fast degrees of freedom, UNRES provides an effective 1000-fold speed-up compared to the experimental time scale and, therefore, enables us to effectively carry out millisecond-scale simulations of proteins with 500 and more amino-acid residues in days of wall-clock time. PMID:20305729
Optimisation of a parallel ocean general circulation model
NASA Astrophysics Data System (ADS)
Beare, M. I.; Stevens, D. P.
1997-10-01
This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.
Graphics Processing Unit Assisted Thermographic Compositing
NASA Technical Reports Server (NTRS)
Ragasa, Scott; McDougal, Matthew; Russell, Sam
2012-01-01
Objective: To develop a software application utilizing general purpose graphics processing units (GPUs) for the analysis of large sets of thermographic data. Background: Over the past few years, an increasing effort among scientists and engineers to utilize the GPU in a more general purpose fashion is allowing for supercomputer level results at individual workstations. As data sets grow, the methods to work them grow at an equal, and often great, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU to allow for throughput that was previously reserved for compute clusters. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Signal (image) processing is one area were GPUs are being used to greatly increase the performance of certain algorithms and analysis techniques. Technical Methodology/Approach: Apply massively parallel algorithms and data structures to the specific analysis requirements presented when working with thermographic data sets.
Particle simulation of plasmas on the massively parallel processor
NASA Technical Reports Server (NTRS)
Gledhill, I. M. A.; Storey, L. R. O.
1987-01-01
Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.
A Massively Parallel Computational Method of Reading Index Files for SOAPsnv.
Zhu, Xiaoqian; Peng, Shaoliang; Liu, Shaojie; Cui, Yingbo; Gu, Xiang; Gao, Ming; Fang, Lin; Fang, Xiaodong
2015-12-01
SOAPsnv is the software used for identifying the single nucleotide variation in cancer genes. However, its performance is yet to match the massive amount of data to be processed. Experiments reveal that the main performance bottleneck of SOAPsnv software is the pileup algorithm. The original pileup algorithm's I/O process is time-consuming and inefficient to read input files. Moreover, the scalability of the pileup algorithm is also poor. Therefore, we designed a new algorithm, named BamPileup, aiming to improve the performance of sequential read, and the new pileup algorithm implemented a parallel read mode based on index. Using this method, each thread can directly read the data start from a specific position. The results of experiments on the Tianhe-2 supercomputer show that, when reading data in a multi-threaded parallel I/O way, the processing time of algorithm is reduced to 3.9 s and the application program can achieve a speedup up to 100×. Moreover, the scalability of the new algorithm is also satisfying.
Quantum supercharger library: hyper-parallelism of the Hartree-Fock method.
Fernandes, Kyle D; Renison, C Alicia; Naidoo, Kevin J
2015-07-05
We present here a set of algorithms that completely rewrites the Hartree-Fock (HF) computations common to many legacy electronic structure packages (such as GAMESS-US, GAMESS-UK, and NWChem) into a massively parallel compute scheme that takes advantage of hardware accelerators such as Graphical Processing Units (GPUs). The HF compute algorithm is core to a library of routines that we name the Quantum Supercharger Library (QSL). We briefly evaluate the QSL's performance and report that it accelerates a HF 6-31G Self-Consistent Field (SCF) computation by up to 20 times for medium sized molecules (such as a buckyball) when compared with mature Central Processing Unit algorithms available in the legacy codes in regular use by researchers. It achieves this acceleration by massive parallelization of the one- and two-electron integrals and optimization of the SCF and Direct Inversion in the Iterative Subspace routines through the use of GPU linear algebra libraries. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Numerical simulation of disperse particle flows on a graphics processing unit
NASA Astrophysics Data System (ADS)
Sierakowski, Adam J.
In both nature and technology, we commonly encounter solid particles being carried within fluid flows, from dust storms to sediment erosion and from food processing to energy generation. The motion of uncountably many particles in highly dynamic flow environments characterizes the tremendous complexity of such phenomena. While methods exist for the full-scale numerical simulation of such systems, current computational capabilities require the simplification of the numerical task with significant approximation using closure models widely recognized as insufficient. There is therefore a fundamental need for the investigation of the underlying physical processes governing these disperse particle flows. In the present work, we develop a new tool based on the Physalis method for the first-principles numerical simulation of thousands of particles (a small fraction of an entire disperse particle flow system) in order to assist in the search for new reduced-order closure models. We discuss numerous enhancements to the efficiency and stability of the Physalis method, which introduces the influence of spherical particles to a fixed-grid incompressible Navier-Stokes flow solver using a local analytic solution to the flow equations. Our first-principles investigation demands the modeling of unresolved length and time scales associated with particle collisions. We introduce a collision model alongside Physalis, incorporating lubrication effects and proposing a new nonlinearly damped Hertzian contact model. By reproducing experimental studies from the literature, we document extensive validation of the methods. We discuss the implementation of our methods for massively parallel computation using a graphics processing unit (GPU). We combine Eulerian grid-based algorithms with Lagrangian particle-based algorithms to achieve computational throughput up to 90 times faster than the legacy implementation of Physalis for a single central processing unit. By avoiding all data communication between the GPU and the host system during the simulation, we utilize with great efficacy the GPU hardware with which many high performance computing systems are currently equipped. We conclude by looking forward to the future of Physalis with multi-GPU parallelization in order to perform resolved disperse flow simulations of more than 100,000 particles and further advance the development of reduced-order closure models.
Applications of Parallel Process HiMAP for Large Scale Multidisciplinary Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Potsdam, Mark; Rodriguez, David; Kwak, Dochay (Technical Monitor)
2000-01-01
HiMAP is a three level parallel middleware that can be interfaced to a large scale global design environment for code independent, multidisciplinary analysis using high fidelity equations. Aerospace technology needs are rapidly changing. Computational tools compatible with the requirements of national programs such as space transportation are needed. Conventional computation tools are inadequate for modern aerospace design needs. Advanced, modular computational tools are needed, such as those that incorporate the technology of massively parallel processors (MPP).
Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Kwak, Dochan (Technical Monitor)
2002-01-01
A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel supercomputers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.
Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Byun, Chansup; Kwak, Dochan (Technical Monitor)
2001-01-01
A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel super computers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.
Adaptive parallel logic networks
NASA Technical Reports Server (NTRS)
Martinez, Tony R.; Vidal, Jacques J.
1988-01-01
Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.
Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, A.; Kabel, A.; Lee, L.
In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).
Experience in highly parallel processing using DAP
NASA Technical Reports Server (NTRS)
Parkinson, D.
1987-01-01
Distributed Array Processors (DAP) have been in day to day use for ten years and a large amount of user experience has been gained. The profile of user applications is similar to that of the Massively Parallel Processor (MPP) working group. Experience has shown that contrary to expectations, highly parallel systems provide excellent performance on so-called dirty problems such as the physics part of meteorological codes. The reasons for this observation are discussed. The arguments against replacing bit processors with floating point processors are also discussed.
Dynamic load balancing of applications
Wheat, Stephen R.
1997-01-01
An application-level method for dynamically maintaining global load balance on a parallel computer, particularly on massively parallel MIMD computers. Global load balancing is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated.
Practical aspects of prestack depth migration with finite differences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ober, C.C.; Oldfield, R.A.; Womble, D.E.
1997-07-01
Finite-difference, prestack, depth migrations offers significant improvements over Kirchhoff methods in imaging near or under salt structures. The authors have implemented a finite-difference prestack depth migration algorithm for use on massively parallel computers which is discussed. The image quality of the finite-difference scheme has been investigated and suggested improvements are discussed. In this presentation, the authors discuss an implicit finite difference migration code, called Salvo, that has been developed through an ACTI (Advanced Computational Technology Initiative) joint project. This code is designed to be efficient on a variety of massively parallel computers. It takes advantage of both frequency and spatialmore » parallelism as well as the use of nodes dedicated to data input/output (I/O). Besides giving an overview of the finite-difference algorithm and some of the parallelism techniques used, migration results using both Kirchhoff and finite-difference migration will be presented and compared. The authors start out with a very simple Cartoon model where one can intuitively see the multiple travel paths and some of the potential problems that will be encountered with Kirchhoff migration. More complex synthetic models as well as results from actual seismic data from the Gulf of Mexico will be shown.« less
Massively Parallel Dantzig-Wolfe Decomposition Applied to Traffic Flow Scheduling
NASA Technical Reports Server (NTRS)
Rios, Joseph Lucio; Ross, Kevin
2009-01-01
Optimal scheduling of air traffic over the entire National Airspace System is a computationally difficult task. To speed computation, Dantzig-Wolfe decomposition is applied to a known linear integer programming approach for assigning delays to flights. The optimization model is proven to have the block-angular structure necessary for Dantzig-Wolfe decomposition. The subproblems for this decomposition are solved in parallel via independent computation threads. Experimental evidence suggests that as the number of subproblems/threads increases (and their respective sizes decrease), the solution quality, convergence, and runtime improve. A demonstration of this is provided by using one flight per subproblem, which is the finest possible decomposition. This results in thousands of subproblems and associated computation threads. This massively parallel approach is compared to one with few threads and to standard (non-decomposed) approaches in terms of solution quality and runtime. Since this method generally provides a non-integral (relaxed) solution to the original optimization problem, two heuristics are developed to generate an integral solution. Dantzig-Wolfe followed by these heuristics can provide a near-optimal (sometimes optimal) solution to the original problem hundreds of times faster than standard (non-decomposed) approaches. In addition, when massive decomposition is employed, the solution is shown to be more likely integral, which obviates the need for an integerization step. These results indicate that nationwide, real-time, high fidelity, optimal traffic flow scheduling is achievable for (at least) 3 hour planning horizons.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Madduri, Kamesh; Ediger, David; Jiang, Karl
2009-02-15
We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 millionmore » vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Madduri, Kamesh; Ediger, David; Jiang, Karl
2009-05-29
We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor.more » For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
Learning Quantitative Sequence-Function Relationships from Massively Parallel Experiments
NASA Astrophysics Data System (ADS)
Atwal, Gurinder S.; Kinney, Justin B.
2016-03-01
A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships—functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes"—directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.
NASA Astrophysics Data System (ADS)
Zhao, Feng; Frietman, Edward E. E.; Han, Zhong; Chen, Ray T.
1999-04-01
A characteristic feature of a conventional von Neumann computer is that computing power is delivered by a single processing unit. Although increasing the clock frequency improves the performance of the computer, the switching speed of the semiconductor devices and the finite speed at which electrical signals propagate along the bus set the boundaries. Architectures containing large numbers of nodes can solve this performance dilemma, with the comment that main obstacles in designing such systems are caused by difficulties to come up with solutions that guarantee efficient communications among the nodes. Exchanging data becomes really a bottleneck should al nodes be connected by a shared resource. Only optics, due to its inherent parallelism, could solve that bottleneck. Here, we explore a multi-faceted free space image distributor to be used in optical interconnects in massively parallel processing. In this paper, physical and optical models of the image distributor are focused on from diffraction theory of light wave to optical simulations. the general features and the performance of the image distributor are also described. The new structure of an image distributor and the simulations for it are discussed. From the digital simulation and experiment, it is found that the multi-faceted free space image distributing technique is quite suitable for free space optical interconnection in massively parallel processing and new structure of the multifaceted free space image distributor would perform better.
Computer Sciences and Data Systems, volume 2
NASA Technical Reports Server (NTRS)
1987-01-01
Topics addressed include: data storage; information network architecture; VHSIC technology; fiber optics; laser applications; distributed processing; spaceborne optical disk controller; massively parallel processors; and advanced digital SAR processors.
The singular behavior of one-loop massive QCD amplitudes with one external soft gluon
NASA Astrophysics Data System (ADS)
Bierenbaum, Isabella; Czakon, Michał; Mitov, Alexander
2012-03-01
We calculate the one-loop correction to the soft-gluon current with massive fermions. This current is process independent and controls the singular behavior of one-loop massive QCD amplitudes in the limit when one external gluon becomes soft. The result derived in this work is the last missing process-independent ingredient needed for numerical evaluation of observables with massive fermions at hadron colliders at the next-to-next-to-leading order.
Collisionless stellar hydrodynamics as an efficient alternative to N-body methods
NASA Astrophysics Data System (ADS)
Mitchell, Nigel L.; Vorobyov, Eduard I.; Hensler, Gerhard
2013-01-01
The dominant constituents of the Universe's matter are believed to be collisionless in nature and thus their modelling in any self-consistent simulation is extremely important. For simulations that deal only with dark matter or stellar systems, the conventional N-body technique is fast, memory efficient and relatively simple to implement. However when extending simulations to include the effects of gas physics, mesh codes are at a distinct disadvantage compared to Smooth Particle Hydrodynamics (SPH) codes. Whereas implementing the N-body approach into SPH codes is fairly trivial, the particle-mesh technique used in mesh codes to couple collisionless stars and dark matter to the gas on the mesh has a series of significant scientific and technical limitations. These include spurious entropy generation resulting from discreteness effects, poor load balancing and increased communication overhead which spoil the excellent scaling in massively parallel grid codes. In this paper we propose the use of the collisionless Boltzmann moment equations as a means to model the collisionless material as a fluid on the mesh, implementing it into the massively parallel FLASH Adaptive Mesh Refinement (AMR) code. This approach which we term `collisionless stellar hydrodynamics' enables us to do away with the particle-mesh approach and since the parallelization scheme is identical to that used for the hydrodynamics, it preserves the excellent scaling of the FLASH code already demonstrated on peta-flop machines. We find that the classic hydrodynamic equations and the Boltzmann moment equations can be reconciled under specific conditions, allowing us to generate analytic solutions for collisionless systems using conventional test problems. We confirm the validity of our approach using a suite of demanding test problems, including the use of a modified Sod shock test. By deriving the relevant eigenvalues and eigenvectors of the Boltzmann moment equations, we are able to use high order accurate characteristic tracing methods with Riemann solvers to generate numerical solutions which show excellent agreement with our analytic solutions. We conclude by demonstrating the ability of our code to model complex phenomena by simulating the evolution of a two-armed spiral galaxy whose properties agree with those predicted by the swing amplification theory.
NASA Astrophysics Data System (ADS)
Lange, Jacob; O'Shaughnessy, Richard; Healy, James; Lousto, Carlos; Shoemaker, Deirdre; Lovelace, Geoffrey; Scheel, Mark; Ossokine, Serguei
2016-03-01
In this talk, we describe a procedure to reconstruct the parameters of sufficiently massive coalescing compact binaries via direct comparison with numerical relativity simulations. For sufficiently massive sources, existing numerical relativity simulations are long enough to cover the observationally accessible part of the signal. Due to the signal's brevity, the posterior parameter distribution it implies is broad, simple, and easily reconstructed from information gained by comparing to only the sparse sample of existing numerical relativity simulations. We describe how followup simulations can corroborate and improve our understanding of a detected source. Since our method can include all physics provided by full numerical relativity simulations of coalescing binaries, it provides a valuable complement to alternative techniques which employ approximations to reconstruct source parameters. Supported by NSF Grant PHY-1505629.
Design considerations for parallel graphics libraries
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1994-01-01
Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++
NASA Technical Reports Server (NTRS)
Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis
1994-01-01
Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul
2010-11-23
A massively parallel computer system contains an inter-nodal communications network of node-to-node links. Nodes vary a choice of routing policy for routing data in the network in a semi-random manner, so that similarly situated packets are not always routed along the same path. Semi-random variation of the routing policy tends to avoid certain local hot spots of network activity, which might otherwise arise using more consistent routing determinations. Preferably, the originating node chooses a routing policy for a packet, and all intermediate nodes in the path route the packet according to that policy. Policies may be rotated on a round-robin basis, selected by generating a random number, or otherwise varied.
Applications of massively parallel computers in telemetry processing
NASA Technical Reports Server (NTRS)
El-Ghazawi, Tarek A.; Pritchard, Jim; Knoble, Gordon
1994-01-01
Telemetry processing refers to the reconstruction of full resolution raw instrumentation data with artifacts, of space and ground recording and transmission, removed. Being the first processing phase of satellite data, this process is also referred to as level-zero processing. This study is aimed at investigating the use of massively parallel computing technology in providing level-zero processing to spaceflights that adhere to the recommendations of the Consultative Committee on Space Data Systems (CCSDS). The workload characteristics, of level-zero processing, are used to identify processing requirements in high-performance computing systems. An example of level-zero functions on a SIMD MPP, such as the MasPar, is discussed. The requirements in this paper are based in part on the Earth Observing System (EOS) Data and Operation System (EDOS).
Genetic heterogeneity of RPMI-8402, a T-acute lymphoblastic leukemia cell line
STOCZYNSKA-FIDELUS, EWELINA; PIASKOWSKI, SYLWESTER; PAWLOWSKA, ROZA; SZYBKA, MALGORZATA; PECIAK, JOANNA; HULAS-BIGOSZEWSKA, KRYSTYNA; WINIECKA-KLIMEK, MARTA; RIESKE, PIOTR
2016-01-01
Thorough examination of genetic heterogeneity of cell lines is uncommon. In order to address this issue, the present study analyzed the genetic heterogeneity of RPMI-8402, a T-acute lymphoblastic leukemia (T-ALL) cell line. For this purpose, traditional techniques such as fluorescence in situ hybridization and immunocytochemistry were used, in addition to more advanced techniques, including cell sorting, Sanger sequencing and massive parallel sequencing. The results indicated that the RPMI-8402 cell line consists of several genetically different cell subpopulations. Furthermore, massive parallel sequencing of RPMI-8402 provided insight into the evolution of T-ALL carcinogenesis, since this cell line exhibited the genetic heterogeneity typical of T-ALL. Therefore, the use of cell lines for drug testing in future studies may aid the progress of anticancer drug research. PMID:26870252
Big data mining analysis method based on cloud computing
NASA Astrophysics Data System (ADS)
Cai, Qing Qiu; Cui, Hong Gang; Tang, Hao
2017-08-01
Information explosion era, large data super-large, discrete and non-(semi) structured features have gone far beyond the traditional data management can carry the scope of the way. With the arrival of the cloud computing era, cloud computing provides a new technical way to analyze the massive data mining, which can effectively solve the problem that the traditional data mining method cannot adapt to massive data mining. This paper introduces the meaning and characteristics of cloud computing, analyzes the advantages of using cloud computing technology to realize data mining, designs the mining algorithm of association rules based on MapReduce parallel processing architecture, and carries out the experimental verification. The algorithm of parallel association rule mining based on cloud computing platform can greatly improve the execution speed of data mining.
Ab initio molecular simulations with numeric atom-centered orbitals
NASA Astrophysics Data System (ADS)
Blum, Volker; Gehrke, Ralf; Hanke, Felix; Havu, Paula; Havu, Ville; Ren, Xinguo; Reuter, Karsten; Scheffler, Matthias
2009-11-01
We describe a complete set of algorithms for ab initio molecular simulations based on numerically tabulated atom-centered orbitals (NAOs) to capture a wide range of molecular and materials properties from quantum-mechanical first principles. The full algorithmic framework described here is embodied in the Fritz Haber Institute "ab initio molecular simulations" (FHI-aims) computer program package. Its comprehensive description should be relevant to any other first-principles implementation based on NAOs. The focus here is on density-functional theory (DFT) in the local and semilocal (generalized gradient) approximations, but an extension to hybrid functionals, Hartree-Fock theory, and MP2/GW electron self-energies for total energies and excited states is possible within the same underlying algorithms. An all-electron/full-potential treatment that is both computationally efficient and accurate is achieved for periodic and cluster geometries on equal footing, including relaxation and ab initio molecular dynamics. We demonstrate the construction of transferable, hierarchical basis sets, allowing the calculation to range from qualitative tight-binding like accuracy to meV-level total energy convergence with the basis set. Since all basis functions are strictly localized, the otherwise computationally dominant grid-based operations scale as O(N) with system size N. Together with a scalar-relativistic treatment, the basis sets provide access to all elements from light to heavy. Both low-communication parallelization of all real-space grid based algorithms and a ScaLapack-based, customized handling of the linear algebra for all matrix operations are possible, guaranteeing efficient scaling (CPU time and memory) up to massively parallel computer systems with thousands of CPUs.
Hybrid massively parallel fast sweeping method for static Hamilton-Jacobi equations
NASA Astrophysics Data System (ADS)
Detrixhe, Miles; Gibou, Frédéric
2016-10-01
The fast sweeping method is a popular algorithm for solving a variety of static Hamilton-Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling, and show state-of-the-art speedup values for the fast sweeping method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lichtner, Peter C.; Hammond, Glenn E.; Lu, Chuan
PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Writtenmore » in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 2 32 = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.« less
Ice-sheet modelling accelerated by graphics cards
NASA Astrophysics Data System (ADS)
Brædstrup, Christian Fredborg; Damsgaard, Anders; Egholm, David Lundbek
2014-11-01
Studies of glaciers and ice sheets have increased the demand for high performance numerical ice flow models over the past decades. When exploring the highly non-linear dynamics of fast flowing glaciers and ice streams, or when coupling multiple flow processes for ice, water, and sediment, researchers are often forced to use super-computing clusters. As an alternative to conventional high-performance computing hardware, the Graphical Processing Unit (GPU) is capable of massively parallel computing while retaining a compact design and low cost. In this study, we present a strategy for accelerating a higher-order ice flow model using a GPU. By applying the newest GPU hardware, we achieve up to 180× speedup compared to a similar but serial CPU implementation. Our results suggest that GPU acceleration is a competitive option for ice-flow modelling when compared to CPU-optimised algorithms parallelised by the OpenMP or Message Passing Interface (MPI) protocols.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Clough, Katy; Figueras, Pau; Finkel, Hal
In this work, we introduce GRChombo: a new numerical relativity code which incorporates full adaptive mesh refinement (AMR) using block structured Berger-Rigoutsos grid generation. The code supports non-trivial 'many-boxes-in-many-boxes' mesh hierarchies and massive parallelism through the message passing interface. GRChombo evolves the Einstein equation using the standard BSSN formalism, with an option to turn on CCZ4 constraint damping if required. The AMR capability permits the study of a range of new physics which has previously been computationally infeasible in a full 3 + 1 setting, while also significantly simplifying the process of setting up the mesh for these problems. Wemore » show that GRChombo can stably and accurately evolve standard spacetimes such as binary black hole mergers and scalar collapses into black holes, demonstrate the performance characteristics of our code, and discuss various physics problems which stand to benefit from the AMR technique.« less
Covering Resilience: A Recent Development for Binomial Checkpointing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Walther, Andrea; Narayanan, Sri Hari Krishna
In terms of computing time, adjoint methods offer a very attractive alternative to compute gradient information, required, e.g., for optimization purposes. However, together with this very favorable temporal complexity result comes a memory requirement that is in essence proportional with the operation count of the underlying function, e.g., if algorithmic differentiation is used to provide the adjoints. For this reason, checkpointing approaches in many variants have become popular. This paper analyzes an extension of the so-called binomial approach to cover also possible failures of the computing systems. Such a measure of precaution is of special interest for massive parallel simulationsmore » and adjoint calculations where the mean time between failure of the large scale computing system is smaller than the time needed to complete the calculation of the adjoint information. We describe the extensions of standard checkpointing approaches required for such resilience, provide a corresponding implementation and discuss first numerical results.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pingenot, J; Rieben, R; White, D
2005-10-31
We present a computational study of signal propagation and attenuation of a 200 MHz planar loop antenna in a cave environment. The cave is modeled as a straight and lossy random rough wall. To simulate a broad frequency band, the full wave Maxwell equations are solved directly in the time domain via a high order vector finite element discretization using the massively parallel CEM code EMSolve. The numerical technique is first verified against theoretical results for a planar loop antenna in a smooth lossy cave. The simulation is then performed for a series of random rough surface meshes in ordermore » to generate statistical data for the propagation and attenuation properties of the antenna in a cave environment. Results for the mean and variance of the power spectral density of the electric field are presented and discussed.« less
Biomorphic architectures for autonomous Nanosat designs
NASA Technical Reports Server (NTRS)
Hasslacher, Brosl; Tilden, Mark W.
1995-01-01
Modern space tool design is the science of making a machine both massively complex while at the same time extremely robust and dependable. We propose a novel nonlinear control technique that produces capable, self-organizing, micron-scale space machines at low cost and in large numbers by parallel silicon assembly. Experiments using biomorphic architectures (with ideal space attributes) have produced a wide spectrum of survival-oriented machines that are reliably domesticated for work applications in specific environments. In particular, several one-chip satellite prototypes show interesting control properties that can be turned into numerous application-specific machines for autonomous, disposable space tasks. We believe that the real power of these architectures lies in their potential to self-assemble into larger, robust, loosely coupled structures. Assembly takes place at hierarchical space scales, with different attendant properties, allowing for inexpensive solutions to many daunting work tasks. The nature of biomorphic control, design, engineering options, and applications are discussed.
3D simulation of floral oil storage in the scopa of South American insects
NASA Astrophysics Data System (ADS)
Ruettgers, Alexander; Griebel, Michael; Pastrik, Lars; Schmied, Heiko; Wittmann, Dieter; Scherrieble, Andreas; Dinkelmann, Albrecht; Stegmaier, Thomas; InstituteNumerical Simulation Team; Institute of Crop Science; Resource Conservation Team; Institute of Textile Technology; Process Engineering Team
2014-11-01
Several species of bees in South America possess structures to store and transport floral oils. By using closely spaced hairs at their back legs, the so called scopa, these bees can absorb and release oil droplets without loss. The high efficiency of this process is a matter of ongoing research. Basing on recent x-ray microtomography scans from the scopa of these bees at the Institute of Textile Technology and Process Engineering Denkendorf, we build a three-dimensional computer model. Using NaSt3DGPF, a two-phase flow solver developed at the Institute for Numerical Simulation of the University of Bonn, we perform massively parallel flow simulations with the complex micro-CT data. In this talk, we discuss the results of our simulations and the transfer of the x-ray measurement into a computer model. This research was funded under GR 1144/18-1 by the Deutsche Forschungsgemeinschaft (DFG).
Dynamic Imbalance Would Counter Offcenter Thrust
NASA Technical Reports Server (NTRS)
Mccanna, Jason
1994-01-01
Dynamic imbalance generated by offcenter thrust on rotating body eliminated by shifting some of mass of body to generate opposing dynamic imbalance. Technique proposed originally for spacecraft including massive crew module connected via long, lightweight intermediate structure to massive engine module, such that artificial gravitation in crew module generated by rotating spacecraft around axis parallel to thrust generated by engine. Also applicable to dynamic balancing of rotating terrestrial equipment to which offcenter forces applied.
ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains
Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz
2016-01-01
With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity. PMID:27420734
Dynamic load balancing of applications
Wheat, S.R.
1997-05-13
An application-level method for dynamically maintaining global load balance on a parallel computer, particularly on massively parallel MIMD computers is disclosed. Global load balancing is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated. 13 figs.
Computational methods and software systems for dynamics and control of large space structures
NASA Technical Reports Server (NTRS)
Park, K. C.; Felippa, C. A.; Farhat, C.; Pramono, E.
1990-01-01
Two key areas of crucial importance to the computer-based simulation of large space structures are discussed. The first area involves multibody dynamics (MBD) of flexible space structures, with applications directed to deployment, construction, and maneuvering. The second area deals with advanced software systems, with emphasis on parallel processing. The latest research thrust in the second area involves massively parallel computers.
DGDFT: A massively parallel method for large scale density functional theory calculations.
Hu, Wei; Lin, Lin; Yang, Chao
2015-09-28
We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.
Parallel language constructs for tensor product computations on loosely coupled architectures
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush; Van Rosendale, John
1989-01-01
A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. The authors focus on tensor product array computations, a simple but important class of numerical algorithms. They consider first the problem of programming one-dimensional kernel routines, such as parallel tridiagonal solvers, and then look at how such parallel kernels can be combined to form parallel tensor product algorithms.
Seamless contiguity method for parallel segmentation of remote sensing image
NASA Astrophysics Data System (ADS)
Wang, Geng; Wang, Guanghui; Yu, Mei; Cui, Chengling
2015-12-01
Seamless contiguity is the key technology for parallel segmentation of remote sensing data with large quantities. It can be effectively integrate fragments of the parallel processing into reasonable results for subsequent processes. There are numerous methods reported in the literature for seamless contiguity, such as establishing buffer, area boundary merging and data sewing. et. We proposed a new method which was also based on building buffers. The seamless contiguity processes we adopt are based on the principle: ensuring the accuracy of the boundary, ensuring the correctness of topology. Firstly, block number is computed based on data processing ability, unlike establishing buffer on both sides of block line, buffer is established just on the right side and underside of the line. Each block of data is segmented respectively and then gets the segmentation objects and their label value. Secondly, choose one block(called master block) and do stitching on the adjacent blocks(called slave block), process the rest of the block in sequence. Through the above processing, topological relationship and boundaries of master block are guaranteed. Thirdly, if the master block polygons boundaries intersect with buffer boundary and the slave blocks polygons boundaries intersect with block line, we adopt certain rules to merge and trade-offs them. Fourthly, check the topology and boundary in the buffer area. Finally, a set of experiments were conducted and prove the feasibility of this method. This novel seamless contiguity algorithm provides an applicable and practical solution for efficient segmentation of massive remote sensing image.
Ocean Modeling and Visualization on Massively Parallel Computer
NASA Technical Reports Server (NTRS)
Chao, Yi; Li, P. Peggy; Wang, Ping; Katz, Daniel S.; Cheng, Benny N.
1997-01-01
Climate modeling is one of the grand challenges of computational science, and ocean modeling plays an important role in both understanding the current climatic conditions and predicting future climate change.
Wedi, Nils P
2014-06-28
The steady path of doubling the global horizontal resolution approximately every 8 years in numerical weather prediction (NWP) at the European Centre for Medium Range Weather Forecasts may be substantially altered with emerging novel computing architectures. It coincides with the need to appropriately address and determine forecast uncertainty with increasing resolution, in particular, when convective-scale motions start to be resolved. Blunt increases in the model resolution will quickly become unaffordable and may not lead to improved NWP forecasts. Consequently, there is a need to accordingly adjust proven numerical techniques. An informed decision on the modelling strategy for harnessing exascale, massively parallel computing power thus also requires a deeper understanding of the sensitivity to uncertainty--for each part of the model--and ultimately a deeper understanding of multi-scale interactions in the atmosphere and their numerical realization in ultra-high-resolution NWP and climate simulations. This paper explores opportunities for substantial increases in the forecast efficiency by judicious adjustment of the formal accuracy or relative resolution in the spectral and physical space. One path is to reduce the formal accuracy by which the spectral transforms are computed. The other pathway explores the importance of the ratio used for the horizontal resolution in gridpoint space versus wavenumbers in spectral space. This is relevant for both high-resolution simulations as well as ensemble-based uncertainty estimation. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
NASA Technical Reports Server (NTRS)
Swinbank, Richard; Purser, James
2006-01-01
Recent years have seen a resurgence of interest in a variety of non-standard computational grids for global numerical prediction. The motivation has been to reduce problems associated with the converging meridians and the polar singularities of conventional regular latitude-longitude grids. A further impetus has come from the adoption of massively parallel computers, for which it is necessary to distribute work equitably across the processors; this is more practicable for some non-standard grids. Desirable attributes of a grid for high-order spatial finite differencing are: (i) geometrical regularity; (ii) a homogeneous and approximately isotropic spatial resolution; (iii) a low proportion of the grid points where the numerical procedures require special customization (such as near coordinate singularities or grid edges). One family of grid arrangements which, to our knowledge, has never before been applied to numerical weather prediction, but which appears to offer several technical advantages, are what we shall refer to as "Fibonacci grids". They can be thought of as mathematically ideal generalizations of the patterns occurring naturally in the spiral arrangements of seeds and fruit found in sunflower heads and pineapples (to give two of the many botanical examples). These grids possess virtually uniform and highly isotropic resolution, with an equal area for each grid point. There are only two compact singular regions on a sphere that require customized numerics. We demonstrate the practicality of these grids in shallow water simulations, and discuss the prospects for efficiently using these frameworks in three-dimensional semi-implicit and semi-Lagrangian weather prediction or climate models.
Moskalev, Evgeny A; Frohnauer, Judith; Merkelbach-Bruse, Sabine; Schildhaus, Hans-Ulrich; Dimmler, Arno; Schubert, Thomas; Boltze, Carsten; König, Helmut; Fuchs, Florian; Sirbu, Horia; Rieker, Ralf J; Agaimy, Abbas; Hartmann, Arndt; Haller, Florian
2014-06-01
Recurrent gene fusions of anaplastic lymphoma receptor tyrosine kinase (ALK) and echinoderm microtubule-associated protein-like 4 (EML4) have been recently identified in ∼5% of non-small cell lung cancers (NSCLCs) and are targets for selective tyrosine kinase inhibitors. While fluorescent in situ hybridization (FISH) is the current gold standard for detection of EML4-ALK rearrangements, several limitations exist including high costs, time-consuming evaluation and somewhat equivocal interpretation of results. In contrast, targeted massive parallel sequencing has been introduced as a powerful method for simultaneous and sensitive detection of multiple somatic mutations even in limited biopsies, and is currently evolving as the method of choice for molecular diagnostic work-up of NSCLCs. We developed a novel approach for indirect detection of EML4-ALK rearrangements based on 454 massive parallel sequencing after reverse transcription and subsequent multiplex amplification (multiplex ALK RNA-seq) which takes advantage of unbalanced expression of the 5' and 3' ALK mRNA regions. Two lung cancer cell lines and a selected series of 32 NSCLC samples including 11 cases with EML4-ALK rearrangement were analyzed with this novel approach in comparison to ALK FISH, ALK qRT-PCR and EML4-ALK RT-PCR. The H2228 cell line with known EML4-ALK rearrangement showed 171 and 729 reads for 5' and 3' ALK regions, respectively, demonstrating a clearly unbalanced expression pattern. In contrast, the H1299 cell line with ALK wildtype status displayed no reads for both ALK regions. Considering a threshold of 100 reads for 3' ALK region as indirect indicator of EML4-ALK rearrangement, there was 100% concordance between the novel multiplex ALK RNA-seq approach and ALK FISH among all 32 NSCLC samples. Multiplex ALK RNA-seq is a sensitive and specific method for indirect detection of EML4-ALK rearrangements, and can be easily implemented in panel based molecular diagnostic work-up of NSCLCs by massive parallel sequencing. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
Speeding up parallel processing
NASA Technical Reports Server (NTRS)
Denning, Peter J.
1988-01-01
In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, William Michael; Plimpton, Steven James; Wang, Peng
2010-03-01
LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for soft materials (biomolecules, polymers) and solid-state materials (metals, semiconductors) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. The code is designed to be easy to modify or extend with new functionality.
Iterative methods for 3D implicit finite-difference migration using the complex Padé approximation
NASA Astrophysics Data System (ADS)
Costa, Carlos A. N.; Campos, Itamara S.; Costa, Jessé C.; Neto, Francisco A.; Schleicher, Jörg; Novais, Amélia
2013-08-01
Conventional implementations of 3D finite-difference (FD) migration use splitting techniques to accelerate performance and save computational cost. However, such techniques are plagued with numerical anisotropy that jeopardises the correct positioning of dipping reflectors in the directions not used for the operator splitting. We implement 3D downward continuation FD migration without splitting using a complex Padé approximation. In this way, the numerical anisotropy is eliminated at the expense of a computationally more intensive solution of a large-band linear system. We compare the performance of the iterative stabilized biconjugate gradient (BICGSTAB) and that of the multifrontal massively parallel direct solver (MUMPS). It turns out that the use of the complex Padé approximation not only stabilizes the solution, but also acts as an effective preconditioner for the BICGSTAB algorithm, reducing the number of iterations as compared to the implementation using the real Padé expansion. As a consequence, the iterative BICGSTAB method is more efficient than the direct MUMPS method when solving a single term in the Padé expansion. The results of both algorithms, here evaluated by computing the migration impulse response in the SEG/EAGE salt model, are of comparable quality.
Modeling Anisotropic Elastic Wave Propagation in Jointed Rock Masses
NASA Astrophysics Data System (ADS)
Hurley, R.; Vorobiev, O.; Ezzedine, S. M.; Antoun, T.
2016-12-01
We present a numerical approach for determining the anisotropic stiffness of materials with nonlinearly-compliant joints capable of sliding. The proposed method extends existing ones for upscaling the behavior of a medium with open cracks and inclusions to cases relevant to natural fractured and jointed rocks, where nonlinearly-compliant joints can undergo plastic slip. The method deviates from existing techniques by incorporating the friction and closure states of the joints, and recovers an anisotropic elastic form in the small-strain limit when joints are not sliding. We present the mathematical formulation of our method and use Representative Volume Element (RVE) simulations to evaluate its accuracy for joint sets with varying complexity. We then apply the formulation to determine anisotropic elastic constants of jointed granite found at the Nevada Nuclear Security Site (NNSS) where the Source Physics Experiments (SPE), a campaign of underground chemical explosions, are performed. Finally, we discuss the implementation of our numerical approach in a massively parallel Lagrangian code Geodyn-L and its use for studying wave propagation from underground explosions. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Numerical Modeling of Mixing and Venting from Explosions in Bunkers
NASA Astrophysics Data System (ADS)
Liu, Benjamin
2005-07-01
2D and 3D numerical simulations were performed to study the dynamic interaction of explosion products in a concrete bunker with ambient air, stored chemical or biological warfare (CBW) agent simulant, and the surrounding walls and structure. The simulations were carried out with GEODYN, a multi-material, Godunov-based Eulerian code, that employs adaptive mesh refinement and runs efficiently on massively parallel computer platforms. Tabular equations of state were used for all materials with the exception of any high explosives employed, which were characterized with conventional JWL models. An appropriate constitutive model was used to describe the concrete. Interfaces between materials were either tracked with a volume-of-fluid method that used high-order reconstruction to specify the interface location and orientation, or a capturing approach was employed with the assumption of local thermal and mechanical equilibrium. A major focus of the study was to estimate the extent of agent heating that could be obtained prior to venting of the bunker and resultant agent dispersal. Parameters investigated included the bunker construction, agent layout, energy density in the bunker and the yield-to-agent mass ratio. Turbulent mixing was found to be the dominant heat transfer mechanism for heating the agent.
Development of an EMC3-EIRENE Synthetic Imaging Diagnostic
NASA Astrophysics Data System (ADS)
Meyer, William; Allen, Steve; Samuell, Cameron; Lore, Jeremy
2017-10-01
2D and 3D flow measurements are critical for validating numerical codes such as EMC3-EIRENE. Toroidal symmetry assumptions preclude tomographic reconstruction of 3D flows from single camera views. In addition, the resolution of the grids utilized in numerical code models can easily surpass the resolution of physical camera diagnostic geometries. For these reasons we have developed a Synthetic Imaging Diagnostic capability for forward projection comparisons of EMC3-EIRENE model solutions with the line integrated images from the Doppler Coherence Imaging diagnostic on DIII-D. The forward projection matrix is 2.8 Mpixel by 6.4 Mcells for the non-axisymmetric case we present. For flow comparisons, both simple line integral, and field aligned component matrices must be calculated. The calculation of these matrices is a massive embarrassingly parallel problem and performed with a custom dispatcher that allows processing platforms to join mid-problem as they become available, or drop out if resources are needed for higher priority tasks. The matrices are handled using standard sparse matrix techniques. Prepared by LLNL under Contract DE-AC52-07NA27344. This material is based upon work supported by the U.S. DOE, Office of Science, Office of Fusion Energy Sciences. LLNL-ABS-734800.
AdiosStMan: Parallelizing Casacore Table Data System using Adaptive IO System
NASA Astrophysics Data System (ADS)
Wang, R.; Harris, C.; Wicenec, A.
2016-07-01
In this paper, we investigate the Casacore Table Data System (CTDS) used in the casacore and CASA libraries, and methods to parallelize it. CTDS provides a storage manager plugin mechanism for third-party developers to design and implement their own CTDS storage managers. Having this in mind, we looked into various storage backend techniques that can possibly enable parallel I/O for CTDS by implementing new storage managers. After carrying on benchmarks showing the excellent parallel I/O throughput of the Adaptive IO System (ADIOS), we implemented an ADIOS based parallel CTDS storage manager. We then applied the CASA MSTransform frequency split task to verify the ADIOS Storage Manager. We also ran a series of performance tests to examine the I/O throughput in a massively parallel scenario.
Forming spectroscopic massive protobinaries by disc fragmentation
NASA Astrophysics Data System (ADS)
Meyer, D. M.-A.; Kuiper, R.; Kley, W.; Johnston, K. G.; Vorobyov, E.
2018-01-01
The surroundings of massive protostars constitute an accretion disc which has numerically been shown to be subject to fragmentation and responsible for luminous accretion-driven outbursts. Moreover, it is suspected to produce close binary companions which will later strongly influence the star's future evolution in the Hertzsprung-Russel diagram. We present three-dimensional gravitation-radiation-hydrodynamic numerical simulations of 100 M⊙ pre-stellar cores. We find that accretion discs of young massive stars violently fragment without preventing the (highly variable) accretion of gaseous clumps on to the protostars. While acquiring the characteristics of a nascent low-mass companion, some disc fragments migrate on to the central massive protostar with dynamical properties showing that its final Keplerian orbit is close enough to constitute a close massive protobinary system, having a young high- and a low-mass components. We conclude on the viability of the disc fragmentation channel for the formation of such short-period binaries, and that both processes - close massive binary formation and accretion bursts - may happen at the same time. FU-Orionis-type bursts, such as observed in the young high-mass star S255IR-NIRS3, may not only indicate ongoing disc fragmentation, but also be considered as a tracer for the formation of close massive binaries - progenitors of the subsequent massive spectroscopic binaries - once the high-mass component of the system will enter the main-sequence phase of its evolution. Finally, we investigate the Atacama Large (sub-)Millimeter Array observability of the disc fragments.
Dubinett - Targeted Sequencing 2012 — EDRN Public Portal
we propose to use targeted massively parallel DNA sequencing to identify somatic alterations within mutational hotspots in matched sets of primary lung tumors, premalignant lesions, and adjacent,histologically normal lung tissue.
Visualization of Octree Adaptive Mesh Refinement (AMR) in Astrophysical Simulations
NASA Astrophysics Data System (ADS)
Labadens, M.; Chapon, D.; Pomaréde, D.; Teyssier, R.
2012-09-01
Computer simulations are important in current cosmological research. Those simulations run in parallel on thousands of processors, and produce huge amount of data. Adaptive mesh refinement is used to reduce the computing cost while keeping good numerical accuracy in regions of interest. RAMSES is a cosmological code developed by the Commissariat à l'énergie atomique et aux énergies alternatives (English: Atomic Energy and Alternative Energies Commission) which uses Octree adaptive mesh refinement. Compared to grid based AMR, the Octree AMR has the advantage to fit very precisely the adaptive resolution of the grid to the local problem complexity. However, this specific octree data type need some specific software to be visualized, as generic visualization tools works on Cartesian grid data type. This is why the PYMSES software has been also developed by our team. It relies on the python scripting language to ensure a modular and easy access to explore those specific data. In order to take advantage of the High Performance Computer which runs the RAMSES simulation, it also uses MPI and multiprocessing to run some parallel code. We would like to present with more details our PYMSES software with some performance benchmarks. PYMSES has currently two visualization techniques which work directly on the AMR. The first one is a splatting technique, and the second one is a custom ray tracing technique. Both have their own advantages and drawbacks. We have also compared two parallel programming techniques with the python multiprocessing library versus the use of MPI run. The load balancing strategy has to be smartly defined in order to achieve a good speed up in our computation. Results obtained with this software are illustrated in the context of a massive, 9000-processor parallel simulation of a Milky Way-like galaxy.
NASA Astrophysics Data System (ADS)
Wichert, Viktoria; Arkenberg, Mario; Hauschildt, Peter H.
2016-10-01
Highly resolved state-of-the-art 3D atmosphere simulations will remain computationally extremely expensive for years to come. In addition to the need for more computing power, rethinking coding practices is necessary. We take a dual approach by introducing especially adapted, parallel numerical methods and correspondingly parallelizing critical code passages. In the following, we present our respective work on PHOENIX/3D. With new parallel numerical algorithms, there is a big opportunity for improvement when iteratively solving the system of equations emerging from the operator splitting of the radiative transfer equation J = ΛS. The narrow-banded approximate Λ-operator Λ* , which is used in PHOENIX/3D, occurs in each iteration step. By implementing a numerical algorithm which takes advantage of its characteristic traits, the parallel code's efficiency is further increased and a speed-up in computational time can be achieved.
NASA Astrophysics Data System (ADS)
Newman, Gregory A.; Commer, Michael
2009-07-01
Three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical conductivity mapping of potential offshore oil and gas reservoirs. The imaging technology employs controlled source electromagnetic (CSEM) and magnetotelluric (MT) fields and treats geological media exhibiting transverse anisotropy. Moreover when combined with established seismic methods, direct imaging of reservoir fluids is possible. Because of the size of the 3D conductivity imaging problem, strategies are required exploiting computational parallelism and optimal meshing. The algorithm thus developed has been shown to scale to tens of thousands of processors. In one imaging experiment, 32,768 tasks/processors on the IBM Watson Research Blue Gene/L supercomputer were successfully utilized. Over a 24 hour period we were able to image a large scale field data set that previously required over four months of processing time on distributed clusters based on Intel or AMD processors utilizing 1024 tasks on an InfiniBand fabric. Electrical conductivity imaging using massively parallel computational resources produces results that cannot be obtained otherwise and are consistent with timeframes required for practical exploration problems.
NASA Astrophysics Data System (ADS)
Calafiura, Paolo; Leggett, Charles; Seuster, Rolf; Tsulaia, Vakhtang; Van Gemmeren, Peter
2015-12-01
AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write mechanisms, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows the running of AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the diversity of ATLAS event processing workloads on various computing resources: Grid, opportunistic resources and HPC.
Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array
NASA Astrophysics Data System (ADS)
Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul
2008-04-01
This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.
GPU-accelerated Tersoff potentials for massively parallel Molecular Dynamics simulations
NASA Astrophysics Data System (ADS)
Nguyen, Trung Dac
2017-03-01
The Tersoff potential is one of the empirical many-body potentials that has been widely used in simulation studies at atomic scales. Unlike pair-wise potentials, the Tersoff potential involves three-body terms, which require much more arithmetic operations and data dependency. In this contribution, we have implemented the GPU-accelerated version of several variants of the Tersoff potential for LAMMPS, an open-source massively parallel Molecular Dynamics code. Compared to the existing MPI implementation in LAMMPS, the GPU implementation exhibits a better scalability and offers a speedup of 2.2X when run on 1000 compute nodes on the Titan supercomputer. On a single node, the speedup ranges from 2.0 to 8.0 times, depending on the number of atoms per GPU and hardware configurations. The most notable features of our GPU-accelerated version include its design for MPI/accelerator heterogeneous parallelism, its compatibility with other functionalities in LAMMPS, its ability to give deterministic results and to support both NVIDIA CUDA- and OpenCL-enabled accelerators. Our implementation is now part of the GPU package in LAMMPS and accessible for public use.
Progress report on PIXIE3D, a fully implicit 3D extended MHD solver
NASA Astrophysics Data System (ADS)
Chacon, Luis
2008-11-01
Recently, invited talk at DPP07 an optimal, massively parallel implicit algorithm for 3D resistive magnetohydrodynamics (PIXIE3D) was demonstrated. Excellent algorithmic and parallel results were obtained with up to 4096 processors and 138 million unknowns. While this is a remarkable result, further developments are still needed for PIXIE3D to become a 3D extended MHD production code in general geometries. In this poster, we present an update on the status of PIXIE3D on several fronts. On the physics side, we will describe our progress towards the full Braginskii model, including: electron Hall terms, anisotropic heat conduction, and gyroviscous corrections. Algorithmically, we will discuss progress towards a robust, optimal, nonlinear solver for arbitrary geometries, including preconditioning for the new physical effects described, the implementation of a coarse processor-grid solver (to maintain optimal algorithmic performance for an arbitrarily large number of processors in massively parallel computations), and of a multiblock capability to deal with complicated geometries. L. Chac'on, Phys. Plasmas 15, 056103 (2008);
Towards implementation of cellular automata in Microbial Fuel Cells.
Tsompanas, Michail-Antisthenis I; Adamatzky, Andrew; Sirakoulis, Georgios Ch; Greenman, John; Ieropoulos, Ioannis
2017-01-01
The Microbial Fuel Cell (MFC) is a bio-electrochemical transducer converting waste products into electricity using microbial communities. Cellular Automaton (CA) is a uniform array of finite-state machines that update their states in discrete time depending on states of their closest neighbors by the same rule. Arrays of MFCs could, in principle, act as massive-parallel computing devices with local connectivity between elementary processors. We provide a theoretical design of such a parallel processor by implementing CA in MFCs. We have chosen Conway's Game of Life as the 'benchmark' CA because this is the most popular CA which also exhibits an enormously rich spectrum of patterns. Each cell of the Game of Life CA is realized using two MFCs. The MFCs are linked electrically and hydraulically. The model is verified via simulation of an electrical circuit demonstrating equivalent behaviours. The design is a first step towards future implementations of fully autonomous biological computing devices with massive parallelism. The energy independence of such devices counteracts their somewhat slow transitions-compared to silicon circuitry-between the different states during computation.
Towards implementation of cellular automata in Microbial Fuel Cells
Adamatzky, Andrew; Sirakoulis, Georgios Ch.; Greenman, John; Ieropoulos, Ioannis
2017-01-01
The Microbial Fuel Cell (MFC) is a bio-electrochemical transducer converting waste products into electricity using microbial communities. Cellular Automaton (CA) is a uniform array of finite-state machines that update their states in discrete time depending on states of their closest neighbors by the same rule. Arrays of MFCs could, in principle, act as massive-parallel computing devices with local connectivity between elementary processors. We provide a theoretical design of such a parallel processor by implementing CA in MFCs. We have chosen Conway’s Game of Life as the ‘benchmark’ CA because this is the most popular CA which also exhibits an enormously rich spectrum of patterns. Each cell of the Game of Life CA is realized using two MFCs. The MFCs are linked electrically and hydraulically. The model is verified via simulation of an electrical circuit demonstrating equivalent behaviours. The design is a first step towards future implementations of fully autonomous biological computing devices with massive parallelism. The energy independence of such devices counteracts their somewhat slow transitions—compared to silicon circuitry—between the different states during computation. PMID:28498871
Hybrid massively parallel fast sweeping method for static Hamilton–Jacobi equations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Detrixhe, Miles, E-mail: mdetrixhe@engineering.ucsb.edu; University of California Santa Barbara, Santa Barbara, CA, 93106; Gibou, Frédéric, E-mail: fgibou@engineering.ucsb.edu
The fast sweeping method is a popular algorithm for solving a variety of static Hamilton–Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling,more » and show state-of-the-art speedup values for the fast sweeping method.« less
Transmissive Nanohole Arrays for Massively-Parallel Optical Biosensing
2015-01-01
A high-throughput optical biosensing technique is proposed and demonstrated. This hybrid technique combines optical transmission of nanoholes with colorimetric silver staining. The size and spacing of the nanoholes are chosen so that individual nanoholes can be independently resolved in massive parallel using an ordinary transmission optical microscope, and, in place of determining a spectral shift, the brightness of each nanohole is recorded to greatly simplify the readout. Each nanohole then acts as an independent sensor, and the blocking of nanohole optical transmission by enzymatic silver staining defines the specific detection of a biological agent. Nearly 10000 nanoholes can be simultaneously monitored under the field of view of a typical microscope. As an initial proof of concept, biotinylated lysozyme (biotin-HEL) was used as a model analyte, giving a detection limit as low as 0.1 ng/mL. PMID:25530982
Large-eddy simulations of compressible convection on massively parallel computers. [stellar physics
NASA Technical Reports Server (NTRS)
Xie, Xin; Toomre, Juri
1993-01-01
We report preliminary implementation of the large-eddy simulation (LES) technique in 2D simulations of compressible convection carried out on the CM-2 massively parallel computer. The convective flow fields in our simulations possess structures similar to those found in a number of direct simulations, with roll-like flows coherent across the entire depth of the layer that spans several density scale heights. Our detailed assessment of the effects of various subgrid scale (SGS) terms reveals that they may affect the gross character of convection. Yet, somewhat surprisingly, we find that our LES solutions, and another in which the SGS terms are turned off, only show modest differences. The resulting 2D flows realized here are rather laminar in character, and achieving substantial turbulence may require stronger forcing and less dissipation.
Contextual classification on the massively parallel processor
NASA Technical Reports Server (NTRS)
Tilton, James C.
1987-01-01
Classifiers are often used to produce land cover maps from multispectral Earth observation imagery. Conventionally, these classifiers have been designed to exploit the spectral information contained in the imagery. Very few classifiers exploit the spatial information content of the imagery, and the few that do rarely exploit spatial information content in conjunction with spectral and/or temporal information. A contextual classifier that exploits spatial and spectral information in combination through a general statistical approach was studied. Early test results obtained from an implementation of the classifier on a VAX-11/780 minicomputer were encouraging, but they are of limited meaning because they were produced from small data sets. An implementation of the contextual classifier is presented on the Massively Parallel Processor (MPP) at Goddard that for the first time makes feasible the testing of the classifier on large data sets.
Brett, Maggie; McPherson, John; Zang, Zhi Jiang; Lai, Angeline; Tan, Ee-Shien; Ng, Ivy; Ong, Lai-Choo; Cham, Breana; Tan, Patrick; Rozen, Steve; Tan, Ene-Choo
2014-01-01
Developmental delay and/or intellectual disability (DD/ID) affects 1–3% of all children. At least half of these are thought to have a genetic etiology. Recent studies have shown that massively parallel sequencing (MPS) using a targeted gene panel is particularly suited for diagnostic testing for genetically heterogeneous conditions. We report on our experiences with using massively parallel sequencing of a targeted gene panel of 355 genes for investigating the genetic etiology of eight patients with a wide range of phenotypes including DD/ID, congenital anomalies and/or autism spectrum disorder. Targeted sequence enrichment was performed using the Agilent SureSelect Target Enrichment Kit and sequenced on the Illumina HiSeq2000 using paired-end reads. For all eight patients, 81–84% of the targeted regions achieved read depths of at least 20×, with average read depths overlapping targets ranging from 322× to 798×. Causative variants were successfully identified in two of the eight patients: a nonsense mutation in the ATRX gene and a canonical splice site mutation in the L1CAM gene. In a third patient, a canonical splice site variant in the USP9X gene could likely explain all or some of her clinical phenotypes. These results confirm the value of targeted MPS for investigating DD/ID in children for diagnostic purposes. However, targeted gene MPS was less likely to provide a genetic diagnosis for children whose phenotype includes autism. PMID:24690944
Eduardoff, M; Gross, T E; Santos, C; de la Puente, M; Ballard, D; Strobl, C; Børsting, C; Morling, N; Fusco, L; Hussing, C; Egyed, B; Souto, L; Uacyisrael, J; Syndercombe Court, D; Carracedo, Á; Lareu, M V; Schneider, P M; Parson, W; Phillips, C; Parson, W; Phillips, C
2016-07-01
The EUROFORGEN Global ancestry-informative SNP (AIM-SNPs) panel is a forensic multiplex of 128 markers designed to differentiate an individual's ancestry from amongst the five continental population groups of Africa, Europe, East Asia, Native America, and Oceania. A custom multiplex of AmpliSeq™ PCR primers was designed for the Global AIM-SNPs to perform massively parallel sequencing using the Ion PGM™ system. This study assessed individual SNP genotyping precision using the Ion PGM™, the forensic sensitivity of the multiplex using dilution series, degraded DNA plus simple mixtures, and the ancestry differentiation power of the final panel design, which required substitution of three original ancestry-informative SNPs with alternatives. Fourteen populations that had not been previously analyzed were genotyped using the custom multiplex and these studies allowed assessment of genotyping performance by comparison of data across five laboratories. Results indicate a low level of genotyping error can still occur from sequence misalignment caused by homopolymeric tracts close to the target SNP, despite careful scrutiny of candidate SNPs at the design stage. Such sequence misalignment required the exclusion of component SNP rs2080161 from the Global AIM-SNPs panel. However, the overall genotyping precision and sensitivity of this custom multiplex indicates the Ion PGM™ assay for the Global AIM-SNPs is highly suitable for forensic ancestry analysis with massively parallel sequencing. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Waugh, Caryll; Cromer, Deborah; Grimm, Andrew; Chopra, Abha; Mallal, Simon; Davenport, Miles; Mak, Johnson
2015-04-09
Massive, parallel sequencing is a potent tool for dissecting the regulation of biological processes by revealing the dynamics of the cellular RNA profile under different conditions. Similarly, massive, parallel sequencing can be used to reveal the complexity of viral quasispecies that are often found in the RNA virus infected host. However, the production of cDNA libraries for next-generation sequencing (NGS) necessitates the reverse transcription of RNA into cDNA and the amplification of the cDNA template using PCR, which may introduce artefact in the form of phantom nucleic acids species that can bias the composition and interpretation of original RNA profiles. Using HIV as a model we have characterised the major sources of error during the conversion of viral RNA to cDNA, namely excess RNA template and the RNaseH activity of the polymerase enzyme, reverse transcriptase. In addition we have analysed the effect of PCR cycle on detection of recombinants and assessed the contribution of transfection of highly similar plasmid DNA to the formation of recombinant species during the production of our control viruses. We have identified RNA template concentrations, RNaseH activity of reverse transcriptase, and PCR conditions as key parameters that must be carefully optimised to minimise chimeric artefacts. Using our optimised RT-PCR conditions, in combination with our modified PCR amplification procedure, we have developed a reliable technique for accurate determination of RNA species using NGS technology.
Sierra Structural Dynamics User's Notes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reese, Garth M.
2015-10-19
Sierra/SD provides a massively parallel implementation of structural dynamics finite element analysis, required for high fidelity, validated models used in modal, vibration, static and shock analysis of weapons systems. This document provides a users guide to the input for Sierra/SD. Details of input specifications for the different solution types, output options, element types and parameters are included. The appendices contain detailed examples, and instructions for running the software on parallel platforms.
Reverse time migration: A seismic processing application on the connection machine
NASA Technical Reports Server (NTRS)
Fiebrich, Rolf-Dieter
1987-01-01
The implementation of a reverse time migration algorithm on the Connection Machine, a massively parallel computer is described. Essential architectural features of this machine as well as programming concepts are presented. The data structures and parallel operations for the implementation of the reverse time migration algorithm are described. The algorithm matches the Connection Machine architecture closely and executes almost at the peak performance of this machine.
Massively-Parallel Architectures for Automatic Recognition of Visual Speech Signals
1988-10-12
Secusrity Clamifieation, Nlassively-Parallel Architectures for Automa ic Recognitio of Visua, Speech Signals 12. PERSONAL AUTHOR(S) Terrence J...characteristics of speech from tJhe, visual speech signals. Neural networks have been trained on a database of vowels. The rqw images of faces , aligned and...images of faces , aligned and preprocessed, were used as input to these network which were trained to estimate the corresponding envelope of the
Solving Navier-Stokes equations on a massively parallel processor; The 1 GFLOP performance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saati, A.; Biringen, S.; Farhat, C.
This paper reports on experience in solving large-scale fluid dynamics problems on the Connection Machine model CM-2. The authors have implemented a parallel version of the MacCormack scheme for the solution of the Navier-Stokes equations. By using triad floating point operations and reducing the number of interprocessor communications, they have achieved a sustained performance rate of 1.42 GFLOPS.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Munday, Lynn Brendon; Day, David M.; Bunting, Gregory
Sierra/SD provides a massively parallel implementation of structural dynamics finite element analysis, required for high fidelity, validated models used in modal, vibration, static and shock analysis of weapons systems. This document provides a users guide to the input for Sierra/SD. Details of input specifications for the different solution types, output options, element types and parameters are included. The appendices contain detailed examples, and instructions for running the software on parallel platforms.
Representing and computing regular languages on massively parallel networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, M.I.; O'Sullivan, J.A.; Boysam, B.
1991-01-01
This paper proposes a general method for incorporating rule-based constraints corresponding to regular languages into stochastic inference problems, thereby allowing for a unified representation of stochastic and syntactic pattern constraints. The authors' approach first established the formal connection of rules to Chomsky grammars, and generalizes the original work of Shannon on the encoding of rule-based channel sequences to Markov chains of maximum entropy. This maximum entropy probabilistic view leads to Gibb's representations with potentials which have their number of minima growing at precisely the exponential rate that the language of deterministically constrained sequences grow. These representations are coupled to stochasticmore » diffusion algorithms, which sample the language-constrained sequences by visiting the energy minima according to the underlying Gibbs' probability law. The coupling to stochastic search methods yields the all-important practical result that fully parallel stochastic cellular automata may be derived to generate samples from the rule-based constraint sets. The production rules and neighborhood state structure of the language of sequences directly determines the necessary connection structures of the required parallel computing surface. Representations of this type have been mapped to the DAP-510 massively-parallel processor consisting of 1024 mesh-connected bit-serial processing elements for performing automated segmentation of electron-micrograph images.« less
NASA Astrophysics Data System (ADS)
Sun, Rui; Xiao, Heng
2016-04-01
With the growth of available computational resource, CFD-DEM (computational fluid dynamics-discrete element method) becomes an increasingly promising and feasible approach for the study of sediment transport. Several existing CFD-DEM solvers are applied in chemical engineering and mining industry. However, a robust CFD-DEM solver for the simulation of sediment transport is still desirable. In this work, the development of a three-dimensional, massively parallel, and open-source CFD-DEM solver SediFoam is detailed. This solver is built based on open-source solvers OpenFOAM and LAMMPS. OpenFOAM is a CFD toolbox that can perform three-dimensional fluid flow simulations on unstructured meshes; LAMMPS is a massively parallel DEM solver for molecular dynamics. Several validation tests of SediFoam are performed using cases of a wide range of complexities. The results obtained in the present simulations are consistent with those in the literature, which demonstrates the capability of SediFoam for sediment transport applications. In addition to the validation test, the parallel efficiency of SediFoam is studied to test the performance of the code for large-scale and complex simulations. The parallel efficiency tests show that the scalability of SediFoam is satisfactory in the simulations using up to O(107) particles.
García-Calvo, Raúl; Guisado, JL; Diaz-del-Rio, Fernando; Córdoba, Antonio; Jiménez-Morales, Francisco
2018-01-01
Understanding the regulation of gene expression is one of the key problems in current biology. A promising method for that purpose is the determination of the temporal dynamics between known initial and ending network states, by using simple acting rules. The huge amount of rule combinations and the nonlinear inherent nature of the problem make genetic algorithms an excellent candidate for finding optimal solutions. As this is a computationally intensive problem that needs long runtimes in conventional architectures for realistic network sizes, it is fundamental to accelerate this task. In this article, we study how to develop efficient parallel implementations of this method for the fine-grained parallel architecture of graphics processing units (GPUs) using the compute unified device architecture (CUDA) platform. An exhaustive and methodical study of various parallel genetic algorithm schemes—master-slave, island, cellular, and hybrid models, and various individual selection methods (roulette, elitist)—is carried out for this problem. Several procedures that optimize the use of the GPU’s resources are presented. We conclude that the implementation that produces better results (both from the performance and the genetic algorithm fitness perspectives) is simulating a few thousands of individuals grouped in a few islands using elitist selection. This model comprises 2 mighty factors for discovering the best solutions: finding good individuals in a short number of generations, and introducing genetic diversity via a relatively frequent and numerous migration. As a result, we have even found the optimal solution for the analyzed gene regulatory network (GRN). In addition, a comparative study of the performance obtained by the different parallel implementations on GPU versus a sequential application on CPU is carried out. In our tests, a multifold speedup was obtained for our optimized parallel implementation of the method on medium class GPU over an equivalent sequential single-core implementation running on a recent Intel i7 CPU. This work can provide useful guidance to researchers in biology, medicine, or bioinformatics in how to take advantage of the parallelization on massively parallel devices and GPUs to apply novel metaheuristic algorithms powered by nature for real-world applications (like the method to solve the temporal dynamics of GRNs). PMID:29662297
García-Calvo, Raúl; Guisado, J L; Diaz-Del-Rio, Fernando; Córdoba, Antonio; Jiménez-Morales, Francisco
2018-01-01
Understanding the regulation of gene expression is one of the key problems in current biology. A promising method for that purpose is the determination of the temporal dynamics between known initial and ending network states, by using simple acting rules. The huge amount of rule combinations and the nonlinear inherent nature of the problem make genetic algorithms an excellent candidate for finding optimal solutions. As this is a computationally intensive problem that needs long runtimes in conventional architectures for realistic network sizes, it is fundamental to accelerate this task. In this article, we study how to develop efficient parallel implementations of this method for the fine-grained parallel architecture of graphics processing units (GPUs) using the compute unified device architecture (CUDA) platform. An exhaustive and methodical study of various parallel genetic algorithm schemes-master-slave, island, cellular, and hybrid models, and various individual selection methods (roulette, elitist)-is carried out for this problem. Several procedures that optimize the use of the GPU's resources are presented. We conclude that the implementation that produces better results (both from the performance and the genetic algorithm fitness perspectives) is simulating a few thousands of individuals grouped in a few islands using elitist selection. This model comprises 2 mighty factors for discovering the best solutions: finding good individuals in a short number of generations, and introducing genetic diversity via a relatively frequent and numerous migration. As a result, we have even found the optimal solution for the analyzed gene regulatory network (GRN). In addition, a comparative study of the performance obtained by the different parallel implementations on GPU versus a sequential application on CPU is carried out. In our tests, a multifold speedup was obtained for our optimized parallel implementation of the method on medium class GPU over an equivalent sequential single-core implementation running on a recent Intel i7 CPU. This work can provide useful guidance to researchers in biology, medicine, or bioinformatics in how to take advantage of the parallelization on massively parallel devices and GPUs to apply novel metaheuristic algorithms powered by nature for real-world applications (like the method to solve the temporal dynamics of GRNs).
Charged scalar perturbations on charged black holes in de Rham-Gabadadze-Tolley massive gravity
NASA Astrophysics Data System (ADS)
Burikham, Piyabut; Ponglertsakul, Supakchai; Tannukij, Lunchakorn
2017-12-01
We explore the quasistationary profile of a massive charged scalar field in a class of charged black holes in de Rham-Gabadadze-Tolley (dRGT) massive gravity. We discuss how the linear term in the metric, which is a unique character of the dRGT massive gravity, affects the structure of the spacetime. Numerical calculations of the quasinormal modes are performed for a charged scalar field in the dRGT black hole background. For an asymptotically de Sitter (dS) black hole, an improved asymptotic iteration method is used to obtain the associated quasinormal frequencies. The unstable modes are found for the ℓ=0 case, and their corresponding real parts satisfy the superradiant condition. For ℓ=2 , the results show that all the de Sitter black holes considered here are stable against a small perturbation. For an asymptotically dRGT anti-de Sitter (AdS) black hole, unstable modes are found with the frequency satisfying the superradiant condition. Effects of massive-gravity parameters are discussed. Analytic calculation reveals the unique diffusive nature of quasinormal modes in the massive-gravity model with the linear term. Numerical results confirm the existence of the characteristic diffusive modes in both the dS and AdS cases.
Parallel Implementation of a High Order Implicit Collocation Method for the Heat Equation
NASA Technical Reports Server (NTRS)
Kouatchou, Jules; Halem, Milton (Technical Monitor)
2000-01-01
We combine a high order compact finite difference approximation and collocation techniques to numerically solve the two dimensional heat equation. The resulting method is implicit arid can be parallelized with a strategy that allows parallelization across both time and space. We compare the parallel implementation of the new method with a classical implicit method, namely the Crank-Nicolson method, where the parallelization is done across space only. Numerical experiments are carried out on the SGI Origin 2000.
Wiggly tails: A gravitational wave signature of massive fields around black holes
NASA Astrophysics Data System (ADS)
Degollado, Juan Carlos; Herdeiro, Carlos A. R.
2014-09-01
Massive fields can exist in long-lived configurations around black holes. We examine how the gravitational wave signal of a perturbed black hole is affected by such "dirtiness" within linear theory. As a concrete example, we consider the gravitational radiation emitted by the infall of a massive scalar field into a Schwarzschild black hole. Whereas part of the scalar field is absorbed/scattered by the black hole and triggers gravitational wave emission, another part lingers in long-lived quasibound states. Solving numerically the Teukolsky master equation for gravitational perturbations coupled to the massive Klein-Gordon equation, we find a characteristic gravitational wave signal, composed by a quasinormal ringing followed by a late time tail. In contrast to "clean" black holes, however, the late time tail contains small amplitude wiggles with the frequency of the dominating quasibound state. Additionally, an observer dependent beating pattern may also be seen. These features were already observed in fully nonlinear studies; our analysis shows they are present at linear level, and, since it reduces to a 1+1 dimensional numerical problem, allows for cleaner numerical data. Moreover, we discuss the power law of the tail and that it only becomes universal sufficiently far away from the dirty black hole. The wiggly tails, by constrast, are a generic feature that may be used as a smoking gun for the presence of massive fields around black holes, either as a linear cloud or as fully nonlinear hair.
A 3D staggered-grid finite difference scheme for poroelastic wave equation
NASA Astrophysics Data System (ADS)
Zhang, Yijie; Gao, Jinghuai
2014-10-01
Three dimensional numerical modeling has been a viable tool for understanding wave propagation in real media. The poroelastic media can better describe the phenomena of hydrocarbon reservoirs than acoustic and elastic media. However, the numerical modeling in 3D poroelastic media demands significantly more computational capacity, including both computational time and memory. In this paper, we present a 3D poroelastic staggered-grid finite difference (SFD) scheme. During the procedure, parallel computing is implemented to reduce the computational time. Parallelization is based on domain decomposition, and communication between processors is performed using message passing interface (MPI). Parallel analysis shows that the parallelized SFD scheme significantly improves the simulation efficiency and 3D decomposition in domain is the most efficient. We also analyze the numerical dispersion and stability condition of the 3D poroelastic SFD method. Numerical results show that the 3D numerical simulation can provide a real description of wave propagation.
Cost-effective GPU-grid for genome-wide epistasis calculations.
Pütz, B; Kam-Thong, T; Karbalai, N; Altmann, A; Müller-Myhsok, B
2013-01-01
Until recently, genotype studies were limited to the investigation of single SNP effects due to the computational burden incurred when studying pairwise interactions of SNPs. However, some genetic effects as simple as coloring (in plants and animals) cannot be ascribed to a single locus but only understood when epistasis is taken into account [1]. It is expected that such effects are also found in complex diseases where many genes contribute to the clinical outcome of affected individuals. Only recently have such problems become feasible computationally. The inherently parallel structure of the problem makes it a perfect candidate for massive parallelization on either grid or cloud architectures. Since we are also dealing with confidential patient data, we were not able to consider a cloud-based solution but had to find a way to process the data in-house and aimed to build a local GPU-based grid structure. Sequential epistatsis calculations were ported to GPU using CUDA at various levels. Parallelization on the CPU was compared to corresponding GPU counterparts with regards to performance and cost. A cost-effective solution was created by combining custom-built nodes equipped with relatively inexpensive consumer-level graphics cards with highly parallel GPUs in a local grid. The GPU method outperforms current cluster-based systems on a price/performance criterion, as a single GPU shows speed performance comparable up to 200 CPU cores. The outlined approach will work for problems that easily lend themselves to massive parallelization. Code for various tasks has been made available and ongoing development of tools will further ease the transition from sequential to parallel algorithms.
CICE, The Los Alamos Sea Ice Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hunke, Elizabeth; Lipscomb, William; Jones, Philip
The Los Alamos sea ice model (CICE) is the result of an effort to develop a computationally efficient sea ice component for a fully coupled atmosphere–land–ocean–ice global climate model. It was originally designed to be compatible with the Parallel Ocean Program (POP), an ocean circulation model developed at Los Alamos National Laboratory for use on massively parallel computers. CICE has several interacting components: a vertical thermodynamic model that computes local growth rates of snow and ice due to vertical conductive, radiative and turbulent fluxes, along with snowfall; an elastic-viscous-plastic model of ice dynamics, which predicts the velocity field of themore » ice pack based on a model of the material strength of the ice; an incremental remapping transport model that describes horizontal advection of the areal concentration, ice and snow volume and other state variables; and a ridging parameterization that transfers ice among thickness categories based on energetic balances and rates of strain. It also includes a biogeochemical model that describes evolution of the ice ecosystem. The CICE sea ice model is used for climate research as one component of complex global earth system models that include atmosphere, land, ocean and biogeochemistry components. It is also used for operational sea ice forecasting in the polar regions and in numerical weather prediction models.« less
Photochemical numerics for global-scale modeling: Fidelity and GCM testing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Elliott, S.; Jim Kao, Chih-Yue; Zhao, X.
1995-03-01
Atmospheric photochemistry lies at the heart of global-scale pollution problems, but it is a nonlinear system embedded in nonlinear transport and so must be modeled in three dimensions. Total earth grids are massive and kinetics require dozens of interacting tracers, taxing supercomputers to their limits in global calculations. A matrix-free and noniterative family scheme is described that permits chemical step sizes an order of magnitude or more larger than time constants for molecular groupings, in the 1-h range used for transport. Families are partitioned through linearized implicit integrations that produce stabilizing species concentrations for a mass-conserving forward solver. The kineticsmore » are also parallelized by moving geographic loops innermost and changes in the continuity equations are automated through list reading. The combination of speed, parallelization and automation renders the programs naturally modular. Accuracy lies within 1% for all species in week-long fidelity tests. A 50-species, 150-reaction stratospheric module tested in a spectral GCM benchmarks at 10 min CPU time per day and agrees with lower-dimensionality simulations. Tropospheric nonmethane hydrocarbon chemistry will soon be added, and inherently three-dimensional phenomena will be investigated both decoupled from dynamics and in a complete chemical GCM. 225 refs., 11 figs., 2 tabs.« less
Continental-scale patterns of canopy tree composition and function across Amazonia.
ter Steege, Hans; Pitman, Nigel C A; Phillips, Oliver L; Chave, Jerome; Sabatier, Daniel; Duque, Alvaro; Molino, Jean-François; Prévost, Marie-Françoise; Spichiger, Rodolphe; Castellanos, Hernán; von Hildebrand, Patricio; Vásquez, Rodolfo
2006-09-28
The world's greatest terrestrial stores of biodiversity and carbon are found in the forests of northern South America, where large-scale biogeographic patterns and processes have recently begun to be described. Seven of the nine countries with territory in the Amazon basin and the Guiana shield have carried out large-scale forest inventories, but such massive data sets have been little exploited by tropical plant ecologists. Although forest inventories often lack the species-level identifications favoured by tropical plant ecologists, their consistency of measurement and vast spatial coverage make them ideally suited for numerical analyses at large scales, and a valuable resource to describe the still poorly understood spatial variation of biomass, diversity, community composition and forest functioning across the South American tropics. Here we show, by using the seven forest inventories complemented with trait and inventory data collected elsewhere, two dominant gradients in tree composition and function across the Amazon, one paralleling a major gradient in soil fertility and the other paralleling a gradient in dry season length. The data set also indicates that the dominance of Fabaceae in the Guiana shield is not necessarily the result of root adaptations to poor soils (nodulation or ectomycorrhizal associations) but perhaps also the result of their remarkably high seed mass there as a potential adaptation to low rates of disturbance.
Novel Scalable 3-D MT Inverse Solver
NASA Astrophysics Data System (ADS)
Kuvshinov, A. V.; Kruglyakov, M.; Geraskin, A.
2016-12-01
We present a new, robust and fast, three-dimensional (3-D) magnetotelluric (MT) inverse solver. As a forward modelling engine a highly-scalable solver extrEMe [1] is used. The (regularized) inversion is based on an iterative gradient-type optimization (quasi-Newton method) and exploits adjoint sources approach for fast calculation of the gradient of the misfit. The inverse solver is able to deal with highly detailed and contrasting models, allows for working (separately or jointly) with any type of MT (single-site and/or inter-site) responses, and supports massive parallelization. Different parallelization strategies implemented in the code allow for optimal usage of available computational resources for a given problem set up. To parameterize an inverse domain a mask approach is implemented, which means that one can merge any subset of forward modelling cells in order to account for (usually) irregular distribution of observation sites. We report results of 3-D numerical experiments aimed at analysing the robustness, performance and scalability of the code. In particular, our computational experiments carried out at different platforms ranging from modern laptops to high-performance clusters demonstrate practically linear scalability of the code up to thousands of nodes. 1. Kruglyakov, M., A. Geraskin, A. Kuvshinov, 2016. Novel accurate and scalable 3-D MT forward solver based on a contracting integral equation method, Computers and Geosciences, in press.
Kenneth Wilson and Lattice QCD
NASA Astrophysics Data System (ADS)
Ukawa, Akira
2015-09-01
We discuss the physics and computation of lattice QCD, a space-time lattice formulation of quantum chromodynamics, and Kenneth Wilson's seminal role in its development. We start with the fundamental issue of confinement of quarks in the theory of the strong interactions, and discuss how lattice QCD provides a framework for understanding this phenomenon. A conceptual issue with lattice QCD is a conflict of space-time lattice with chiral symmetry of quarks. We discuss how this problem is resolved. Since lattice QCD is a non-linear quantum dynamical system with infinite degrees of freedom, quantities which are analytically calculable are limited. On the other hand, it provides an ideal case of massively parallel numerical computations. We review the long and distinguished history of parallel-architecture supercomputers designed and built for lattice QCD. We discuss algorithmic developments, in particular the difficulties posed by the fermionic nature of quarks, and their resolution. The triad of efforts toward better understanding of physics, better algorithms, and more powerful supercomputers have produced major breakthroughs in our understanding of the strong interactions. We review the salient results of this effort in understanding the hadron spectrum, the Cabibbo-Kobayashi-Maskawa matrix elements and CP violation, and quark-gluon plasma at high temperatures. We conclude with a brief summary and a future perspective.
Continental-scale patterns of canopy tree composition and function across Amazonia
NASA Astrophysics Data System (ADS)
Ter Steege, Hans; Pitman, Nigel C. A.; Phillips, Oliver L.; Chave, Jerome; Sabatier, Daniel; Duque, Alvaro; Molino, Jean-François; Prévost, Marie-Françoise; Spichiger, Rodolphe; Castellanos, Hernán; von Hildebrand, Patricio; Vásquez, Rodolfo
2006-09-01
The world's greatest terrestrial stores of biodiversity and carbon are found in the forests of northern South America, where large-scale biogeographic patterns and processes have recently begun to be described. Seven of the nine countries with territory in the Amazon basin and the Guiana shield have carried out large-scale forest inventories, but such massive data sets have been little exploited by tropical plant ecologists. Although forest inventories often lack the species-level identifications favoured by tropical plant ecologists, their consistency of measurement and vast spatial coverage make them ideally suited for numerical analyses at large scales, and a valuable resource to describe the still poorly understood spatial variation of biomass, diversity, community composition and forest functioning across the South American tropics. Here we show, by using the seven forest inventories complemented with trait and inventory data collected elsewhere, two dominant gradients in tree composition and function across the Amazon, one paralleling a major gradient in soil fertility and the other paralleling a gradient in dry season length. The data set also indicates that the dominance of Fabaceae in the Guiana shield is not necessarily the result of root adaptations to poor soils (nodulation or ectomycorrhizal associations) but perhaps also the result of their remarkably high seed mass there as a potential adaptation to low rates of disturbance.
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce
Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng
2016-01-01
The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS – a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing. PMID:27617325
Terascale direct numerical simulations of turbulent combustion using S3D
NASA Astrophysics Data System (ADS)
Chen, J. H.; Choudhary, A.; de Supinski, B.; DeVries, M.; Hawkes, E. R.; Klasky, S.; Liao, W. K.; Ma, K. L.; Mellor-Crummey, J.; Podhorszki, N.; Sankaran, R.; Shende, S.; Yoo, C. S.
2009-01-01
Computational science is paramount to the understanding of underlying processes in internal combustion engines of the future that will utilize non-petroleum-based alternative fuels, including carbon-neutral biofuels, and burn in new combustion regimes that will attain high efficiency while minimizing emissions of particulates and nitrogen oxides. Next-generation engines will likely operate at higher pressures, with greater amounts of dilution and utilize alternative fuels that exhibit a wide range of chemical and physical properties. Therefore, there is a significant role for high-fidelity simulations, direct numerical simulations (DNS), specifically designed to capture key turbulence-chemistry interactions in these relatively uncharted combustion regimes, and in particular, that can discriminate the effects of differences in fuel properties. In DNS, all of the relevant turbulence and flame scales are resolved numerically using high-order accurate numerical algorithms. As a consequence terascale DNS are computationally intensive, require massive amounts of computing power and generate tens of terabytes of data. Recent results from terascale DNS of turbulent flames are presented here, illustrating its role in elucidating flame stabilization mechanisms in a lifted turbulent hydrogen/air jet flame in a hot air coflow, and the flame structure of a fuel-lean turbulent premixed jet flame. Computing at this scale requires close collaborations between computer and combustion scientists to provide optimized scaleable algorithms and software for terascale simulations, efficient collective parallel I/O, tools for volume visualization of multiscale, multivariate data and automating the combustion workflow. The enabling computer science, applied to combustion science, is also required in many other terascale physics and engineering simulations. In particular, performance monitoring is used to identify the performance of key kernels in the DNS code, S3D and especially memory intensive loops in the code. Through the careful application of loop transformations, data reuse in cache is exploited thereby reducing memory bandwidth needs, and hence, improving S3D's nodal performance. To enhance collective parallel I/O in S3D, an MPI-I/O caching design is used to construct a two-stage write-behind method for improving the performance of write-only operations. The simulations generate tens of terabytes of data requiring analysis. Interactive exploration of the simulation data is enabled by multivariate time-varying volume visualization. The visualization highlights spatial and temporal correlations between multiple reactive scalar fields using an intuitive user interface based on parallel coordinates and time histogram. Finally, an automated combustion workflow is designed using Kepler to manage large-scale data movement, data morphing, and archival and to provide a graphical display of run-time diagnostics.
2013-11-21
Fanconi Anemia; Autosomal or Sex Linked Recessive Genetic Disease; Bone Marrow Hematopoiesis Failure, Multiple Congenital Abnormalities, and Susceptibility to Neoplastic Diseases.; Hematopoiesis Maintainance.
2015-11-03
scale optical projection system powered by spatial light modulators, such as digital micro-mirror device ( DMD ). Figure 4 shows the parallel lithography ...1Scientific RepoRts | 5:16192 | DOi: 10.1038/srep16192 www.nature.com/scientificreports High throughput optical lithography by scanning a massive...array of bowtie aperture antennas at near-field X. Wen1,2,3,*, A. Datta1,*, L. M. Traverso1, L. Pan1, X. Xu1 & E. E. Moon4 Optical lithography , the
Parallelization of elliptic solver for solving 1D Boussinesq model
NASA Astrophysics Data System (ADS)
Tarwidi, D.; Adytia, D.
2018-03-01
In this paper, a parallel implementation of an elliptic solver in solving 1D Boussinesq model is presented. Numerical solution of Boussinesq model is obtained by implementing a staggered grid scheme to continuity, momentum, and elliptic equation of Boussinesq model. Tridiagonal system emerging from numerical scheme of elliptic equation is solved by cyclic reduction algorithm. The parallel implementation of cyclic reduction is executed on multicore processors with shared memory architectures using OpenMP. To measure the performance of parallel program, large number of grids is varied from 28 to 214. Two test cases of numerical experiment, i.e. propagation of solitary and standing wave, are proposed to evaluate the parallel program. The numerical results are verified with analytical solution of solitary and standing wave. The best speedup of solitary and standing wave test cases is about 2.07 with 214 of grids and 1.86 with 213 of grids, respectively, which are executed by using 8 threads. Moreover, the best efficiency of parallel program is 76.2% and 73.5% for solitary and standing wave test cases, respectively.
NASA Astrophysics Data System (ADS)
Guo, L.; Huang, H.; Gaston, D.; Redden, G. D.; Fox, D. T.; Fujita, Y.
2010-12-01
Inducing mineral precipitation in the subsurface is one potential strategy for immobilizing trace metal and radionuclide contaminants. Generating mineral precipitates in situ can be achieved by manipulating chemical conditions, typically through injection or in situ generation of reactants. How these reactants transport, mix and react within the medium controls the spatial distribution and composition of the resulting mineral phases. Multiple processes, including fluid flow, dispersive/diffusive transport of reactants, biogeochemical reactions and changes in porosity-permeability, are tightly coupled over a number of scales. Numerical modeling can be used to investigate the nonlinear coupling effects of these processes which are quite challenging to explore experimentally. Many subsurface reactive transport simulators employ a de-coupled or operator-splitting approach where transport equations and batch chemistry reactions are solved sequentially. However, such an approach has limited applicability for biogeochemical systems with fast kinetics and strong coupling between chemical reactions and medium properties. A massively parallel, fully coupled, fully implicit Reactive Transport simulator (referred to as “RAT”) based on a parallel multi-physics object-oriented simulation framework (MOOSE) has been developed at the Idaho National Laboratory. Within this simulator, systems of transport and reaction equations can be solved simultaneously in a fully coupled, fully implicit manner using the Jacobian Free Newton-Krylov (JFNK) method with additional advanced computing capabilities such as (1) physics-based preconditioning for solution convergence acceleration, (2) massively parallel computing and scalability, and (3) adaptive mesh refinements for 2D and 3D structured and unstructured mesh. The simulator was first tested against analytical solutions, then applied to simulating induced calcium carbonate mineral precipitation in 1D columns and 2D flow cells as analogs to homogeneous and heterogeneous porous media, respectively. In 1D columns, calcium carbonate mineral precipitation was driven by urea hydrolysis catalyzed by urease enzyme, and in 2D flow cells, calcium carbonate mineral forming reactants were injected sequentially, forming migrating reaction fronts that are typically highly nonuniform. The RAT simulation results for the spatial and temporal distributions of precipitates, reaction rates and major species in the system, and also for changes in porosity and permeability, were compared to both laboratory experimental data and computational results obtained using other reactive transport simulators. The comparisons demonstrate the ability of RAT to simulate complex nonlinear systems and the advantages of fully coupled approaches, over de-coupled methods, for accurate simulation of complex, dynamic processes such as engineered mineral precipitation in subsurface environments.
Role of APOE Isoforms in the Pathogenesis of TBI Induced Alzheimer’s Disease
2015-10-01
global deletion, APOE targeted replacement, complex breeding, CCI model optimization, mRNA library generation, high throughput massive parallel ...ATP binding cassette transporter A1 (ABCA1) is a lipid transporter that controls the generation of HDL in plasma and ApoE-containing lipoproteins in... parallel sequencing, mRNA-seq, behavioral testing, mem- ory impairement, recovery. 3 Overall Project Summary During the reported period, we have been able
Applications and accuracy of the parallel diagonal dominant algorithm
NASA Technical Reports Server (NTRS)
Sun, Xian-He
1993-01-01
The Parallel Diagonal Dominant (PDD) algorithm is a highly efficient, ideally scalable tridiagonal solver. In this paper, a detailed study of the PDD algorithm is given. First the PDD algorithm is introduced. Then the algorithm is extended to solve periodic tridiagonal systems. A variant, the reduced PDD algorithm, is also proposed. Accuracy analysis is provided for a class of tridiagonal systems, the symmetric, and anti-symmetric Toeplitz tridiagonal systems. Implementation results show that the analysis gives a good bound on the relative error, and the algorithm is a good candidate for the emerging massively parallel machines.
Scalability and Portability of Two Parallel Implementations of ADI
NASA Technical Reports Server (NTRS)
Phung, Thanh; VanderWijngaart, Rob F.
1994-01-01
Two domain decompositions for the implementation of the NAS Scalar Penta-diagonal Parallel Benchmark on MIMD systems are investigated, namely transposition and multi-partitioning. Hardware platforms considered are the Intel iPSC/860 and Paragon XP/S-15, and clusters of SGI workstations on ethernet, communicating through PVM. It is found that the multi-partitioning strategy offers the kind of coarse granularity that allows scaling up to hundreds of processors on a massively parallel machine. Moreover, efficiency is retained when the code is ported verbatim (save message passing syntax) to a PVM environment on a modest size cluster of workstations.
Nuclide Depletion Capabilities in the Shift Monte Carlo Code
Davidson, Gregory G.; Pandya, Tara M.; Johnson, Seth R.; ...
2017-12-21
A new depletion capability has been developed in the Exnihilo radiation transport code suite. This capability enables massively parallel domain-decomposed coupling between the Shift continuous-energy Monte Carlo solver and the nuclide depletion solvers in ORIGEN to perform high-performance Monte Carlo depletion calculations. This paper describes this new depletion capability and discusses its various features, including a multi-level parallel decomposition, high-order transport-depletion coupling, and energy-integrated power renormalization. Several test problems are presented to validate the new capability against other Monte Carlo depletion codes, and the parallel performance of the new capability is analyzed.
NASA Astrophysics Data System (ADS)
Yamamoto, H.; Nakajima, K.; Zhang, K.; Nanai, S.
2015-12-01
Powerful numerical codes that are capable of modeling complex coupled processes of physics and chemistry have been developed for predicting the fate of CO2 in reservoirs as well as its potential impacts on groundwater and subsurface environments. However, they are often computationally demanding for solving highly non-linear models in sufficient spatial and temporal resolutions. Geological heterogeneity and uncertainties further increase the challenges in modeling works. Two-phase flow simulations in heterogeneous media usually require much longer computational time than that in homogeneous media. Uncertainties in reservoir properties may necessitate stochastic simulations with multiple realizations. Recently, massively parallel supercomputers with more than thousands of processors become available in scientific and engineering communities. Such supercomputers may attract attentions from geoscientist and reservoir engineers for solving the large and non-linear models in higher resolutions within a reasonable time. However, for making it a useful tool, it is essential to tackle several practical obstacles to utilize large number of processors effectively for general-purpose reservoir simulators. We have implemented massively-parallel versions of two TOUGH2 family codes (a multi-phase flow simulator TOUGH2 and a chemically reactive transport simulator TOUGHREACT) on two different types (vector- and scalar-type) of supercomputers with a thousand to tens of thousands of processors. After completing implementation and extensive tune-up on the supercomputers, the computational performance was measured for three simulations with multi-million grid models, including a simulation of the dissolution-diffusion-convection process that requires high spatial and temporal resolutions to simulate the growth of small convective fingers of CO2-dissolved water to larger ones in a reservoir scale. The performance measurement confirmed that the both simulators exhibit excellent scalabilities showing almost linear speedup against number of processors up to over ten thousand cores. Generally this allows us to perform coupled multi-physics (THC) simulations on high resolution geologic models with multi-million grid in a practical time (e.g., less than a second per time step).
NASA Astrophysics Data System (ADS)
Aalto, R. E.; Lauer, J. W.; Darby, S. E.; Best, J.; Dietrich, W. E.
2015-12-01
During glacial-marine transgressions vast volumes of sediment are deposited due to the infilling of lowland fluvial systems and shallow shelves, material that is removed during ensuing regressions. Modelling these processes would illuminate system morphodynamics, fluxes, and 'complexity' in response to base level change, yet such problems are computationally formidable. Environmental systems are characterized by strong interconnectivity, yet traditional supercomputers have slow inter-node communication -- whereas rapidly advancing Graphics Processing Unit (GPU) technology offers vastly higher (>100x) bandwidths. GULLEM (GpU-accelerated Lowland Landscape Evolution Model) employs massively parallel code to simulate coupled fluvial-landscape evolution for complex lowland river systems over large temporal and spatial scales. GULLEM models the accommodation space carved/infilled by representing a range of geomorphic processes, including: river & tributary incision within a multi-directional flow regime, non-linear diffusion, glacial-isostatic flexure, hydraulic geometry, tectonic deformation, sediment production, transport & deposition, and full 3D tracking of all resulting stratigraphy. Model results concur with the Holocene dynamics of the Fly River, PNG -- as documented with dated cores, sonar imaging of floodbasin stratigraphy, and the observations of topographic remnants from LGM conditions. Other supporting research was conducted along the Mekong River, the largest fluvial system of the Sunda Shelf. These and other field data provide tantalizing empirical glimpses into the lowland landscapes of large rivers during glacial-interglacial transitions, observations that can be explored with this powerful numerical model. GULLEM affords estimates for the timing and flux budgets within the Fly and Sunda Systems, illustrating complex internal system responses to the external forcing of sea level and climate. Furthermore, GULLEM can be applied to most ANY fluvial system to explore processes across a wide range of temporal and spatial scales. The presentation will provide insights (& many animations) illustrating river morphodynamics & resulting landscapes formed as a result of sea level oscillations. [Image: The incised 3.2e6 km^2 Sundaland domain @ 431ka
NASA Astrophysics Data System (ADS)
Konduri, Aditya
Many natural and engineering systems are governed by nonlinear partial differential equations (PDEs) which result in a multiscale phenomena, e.g. turbulent flows. Numerical simulations of these problems are computationally very expensive and demand for extreme levels of parallelism. At realistic conditions, simulations are being carried out on massively parallel computers with hundreds of thousands of processing elements (PEs). It has been observed that communication between PEs as well as their synchronization at these extreme scales take up a significant portion of the total simulation time and result in poor scalability of codes. This issue is likely to pose a bottleneck in scalability of codes on future Exascale systems. In this work, we propose an asynchronous computing algorithm based on widely used finite difference methods to solve PDEs in which synchronization between PEs due to communication is relaxed at a mathematical level. We show that while stability is conserved when schemes are used asynchronously, accuracy is greatly degraded. Since message arrivals at PEs are random processes, so is the behavior of the error. We propose a new statistical framework in which we show that average errors drop always to first-order regardless of the original scheme. We propose new asynchrony-tolerant schemes that maintain accuracy when synchronization is relaxed. The quality of the solution is shown to depend, not only on the physical phenomena and numerical schemes, but also on the characteristics of the computing machine. A novel algorithm using remote memory access communications has been developed to demonstrate excellent scalability of the method for large-scale computing. Finally, we present a path to extend this method in solving complex multi-scale problems on Exascale machines.
Simulation of an array-based neural net model
NASA Technical Reports Server (NTRS)
Barnden, John A.
1987-01-01
Research in cognitive science suggests that much of cognition involves the rapid manipulation of complex data structures. However, it is very unclear how this could be realized in neural networks or connectionist systems. A core question is: how could the interconnectivity of items in an abstract-level data structure be neurally encoded? The answer appeals mainly to positional relationships between activity patterns within neural arrays, rather than directly to neural connections in the traditional way. The new method was initially devised to account for abstract symbolic data structures, but it also supports cognitively useful spatial analogue, image-like representations. As the neural model is based on massive, uniform, parallel computations over 2D arrays, the massively parallel processor is a convenient tool for simulation work, although there are complications in using the machine to the fullest advantage. An MPP Pascal simulation program for a small pilot version of the model is running.
Computations on the massively parallel processor at the Goddard Space Flight Center
NASA Technical Reports Server (NTRS)
Strong, James P.
1991-01-01
Described are four significant algorithms implemented on the massively parallel processor (MPP) at the Goddard Space Flight Center. Two are in the area of image analysis. Of the other two, one is a mathematical simulation experiment and the other deals with the efficient transfer of data between distantly separated processors in the MPP array. The first algorithm presented is the automatic determination of elevations from stereo pairs. The second algorithm solves mathematical logistic equations capable of producing both ordered and chaotic (or random) solutions. This work can potentially lead to the simulation of artificial life processes. The third algorithm is the automatic segmentation of images into reasonable regions based on some similarity criterion, while the fourth is an implementation of a bitonic sort of data which significantly overcomes the nearest neighbor interconnection constraints on the MPP for transferring data between distant processors.
The CAnadian NIRISS Unbiased Cluster Survey (CANUCS)
NASA Astrophysics Data System (ADS)
Ravindranath, Swara; NIRISS GTO Team
2017-06-01
CANUCS GTO program is a JWST spectroscopy and imaging survey of five massive galaxy clusters and ten parallel fields using the NIRISS low-resolution grisms, NIRCam imaging and NIRSpec multi-object spectroscopy. The primary goal is to understand the evolution of low mass galaxies across cosmic time. The resolved emission line maps and line ratios for many galaxies, with some at resolution of 100pc via the magnification by gravitational lensing will enable determining the spatial distribution of star formation, dust and metals. Other science goals include the detection and characterization of galaxies within the reionization epoch, using multiply-imaged lensed galaxies to constrain cluster mass distributions and dark matter substructure, and understanding star-formation suppression in the most massive galaxy clusters. In this talk I will describe the science goals of the CANUCS program. The proposed prime and parallel observations will be presented with details of the implementation of the observation strategy using JWST proposal planning tools.
The MasPar MP-1 As a Computer Arithmetic Laboratory
Anuta, Michael A.; Lozier, Daniel W.; Turner, Peter R.
1996-01-01
This paper is a blueprint for the use of a massively parallel SIMD computer architecture for the simulation of various forms of computer arithmetic. The particular system used is a DEC/MasPar MP-1 with 4096 processors in a square array. This architecture has many advantages for such simulations due largely to the simplicity of the individual processors. Arithmetic operations can be spread across the processor array to simulate a hardware chip. Alternatively they may be performed on individual processors to allow simulation of a massively parallel implementation of the arithmetic. Compromises between these extremes permit speed-area tradeoffs to be examined. The paper includes a description of the architecture and its features. It then summarizes some of the arithmetic systems which have been, or are to be, implemented. The implementation of the level-index and symmetric level-index, LI and SLI, systems is described in some detail. An extensive bibliography is included. PMID:27805123
You, Yanqin; Sun, Yan; Li, Xuchao; Li, Yali; Wei, Xiaoming; Chen, Fang; Ge, Huijuan; Lan, Zhangzhang; Zhu, Qian; Tang, Ying; Wang, Shujuan; Gao, Ya; Jiang, Fuman; Song, Jiaping; Shi, Quan; Zhu, Xuan; Mu, Feng; Dong, Wei; Gao, Vince; Jiang, Hui; Yi, Xin; Wang, Wei; Gao, Zhiying
2014-08-01
This article demonstrates a prominent noninvasive prenatal approach to assist the clinical diagnosis of a single-gene disorder disease, maple syrup urine disease, using targeted sequencing knowledge from the affected family. The method reported here combines novel mutant discovery in known genes by targeted massively parallel sequencing with noninvasive prenatal testing. By applying this new strategy, we successfully revealed novel mutations in the gene BCKDHA (Ex2_4dup and c.392A>G) in this Chinese family and developed a prenatal haplotype-assisted approach to noninvasively detect the genotype of the fetus (transmitted from both parents). This is the first report of integration of targeted sequencing and noninvasive prenatal testing into clinical practice. Our study has demonstrated that this massively parallel sequencing-based strategy can potentially be used for single-gene disorder diagnosis in the future.
Laurie, Matthew T; Bertout, Jessica A; Taylor, Sean D; Burton, Joshua N; Shendure, Jay A; Bielas, Jason H
2013-08-01
Due to the high cost of failed runs and suboptimal data yields, quantification and determination of fragment size range are crucial steps in the library preparation process for massively parallel sequencing (or next-generation sequencing). Current library quality control methods commonly involve quantification using real-time quantitative PCR and size determination using gel or capillary electrophoresis. These methods are laborious and subject to a number of significant limitations that can make library calibration unreliable. Herein, we propose and test an alternative method for quality control of sequencing libraries using droplet digital PCR (ddPCR). By exploiting a correlation we have discovered between droplet fluorescence and amplicon size, we achieve the joint quantification and size determination of target DNA with a single ddPCR assay. We demonstrate the accuracy and precision of applying this method to the preparation of sequencing libraries.
Hu, Peng; Fabyanic, Emily; Kwon, Deborah Y; Tang, Sheng; Zhou, Zhaolan; Wu, Hao
2017-12-07
Massively parallel single-cell RNA sequencing can precisely resolve cellular diversity in a high-throughput manner at low cost, but unbiased isolation of intact single cells from complex tissues such as adult mammalian brains is challenging. Here, we integrate sucrose-gradient-assisted purification of nuclei with droplet microfluidics to develop a highly scalable single-nucleus RNA-seq approach (sNucDrop-seq), which is free of enzymatic dissociation and nucleus sorting. By profiling ∼18,000 nuclei isolated from cortical tissues of adult mice, we demonstrate that sNucDrop-seq not only accurately reveals neuronal and non-neuronal subtype composition with high sensitivity but also enables in-depth analysis of transient transcriptional states driven by neuronal activity, at single-cell resolution, in vivo. Copyright © 2017 Elsevier Inc. All rights reserved.
ls1 mardyn: The Massively Parallel Molecular Dynamics Code for Large Systems.
Niethammer, Christoph; Becker, Stefan; Bernreuther, Martin; Buchholz, Martin; Eckhardt, Wolfgang; Heinecke, Alexander; Werth, Stephan; Bungartz, Hans-Joachim; Glass, Colin W; Hasse, Hans; Vrabec, Jadran; Horsch, Martin
2014-10-14
The molecular dynamics simulation code ls1 mardyn is presented. It is a highly scalable code, optimized for massively parallel execution on supercomputing architectures and currently holds the world record for the largest molecular simulation with over four trillion particles. It enables the application of pair potentials to length and time scales that were previously out of scope for molecular dynamics simulation. With an efficient dynamic load balancing scheme, it delivers high scalability even for challenging heterogeneous configurations. Presently, multicenter rigid potential models based on Lennard-Jones sites, point charges, and higher-order polarities are supported. Due to its modular design, ls1 mardyn can be extended to new physical models, methods, and algorithms, allowing future users to tailor it to suit their respective needs. Possible applications include scenarios with complex geometries, such as fluids at interfaces, as well as nonequilibrium molecular dynamics simulation of heat and mass transfer.
Gole, Jeff; Gore, Athurva; Richards, Andrew; Chiu, Yu-Jui; Fung, Ho-Lim; Bushman, Diane; Chiang, Hsin-I; Chun, Jerold; Lo, Yu-Hwa; Zhang, Kun
2013-01-01
Genome sequencing of single cells has a variety of applications, including characterizing difficult-to-culture microorganisms and identifying somatic mutations in single cells from mammalian tissues. A major hurdle in this process is the bias in amplifying the genetic material from a single cell, a procedure known as polymerase cloning. Here we describe the microwell displacement amplification system (MIDAS), a massively parallel polymerase cloning method in which single cells are randomly distributed into hundreds to thousands of nanoliter wells and simultaneously amplified for shotgun sequencing. MIDAS reduces amplification bias because polymerase cloning occurs in physically separated nanoliter-scale reactors, facilitating the de novo assembly of near-complete microbial genomes from single E. coli cells. In addition, MIDAS allowed us to detect single-copy number changes in primary human adult neurons at 1–2 Mb resolution. MIDAS will further the characterization of genomic diversity in many heterogeneous cell populations. PMID:24213699
Tsui, Nancy B. Y.; Jiang, Peiyong; Chow, Katherine C. K.; Su, Xiaoxi; Leung, Tak Y.; Sun, Hao; Chan, K. C. Allen; Chiu, Rossa W. K.; Lo, Y. M. Dennis
2012-01-01
Background Fetal DNA in maternal urine, if present, would be a valuable source of fetal genetic material for noninvasive prenatal diagnosis. However, the existence of fetal DNA in maternal urine has remained controversial. The issue is due to the lack of appropriate technology to robustly detect the potentially highly degraded fetal DNA in maternal urine. Methodology We have used massively parallel paired-end sequencing to investigate cell-free DNA molecules in maternal urine. Catheterized urine samples were collected from seven pregnant women during the third trimester of pregnancies. We detected fetal DNA by identifying sequenced reads that contained fetal-specific alleles of the single nucleotide polymorphisms. The sizes of individual urinary DNA fragments were deduced from the alignment positions of the paired reads. We measured the fractional fetal DNA concentration as well as the size distributions of fetal and maternal DNA in maternal urine. Principal Findings Cell-free fetal DNA was detected in five of the seven maternal urine samples, with the fractional fetal DNA concentrations ranged from 1.92% to 4.73%. Fetal DNA became undetectable in maternal urine after delivery. The total urinary cell-free DNA molecules were less intact when compared with plasma DNA. Urinary fetal DNA fragments were very short, and the most dominant fetal sequences were between 29 bp and 45 bp in length. Conclusions With the use of massively parallel sequencing, we have confirmed the existence of transrenal fetal DNA in maternal urine, and have shown that urinary fetal DNA was heavily degraded. PMID:23118982
Energy-efficient STDP-based learning circuits with memristor synapses
NASA Astrophysics Data System (ADS)
Wu, Xinyu; Saxena, Vishal; Campbell, Kristy A.
2014-05-01
It is now accepted that the traditional von Neumann architecture, with processor and memory separation, is ill suited to process parallel data streams which a mammalian brain can efficiently handle. Moreover, researchers now envision computing architectures which enable cognitive processing of massive amounts of data by identifying spatio-temporal relationships in real-time and solving complex pattern recognition problems. Memristor cross-point arrays, integrated with standard CMOS technology, are expected to result in massively parallel and low-power Neuromorphic computing architectures. Recently, significant progress has been made in spiking neural networks (SNN) which emulate data processing in the cortical brain. These architectures comprise of a dense network of neurons and the synapses formed between the axons and dendrites. Further, unsupervised or supervised competitive learning schemes are being investigated for global training of the network. In contrast to a software implementation, hardware realization of these networks requires massive circuit overhead for addressing and individually updating network weights. Instead, we employ bio-inspired learning rules such as the spike-timing-dependent plasticity (STDP) to efficiently update the network weights locally. To realize SNNs on a chip, we propose to use densely integrating mixed-signal integrate-andfire neurons (IFNs) and cross-point arrays of memristors in back-end-of-the-line (BEOL) of CMOS chips. Novel IFN circuits have been designed to drive memristive synapses in parallel while maintaining overall power efficiency (<1 pJ/spike/synapse), even at spike rate greater than 10 MHz. We present circuit design details and simulation results of the IFN with memristor synapses, its response to incoming spike trains and STDP learning characterization.
Massively parallel cis-regulatory analysis in the mammalian central nervous system
Shen, Susan Q.; Myers, Connie A.; Hughes, Andrew E.O.; Byrne, Leah C.; Flannery, John G.; Corbo, Joseph C.
2016-01-01
Cis-regulatory elements (CREs, e.g., promoters and enhancers) regulate gene expression, and variants within CREs can modulate disease risk. Next-generation sequencing has enabled the rapid generation of genomic data that predict the locations of CREs, but a bottleneck lies in functionally interpreting these data. To address this issue, massively parallel reporter assays (MPRAs) have emerged, in which barcoded reporter libraries are introduced into cells, and the resulting barcoded transcripts are quantified by next-generation sequencing. Thus far, MPRAs have been largely restricted to assaying short CREs in a limited repertoire of cultured cell types. Here, we present two advances that extend the biological relevance and applicability of MPRAs. First, we adapt exome capture technology to instead capture candidate CREs, thereby tiling across the targeted regions and markedly increasing the length of CREs that can be readily assayed. Second, we package the library into adeno-associated virus (AAV), thereby allowing delivery to target organs in vivo. As a proof of concept, we introduce a capture library of about 46,000 constructs, corresponding to roughly 3500 DNase I hypersensitive (DHS) sites, into the mouse retina by ex vivo plasmid electroporation and into the mouse cerebral cortex by in vivo AAV injection. We demonstrate tissue-specific cis-regulatory activity of DHSs and provide examples of high-resolution truncation mutation analysis for multiplex parsing of CREs. Our approach should enable massively parallel functional analysis of a wide range of CREs in any organ or species that can be infected by AAV, such as nonhuman primates and human stem cell–derived organoids. PMID:26576614
Massively Parallel Simulations of Diffusion in Dense Polymeric Structures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faulon, Jean-Loup, Wilcox, R.T.
1997-11-01
An original computational technique to generate close-to-equilibrium dense polymeric structures is proposed. Diffusion of small gases are studied on the equilibrated structures using massively parallel molecular dynamics simulations running on the Intel Teraflops (9216 Pentium Pro processors) and Intel Paragon(1840 processors). Compared to the current state-of-the-art equilibration methods this new technique appears to be faster by some orders of magnitude.The main advantage of the technique is that one can circumvent the bottlenecks in configuration space that inhibit relaxation in molecular dynamics simulations. The technique is based on the fact that tetravalent atoms (such as carbon and silicon) fit in themore » center of a regular tetrahedron and that regular tetrahedrons can be used to mesh the three-dimensional space. Thus, the problem of polymer equilibration described by continuous equations in molecular dynamics is reduced to a discrete problem where solutions are approximated by simple algorithms. Practical modeling applications include the constructing of butyl rubber and ethylene-propylene-dimer-monomer (EPDM) models for oxygen and water diffusion calculations. Butyl and EPDM are used in O-ring systems and serve as sealing joints in many manufactured objects. Diffusion coefficients of small gases have been measured experimentally on both polymeric systems, and in general the diffusion coefficients in EPDM are an order of magnitude larger than in butyl. In order to better understand the diffusion phenomena, 10, 000 atoms models were generated and equilibrated for butyl and EPDM. The models were submitted to a massively parallel molecular dynamics simulation to monitor the trajectories of the diffusing species.« less
NASA Astrophysics Data System (ADS)
Heinzeller, Dominikus; Duda, Michael G.; Kunstmann, Harald
2017-04-01
With strong financial and political support from national and international initiatives, exascale computing is projected for the end of this decade. Energy requirements and physical limitations imply the use of accelerators and the scaling out to orders of magnitudes larger numbers of cores then today to achieve this milestone. In order to fully exploit the capabilities of these Exascale computing systems, existing applications need to undergo significant development. The Model for Prediction Across Scales (MPAS) is a novel set of Earth system simulation components and consists of an atmospheric core, an ocean core, a land-ice core and a sea-ice core. Its distinct features are the use of unstructured Voronoi meshes and C-grid discretisation to address shortcomings of global models on regular grids and the use of limited area models nested in a forcing data set, with respect to parallel scalability, numerical accuracy and physical consistency. Here, we present work towards the application of the atmospheric core (MPAS-A) on current and future high performance computing systems for problems at extreme scale. In particular, we address the issue of massively parallel I/O by extending the model to support the highly scalable SIONlib library. Using global uniform meshes with a convection-permitting resolution of 2-3km, we demonstrate the ability of MPAS-A to scale out to half a million cores while maintaining a high parallel efficiency. We also demonstrate the potential benefit of a hybrid parallelisation of the code (MPI/OpenMP) on the latest generation of Intel's Many Integrated Core Architecture, the Intel Xeon Phi Knights Landing.
NASA Astrophysics Data System (ADS)
González, J. A.; Guzmán, F. S.
2018-03-01
We present a method for estimating the velocity of a wandering black hole and the equation of state for the gas around it based on a catalog of numerical simulations. The method uses machine-learning methods based on convolutional neural networks applied to the classification of images resulting from numerical simulations. Specifically we focus on the supersonic velocity regime and choose the direction of the black hole to be parallel to its spin. We build a catalog of 900 simulations by numerically solving Euler's equations onto the fixed space-time background of a black hole, for two parameters: the adiabatic index Γ with values in the range [1.1, 5 /3 ], and the asymptotic relative velocity of the black hole with respect to the surroundings v∞, with values within [0.2 ,0.8 ]c . For each simulation we produce a 2D image of the gas density once the process of accretion has approached a stationary regime. The results obtained show that the implemented convolutional neural networks are able to correctly classify the adiabatic index 87.78% of the time within an uncertainty of ±0.0284 , while the prediction of the velocity is correct 96.67% of the time within an uncertainty of ±0.03 c . We expect that this combination of a massive number of numerical simulations and machine-learning methods will help us analyze more complicated scenarios related to future high-resolution observations of black holes, like those from the Event Horizon Telescope.
NASA Technical Reports Server (NTRS)
Lyster, P. M.; Liewer, P. C.; Decyk, V. K.; Ferraro, R. D.
1995-01-01
A three-dimensional electrostatic particle-in-cell (PIC) plasma simulation code has been developed on coarse-grain distributed-memory massively parallel computers with message passing communications. Our implementation is the generalization to three-dimensions of the general concurrent particle-in-cell (GCPIC) algorithm. In the GCPIC algorithm, the particle computation is divided among the processors using a domain decomposition of the simulation domain. In a three-dimensional simulation, the domain can be partitioned into one-, two-, or three-dimensional subdomains ("slabs," "rods," or "cubes") and we investigate the efficiency of the parallel implementation of the push for all three choices. The present implementation runs on the Intel Touchstone Delta machine at Caltech; a multiple-instruction-multiple-data (MIMD) parallel computer with 512 nodes. We find that the parallel efficiency of the push is very high, with the ratio of communication to computation time in the range 0.3%-10.0%. The highest efficiency (> 99%) occurs for a large, scaled problem with 64(sup 3) particles per processing node (approximately 134 million particles of 512 nodes) which has a push time of about 250 ns per particle per time step. We have also developed expressions for the timing of the code which are a function of both code parameters (number of grid points, particles, etc.) and machine-dependent parameters (effective FLOP rate, and the effective interprocessor bandwidths for the communication of particles and grid points). These expressions can be used to estimate the performance of scaled problems--including those with inhomogeneous plasmas--to other parallel machines once the machine-dependent parameters are known.
Solving very large, sparse linear systems on mesh-connected parallel computers
NASA Technical Reports Server (NTRS)
Opsahl, Torstein; Reif, John
1987-01-01
The implementation of Pan and Reif's Parallel Nested Dissection (PND) algorithm on mesh connected parallel computers is described. This is the first known algorithm that allows very large, sparse linear systems of equations to be solved efficiently in polylog time using a small number of processors. How the processor bound of PND can be matched to the number of processors available on a given parallel computer by slowing down the algorithm by constant factors is described. Also, for the important class of problems where G(A) is a grid graph, a unique memory mapping that reduces the inter-processor communication requirements of PND to those that can be executed on mesh connected parallel machines is detailed. A description of an implementation on the Goodyear Massively Parallel Processor (MPP), located at Goddard is given. Also, a detailed discussion of data mappings and performance issues is given.
Karasick, Michael S.; Strip, David R.
1996-01-01
A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modelling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modelling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modelling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication.
An Overview of Mesoscale Modeling Software for Energetic Materials Research
2010-03-01
12 2.9 Large-scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ...13 Table 10. LAMMPS summary...extensive reviews, lectures and workshops are available on multiscale modeling of materials applications (76-78). • Multi-phase mixtures of
GPU computing in medical physics: a review.
Pratx, Guillem; Xing, Lei
2011-05-01
The graphics processing unit (GPU) has emerged as a competitive platform for computing massively parallel problems. Many computing applications in medical physics can be formulated as data-parallel tasks that exploit the capabilities of the GPU for reducing processing times. The authors review the basic principles of GPU computing as well as the main performance optimization techniques, and survey existing applications in three areas of medical physics, namely image reconstruction, dose calculation and treatment plan optimization, and image processing.
NASA Astrophysics Data System (ADS)
Kjærgaard, Thomas; Baudin, Pablo; Bykov, Dmytro; Eriksen, Janus Juul; Ettenhuber, Patrick; Kristensen, Kasper; Larkin, Jeff; Liakh, Dmitry; Pawłowski, Filip; Vose, Aaron; Wang, Yang Min; Jørgensen, Poul
2017-03-01
We present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide-Expand-Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide-Expand-Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalability of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the "resolution of the identity second-order Møller-Plesset perturbation theory" (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.
NASA Technical Reports Server (NTRS)
Sobieszczanski-Sobieski, Jaroslaw
1998-01-01
The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate a radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimization (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behavior by interaction of a large number of very simple models may be an inspiration for the above algorithms, the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should be now, even though the widespread availability of massively parallel processing is still a few years away.
Massively Multithreaded Maxflow for Image Segmentation on the Cray XMT-2
Bokhari, Shahid H.; Çatalyürek, Ümit V.; Gurcan, Metin N.
2014-01-01
SUMMARY Image segmentation is a very important step in the computerized analysis of digital images. The maxflow mincut approach has been successfully used to obtain minimum energy segmentations of images in many fields. Classical algorithms for maxflow in networks do not directly lend themselves to efficient parallel implementations on contemporary parallel processors. We present the results of an implementation of Goldberg-Tarjan preflow-push algorithm on the Cray XMT-2 massively multithreaded supercomputer. This machine has hardware support for 128 threads in each physical processor, a uniformly accessible shared memory of up to 4 TB and hardware synchronization for each 64 bit word. It is thus well-suited to the parallelization of graph theoretic algorithms, such as preflow-push. We describe the implementation of the preflow-push code on the XMT-2 and present the results of timing experiments on a series of synthetically generated as well as real images. Our results indicate very good performance on large images and pave the way for practical applications of this machine architecture for image analysis in a production setting. The largest images we have run are 320002 pixels in size, which are well beyond the largest previously reported in the literature. PMID:25598745
NASA Technical Reports Server (NTRS)
Sobieszczanski-Sobieski, Jaroslaw
1999-01-01
The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate. A radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimisation (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behaviour by interaction of a large number of very simple models may be an inspiration for the above algorithms; the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should begin now, even though the widespread availability of massively parallel processing is still a few years away.
A massively asynchronous, parallel brain.
Zeki, Semir
2015-05-19
Whether the visual brain uses a parallel or a serial, hierarchical, strategy to process visual signals, the end result appears to be that different attributes of the visual scene are perceived asynchronously--with colour leading form (orientation) by 40 ms and direction of motion by about 80 ms. Whatever the neural root of this asynchrony, it creates a problem that has not been properly addressed, namely how visual attributes that are perceived asynchronously over brief time windows after stimulus onset are bound together in the longer term to give us a unified experience of the visual world, in which all attributes are apparently seen in perfect registration. In this review, I suggest that there is no central neural clock in the (visual) brain that synchronizes the activity of different processing systems. More likely, activity in each of the parallel processing-perceptual systems of the visual brain is reset independently, making of the brain a massively asynchronous organ, just like the new generation of more efficient computers promise to be. Given the asynchronous operations of the brain, it is likely that the results of activities in the different processing-perceptual systems are not bound by physiological interactions between cells in the specialized visual areas, but post-perceptually, outside the visual brain.
NASA Technical Reports Server (NTRS)
Banks, Daniel W.; Laflin, Brenda E. Gile; Kemmerly, Guy T.; Campbell, Bryan A.
1999-01-01
The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate. A radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimisation (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behaviour by interaction of a large number of very simple models may be an inspiration for the above algorithms; the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should begin now, even though the widespread availability of massively parallel processing is still a few years away.
Suppressing correlations in massively parallel simulations of lattice models
NASA Astrophysics Data System (ADS)
Kelling, Jeffrey; Ódor, Géza; Gemming, Sibylle
2017-11-01
For lattice Monte Carlo simulations parallelization is crucial to make studies of large systems and long simulation time feasible, while sequential simulations remain the gold-standard for correlation-free dynamics. Here, various domain decomposition schemes are compared, concluding with one which delivers virtually correlation-free simulations on GPUs. Extensive simulations of the octahedron model for 2 + 1 dimensional Kardar-Parisi-Zhang surface growth, which is very sensitive to correlation in the site-selection dynamics, were performed to show self-consistency of the parallel runs and agreement with the sequential algorithm. We present a GPU implementation providing a speedup of about 30 × over a parallel CPU implementation on a single socket and at least 180 × with respect to the sequential reference.
The remote sensing image segmentation mean shift algorithm parallel processing based on MapReduce
NASA Astrophysics Data System (ADS)
Chen, Xi; Zhou, Liqing
2015-12-01
With the development of satellite remote sensing technology and the remote sensing image data, traditional remote sensing image segmentation technology cannot meet the massive remote sensing image processing and storage requirements. This article put cloud computing and parallel computing technology in remote sensing image segmentation process, and build a cheap and efficient computer cluster system that uses parallel processing to achieve MeanShift algorithm of remote sensing image segmentation based on the MapReduce model, not only to ensure the quality of remote sensing image segmentation, improved split speed, and better meet the real-time requirements. The remote sensing image segmentation MeanShift algorithm parallel processing algorithm based on MapReduce shows certain significance and a realization of value.
Line-drawing algorithms for parallel machines
NASA Technical Reports Server (NTRS)
Pang, Alex T.
1990-01-01
The fact that conventional line-drawing algorithms, when applied directly on parallel machines, can lead to very inefficient codes is addressed. It is suggested that instead of modifying an existing algorithm for a parallel machine, a more efficient implementation can be produced by going back to the invariants in the definition. Popular line-drawing algorithms are compared with two alternatives; distance to a line (a point is on the line if sufficiently close to it) and intersection with a line (a point on the line if an intersection point). For massively parallel single-instruction-multiple-data (SIMD) machines (with thousands of processors and up), the alternatives provide viable line-drawing algorithms. Because of the pixel-per-processor mapping, their performance is independent of the line length and orientation.
Simulating Gravitational Radiation from Binary Black Holes Mergers as LISA Sources
NASA Technical Reports Server (NTRS)
Baker, John
2005-01-01
A viewgraph presentation on the simulation of gravitational waves from Binary Massive Black Holes with LISA observations is shown. The topics include: 1) Massive Black Holes (MBHs); 2) MBH Binaries; 3) Gravitational Wavws from MBH Binaries; 4) Observing with LISA; 5) How LISA sees MBH binary mergers; 6) MBH binary inspirals to LISA; 7) Numerical Relativity Simulations; 8) Numerical Relativity Challenges; 9) Recent Successes; 10) Goddard Team; 11) Binary Black Hole Simulations at Goddard; 12) Goddard Recent Advances; 13) Baker, et al.:GSFC; 13) Starting Farther Out; 14) Comparing Initial Separation; 15) Now with AMR; and 16) Conclusion.
An efficient parallel algorithm for matrix-vector multiplication
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hendrickson, B.; Leland, R.; Plimpton, S.
The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
Massive Open Online Courses (MOOCs): Current Applications and Future Potential
ERIC Educational Resources Information Center
Milheim, William D.
2013-01-01
Massive Open Online Courses (or MOOCs) are the subject of numerous recent articles in "The Chronicle of Higher Education," "The New York Times," and other publications related to their increasing use by a variety of universities to reach large numbers of online students. This article describes the current state of these online…
Rehfeldt, Ruth Anne; Jung, Heidi L; Aguirre, Angelica; Nichols, Jane L; Root, William B
2016-03-01
The e-Transformation in higher education, in which Massive Open Online Courses (MOOCs) are playing a pivotal role, has had an impact on the modality in which behavior analysis is taught. In this paper, we survey the history and implications of online education including MOOCs and describe the implementation and results for the discipline's first MOOC, delivered at Southern Illinois University in spring 2015. Implications for the globalization and free access of higher education are discussed, as well as the parallel between MOOCs and Skinner's teaching machines.
Relativistic N-body simulations with massive neutrinos
NASA Astrophysics Data System (ADS)
Adamek, Julian; Durrer, Ruth; Kunz, Martin
2017-11-01
Some of the dark matter in the Universe is made up of massive neutrinos. Their impact on the formation of large scale structure can be used to determine their absolute mass scale from cosmology, but to this end accurate numerical simulations have to be developed. Due to their relativistic nature, neutrinos pose additional challenges when one tries to include them in N-body simulations that are traditionally based on Newtonian physics. Here we present the first numerical study of massive neutrinos that uses a fully relativistic approach. Our N-body code, gevolution, is based on a weak-field formulation of general relativity that naturally provides a self-consistent framework for relativistic particle species. This allows us to model neutrinos from first principles, without invoking any ad-hoc recipes. Our simulation suite comprises some of the largest neutrino simulations performed to date. We study the effect of massive neutrinos on the nonlinear power spectra and the halo mass function, focusing on the interesting mass range between 0.06 eV and 0.3 eV and including a case for an inverted mass hierarchy.
Massive stars, disks, and clustered star formation
NASA Astrophysics Data System (ADS)
Moeckel, Nickolas Barry
The formation of an isolated massive star is inherently more complex than the relatively well-understood collapse of an isolated, low-mass star. The dense, clustered environment where massive stars are predominantly found further complicates the picture, and suggests that interactions with other stars may play an important role in the early life of these objects. In this thesis we present the results of numerical hydrodynamic experiments investigating interactions between a massive protostar and its lower-mass cluster siblings. We explore the impact of these interactions on the orientation of disks and outflows, which are potentially observable indications of encounters during the formation of a star. We show that these encounters efficiently form eccentric binary systems, and in clusters similar to Orion they occur frequently enough to contribute to the high multiplicity of massive stars. We suggest that the massive protostar in Cepheus A is currently undergoing a series of interactions, and present simulations tailored to that system. We also apply the numerical techniques used in the massive star investigations to a much lower-mass regime, the formation of planetary systems around Solar- mass stars. We perform a small number of illustrative planet-planet scattering experiments, which have been used to explain the eccentricity distribution of extrasolar planets. We add the complication of a remnant gas disk, and show that this feature has the potential to stabilize the system against strong encounters between planets. We present preliminary simulations of Bondi-Hoyle accretion onto a protoplanetary disk, and consider the impact of the flow on the disk properties as well as the impact of the disk on the accretion flow.
A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth; Geveci, Berk
2014-11-01
The evolution of the computing world from teraflop to petaflop has been relatively effortless, with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipelinemore » model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.« less
Global Ocean Prediction with the HYbrid Coordinate Ocean Model, HYCOM
NASA Astrophysics Data System (ADS)
Chassignet, E.
A broad partnership of institutions is collaborating in developing and demonstrating the performance and application of eddy-resolving, real-time global and Atlantic ocean prediction systems using the the HYbrid Coordinate Ocean Model (HYCOM). These systems will be transitioned for operational use by both the U.S. Navy at the Naval Oceanographic Office (NAVOCEANO), Stennis Space Center, MS, and the Fleet Numerical Meteorology and Oceanography Centre (FNMOC), Monterey, CA, and by NOAA at the National Centers for Environmental Prediction (NCEP), Washington, D.C. These systems will run efficiently on a variety of massively parallel computers and will include sophisticated data assimilation techniques for assimilation of satellite altimeter sea surface height and sea surface temperature as well as in situ temperature, salinity, and float displacement. The Partnership addresses the Global Ocean Data Assimilation Experiment (GODAE) goals of three-dimensional (3D) depiction of the ocean state at fine resolution in real-time and provision of boundary conditions for coastal and regional models. An overview of the effort will be presented.
Near-surface coherent structures explored by large eddy simulation of entire tropical cyclones.
Ito, Junshi; Oizumi, Tsutao; Niino, Hiroshi
2017-06-19
Taking advantage of the huge computational power of a massive parallel supercomputer (K-supercomputer), this study conducts large eddy simulations of entire tropical cyclones by employing a numerical weather prediction model, and explores near-surface coherent structures. The maximum of the near-surface wind changes little from that simulated based on coarse-resolution runs. Three kinds of coherent structures appeared inside the boundary layer. The first is a Type-A roll, which is caused by an inflection-point instability of the radial flow and prevails outside the radius of maximum wind. The second is a Type-B roll that also appears to be caused by an inflection-point instability but of both radial and tangential winds. Its roll axis is almost orthogonal to the Type-A roll. The third is a Type-C roll, which occurs inside the radius of maximum wind and only near the surface. It transports horizontal momentum in an up-gradient sense and causes the largest gusts.
Component-based integration of chemistry and optimization software.
Kenny, Joseph P; Benson, Steven J; Alexeev, Yuri; Sarich, Jason; Janssen, Curtis L; McInnes, Lois Curfman; Krishnan, Manojkumar; Nieplocha, Jarek; Jurrus, Elizabeth; Fahlstrom, Carl; Windus, Theresa L
2004-11-15
Typical scientific software designs make rigid assumptions regarding programming language and data structures, frustrating software interoperability and scientific collaboration. Component-based software engineering is an emerging approach to managing the increasing complexity of scientific software. Component technology facilitates code interoperability and reuse. Through the adoption of methodology and tools developed by the Common Component Architecture Forum, we have developed a component architecture for molecular structure optimization. Using the NWChem and Massively Parallel Quantum Chemistry packages, we have produced chemistry components that provide capacity for energy and energy derivative evaluation. We have constructed geometry optimization applications by integrating the Toolkit for Advanced Optimization, Portable Extensible Toolkit for Scientific Computation, and Global Arrays packages, which provide optimization and linear algebra capabilities. We present a brief overview of the component development process and a description of abstract interfaces for chemical optimizations. The components conforming to these abstract interfaces allow the construction of applications using different chemistry and mathematics packages interchangeably. Initial numerical results for the component software demonstrate good performance, and highlight potential research enabled by this platform.
NASA Technical Reports Server (NTRS)
Swift, Daniel W.
1991-01-01
The primary methodology during the grant period has been the use of micro or meso-scale simulations to address specific questions concerning magnetospheric processes related to the aurora and substorm morphology. This approach, while useful in providing some answers, has its limitations. Many of the problems relating to the magnetosphere are inherently global and kinetic. Effort during the last year of the grant period has increasingly focused on development of a global-scale hybrid code to model the entire, coupled magnetosheath - magnetosphere - ionosphere system. In particular, numerical procedures for curvilinear coordinate generation and exactly conservative differencing schemes for hybrid codes in curvilinear coordinates have been developed. The new computer algorithms and the massively parallel computer architectures now make this global code a feasible proposition. Support provided by this project has played an important role in laying the groundwork for the eventual development or a global-scale code to model and forecast magnetospheric weather.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hurley, R. C.; Vorobiev, O. Y.; Ezzedine, S. M.
Here, we present a numerical method for modeling the mechanical effects of nonlinearly-compliant joints in elasto-plastic media. The method uses a series of strain-rate and stress update algorithms to determine joint closure, slip, and solid stress within computational cells containing multiple “embedded” joints. This work facilitates efficient modeling of nonlinear wave propagation in large spatial domains containing a large number of joints that affect bulk mechanical properties. We implement the method within the massively parallel Lagrangian code GEODYN-L and provide verification and examples. We highlight the ability of our algorithms to capture joint interactions and multiple weakness planes within individualmore » computational cells, as well as its computational efficiency. We also discuss the motivation for developing the proposed technique: to simulate large-scale wave propagation during the Source Physics Experiments (SPE), a series of underground explosions conducted at the Nevada National Security Site (NNSS).« less
Hurley, R. C.; Vorobiev, O. Y.; Ezzedine, S. M.
2017-04-06
Here, we present a numerical method for modeling the mechanical effects of nonlinearly-compliant joints in elasto-plastic media. The method uses a series of strain-rate and stress update algorithms to determine joint closure, slip, and solid stress within computational cells containing multiple “embedded” joints. This work facilitates efficient modeling of nonlinear wave propagation in large spatial domains containing a large number of joints that affect bulk mechanical properties. We implement the method within the massively parallel Lagrangian code GEODYN-L and provide verification and examples. We highlight the ability of our algorithms to capture joint interactions and multiple weakness planes within individualmore » computational cells, as well as its computational efficiency. We also discuss the motivation for developing the proposed technique: to simulate large-scale wave propagation during the Source Physics Experiments (SPE), a series of underground explosions conducted at the Nevada National Security Site (NNSS).« less
Iterative Importance Sampling Algorithms for Parameter Estimation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grout, Ray W; Morzfeld, Matthias; Day, Marcus S.
In parameter estimation problems one computes a posterior distribution over uncertain parameters defined jointly by a prior distribution, a model, and noisy data. Markov chain Monte Carlo (MCMC) is often used for the numerical solution of such problems. An alternative to MCMC is importance sampling, which can exhibit near perfect scaling with the number of cores on high performance computing systems because samples are drawn independently. However, finding a suitable proposal distribution is a challenging task. Several sampling algorithms have been proposed over the past years that take an iterative approach to constructing a proposal distribution. We investigate the applicabilitymore » of such algorithms by applying them to two realistic and challenging test problems, one in subsurface flow, and one in combustion modeling. More specifically, we implement importance sampling algorithms that iterate over the mean and covariance matrix of Gaussian or multivariate t-proposal distributions. Our implementation leverages massively parallel computers, and we present strategies to initialize the iterations using 'coarse' MCMC runs or Gaussian mixture models.« less
A Fast Algorithm for Massively Parallel, Long-Term, Simulation of Complex Molecular Dynamics Systems
NASA Technical Reports Server (NTRS)
Jaramillo-Botero, Andres; Goddard, William A, III; Fijany, Amir
1997-01-01
The advances in theory and computing technology over the last decade have led to enormous progress in applying atomistic molecular dynamics (MD) methods to the characterization, prediction, and design of chemical, biological, and material systems,.
Branched Polymers for Enhancing Polymer Gel Strength and Toughness
2013-02-01
Molecular Massively Parallel Simulator ( LAMMPS ) program and the stress-strain relations were calculated with varying strain-rates (figure 6). A...Acronyms ARL U.S. Army Research Laboratory D3 hexamethylcyclotrisiloxane FTIR Fourier transform infrared GPC gel permeation chromatography LAMMPS
DOE Office of Scientific and Technical Information (OSTI.GOV)
SmartImport.py is a Python source-code file that implements a replacement for the standard Python module importer. The code is derived from knee.py, a file in the standard Python diestribution , and adds functionality to improve the performance of Python module imports in massively parallel contexts.
NASA Astrophysics Data System (ADS)
Boyko, Oleksiy; Zheleznyak, Mark
2015-04-01
The original numerical code TOPKAPI-IMMS of the distributed rainfall-runoff model TOPKAPI ( Todini et al, 1996-2014) is developed and implemented in Ukraine. The parallel version of the code has been developed recently to be used on multiprocessors systems - multicore/processors PC and clusters. Algorithm is based on binary-tree decomposition of the watershed for the balancing of the amount of computation for all processors/cores. Message passing interface (MPI) protocol is used as a parallel computing framework. The numerical efficiency of the parallelization algorithms is demonstrated for the case studies for the flood predictions of the mountain watersheds of the Ukrainian Carpathian regions. The modeling results is compared with the predictions based on the lumped parameters models.
NASA Technical Reports Server (NTRS)
Tilton, James C.
1988-01-01
Image segmentation can be a key step in data compression and image analysis. However, the segmentation results produced by most previous approaches to region growing are suspect because they depend on the order in which portions of the image are processed. An iterative parallel segmentation algorithm avoids this problem by performing globally best merges first. Such a segmentation approach, and two implementations of the approach on NASA's Massively Parallel Processor (MPP) are described. Application of the segmentation approach to data compression and image analysis is then described, and results of such application are given for a LANDSAT Thematic Mapper image.
Low profile, highly configurable, current sharing paralleled wide band gap power device power module
McPherson, Brice; Killeen, Peter D.; Lostetter, Alex; Shaw, Robert; Passmore, Brandon; Hornberger, Jared; Berry, Tony M
2016-08-23
A power module with multiple equalized parallel power paths supporting multiple parallel bare die power devices constructed with low inductance equalized current paths for even current sharing and clean switching events. Wide low profile power contacts provide low inductance, short current paths, and large conductor cross section area provides for massive current carrying. An internal gate & source kelvin interconnection substrate is provided with individual ballast resistors and simple bolted construction. Gate drive connectors are provided on either left or right size of the module. The module is configurable as half bridge, full bridge, common source, and common drain topologies.
Progress on the Multiphysics Capabilities of the Parallel Electromagnetic ACE3P Simulation Suite
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kononenko, Oleksiy
2015-03-26
ACE3P is a 3D parallel simulation suite that is being developed at SLAC National Accelerator Laboratory. Effectively utilizing supercomputer resources, ACE3P has become a key tool for the coupled electromagnetic, thermal and mechanical research and design of particle accelerators. Based on the existing finite-element infrastructure, a massively parallel eigensolver is developed for modal analysis of mechanical structures. It complements a set of the multiphysics tools in ACE3P and, in particular, can be used for the comprehensive study of microphonics in accelerating cavities ensuring the operational reliability of a particle accelerator.
Sensitivity Characterization of Pressed Energetic Materials using Flyer Plate Mesoscale Simulations
NASA Astrophysics Data System (ADS)
Rai, Nirmal; Udaykumar, H. S.
Heterogeneous energetic materials like pressed explosives have complicated microstructure and contain various forms of heterogeneities such as pores, micro-cracks, energetic crystals etc. It is widely accepted that the presence of these heterogeneities can affect the sensitivity of these materials under shock load. The interaction of shock load with the microstructural heterogeneities may leads to the formation of local heated regions known as ``hot spots''. Chemical reaction may trigger at the hot spot regions depending on the hot spot temperature and the duration over which the temperature can be maintained before phenomenon like heat conduction, rarefaction waves withdraws energy from it. There are different mechanisms which can lead to the formation of hot spots including void collapse. The current work is focused towards the sensitivity characterization of two HMX based pressed energetic materials using flyer plate mesoscale simulations. The aim of the current work is to develop mesoscale numerical framework which can perform simulations by replicating the laboratory based flyer plate experiments. The current numerical framework uses an image processing approach to represent the microstructural heterogeneities incorporated in a massively parallel Eulerian code SCIMITAR3D. The chemical decomposition of HMX is modeled using Henson-Smilowitz reaction mechanism. The sensitivity characterization is aimed towards obtaining James initiation threshold curve and comparing it with the experimental results.
Optical distributed sensors for feedback control: Characterization of photorefractive resonator
NASA Technical Reports Server (NTRS)
Indebetouw, Guy; Lindner, D. K.
1992-01-01
The aim of the project was to explore, define, and assess the possibilities of optical distributed sensing for feedback control. This type of sensor, which may have some impacts in the dynamic control of deformable structures and the monitoring of small displacements, can be divided into data acquisition, data processing, and control design. Analogue optical techniques, because they are noninvasive and afford massive parallelism may play a significant role in the acquisition and the preprocessing of the data for such a sensor. Assessing these possibilities was the aim of the first stage of this project. The scope of the proposed research was limited to: (1) the characterization of photorefractive resonators and the assessment of their possible use as a distributed optical processing element; and (2) the design of a control system utilizing signals from distributed sensors. The results include a numerical and experimental study of the resonator below threshold, an experimental study of the effect of the resonator's transverse confinement on its dynamics above threshold, a numerical study of the resonator above threshold using a modal expansion approach, and the experimental test of this model. A detailed account of each investigation, including methodology and analysis of the results are also included along with reprints of published and submitted papers.
Bisetti, Fabrizio; Attili, Antonio; Pitsch, Heinz
2014-01-01
Combustion of fossil fuels is likely to continue for the near future due to the growing trends in energy consumption worldwide. The increase in efficiency and the reduction of pollutant emissions from combustion devices are pivotal to achieving meaningful levels of carbon abatement as part of the ongoing climate change efforts. Computational fluid dynamics featuring adequate combustion models will play an increasingly important role in the design of more efficient and cleaner industrial burners, internal combustion engines, and combustors for stationary power generation and aircraft propulsion. Today, turbulent combustion modelling is hindered severely by the lack of data that are accurate and sufficiently complete to assess and remedy model deficiencies effectively. In particular, the formation of pollutants is a complex, nonlinear and multi-scale process characterized by the interaction of molecular and turbulent mixing with a multitude of chemical reactions with disparate time scales. The use of direct numerical simulation (DNS) featuring a state of the art description of the underlying chemistry and physical processes has contributed greatly to combustion model development in recent years. In this paper, the analysis of the intricate evolution of soot formation in turbulent flames demonstrates how DNS databases are used to illuminate relevant physico-chemical mechanisms and to identify modelling needs. PMID:25024412
Biomimetic Models for An Ecological Approach to Massively-Deployed Sensor Networks
NASA Technical Reports Server (NTRS)
Jones, Kennie H.; Lodding, Kenneth N.; Olariu, Stephan; Wilson, Larry; Xin, Chunsheng
2005-01-01
Promises of ubiquitous control of the physical environment by massively-deployed wireless sensor networks open avenues for new applications that will redefine the way we live and work. Due to small size and low cost of sensor devices, visionaries promise systems enabled by deployment of massive numbers of sensors ubiquitous throughout our environment working in concert. Recent research has concentrated on developing techniques for performing relatively simple tasks with minimal energy expense, assuming some form of centralized control. Unfortunately, centralized control is not conducive to parallel activities and does not scale to massive size networks. Execution of simple tasks in sparse networks will not lead to the sophisticated applications predicted. We propose a new way of looking at massively-deployed sensor networks, motivated by lessons learned from the way biological ecosystems are organized. We demonstrate that in such a model, fully distributed data aggregation can be performed in a scalable fashion in massively deployed sensor networks, where motes operate on local information, making local decisions that are aggregated across the network to achieve globally-meaningful effects. We show that such architectures may be used to facilitate communication and synchronization in a fault-tolerant manner, while balancing workload and required energy expenditure throughout the network.
Massive stars in the Sagittarius Dwarf Irregular Galaxy
NASA Astrophysics Data System (ADS)
Garcia, Miriam
2018-02-01
Low metallicity massive stars hold the key to interpret numerous processes in the past Universe including re-ionization, starburst galaxies, high-redshift supernovae, and γ-ray bursts. The Sagittarius Dwarf Irregular Galaxy [SagDIG, 12+log(O/H) = 7.37] represents an important landmark in the quest for analogues accessible with 10-m class telescopes. This Letter presents low-resolution spectroscopy executed with the Gran Telescopio Canarias that confirms that SagDIG hosts massive stars. The observations unveiled three OBA-type stars and one red supergiant candidate. Pending confirmation from high-resolution follow-up studies, these could be the most metal-poor massive stars of the Local Group.
NASA Technical Reports Server (NTRS)
OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)
1998-01-01
This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
A 3-D Finite-Volume Non-hydrostatic Icosahedral Model (NIM)
NASA Astrophysics Data System (ADS)
Lee, Jin
2014-05-01
The Nonhydrostatic Icosahedral Model (NIM) formulates the latest numerical innovation of the three-dimensional finite-volume control volume on the quasi-uniform icosahedral grid suitable for ultra-high resolution simulations. NIM's modeling goal is to improve numerical accuracy for weather and climate simulations as well as to utilize the state-of-art computing architecture such as massive parallel CPUs and GPUs to deliver routine high-resolution forecasts in timely manner. NIM dynamic corel innovations include: * A local coordinate system remapped spherical surface to plane for numerical accuracy (Lee and MacDonald, 2009), * Grid points in a table-driven horizontal loop that allow any horizontal point sequence (A.E. MacDonald, et al., 2010), * Flux-Corrected Transport formulated on finite-volume operators to maintain conservative positive definite transport (J.-L, Lee, ET. Al., 2010), *Icosahedral grid optimization (Wang and Lee, 2011), * All differentials evaluated as three-dimensional finite-volume integrals around the control volume. The three-dimensional finite-volume solver in NIM is designed to improve pressure gradient calculation and orographic precipitation over complex terrain. NIM dynamical core has been successfully verified with various non-hydrostatic benchmark test cases such as internal gravity wave, and mountain waves in Dynamical Cores Model Inter-comparisons Projects (DCMIP). Physical parameterizations suitable for NWP are incorporated into NIM dynamical core and successfully tested with multimonth aqua-planet simulations. Recently, NIM has started real data simulations using GFS initial conditions. Results from the idealized tests as well as real-data simulations will be shown in the conference.
NASA Astrophysics Data System (ADS)
Roubinet, D.; Russian, A.; Dentz, M.; Gouze, P.
2017-12-01
Characterizing and modeling hydrodynamic reactive transport in fractured rock are critical challenges for various research fields and applications including environmental remediation, geological storage, and energy production. To this end, we consider a recently developed time domain random walk (TDRW) approach, which is adapted to reproduce anomalous transport behaviors and capture heterogeneous structural and physical properties. This method is also very well suited to optimize numerical simulations by memory-shared massive parallelization and provide numerical results at various scales. So far, the TDRW approach has been applied for modeling advective-diffusive transport with mass transfer between mobile and immobile regions and simple (theoretical) reactions in heterogeneous porous media represented as single continuum domains. We extend this approach to dual-continuum representations considering a highly permeable fracture network embedded into a poorly permeable rock matrix with heterogeneous geochemical reactions occurring in both geological structures. The resulting numerical model enables us to extend the range of the modeled heterogeneity scales with an accurate representation of solute transport processes and no assumption on the Fickianity of these processes. The proposed model is compared to existing particle-based methods that are usually used to model reactive transport in fractured rocks assuming a homogeneous surrounding matrix, and is used to evaluate the impact of the matrix heterogeneity on the apparent reaction rates for different 2D and 3D simple-to-complex fracture network configurations.
Yang, L. H.; Brooks III, E. D.; Belak, J.
1992-01-01
A molecular dynamics algorithm for performing large-scale simulations using the Parallel C Preprocessor (PCP) programming paradigm on the BBN TC2000, a massively parallel computer, is discussed. The algorithm uses a linked-cell data structure to obtain the near neighbors of each atom as time evoles. Each processor is assigned to a geometric domain containing many subcells and the storage for that domain is private to the processor. Within this scheme, the interdomain (i.e., interprocessor) communication is minimized.
Reconstruction of coded aperture images
NASA Technical Reports Server (NTRS)
Bielefeld, Michael J.; Yin, Lo I.
1987-01-01
Balanced correlation method and the Maximum Entropy Method (MEM) were implemented to reconstruct a laboratory X-ray source as imaged by a Uniformly Redundant Array (URA) system. Although the MEM method has advantages over the balanced correlation method, it is computationally time consuming because of the iterative nature of its solution. Massively Parallel Processing, with its parallel array structure is ideally suited for such computations. These preliminary results indicate that it is possible to use the MEM method in future coded-aperture experiments with the help of the MPP.
Function algorithms for MPP scientific subroutines, volume 1
NASA Technical Reports Server (NTRS)
Gouch, J. G.
1984-01-01
Design documentation and user documentation for function algorithms for the Massively Parallel Processor (MPP) are presented. The contract specifies development of MPP assembler instructions to perform the following functions: natural logarithm; exponential (e to the x power); square root; sine; cosine; and arctangent. To fulfill the requirements of the contract, parallel array and solar implementations for these functions were developed on the PDP11/34 Program Development and Management Unit (PDMU) that is resident at the MPP testbed installation located at the NASA Goddard facility.
GPU-completeness: theory and implications
NASA Astrophysics Data System (ADS)
Lin, I.-Jong
2011-01-01
This paper formalizes a major insight into a class of algorithms that relate parallelism and performance. The purpose of this paper is to define a class of algorithms that trades off parallelism for quality of result (e.g. visual quality, compression rate), and we propose a similar method for algorithmic classification based on NP-Completeness techniques, applied toward parallel acceleration. We will define this class of algorithm as "GPU-Complete" and will postulate the necessary properties of the algorithms for admission into this class. We will also formally relate his algorithmic space and imaging algorithms space. This concept is based upon our experience in the print production area where GPUs (Graphic Processing Units) have shown a substantial cost/performance advantage within the context of HPdelivered enterprise services and commercial printing infrastructure. While CPUs and GPUs are converging in their underlying hardware and functional blocks, their system behaviors are clearly distinct in many ways: memory system design, programming paradigms, and massively parallel SIMD architecture. There are applications that are clearly suited to each architecture: for CPU: language compilation, word processing, operating systems, and other applications that are highly sequential in nature; for GPU: video rendering, particle simulation, pixel color conversion, and other problems clearly amenable to massive parallelization. While GPUs establishing themselves as a second, distinct computing architecture from CPUs, their end-to-end system cost/performance advantage in certain parts of computation inform the structure of algorithms and their efficient parallel implementations. While GPUs are merely one type of architecture for parallelization, we show that their introduction into the design space of printing systems demonstrate the trade-offs against competing multi-core, FPGA, and ASIC architectures. While each architecture has its own optimal application, we believe that the selection of architecture can be defined in terms of properties of GPU-Completeness. For a welldefined subset of algorithms, GPU-Completeness is intended to connect the parallelism, algorithms and efficient architectures into a unified framework to show that multiple layers of parallel implementation are guided by the same underlying trade-off.
How to Build an AppleSeed: A Parallel Macintosh Cluster for Numerically Intensive Computing
NASA Astrophysics Data System (ADS)
Decyk, V. K.; Dauger, D. E.
We have constructed a parallel cluster consisting of a mixture of Apple Macintosh G3 and G4 computers running the Mac OS, and have achieved very good performance on numerically intensive, parallel plasma particle-incell simulations. A subset of the MPI message-passing library was implemented in Fortran77 and C. This library enabled us to port code, without modification, from other parallel processors to the Macintosh cluster. Unlike Unix-based clusters, no special expertise in operating systems is required to build and run the cluster. This enables us to move parallel computing from the realm of experts to the main stream of computing.
Data intensive computing at Sandia.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilson, Andrew T.
2010-09-01
Data-Intensive Computing is parallel computing where you design your algorithms and your software around efficient access and traversal of a data set; where hardware requirements are dictated by data size as much as by desired run times usually distilling compact results from massive data.
Optimization of Particle-in-Cell Codes on RISC Processors
NASA Technical Reports Server (NTRS)
Decyk, Viktor K.; Karmesin, Steve Roy; Boer, Aeint de; Liewer, Paulette C.
1996-01-01
General strategies are developed to optimize particle-cell-codes written in Fortran for RISC processors which are commonly used on massively parallel computers. These strategies include data reorganization to improve cache utilization and code reorganization to improve efficiency of arithmetic pipelines.
Optimized Landing of Autonomous Unmanned Aerial Vehicle Swarms
2012-06-01
understanding about the world. Examples of these emergent behaviors include construction of complex structures (e.g., hives, termite mounds), trends in economic...Sep. 2007. [16] M. Resnick, Turtles, Termites , and Traffic Jams: Explorations in Massively Parallel Microworlds. MIT Press, 1997. [Online]. Available
Constraint-Based Scheduling System
NASA Technical Reports Server (NTRS)
Zweben, Monte; Eskey, Megan; Stock, Todd; Taylor, Will; Kanefsky, Bob; Drascher, Ellen; Deale, Michael; Daun, Brian; Davis, Gene
1995-01-01
Report describes continuing development of software for constraint-based scheduling system implemented eventually on massively parallel computer. Based on machine learning as means of improving scheduling. Designed to learn when to change search strategy by analyzing search progress and learning general conditions under which resource bottleneck occurs.
NASA Technical Reports Server (NTRS)
Sharma, Naveen
1992-01-01
In this paper we briefly describe a combined symbolic and numeric approach for solving mathematical models on parallel computers. An experimental software system, PIER, is being developed in Common Lisp to synthesize computationally intensive and domain formulation dependent phases of finite element analysis (FEA) solution methods. Quantities for domain formulation like shape functions, element stiffness matrices, etc., are automatically derived using symbolic mathematical computations. The problem specific information and derived formulae are then used to generate (parallel) numerical code for FEA solution steps. A constructive approach to specify a numerical program design is taken. The code generator compiles application oriented input specifications into (parallel) FORTRAN77 routines with the help of built-in knowledge of the particular problem, numerical solution methods and the target computer.
The novel high-performance 3-D MT inverse solver
NASA Astrophysics Data System (ADS)
Kruglyakov, Mikhail; Geraskin, Alexey; Kuvshinov, Alexey
2016-04-01
We present novel, robust, scalable, and fast 3-D magnetotelluric (MT) inverse solver. The solver is written in multi-language paradigm to make it as efficient, readable and maintainable as possible. Separation of concerns and single responsibility concepts go through implementation of the solver. As a forward modelling engine a modern scalable solver extrEMe, based on contracting integral equation approach, is used. Iterative gradient-type (quasi-Newton) optimization scheme is invoked to search for (regularized) inverse problem solution, and adjoint source approach is used to calculate efficiently the gradient of the misfit. The inverse solver is able to deal with highly detailed and contrasting models, allows for working (separately or jointly) with any type of MT responses, and supports massive parallelization. Moreover, different parallelization strategies implemented in the code allow optimal usage of available computational resources for a given problem statement. To parameterize an inverse domain the so-called mask parameterization is implemented, which means that one can merge any subset of forward modelling cells in order to account for (usually) irregular distribution of observation sites. We report results of 3-D numerical experiments aimed at analysing the robustness, performance and scalability of the code. In particular, our computational experiments carried out at different platforms ranging from modern laptops to HPC Piz Daint (6th supercomputer in the world) demonstrate practically linear scalability of the code up to thousands of nodes.
Mueller, Jennifer J; Schlappe, Brooke A; Kumar, Rahul; Olvera, Narciso; Dao, Fanny; Abu-Rustum, Nadeem; Aghajanian, Carol; DeLair, Deborah; Hussein, Yaser R; Soslow, Robert A; Levine, Douglas A; Weigelt, Britta
2018-05-21
Mucinous ovarian cancer (MOC) is a rare type of epithelial ovarian cancer resistant to standard chemotherapy regimens. We sought to characterize the repertoire of somatic mutations in MOCs and to define the contribution of massively parallel sequencing to the classification of tumors diagnosed as primary MOCs. Following gynecologic pathology and chart review, DNA samples obtained from primary MOCs and matched normal tissues/blood were subjected to whole-exome (n = 9) or massively parallel sequencing targeting 341 cancer genes (n = 15). Immunohistochemical analysis of estrogen receptor, progesterone receptor, PTEN, ARID1A/BAF250a, and the DNA mismatch (MMR) proteins MSH6 and PMS2 was performed for all cases. Mutational frequencies of MOCs were compared to those of high-grade serous ovarian cancers (HGSOCs) and mucinous tumors from other sites. MOCs were heterogeneous at the genetic level, frequently harboring TP53 (75%) mutations, KRAS (71%) mutations and/or CDKN2A/B homozygous deletions/mutations (33%). Although established criteria for diagnosis were employed, four cases harbored mutational and immunohistochemical profiles similar to those of endometrioid carcinomas, and one case for colorectal or endometrioid carcinoma. Significant differences in the frequencies of KRAS, TP53, CDKN2A, FBXW7, PIK3CA and/or APC mutations between the confirmed primary MOCs (n = 19) and HGSOCs, mucinous gastric and/or mucinous colorectal carcinomas were found, whereas no differences in the 341 genes studied between MOCs and mucinous pancreatic carcinomas were identified. Our findings suggest that the assessment of mutations affecting TP53, KRAS, PIK3CA, ARID1A and POLE, and DNA MMR protein expression may be used to further aid the diagnosis and treatment decision-making of primary MOC. Copyright © 2018 Elsevier Inc. All rights reserved.
Steinberg, Karyn Meltz; Ramachandran, Dhanya; Patel, Viren C; Shetty, Amol C; Cutler, David J; Zwick, Michael E
2012-09-28
Autism spectrum disorder (ASD) is highly heritable, but the genetic risk factors for it remain largely unknown. Although structural variants with large effect sizes may explain up to 15% ASD, genome-wide association studies have failed to uncover common single nucleotide variants with large effects on phenotype. The focus within ASD genetics is now shifting to the examination of rare sequence variants of modest effect, which is most often achieved via exome selection and sequencing. This strategy has indeed identified some rare candidate variants; however, the approach does not capture the full spectrum of genetic variation that might contribute to the phenotype. We surveyed two loci with known rare variants that contribute to ASD, the X-linked neuroligin genes by performing massively parallel Illumina sequencing of the coding and noncoding regions from these genes in males from families with multiplex autism. We annotated all variant sites and functionally tested a subset to identify other rare mutations contributing to ASD susceptibility. We found seven rare variants at evolutionary conserved sites in our study population. Functional analyses of the three 3' UTR variants did not show statistically significant effects on the expression of NLGN3 and NLGN4X. In addition, we identified two NLGN3 intronic variants located within conserved transcription factor binding sites that could potentially affect gene regulation. These data demonstrate the power of massively parallel, targeted sequencing studies of affected individuals for identifying rare, potentially disease-contributing variation. However, they also point out the challenges and limitations of current methods of direct functional testing of rare variants and the difficulties of identifying alleles with modest effects.
2012-01-01
Background Autism spectrum disorder (ASD) is highly heritable, but the genetic risk factors for it remain largely unknown. Although structural variants with large effect sizes may explain up to 15% ASD, genome-wide association studies have failed to uncover common single nucleotide variants with large effects on phenotype. The focus within ASD genetics is now shifting to the examination of rare sequence variants of modest effect, which is most often achieved via exome selection and sequencing. This strategy has indeed identified some rare candidate variants; however, the approach does not capture the full spectrum of genetic variation that might contribute to the phenotype. Methods We surveyed two loci with known rare variants that contribute to ASD, the X-linked neuroligin genes by performing massively parallel Illumina sequencing of the coding and noncoding regions from these genes in males from families with multiplex autism. We annotated all variant sites and functionally tested a subset to identify other rare mutations contributing to ASD susceptibility. Results We found seven rare variants at evolutionary conserved sites in our study population. Functional analyses of the three 3’ UTR variants did not show statistically significant effects on the expression of NLGN3 and NLGN4X. In addition, we identified two NLGN3 intronic variants located within conserved transcription factor binding sites that could potentially affect gene regulation. Conclusions These data demonstrate the power of massively parallel, targeted sequencing studies of affected individuals for identifying rare, potentially disease-contributing variation. However, they also point out the challenges and limitations of current methods of direct functional testing of rare variants and the difficulties of identifying alleles with modest effects. PMID:23020841
PARAVT: Parallel Voronoi tessellation code
NASA Astrophysics Data System (ADS)
González, R. E.
2016-10-01
In this study, we present a new open source code for massive parallel computation of Voronoi tessellations (VT hereafter) in large data sets. The code is focused for astrophysical purposes where VT densities and neighbors are widely used. There are several serial Voronoi tessellation codes, however no open source and parallel implementations are available to handle the large number of particles/galaxies in current N-body simulations and sky surveys. Parallelization is implemented under MPI and VT using Qhull library. Domain decomposition takes into account consistent boundary computation between tasks, and includes periodic conditions. In addition, the code computes neighbors list, Voronoi density, Voronoi cell volume, density gradient for each particle, and densities on a regular grid. Code implementation and user guide are publicly available at https://github.com/regonzar/paravt.
Parallel dispatch: a new paradigm of electrical power system dispatch
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jun Jason; Wang, Fei-Yue; Wang, Qiang
Modern power systems are evolving into sociotechnical systems with massive complexity, whose real-time operation and dispatch go beyond human capability. Thus, the need for developing and applying new intelligent power system dispatch tools are of great practical significance. In this paper, we introduce the overall business model of power system dispatch, the top level design approach of an intelligent dispatch system, and the parallel intelligent technology with its dispatch applications. We expect that a new dispatch paradigm, namely the parallel dispatch, can be established by incorporating various intelligent technologies, especially the parallel intelligent technology, to enable secure operation of complexmore » power grids, extend system operators U+02BC capabilities, suggest optimal dispatch strategies, and to provide decision-making recommendations according to power system operational goals.« less
Parallel Grid Manipulations in Earth Science Calculations
NASA Technical Reports Server (NTRS)
Sawyer, W.; Lucchesi, R.; daSilva, A.; Takacs, L. L.
1999-01-01
The National Aeronautics and Space Administration (NASA) Data Assimilation Office (DAO) at the Goddard Space Flight Center is moving its data assimilation system to massively parallel computing platforms. This parallel implementation of GEOS DAS will be used in the DAO's normal activities, which include reanalysis of data, and operational support for flight missions. Key components of GEOS DAS, including the gridpoint-based general circulation model and a data analysis system, are currently being parallelized. The parallelization of GEOS DAS is also one of the HPCC Grand Challenge Projects. The GEOS-DAS software employs several distinct grids. Some examples are: an observation grid- an unstructured grid of points at which observed or measured physical quantities from instruments or satellites are associated- a highly-structured latitude-longitude grid of points spanning the earth at given latitude-longitude coordinates at which prognostic quantities are determined, and a computational lat-lon grid in which the pole has been moved to a different location to avoid computational instabilities. Each of these grids has a different structure and number of constituent points. In spite of that, there are numerous interactions between the grids, e.g., values on one grid must be interpolated to another, or, in other cases, grids need to be redistributed on the underlying parallel platform. The DAO has designed a parallel integrated library for grid manipulations (PILGRIM) to support the needed grid interactions with maximum efficiency. It offers a flexible interface to generate new grids, define transformations between grids and apply them. Basic communication is currently MPI, however the interfaces defined here could conceivably be implemented with other message-passing libraries, e.g., Cray SHMEM, or with shared-memory constructs. The library is written in Fortran 90. First performance results indicate that even difficult problems, such as above-mentioned pole rotation- a sparse interpolation with little data locality between the physical lat-lon grid and a pole rotated computational grid- can be solved efficiently and at the GFlop/s rates needed to solve tomorrow's high resolution earth science models. In the subsequent presentation we will discuss the design and implementation of PILGRIM as well as a number of the problems it is required to solve. Some conclusions will be drawn about the potential performance of the overall earth science models on the supercomputer platforms foreseen for these problems.
ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.
Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping
2018-04-27
A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
Parallel Algorithms for Computational Models of Geophysical Systems
NASA Astrophysics Data System (ADS)
Carrillo Ledesma, A.; Herrera, I.; de la Cruz, L. M.; Hernández, G.; Grupo de Modelacion Matematica y Computacional
2013-05-01
Mathematical models of many systems of interest, including very important continuous systems of Earth Sciences and Engineering, lead to a great variety of partial differential equations (PDEs) whose solution methods are based on the computational processing of large-scale algebraic systems. Furthermore, the incredible expansion experienced by the existing computational hardware and software has made amenable to effective treatment problems of an ever increasing diversity and complexity, posed by scientific and engineering applications. Parallel computing is outstanding among the new computational tools and, in order to effectively use the most advanced computers available today, massively parallel software is required. Domain decomposition methods (DDMs) have been developed precisely for effectively treating PDEs in paralle. Ideally, the main objective of domain decomposition research is to produce algorithms capable of 'obtaining the global solution by exclusively solving local problems', but up-to-now this has only been an aspiration; that is, a strong desire for achieving such a property and so we call it 'the DDM-paradigm'. In recent times, numerically competitive DDM-algorithms are non-overlapping, preconditioned and necessarily incorporate constraints which pose an additional challenge for achieving the DDM-paradigm. Recently a group of four algorithms, referred to as the 'DVS-algorithms', which fulfill the DDM-paradigm, was developed. To derive them a new discretization method, which uses a non-overlapping system of nodes (the derived-nodes), was introduced. This discretization procedure can be applied to any boundary-value problem, or system of such equations. In turn, the resulting system of discrete equations can be treated using any available DDM-algorithm. In particular, two of the four DVS-algorithms mentioned above were obtained by application of the well-known and very effective algorithms BDDC and FETI-DP; these will be referred to as the DVS-BDDC and DVS-FETI-DP algorithms. The other two, which will be referred to as the DVS-PRIMAL and DVS-DUAL algorithms, were obtained by application of two new algorithms that had not been previously reported in the literature. As said before, the four DVS-algorithms constitute a group of preconditioned and constrained algorithms that, for the first time, fulfill the DDM-paradigm. Both, BDDC and FETI-DP, are very well-known; and both are highly efficient. Recently, it was established that these two methods are closely related and its numerical performance is quite similar. On the other hand, through numerical experiments, we have established that the numerical performances of each one of the members of DVS-algorithms group (DVS-BDDC, DVS-FETI-DP, DVS-PRIMAL and DVS-DUAL) are very similar too. Furthermore, we have carried out comparisons of the performances of the standard versions of BDDC and FETI-DP with DVS-BDDC and DVS-FETI-DP, and in all such numerical experiments the DVS algorithms have performed significantly better.
Kjaergaard, Thomas; Baudin, Pablo; Bykov, Dmytro; ...
2016-11-16
Here, we present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide–Expand–Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide–Expand–Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalabilitymore » of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the “resolution of the identity second-order Moller–Plesset perturbation theory” (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.« less
GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit
Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik
2013-01-01
Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358
Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.
Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias
2011-01-01
The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.
Sewage Reflects the Distriubtion of Human Faecal Lachnospiraceae
Faecal pollution contains a rich and diverse community of bacteria derived from animals and humans,many of which might serve as alternatives to the traditional enterococci and Escherichia coli faecal indicators. We used massively parallel sequencing (MPS)of the 16S rRNA gene to ...
Genetics Home Reference: medullary cystic kidney disease type 1
... They lead to the production of an altered protein. It is unclear how this change causes kidney disease. ... cystic kidney disease type 1 lie in a large VNTR in MUC1 missed by massively parallel sequencing. Nat Genet. 2013 Mar;45(3):299-303. ...
Software Applications on the Peregrine System | High-Performance Computing
programming and optimization. Gaussian Chemistry Program for calculating molecular electronic structure and Materials Science Open-source classical molecular dynamics program designed for massively parallel systems framework Q-Chem Chemistry ab initio quantum chemistry package for predictin molecular structures
Flow cytometry for enrichment and titration in massively parallel DNA sequencing
Sandberg, Julia; Ståhl, Patrik L.; Ahmadian, Afshin; Bjursell, Magnus K.; Lundeberg, Joakim
2009-01-01
Massively parallel DNA sequencing is revolutionizing genomics research throughout the life sciences. However, the reagent costs and labor requirements in current sequencing protocols are still substantial, although improvements are continuously being made. Here, we demonstrate an effective alternative to existing sample titration protocols for the Roche/454 system using Fluorescence Activated Cell Sorting (FACS) technology to determine the optimal DNA-to-bead ratio prior to large-scale sequencing. Our method, which eliminates the need for the costly pilot sequencing of samples during titration is capable of rapidly providing accurate DNA-to-bead ratios that are not biased by the quantification and sedimentation steps included in current protocols. Moreover, we demonstrate that FACS sorting can be readily used to highly enrich fractions of beads carrying template DNA, with near total elimination of empty beads and no downstream sacrifice of DNA sequencing quality. Automated enrichment by FACS is a simple approach to obtain pure samples for bead-based sequencing systems, and offers an efficient, low-cost alternative to current enrichment protocols. PMID:19304748
2017-01-01
Amplicon (targeted) sequencing by massively parallel sequencing (PCR-MPS) is a potential method for use in forensic DNA analyses. In this application, PCR-MPS may supplement or replace other instrumental analysis methods such as capillary electrophoresis and Sanger sequencing for STR and mitochondrial DNA typing, respectively. PCR-MPS also may enable the expansion of forensic DNA analysis methods to include new marker systems such as single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) that currently are assayable using various instrumental analysis methods including microarray and quantitative PCR. Acceptance of PCR-MPS as a forensic method will depend in part upon developing protocols and criteria that define the limitations of a method, including a defensible analytical threshold or method detection limit. This paper describes an approach to establish objective analytical thresholds suitable for multiplexed PCR-MPS methods. A definition is proposed for PCR-MPS method background noise, and an analytical threshold based on background noise is described. PMID:28542338
Automation of a Wave-Optics Simulation and Image Post-Processing Package on Riptide
NASA Astrophysics Data System (ADS)
Werth, M.; Lucas, J.; Thompson, D.; Abercrombie, M.; Holmes, R.; Roggemann, M.
Detailed wave-optics simulations and image post-processing algorithms are computationally expensive and benefit from the massively parallel hardware available at supercomputing facilities. We created an automated system that interfaces with the Maui High Performance Computing Center (MHPCC) Distributed MATLAB® Portal interface to submit massively parallel waveoptics simulations to the IBM iDataPlex (Riptide) supercomputer. This system subsequently postprocesses the output images with an improved version of physically constrained iterative deconvolution (PCID) and analyzes the results using a series of modular algorithms written in Python. With this architecture, a single person can simulate thousands of unique scenarios and produce analyzed, archived, and briefing-compatible output products with very little effort. This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views, opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.
Scudder, Nathan; McNevin, Dennis; Kelty, Sally F; Walsh, Simon J; Robertson, James
2018-03-01
Use of DNA in forensic science will be significantly influenced by new technology in coming years. Massively parallel sequencing and forensic genomics will hasten the broadening of forensic DNA analysis beyond short tandem repeats for identity towards a wider array of genetic markers, in applications as diverse as predictive phenotyping, ancestry assignment, and full mitochondrial genome analysis. With these new applications come a range of legal and policy implications, as forensic science touches on areas as diverse as 'big data', privacy and protected health information. Although these applications have the potential to make a more immediate and decisive forensic intelligence contribution to criminal investigations, they raise policy issues that will require detailed consideration if this potential is to be realised. The purpose of this paper is to identify the scope of the issues that will confront forensic and user communities. Copyright © 2017 The Chartered Society of Forensic Sciences. All rights reserved.
Young, Brian; King, Jonathan L; Budowle, Bruce; Armogida, Luigi
2017-01-01
Amplicon (targeted) sequencing by massively parallel sequencing (PCR-MPS) is a potential method for use in forensic DNA analyses. In this application, PCR-MPS may supplement or replace other instrumental analysis methods such as capillary electrophoresis and Sanger sequencing for STR and mitochondrial DNA typing, respectively. PCR-MPS also may enable the expansion of forensic DNA analysis methods to include new marker systems such as single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) that currently are assayable using various instrumental analysis methods including microarray and quantitative PCR. Acceptance of PCR-MPS as a forensic method will depend in part upon developing protocols and criteria that define the limitations of a method, including a defensible analytical threshold or method detection limit. This paper describes an approach to establish objective analytical thresholds suitable for multiplexed PCR-MPS methods. A definition is proposed for PCR-MPS method background noise, and an analytical threshold based on background noise is described.
Massively parallel de novo protein design for targeted therapeutics.
Chevalier, Aaron; Silva, Daniel-Adriano; Rocklin, Gabriel J; Hicks, Derrick R; Vergara, Renan; Murapa, Patience; Bernard, Steffen M; Zhang, Lu; Lam, Kwok-Ho; Yao, Guorui; Bahl, Christopher D; Miyashita, Shin-Ichiro; Goreshnik, Inna; Fuller, James T; Koday, Merika T; Jenkins, Cody M; Colvin, Tom; Carter, Lauren; Bohn, Alan; Bryan, Cassie M; Fernández-Velasco, D Alejandro; Stewart, Lance; Dong, Min; Huang, Xuhui; Jin, Rongsheng; Wilson, Ian A; Fuller, Deborah H; Baker, David
2017-10-05
De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37-43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing.
Massively parallel de novo protein design for targeted therapeutics
NASA Astrophysics Data System (ADS)
Chevalier, Aaron; Silva, Daniel-Adriano; Rocklin, Gabriel J.; Hicks, Derrick R.; Vergara, Renan; Murapa, Patience; Bernard, Steffen M.; Zhang, Lu; Lam, Kwok-Ho; Yao, Guorui; Bahl, Christopher D.; Miyashita, Shin-Ichiro; Goreshnik, Inna; Fuller, James T.; Koday, Merika T.; Jenkins, Cody M.; Colvin, Tom; Carter, Lauren; Bohn, Alan; Bryan, Cassie M.; Fernández-Velasco, D. Alejandro; Stewart, Lance; Dong, Min; Huang, Xuhui; Jin, Rongsheng; Wilson, Ian A.; Fuller, Deborah H.; Baker, David
2017-10-01
De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37-43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing.
Buenrostro, Jason D.; Chircus, Lauren M.; Araya, Carlos L.; Layton, Curtis J.; Chang, Howard Y.; Snyder, Michael P.; Greenleaf, William J.
2015-01-01
RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of MS2 coat protein to >107 RNA targets generated on a flow-cell surface by in situ transcription and inter-molecular tethering of RNA to DNA. We decompose the binding energy contributions from primary and secondary RNA structure, finding that differences in affinity are often driven by sequence-specific changes in association rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis, and a long-hypothesized structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNAMaP) relationships across molecular variants. PMID:24727714
Massively parallel de novo protein design for targeted therapeutics
Chevalier, Aaron; Silva, Daniel-Adriano; Rocklin, Gabriel J.; Hicks, Derrick R.; Vergara, Renan; Murapa, Patience; Bernard, Steffen M.; Zhang, Lu; Lam, Kwok-Ho; Yao, Guorui; Bahl, Christopher D.; Miyashita, Shin-Ichiro; Goreshnik, Inna; Fuller, James T.; Koday, Merika T.; Jenkins, Cody M.; Colvin, Tom; Carter, Lauren; Bohn, Alan; Bryan, Cassie M.; Fernández-Velasco, D. Alejandro; Stewart, Lance; Dong, Min; Huang, Xuhui; Jin, Rongsheng; Wilson, Ian A.; Fuller, Deborah H.; Baker, David
2018-01-01
De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37–43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing. PMID:28953867
Bailey, Jeffrey A; Mvalo, Tisungane; Aragam, Nagesh; Weiser, Matthew; Congdon, Seth; Kamwendo, Debbie; Martinson, Francis; Hoffman, Irving; Meshnick, Steven R; Juliano, Jonathan J
2012-08-15
The development of an effective malaria vaccine has been hampered by the genetic diversity of commonly used target antigens. This diversity has led to concerns about allele-specific immunity limiting the effectiveness of vaccines. Despite extensive genetic diversity of circumsporozoite protein (CS), the most successful malaria vaccine is RTS/S, a monovalent CS vaccine. By use of massively parallel pyrosequencing, we evaluated the diversity of CS haplotypes across the T-cell epitopes in parasites from Lilongwe, Malawi. We identified 57 unique parasite haplotypes from 100 participants. By use of ecological and molecular indexes of diversity, we saw no difference in the diversity of CS haplotypes between adults and children. We saw evidence of weak variant-specific selection within this region of CS, suggesting naturally acquired immunity does induce variant-specific selection on CS. Therefore, the impact of CS vaccines on variant frequencies with widespread implementation of vaccination requires further study.
Drögemüller, Cord; Tetens, Jens; Sigurdsson, Snaevar; Gentile, Arcangelo; Testoni, Stefania; Lindblad-Toh, Kerstin; Leeb, Tosso
2010-01-01
Arachnomelia is a monogenic recessive defect of skeletal development in cattle. The causative mutation was previously mapped to a ∼7 Mb interval on chromosome 5. Here we show that array-based sequence capture and massively parallel sequencing technology, combined with the typical family structure in livestock populations, facilitates the identification of the causative mutation. We re-sequenced the entire critical interval in a healthy partially inbred cow carrying one copy of the critical chromosome segment in its ancestral state and one copy of the same segment with the arachnomelia mutation, and we detected a single heterozygous position. The genetic makeup of several partially inbred cattle provides extremely strong support for the causality of this mutation. The mutation represents a single base insertion leading to a premature stop codon in the coding sequence of the SUOX gene and is perfectly associated with the arachnomelia phenotype. Our findings suggest an important role for sulfite oxidase in bone development. PMID:20865119
NASA Astrophysics Data System (ADS)
Brockmann, J. M.; Schuh, W.-D.
2011-07-01
The estimation of the global Earth's gravity field parametrized as a finite spherical harmonic series is computationally demanding. The computational effort depends on the one hand on the maximal resolution of the spherical harmonic expansion (i.e. the number of parameters to be estimated) and on the other hand on the number of observations (which are several millions for e.g. observations from the GOCE satellite missions). To circumvent these restrictions, a massive parallel software based on high-performance computing (HPC) libraries as ScaLAPACK, PBLAS and BLACS was designed in the context of GOCE HPF WP6000 and the GOCO consortium. A prerequisite for the use of these libraries is that all matrices are block-cyclic distributed on a processor grid comprised by a large number of (distributed memory) computers. Using this set of standard HPC libraries has the benefit that once the matrices are distributed across the computer cluster, a huge set of efficient and highly scalable linear algebra operations can be used.
Track finding in ATLAS using GPUs
NASA Astrophysics Data System (ADS)
Mattmann, J.; Schmitt, C.
2012-12-01
The reconstruction and simulation of collision events is a major task in modern HEP experiments involving several ten thousands of standard CPUs. On the other hand the graphics processors (GPUs) have become much more powerful and are by far outperforming the standard CPUs in terms of floating point operations due to their massive parallel approach. The usage of these GPUs could therefore significantly reduce the overall reconstruction time per event or allow for the usage of more sophisticated algorithms. In this paper the track finding in the ATLAS experiment will be used as an example on how the GPUs can be used in this context: the implementation on the GPU requires a change in the algorithmic flow to allow the code to work in the rather limited environment on the GPU in terms of memory, cache, and transfer speed from and to the GPU and to make use of the massive parallel computation. Both, the specific implementation of parts of the ATLAS track reconstruction chain and the performance improvements obtained will be discussed.
3D numerical simulations of multiphase continental rifting
NASA Astrophysics Data System (ADS)
Naliboff, J.; Glerum, A.; Brune, S.
2017-12-01
Observations of rifted margin architecture suggest continental breakup occurs through multiple phases of extension with distinct styles of deformation. The initial rifting stages are often characterized by slow extension rates and distributed normal faulting in the upper crust decoupled from deformation in the lower crust and mantle lithosphere. Further rifting marks a transition to higher extension rates and coupling between the crust and mantle lithosphere, with deformation typically focused along large-scale detachment faults. Significantly, recent detailed reconstructions and high-resolution 2D numerical simulations suggest that rather than remaining focused on a single long-lived detachment fault, deformation in this phase may progress toward lithospheric breakup through a complex process of fault interaction and development. The numerical simulations also suggest that an initial phase of distributed normal faulting can play a key role in the development of these complex fault networks and the resulting finite deformation patterns. Motivated by these findings, we will present 3D numerical simulations of continental rifting that examine the role of temporal increases in extension velocity on rifted margin structure. The numerical simulations are developed with the massively parallel finite-element code ASPECT. While originally designed to model mantle convection using advanced solvers and adaptive mesh refinement techniques, ASPECT has been extended to model visco-plastic deformation that combines a Drucker Prager yield criterion with non-linear dislocation and diffusion creep. To promote deformation localization, the internal friction angle and cohesion weaken as a function of accumulated plastic strain. Rather than prescribing a single zone of weakness to initiate deformation, an initial random perturbation of the plastic strain field combined with rapid strain weakening produces distributed normal faulting at relatively slow rates of extension in both 2D and 3D simulations. Our presentation will focus on both the numerical assumptions required to produce these results and variations in 3D rifted margin architecture arising from a transition from slow to rapid rates of extension.
Karasick, M.S.; Strip, D.R.
1996-01-30
A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modeling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modeling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modeling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication. 8 figs.
Parallel language constructs for tensor product computations on loosely coupled architectures
NASA Technical Reports Server (NTRS)
Mehrotra, Piyush; Vanrosendale, John
1989-01-01
Distributed memory architectures offer high levels of performance and flexibility, but have proven awkard to program. Current languages for nonshared memory architectures provide a relatively low level programming environment, and are poorly suited to modular programming, and to the construction of libraries. A set of language primitives designed to allow the specification of parallel numerical algorithms at a higher level is described. Tensor product array computations are focused on along with a simple but important class of numerical algorithms. The problem of programming 1-D kernal routines is focused on first, such as parallel tridiagonal solvers, and then how such parallel kernels can be combined to form parallel tensor product algorithms is examined.
Parallel processing in finite element structural analysis
NASA Technical Reports Server (NTRS)
Noor, Ahmed K.
1987-01-01
A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).
Parallel k-means++ for Multiple Shared-Memory Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mackey, Patrick S.; Lewis, Robert R.
2016-09-22
In recent years k-means++ has become a popular initialization technique for improved k-means clustering. To date, most of the work done to improve its performance has involved parallelizing algorithms that are only approximations of k-means++. In this paper we present a parallelization of the exact k-means++ algorithm, with a proof of its correctness. We develop implementations for three distinct shared-memory architectures: multicore CPU, high performance GPU, and the massively multithreaded Cray XMT platform. We demonstrate the scalability of the algorithm on each platform. In addition we present a visual approach for showing which platform performed k-means++ the fastest for varyingmore » data sizes.« less
NASA Astrophysics Data System (ADS)
Shahzad, M.; Rizvi, H.; Panwar, A.; Ryu, C. M.
2017-06-01
We have re-visited the existence criterion of the reverse shear Alfven eigenmodes (RSAEs) in the presence of the parallel equilibrium current by numerically solving the eigenvalue equation using a fast eigenvalue solver code KAES. The parallel equilibrium current can bring in the kink effect and is known to be strongly unfavorable for the RSAE. We have numerically estimated the critical value of the toroidicity factor Qtor in a circular tokamak plasma, above which RSAEs can exist, and compared it to the analytical one. The difference between the numerical and analytical critical values is small for low frequency RSAEs, but it increases as the frequency of the mode increases, becoming greater for higher poloidal harmonic modes.
NASA Technical Reports Server (NTRS)
Raju, Manthena S.
1998-01-01
Sprays occur in a wide variety of industrial and power applications and in the processing of materials. A liquid spray is a phase flow with a gas as the continuous phase and a liquid as the dispersed phase (in the form of droplets or ligaments). Interactions between the two phases, which are coupled through exchanges of mass, momentum, and energy, can occur in different ways at different times and locations involving various thermal, mass, and fluid dynamic factors. An understanding of the flow, combustion, and thermal properties of a rapidly vaporizing spray requires careful modeling of the rate-controlling processes associated with the spray's turbulent transport, mixing, chemical kinetics, evaporation, and spreading rates, as well as other phenomena. In an attempt to advance the state-of-the-art in multidimensional numerical methods, we at the NASA Lewis Research Center extended our previous work on sprays to unstructured grids and parallel computing. LSPRAY, which was developed by M.S. Raju of Nyma, Inc., is designed to be massively parallel and could easily be coupled with any existing gas-phase flow and/or Monte Carlo probability density function (PDF) solver. The LSPRAY solver accommodates the use of an unstructured mesh with mixed triangular, quadrilateral, and/or tetrahedral elements in the gas-phase solvers. It is used specifically for fuel sprays within gas turbine combustors, but it has many other uses. The spray model used in LSPRAY provided favorable results when applied to stratified-charge rotary combustion (Wankel) engines and several other confined and unconfined spray flames. The source code will be available with the National Combustion Code (NCC) as a complete package.
Password Cracking Using Sony Playstations
NASA Astrophysics Data System (ADS)
Kleinhans, Hugo; Butts, Jonathan; Shenoi, Sujeet
Law enforcement agencies frequently encounter encrypted digital evidence for which the cryptographic keys are unknown or unavailable. Password cracking - whether it employs brute force or sophisticated cryptanalytic techniques - requires massive computational resources. This paper evaluates the benefits of using the Sony PlayStation 3 (PS3) to crack passwords. The PS3 offers massive computational power at relatively low cost. Moreover, multiple PS3 systems can be introduced easily to expand parallel processing when additional power is needed. This paper also describes a distributed framework designed to enable law enforcement agents to crack encrypted archives and applications in an efficient and cost-effective manner.
A massively asynchronous, parallel brain
Zeki, Semir
2015-01-01
Whether the visual brain uses a parallel or a serial, hierarchical, strategy to process visual signals, the end result appears to be that different attributes of the visual scene are perceived asynchronously—with colour leading form (orientation) by 40 ms and direction of motion by about 80 ms. Whatever the neural root of this asynchrony, it creates a problem that has not been properly addressed, namely how visual attributes that are perceived asynchronously over brief time windows after stimulus onset are bound together in the longer term to give us a unified experience of the visual world, in which all attributes are apparently seen in perfect registration. In this review, I suggest that there is no central neural clock in the (visual) brain that synchronizes the activity of different processing systems. More likely, activity in each of the parallel processing-perceptual systems of the visual brain is reset independently, making of the brain a massively asynchronous organ, just like the new generation of more efficient computers promise to be. Given the asynchronous operations of the brain, it is likely that the results of activities in the different processing-perceptual systems are not bound by physiological interactions between cells in the specialized visual areas, but post-perceptually, outside the visual brain. PMID:25823871
3D Numerical Rift Modeling with Application to the East African Rift System
NASA Astrophysics Data System (ADS)
Glerum, A.; Brune, S.; Naliboff, J.
2017-12-01
As key components of plate tectonics, continental rifting and the formation of passive margins have been extensively studied with both analogue models and numerical techniques. Only recently however, technical advances have enabled numerical investigations into rift evolution in three dimensions, as is actually required for including those processes that cause rift-parallel variability, such as structural inheritance and oblique extension (Brune 2016). We use the massively parallel finite element code ASPECT (Kronbichler et al. 2012; Heister et al. 2017) to investigate rift evolution. ASPECT's adaptive mesh refinement enables us to focus resolution on the regions of interest (i.e. the rift center), while leaving other areas such as the asthenospheric mantle at coarse resolution, leading to kilometer-scale local mesh resolution in 3D. Furthermore, we implemented plastic and viscous strain weakening of the nonlinear viscoplastic rheology required to develop asymmetric rift geometries (e.g. Huismans and Beaumont 2003). Additionally created plugins to ASPECT allow us to specify initial temperature and composition conditions based on geophysical data (e.g. LITHO1.0, Pasyanos et al. 2014) or to prescribe more general along-strike variation in the initial strain seeding the rift. Employing the above functionality, we construct regional models of the East African Rift System (EARS), the world's largest currently active rift. As the EARS is characterized by both orthogonal and oblique rift sections, multi-phase extension histories as well as magmatic and a-magmatic branches (e.g. Chorowicz 2005; Ebinger and Scholz 2011), it constitutes an extensive natural laboratory for our research into the 3D nature of continental rifting. References:Brune, S. (2016), in Plate boundaries and natural hazards, AGU Geophysical Monograph 219, J. C. Duarte and W. P. Schellart (Eds.). Chorowicz, J. (2005). J. Afr. Earth Sci., 43, 379-410. Ebinger, C. and Scholz, C. A. (2011), in Tectonics of Sedimentary Basins: Recent Advances, Wiley, C. Busby and A. Azor (Eds.). Heister et al. (2017). Geophys. J. Int., 210, 833-851. Huismans, R. S. and Beaumont, C. (2003). J. Geophys. Res., 108, B10, 2496. Kronbichler et al. (2012). Geophys. J. Int., 191, 12-29. Pasyanos et al. (2014). J. of Geophys. Res., 119, 3, 2153-2173.
NASA Technical Reports Server (NTRS)
Dorband, John E.
1987-01-01
Generating graphics to faithfully represent information can be a computationally intensive task. A way of using the Massively Parallel Processor to generate images by ray tracing is presented. This technique uses sort computation, a method of performing generalized routing interspersed with computation on a single-instruction-multiple-data (SIMD) computer.
Multiplexed microsatellite recovery using massively parallel sequencing
T.N. Jennings; B.J. Knaus; T.D. Mullins; S.M. Haig; R.C. Cronn
2011-01-01
Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of...
DNA methylation profiling using HpaII tiny fragment enrichment by ligation-mediated PCR (HELP)
Suzuki, Masako; Greally, John M.
2010-01-01
The HELP assay is a technique that allows genome-wide analysis of cytosine methylation. Here we describe the assay, its relative strengths and weaknesses, and the transition of the assay from a microarray to massively-parallel sequencing-based foundation. PMID:20434563
Genetics Home Reference: Ochoa syndrome
... Other researchers believe that a defective heparanase 2 protein may lead to problems with the development of the urinary tract or with muscle ... Peng W, Xu J, Li J, Owens KM, Bloom D, Innis JW. Exome capture and massively parallel sequencing identifies a novel HPSE2 mutation in a Saudi ...
USDA-ARS?s Scientific Manuscript database
Flesh flies in the genus Sarcophaga are important models for investigating endocrinology, diapause, cold hardiness, reproduction, and immunity. Despite the prominence of Sarcophaga flesh flies as models for insect physiology and biochemistry, and in forensic studies, little genomic or transcriptom...
Medical applications for high-performance computers in SKIF-GRID network.
Zhuchkov, Alexey; Tverdokhlebov, Nikolay
2009-01-01
The paper presents a set of software services for massive mammography image processing by using high-performance parallel computers of SKIF-family which are linked into a service-oriented grid-network. An experience of a prototype system implementation in two medical institutions is also described.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chacon, Luis; del-Castillo-Negrete, Diego; Hauck, Cory D.
2014-09-01
We propose a Lagrangian numerical algorithm for a time-dependent, anisotropic temperature transport equation in magnetized plasmas in the large guide field regime. The approach is based on an analytical integral formal solution of the parallel (i.e., along the magnetic field) transport equation with sources, and it is able to accommodate both local and non-local parallel heat flux closures. The numerical implementation is based on an operator-split formulation, with two straightforward steps: a perpendicular transport step (including sources), and a Lagrangian (field-line integral) parallel transport step. Algorithmically, the first step is amenable to the use of modern iterative methods, while themore » second step has a fixed cost per degree of freedom (and is therefore scalable). Accuracy-wise, the approach is free from the numerical pollution introduced by the discrete parallel transport term when the perpendicular to parallel transport coefficient ratio X ⊥ /X ∥ becomes arbitrarily small, and is shown to capture the correct limiting solution when ε = X⊥L 2 ∥/X1L 2 ⊥ → 0 (with L∥∙ L⊥ , the parallel and perpendicular diffusion length scales, respectively). Therefore, the approach is asymptotic-preserving. We demonstrate the capabilities of the scheme with several numerical experiments with varying magnetic field complexity in two dimensions, including the case of transport across a magnetic island.« less
Data decomposition method for parallel polygon rasterization considering load balancing
NASA Astrophysics Data System (ADS)
Zhou, Chen; Chen, Zhenjie; Liu, Yongxue; Li, Feixue; Cheng, Liang; Zhu, A.-xing; Li, Manchun
2015-12-01
It is essential to adopt parallel computing technology to rapidly rasterize massive polygon data. In parallel rasterization, it is difficult to design an effective data decomposition method. Conventional methods ignore load balancing of polygon complexity in parallel rasterization and thus fail to achieve high parallel efficiency. In this paper, a novel data decomposition method based on polygon complexity (DMPC) is proposed. First, four factors that possibly affect the rasterization efficiency were investigated. Then, a metric represented by the boundary number and raster pixel number in the minimum bounding rectangle was developed to calculate the complexity of each polygon. Using this metric, polygons were rationally allocated according to the polygon complexity, and each process could achieve balanced loads of polygon complexity. To validate the efficiency of DMPC, it was used to parallelize different polygon rasterization algorithms and tested on different datasets. Experimental results showed that DMPC could effectively parallelize polygon rasterization algorithms. Furthermore, the implemented parallel algorithms with DMPC could achieve good speedup ratios of at least 15.69 and generally outperformed conventional decomposition methods in terms of parallel efficiency and load balancing. In addition, the results showed that DMPC exhibited consistently better performance for different spatial distributions of polygons.
High Performance Programming Using Explicit Shared Memory Model on the Cray T3D
NASA Technical Reports Server (NTRS)
Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)
1994-01-01
The Cray T3D is the first-phase system in Cray Research Inc.'s (CRI) three-phase massively parallel processing program. In this report we describe the architecture of the T3D, as well as the CRAFT (Cray Research Adaptive Fortran) programming model, and contrast it with PVM, which is also supported on the T3D We present some performance data based on the NAS Parallel Benchmarks to illustrate both architectural and software features of the T3D.
Belczynski, Krzysztof; Holz, Daniel E; Bulik, Tomasz; O'Shaughnessy, Richard
2016-06-23
The merger of two massive (about 30 solar masses) black holes has been detected in gravitational waves. This discovery validates recent predictions that massive binary black holes would constitute the first detection. Previous calculations, however, have not sampled the relevant binary-black-hole progenitors--massive, low-metallicity binary stars--with sufficient accuracy nor included sufficiently realistic physics to enable robust predictions to better than several orders of magnitude. Here we report high-precision numerical simulations of the formation of binary black holes via the evolution of isolated binary stars, providing a framework within which to interpret the first gravitational-wave source, GW150914, and to predict the properties of subsequent binary-black-hole gravitational-wave events. Our models imply that these events form in an environment in which the metallicity is less than ten per cent of solar metallicity, and involve stars with initial masses of 40-100 solar masses that interact through mass transfer and a common-envelope phase. These progenitor stars probably formed either about 2 billion years or, with a smaller probability, 11 billion years after the Big Bang. Most binary black holes form without supernova explosions, and their spins are nearly unchanged since birth, but do not have to be parallel. The classical field formation of binary black holes we propose, with low natal kicks (the velocity of the black hole at birth) and restricted common-envelope evolution, produces approximately 40 times more binary-black-holes mergers than do dynamical formation channels involving globular clusters; our predicted detection rate of these mergers is comparable to that from homogeneous evolution channels. Our calculations predict detections of about 1,000 black-hole mergers per year with total masses of 20-80 solar masses once second-generation ground-based gravitational-wave observatories reach full sensitivity.
NASA Astrophysics Data System (ADS)
Belczynski, Krzysztof; Holz, Daniel E.; Bulik, Tomasz; O'Shaughnessy, Richard
2016-06-01
The merger of two massive (about 30 solar masses) black holes has been detected in gravitational waves. This discovery validates recent predictions that massive binary black holes would constitute the first detection. Previous calculations, however, have not sampled the relevant binary-black-hole progenitors—massive, low-metallicity binary stars—with sufficient accuracy nor included sufficiently realistic physics to enable robust predictions to better than several orders of magnitude. Here we report high-precision numerical simulations of the formation of binary black holes via the evolution of isolated binary stars, providing a framework within which to interpret the first gravitational-wave source, GW150914, and to predict the properties of subsequent binary-black-hole gravitational-wave events. Our models imply that these events form in an environment in which the metallicity is less than ten per cent of solar metallicity, and involve stars with initial masses of 40-100 solar masses that interact through mass transfer and a common-envelope phase. These progenitor stars probably formed either about 2 billion years or, with a smaller probability, 11 billion years after the Big Bang. Most binary black holes form without supernova explosions, and their spins are nearly unchanged since birth, but do not have to be parallel. The classical field formation of binary black holes we propose, with low natal kicks (the velocity of the black hole at birth) and restricted common-envelope evolution, produces approximately 40 times more binary-black-holes mergers than do dynamical formation channels involving globular clusters; our predicted detection rate of these mergers is comparable to that from homogeneous evolution channels. Our calculations predict detections of about 1,000 black-hole mergers per year with total masses of 20-80 solar masses once second-generation ground-based gravitational-wave observatories reach full sensitivity.
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak
1996-01-01
Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.
Quakefinder: A scalable data mining system for detecting earthquakes from space
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stolorz, P.; Dean, C.
1996-12-31
We present an application of novel massively parallel datamining techniques to highly precise inference of important physical processes from remote sensing imagery. Specifically, we have developed and applied a system, Quakefinder, that automatically detects and measures tectonic activity in the earth`s crust by examination of satellite data. We have used Quakefinder to automatically map the direction and magnitude of ground displacements due to the 1992 Landers earthquake in Southern California, over a spatial region of several hundred square kilometers, at a resolution of 10 meters, to a (sub-pixel) precision of 1 meter. This is the first calculation that has evermore » been able to extract area-mapped information about 2D tectonic processes at this level of detail. We outline the architecture of the Quakefinder system, based upon a combination of techniques drawn from the fields of statistical inference, massively parallel computing and global optimization. We confirm the overall correctness of the procedure by comparison of our results with known locations of targeted faults obtained by careful and time-consuming field measurements. The system also performs knowledge discovery by indicating novel unexplained tectonic activity away from the primary faults that has never before been observed. We conclude by discussing the future potential of this data mining system in the broad context of studying subtle spatio-temporal processes within massive image streams.« less
Tumor Genomic Profiling in Breast Cancer Patients Using Targeted Massively Parallel Sequencing
2015-04-30
recently, we identified several novel alterations in in ER+ breast tumors, including translocations in ESR1 , the gene that encodes the estrogen receptor...modified our bait design to include genomic coordinates across select introns in ESR1 . In addition, two recent papers from the Broad Institute published
The Five Central Psychological Challenges Facing Effective Mobile Learning
ERIC Educational Resources Information Center
Terras, Melody M.; Ramsay, Judith
2012-01-01
Web 2.0 technology not only offers the opportunity of massively parallel interconnected networks that support the provision of information and communication anytime and anywhere but also offers immense opportunities for collaboration and sharing of user-generated content. This information-rich environment may support both formal and informal…
Associative Networks on a Massively Parallel Computer.
1985-10-01
lgbt (as a group of numbers, in this case), but this only leads to sensible queries when a statistical function is applied: "What is the largest salary...34.*"* . •.,. 64 the siW~pe operations being used during ascend, each movement step costs the same as executing an operation
SALT Spectroscopy of Evolved Massive Stars
NASA Astrophysics Data System (ADS)
Kniazev, A. Y.; Gvaramadze, V. V.; Berdnikov, L. N.
2017-06-01
Long-slit spectroscopy with the Southern African Large Telescope (SALT) of central stars of mid-infrared nebulae detected with the Spitzer Space Telescope and Wide-Field Infrared Survey Explorer (WISE) led to the discovery of numerous candidate luminous blue variables (cLBVs) and other rare evolved massive stars. With the recent advent of the SALT fiber-fed high-resolution echelle spectrograph (HRS), a new perspective for the study of these interesting objects is appeared. Using the HRS we obtained spectra of a dozen newly identified massive stars. Some results on the recently identified cLBV Hen 3-729 are presented.
Sigüenza, Julien; Pott, Desiree; Mendez, Simon; Sonntag, Simon J; Kaufmann, Tim A S; Steinseifer, Ulrich; Nicoud, Franck
2018-04-01
The complex fluid-structure interaction problem associated with the flow of blood through a heart valve with flexible leaflets is investigated both experimentally and numerically. In the experimental test rig, a pulse duplicator generates a pulsatile flow through a biomimetic rigid aortic root where a model of aortic valve with polymer flexible leaflets is implanted. High-speed recordings of the leaflets motion and particle image velocimetry measurements were performed together to investigate the valve kinematics and the dynamics of the flow. Large eddy simulations of the same configuration, based on a variant of the immersed boundary method, are also presented. A massively parallel unstructured finite-volume flow solver is coupled with a finite-element solid mechanics solver to predict the fluid-structure interaction between the unsteady flow and the valve. Detailed analysis of the dynamics of opening and closure of the valve are conducted, showing a good quantitative agreement between the experiment and the simulation regarding the global behavior, in spite of some differences regarding the individual dynamics of the valve leaflets. A multicycle analysis (over more than 20 cycles) enables to characterize the generation of turbulence downstream of the valve, showing similar flow features between the experiment and the simulation. The flow transitions to turbulence after peak systole, when the flow starts to decelerate. Fluctuations are observed in the wake of the valve, with maximum amplitude observed at the commissure side of the aorta. Overall, a very promising experiment-vs-simulation comparison is shown, demonstrating the potential of the numerical method. Copyright © 2017 John Wiley & Sons, Ltd.
Parareal in time 3D numerical solver for the LWR Benchmark neutron diffusion transient model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Baudron, Anne-Marie, E-mail: anne-marie.baudron@cea.fr; CEA-DRN/DMT/SERMA, CEN-Saclay, 91191 Gif sur Yvette Cedex; Lautard, Jean-Jacques, E-mail: jean-jacques.lautard@cea.fr
2014-12-15
In this paper we present a time-parallel algorithm for the 3D neutrons calculation of a transient model in a nuclear reactor core. The neutrons calculation consists in numerically solving the time dependent diffusion approximation equation, which is a simplified transport equation. The numerical resolution is done with finite elements method based on a tetrahedral meshing of the computational domain, representing the reactor core, and time discretization is achieved using a θ-scheme. The transient model presents moving control rods during the time of the reaction. Therefore, cross-sections (piecewise constants) are taken into account by interpolations with respect to the velocity ofmore » the control rods. The parallelism across the time is achieved by an adequate use of the parareal in time algorithm to the handled problem. This parallel method is a predictor corrector scheme that iteratively combines the use of two kinds of numerical propagators, one coarse and one fine. Our method is made efficient by means of a coarse solver defined with large time step and fixed position control rods model, while the fine propagator is assumed to be a high order numerical approximation of the full model. The parallel implementation of our method provides a good scalability of the algorithm. Numerical results show the efficiency of the parareal method on large light water reactor transient model corresponding to the Langenbuch–Maurer–Werner benchmark.« less
NASA Astrophysics Data System (ADS)
Kollet, S. J.; Goergen, K.; Gasper, F.; Shresta, P.; Sulis, M.; Rihani, J.; Simmer, C.; Vereecken, H.
2013-12-01
In studies of the terrestrial hydrologic, energy and biogeochemical cycles, integrated multi-physics simulation platforms take a central role in characterizing non-linear interactions, variances and uncertainties of system states and fluxes in reciprocity with observations. Recently developed integrated simulation platforms attempt to honor the complexity of the terrestrial system across multiple time and space scales from the deeper subsurface including groundwater dynamics into the atmosphere. Technically, this requires the coupling of atmospheric, land surface, and subsurface-surface flow models in supercomputing environments, while ensuring a high-degree of efficiency in the utilization of e.g., standard Linux clusters and massively parallel resources. A systematic performance analysis including profiling and tracing in such an application is crucial in the understanding of the runtime behavior, to identify optimum model settings, and is an efficient way to distinguish potential parallel deficiencies. On sophisticated leadership-class supercomputers, such as the 28-rack 5.9 petaFLOP IBM Blue Gene/Q 'JUQUEEN' of the Jülich Supercomputing Centre (JSC), this is a challenging task, but even more so important, when complex coupled component models are to be analysed. Here we want to present our experience from coupling, application tuning (e.g. 5-times speedup through compiler optimizations), parallel scaling and performance monitoring of the parallel Terrestrial Systems Modeling Platform TerrSysMP. The modeling platform consists of the weather prediction system COSMO of the German Weather Service; the Community Land Model, CLM of NCAR; and the variably saturated surface-subsurface flow code ParFlow. The model system relies on the Multiple Program Multiple Data (MPMD) execution model where the external Ocean-Atmosphere-Sea-Ice-Soil coupler (OASIS3) links the component models. TerrSysMP has been instrumented with the performance analysis tool Scalasca and analyzed on JUQUEEN with processor counts on the order of 10,000. The instrumentation is used in weak and strong scaling studies with real data cases and hypothetical idealized numerical experiments for detailed profiling and tracing analysis. The profiling is not only useful in identifying wait states that are due to the MPMD execution model, but also in fine-tuning resource allocation to the component models in search of the most suitable load balancing. This is especially necessary, as with numerical experiments that cover multiple (high resolution) spatial scales, the time stepping, coupling frequencies, and communication overheads are constantly shifting, which makes it necessary to re-determine the model setup with each new experimental design.
On the suitability of the connection machine for direct particle simulation
NASA Technical Reports Server (NTRS)
Dagum, Leonard
1990-01-01
The algorithmic structure was examined of the vectorizable Stanford particle simulation (SPS) method and the structure is reformulated in data parallel form. Some of the SPS algorithms can be directly translated to data parallel, but several of the vectorizable algorithms have no direct data parallel equivalent. This requires the development of new, strictly data parallel algorithms. In particular, a new sorting algorithm is developed to identify collision candidates in the simulation and a master/slave algorithm is developed to minimize communication cost in large table look up. Validation of the method is undertaken through test calculations for thermal relaxation of a gas, shock wave profiles, and shock reflection from a stationary wall. A qualitative measure is provided of the performance of the Connection Machine for direct particle simulation. The massively parallel architecture of the Connection Machine is found quite suitable for this type of calculation. However, there are difficulties in taking full advantage of this architecture because of lack of a broad based tradition of data parallel programming. An important outcome of this work has been new data parallel algorithms specifically of use for direct particle simulation but which also expand the data parallel diction.
Mendoza-Parra, Marco-Antonio; Saravaki, Vincent; Cholley, Pierre-Etienne; Blum, Matthias; Billoré, Benjamin; Gronemeyer, Hinrich
2016-01-01
We have established a certification system for antibodies to be used in chromatin immunoprecipitation assays coupled to massive parallel sequencing (ChIP-seq). This certification comprises a standardized ChIP procedure and the attribution of a numerical quality control indicator (QCi) to biological replicate experiments. The QCi computation is based on a universally applicable quality assessment that quantitates the global deviation of randomly sampled subsets of ChIP-seq dataset with the original genome-aligned sequence reads. Comparison with a QCi database for >28,000 ChIP-seq assays were used to attribute quality grades (ranging from 'AAA' to 'DDD') to a given dataset. In the present report we used the numerical QC system to assess the factors influencing the quality of ChIP-seq assays, including the nature of the target, the sequencing depth and the commercial source of the antibody. We have used this approach specifically to certify mono and polyclonal antibodies obtained from Active Motif directed against the histone modification marks H3K4me3, H3K27ac and H3K9ac for ChIP-seq. The antibodies received the grades AAA to BBC ( www.ngs-qc.org). We propose to attribute such quantitative grading of all antibodies attributed with the label "ChIP-seq grade".
Modeling and Simulation of Radiative Compressible Flows in Aerodynamic Heating Arc-Jet Facility
NASA Technical Reports Server (NTRS)
Bensassi, Khalil; Laguna, Alejandro A.; Lani, Andrea; Mansour, Nagi N.
2016-01-01
Numerical simulations of an arc heated flow inside NASA's 20 [MW] Aerodynamics heating facility (AHF) are performed in order to investigate the three-dimensional swirling flow and the current distribution inside the wind tunnel. The plasma is considered in Local Thermodynamics Equilibrium(LTE) and is composed of Air-Argon gas mixture. The governing equations are the Navier-Stokes equations that include source terms corresponding to Joule heating and radiative cooling. The former is obtained by solving an electric potential equation, while the latter is calculated using an innovative massively parallel ray-tracing algorithm. The fully coupled system is closed by the thermodynamics relations and transport properties which are obtained from Chapman-Enskog method. A novel strategy was developed in order to enable the flow solver and the radiation calculation to be preformed independently and simultaneously using a different number of processors. Drastic reduction in the computational cost was achieved using this strategy. Details on the numerical methods used for space discretization, time integration and ray-tracing algorithm will be presented. The effect of the radiative cooling on the dynamics of the flow will be investigated. The complete set of equations were implemented within the COOLFluiD Framework. Fig. 1 shows the geometry of the Anode and part of the constrictor of the Aerodynamics heating facility (AHF). Fig. 2 shows the velocity field distribution along (x-y) plane and the streamline in (z-y) plane.
Analysis and selection of optimal function implementations in massively parallel computer
Archer, Charles Jens [Rochester, MN; Peters, Amanda [Rochester, MN; Ratterman, Joseph D [Rochester, MN
2011-05-31
An apparatus, program product and method optimize the operation of a parallel computer system by, in part, collecting performance data for a set of implementations of a function capable of being executed on the parallel computer system based upon the execution of the set of implementations under varying input parameters in a plurality of input dimensions. The collected performance data may be used to generate selection program code that is configured to call selected implementations of the function in response to a call to the function under varying input parameters. The collected performance data may be used to perform more detailed analysis to ascertain the comparative performance of the set of implementations of the function under the varying input parameters.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Snyder, L.; Notkin, D.; Adams, L.
1990-03-31
This task relates to research on programming massively parallel computers. Previous work on the Ensamble concept of programming was extended and investigation into nonshared memory models of parallel computation was undertaken. Previous work on the Ensamble concept defined a set of programming abstractions and was used to organize the programming task into three distinct levels; Composition of machine instruction, composition of processes, and composition of phases. It was applied to shared memory models of computations. During the present research period, these concepts were extended to nonshared memory models. During the present research period, one Ph D. thesis was completed, onemore » book chapter, and six conference proceedings were published.« less
NASA Astrophysics Data System (ADS)
Liao, S.; Chen, L.; Li, J.; Xiong, W.; Wu, Q.
2015-07-01
Existing spatiotemporal database supports spatiotemporal aggregation query over massive moving objects datasets. Due to the large amounts of data and single-thread processing method, the query speed cannot meet the application requirements. On the other hand, the query efficiency is more sensitive to spatial variation then temporal variation. In this paper, we proposed a spatiotemporal aggregation query method using multi-thread parallel technique based on regional divison and implemented it on the server. Concretely, we divided the spatiotemporal domain into several spatiotemporal cubes, computed spatiotemporal aggregation on all cubes using the technique of multi-thread parallel processing, and then integrated the query results. By testing and analyzing on the real datasets, this method has improved the query speed significantly.
cellGPU: Massively parallel simulations of dynamic vertex models
NASA Astrophysics Data System (ADS)
Sussman, Daniel M.
2017-10-01
Vertex models represent confluent tissue by polygonal or polyhedral tilings of space, with the individual cells interacting via force laws that depend on both the geometry of the cells and the topology of the tessellation. This dependence on the connectivity of the cellular network introduces several complications to performing molecular-dynamics-like simulations of vertex models, and in particular makes parallelizing the simulations difficult. cellGPU addresses this difficulty and lays the foundation for massively parallelized, GPU-based simulations of these models. This article discusses its implementation for a pair of two-dimensional models, and compares the typical performance that can be expected between running cellGPU entirely on the CPU versus its performance when running on a range of commercial and server-grade graphics cards. By implementing the calculation of topological changes and forces on cells in a highly parallelizable fashion, cellGPU enables researchers to simulate time- and length-scales previously inaccessible via existing single-threaded CPU implementations. Program Files doi:http://dx.doi.org/10.17632/6j2cj29t3r.1 Licensing provisions: MIT Programming language: CUDA/C++ Nature of problem: Simulations of off-lattice "vertex models" of cells, in which the interaction forces depend on both the geometry and the topology of the cellular aggregate. Solution method: Highly parallelized GPU-accelerated dynamical simulations in which the force calculations and the topological features can be handled on either the CPU or GPU. Additional comments: The code is hosted at https://gitlab.com/dmsussman/cellGPU, with documentation additionally maintained at http://dmsussman.gitlab.io/cellGPUdocumentation
Nagai, Moeto; Oohara, Kiyotaka; Kato, Keita; Kawashima, Takahiro; Shibata, Takayuki
2015-04-01
Parallel manipulation of single cells is important for reconstructing in vivo cellular microenvironments and studying cell functions. To manipulate single cells and reconstruct their environments, development of a versatile manipulation tool is necessary. In this study, we developed an array of hollow probes using microelectromechanical systems fabrication technology and demonstrated the manipulation of single cells. We conducted a cell aspiration experiment with a glass pipette and modeled a cell using a standard linear solid model, which provided information for designing hollow stepped probes for minimally invasive single-cell manipulation. We etched a silicon wafer on both sides and formed through holes with stepped structures. The inner diameters of the holes were reduced by SiO2 deposition of plasma-enhanced chemical vapor deposition to trap cells on the tips. This fabrication process makes it possible to control the wall thickness, inner diameter, and outer diameter of the probes. With the fabricated probes, single cells were manipulated and placed in microwells at a single-cell level in a parallel manner. We studied the capture, release, and survival rates of cells at different suction and release pressures and found that the cell trapping rate was directly proportional to the suction pressure, whereas the release rate and viability decreased with increasing the suction pressure. The proposed manipulation system makes it possible to place cells in a well array and observe the adherence, spreading, culture, and death of the cells. This system has potential as a tool for massively parallel manipulation and for three-dimensional hetero cellular assays.
GPU: the biggest key processor for AI and parallel processing
NASA Astrophysics Data System (ADS)
Baji, Toru
2017-07-01
Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.
Massively parallel simulator of optical coherence tomography of inhomogeneous turbid media.
Malektaji, Siavash; Lima, Ivan T; Escobar I, Mauricio R; Sherif, Sherif S
2017-10-01
An accurate and practical simulator for Optical Coherence Tomography (OCT) could be an important tool to study the underlying physical phenomena in OCT such as multiple light scattering. Recently, many researchers have investigated simulation of OCT of turbid media, e.g., tissue, using Monte Carlo methods. The main drawback of these earlier simulators is the long computational time required to produce accurate results. We developed a massively parallel simulator of OCT of inhomogeneous turbid media that obtains both Class I diffusive reflectivity, due to ballistic and quasi-ballistic scattered photons, and Class II diffusive reflectivity due to multiply scattered photons. This Monte Carlo-based simulator is implemented on graphic processing units (GPUs), using the Compute Unified Device Architecture (CUDA) platform and programming model, to exploit the parallel nature of propagation of photons in tissue. It models an arbitrary shaped sample medium as a tetrahedron-based mesh and uses an advanced importance sampling scheme. This new simulator speeds up simulations of OCT of inhomogeneous turbid media by about two orders of magnitude. To demonstrate this result, we have compared the computation times of our new parallel simulator and its serial counterpart using two samples of inhomogeneous turbid media. We have shown that our parallel implementation reduced simulation time of OCT of the first sample medium from 407 min to 92 min by using a single GPU card, to 12 min by using 8 GPU cards and to 7 min by using 16 GPU cards. For the second sample medium, the OCT simulation time was reduced from 209 h to 35.6 h by using a single GPU card, and to 4.65 h by using 8 GPU cards, and to only 2 h by using 16 GPU cards. Therefore our new parallel simulator is considerably more practical to use than its central processing unit (CPU)-based counterpart. Our new parallel OCT simulator could be a practical tool to study the different physical phenomena underlying OCT, or to design OCT systems with improved performance. Copyright © 2017 Elsevier B.V. All rights reserved.
High-Performance High-Order Simulation of Wave and Plasma Phenomena
NASA Astrophysics Data System (ADS)
Klockner, Andreas
This thesis presents results aiming to enhance and broaden the applicability of the discontinuous Galerkin ("DG") method in a variety of ways. DG was chosen as a foundation for this work because it yields high-order finite element discretizations with very favorable numerical properties for the treatment of hyperbolic conservation laws. In a first part, I examine progress that can be made on implementation aspects of DG. In adapting the method to mass-market massively parallel computation hardware in the form of graphics processors ("GPUs"), I obtain an increase in computation performance per unit of cost by more than an order of magnitude over conventional processor architectures. Key to this advance is a recipe that adapts DG to a variety of hardware through automated self-tuning. I discuss new parallel programming tools supporting GPU run-time code generation which are instrumental in the DG self-tuning process and contribute to its reaching application floating point throughput greater than 200 GFlops/s on a single GPU and greater than 3 TFlops/s on a 16-GPU cluster in simulations of electromagnetics problems in three dimensions. I further briefly discuss the solver infrastructure that makes this possible. In the second part of the thesis, I introduce a number of new numerical methods whose motivation is partly rooted in the opportunity created by GPU-DG: First, I construct and examine a novel GPU-capable shock detector, which, when used to control an artificial viscosity, helps stabilize DG computations in gas dynamics and a number of other fields. Second, I describe my pursuit of a method that allows the simulation of rarefied plasmas using a DG discretization of the electromagnetic field. Finally, I introduce new explicit multi-rate time integrators for ordinary differential equations with multiple time scales, with a focus on applicability to DG discretizations of time-dependent problems.
NASA Technical Reports Server (NTRS)
Wang, P.; Li, P.
1998-01-01
A high-resolution numerical study on parallel systems is reported on three-dimensional, time-dependent, thermal convective flows. A parallel implentation on the finite volume method with a multigrid scheme is discussed, and a parallel visualization systemm is developed on distributed systems for visualizing the flow.