Science.gov

Sample records for 3-d massively parallel

  1. Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.

    PubMed

    Maruyama, Yutaka; Yoshida, Norio; Tadano, Hiroto; Takahashi, Daisuke; Sato, Mitsuhisa; Hirata, Fumio

    2014-07-05

    A new three-dimensional reference interaction site model (3D-RISM) program for massively parallel machines combined with the volumetric 3D fast Fourier transform (3D-FFT) was developed, and tested on the RIKEN K supercomputer. The ordinary parallel 3D-RISM program has a limitation on the number of parallelizations because of the limitations of the slab-type 3D-FFT. The volumetric 3D-FFT relieves this limitation drastically. We tested the 3D-RISM calculation on the large and fine calculation cell (2048(3) grid points) on 16,384 nodes, each having eight CPU cores. The new 3D-RISM program achieved excellent scalability to the parallelization, running on the RIKEN K supercomputer. As a benchmark application, we employed the program, combined with molecular dynamics simulation, to analyze the oligomerization process of chymotrypsin Inhibitor 2 mutant. The results demonstrate that the massive parallel 3D-RISM program is effective to analyze the hydration properties of the large biomolecular systems.

  2. 3D seismic imaging on massively parallel computers

    SciTech Connect

    Womble, D.E.; Ober, C.C.; Oldfield, R.

    1997-02-01

    The ability to image complex geologies such as salt domes in the Gulf of Mexico and thrusts in mountainous regions is a key to reducing the risk and cost associated with oil and gas exploration. Imaging these structures, however, is computationally expensive. Datasets can be terabytes in size, and the processing time required for the multiple iterations needed to produce a velocity model can take months, even with the massively parallel computers available today. Some algorithms, such as 3D, finite-difference, prestack, depth migration remain beyond the capacity of production seismic processing. Massively parallel processors (MPPs) and algorithms research are the tools that will enable this project to provide new seismic processing capabilities to the oil and gas industry. The goals of this work are to (1) develop finite-difference algorithms for 3D, prestack, depth migration; (2) develop efficient computational approaches for seismic imaging and for processing terabyte datasets on massively parallel computers; and (3) develop a modular, portable, seismic imaging code.

  3. 3-D readout-electronics packaging for high-bandwidth massively paralleled imager

    DOEpatents

    Kwiatkowski, Kris; Lyke, James

    2007-12-18

    Dense, massively parallel signal processing electronics are co-packaged behind associated sensor pixels. Microchips containing a linear or bilinear arrangement of photo-sensors, together with associated complex electronics, are integrated into a simple 3-D structure (a "mirror cube"). An array of photo-sensitive cells are disposed on a stacked CMOS chip's surface at a 45.degree. angle from light reflecting mirror surfaces formed on a neighboring CMOS chip surface. Image processing electronics are held within the stacked CMOS chip layers. Electrical connections couple each of said stacked CMOS chip layers and a distribution grid, the connections for distributing power and signals to components associated with each stacked CSMO chip layer.

  4. PORTA: A Massively Parallel Code for 3D Non-LTE Polarized Radiative Transfer

    NASA Astrophysics Data System (ADS)

    Štěpán, J.

    2014-10-01

    The interpretation of the Stokes profiles of the solar (stellar) spectral line radiation requires solving a non-LTE radiative transfer problem that can be very complex, especially when the main interest lies in modeling the linear polarization signals produced by scattering processes and their modification by the Hanle effect. One of the main difficulties is due to the fact that the plasma of a stellar atmosphere can be highly inhomogeneous and dynamic, which implies the need to solve the non-equilibrium problem of generation and transfer of polarized radiation in realistic three-dimensional stellar atmospheric models. Here we present PORTA, a computer program we have developed for solving, in three-dimensional (3D) models of stellar atmospheres, the problem of the generation and transfer of spectral line polarization taking into account anisotropic radiation pumping and the Hanle and Zeeman effects in multilevel atoms. The numerical method of solution is based on a highly convergent iterative algorithm, whose convergence rate is insensitive to the grid size, and on an accurate short-characteristics formal solver of the Stokes-vector transfer equation which uses monotonic Bezier interpolation. In addition to the iterative method and the 3D formal solver, another important feature of PORTA is a novel parallelization strategy suitable for taking advantage of massively parallel computers. Linear scaling of the solution with the number of processors allows to reduce the solution time by several orders of magnitude. We present useful benchmarks and a few illustrations of applications using a 3D model of the solar chromosphere resulting from MHD simulations. Finally, we present our conclusions with a view to future research. For more details see Štěpán & Trujillo Bueno (2013).

  5. Massively parallel computation of 3D flow and reactions in chemical vapor deposition reactors

    SciTech Connect

    Salinger, A.G.; Shadid, J.N.; Hutchinson, S.A.; Hennigan, G.L.; Devine, K.D.; Moffat, H.K.

    1997-12-01

    Computer modeling of Chemical Vapor Deposition (CVD) reactors can greatly aid in the understanding, design, and optimization of these complex systems. Modeling is particularly attractive in these systems since the costs of experimentally evaluating many design alternatives can be prohibitively expensive, time consuming, and even dangerous, when working with toxic chemicals like Arsine (AsH{sub 3}): until now, predictive modeling has not been possible for most systems since the behavior is three-dimensional and governed by complex reaction mechanisms. In addition, CVD reactors often exhibit large thermal gradients, large changes in physical properties over regions of the domain, and significant thermal diffusion for gas mixtures with widely varying molecular weights. As a result, significant simplifications in the models have been made which erode the accuracy of the models` predictions. In this paper, the authors will demonstrate how the vast computational resources of massively parallel computers can be exploited to make possible the analysis of models that include coupled fluid flow and detailed chemistry in three-dimensional domains. For the most part, models have either simplified the reaction mechanisms and concentrated on the fluid flow, or have simplified the fluid flow and concentrated on rigorous reactions. An important CVD research thrust has been in detailed modeling of fluid flow and heat transfer in the reactor vessel, treating transport and reaction of chemical species either very simply or as a totally decoupled problem. Using the analogy between heat transfer and mass transfer, and the fact that deposition is often diffusion limited, much can be learned from these calculations; however, the effects of thermal diffusion, the change in physical properties with composition, and the incorporation of surface reaction mechanisms are not included in this model, nor can transitions to three-dimensional flows be detected.

  6. 3-D prestack Kirchhoff depth migration: From prototype to production in a massively parallel processor environment

    SciTech Connect

    Chang, H.; Solano, M.; VanDyke, J.P.; McMechan, G.A.; Epili, D.

    1998-03-01

    Portable, production-scale 3-D prestack Kirchhoff depth migration software capable of full-volume imaging has been successfully implemented and applied to a six-million trace (46.9 Gbyte) marine data set from a salt/subsalt play in the Gulf of Mexico. Velocity model building and updates use an image-driven strategy and were performed in a Sun Sparc environment. Images obtained by 3-D prestack migration after three velocity iterations are substantially better focused and reveal drilling targets that were not visible in images obtained from conventional 3-D poststack time migration. Amplitudes are well preserved, so anomalies associated with known reservoirs conform to the petrophysical predictions. Prototype development was on an 8-node Intel iPSC860 computer; the production version was run on an 1824-node Intel Paragon computer. The code has been successfully ported to CRAY (T3D) and Unix workstation (PVM) environments.

  7. Massive parallelization of a 3D finite difference electromagnetic forward solution using domain decomposition methods on multiple CUDA enabled GPUs

    NASA Astrophysics Data System (ADS)

    Schultz, A.

    2010-12-01

    3D forward solvers lie at the core of inverse formulations used to image the variation of electrical conductivity within the Earth's interior. This property is associated with variations in temperature, composition, phase, presence of volatiles, and in specific settings, the presence of groundwater, geothermal resources, oil/gas or minerals. The high cost of 3D solutions has been a stumbling block to wider adoption of 3D methods. Parallel algorithms for modeling frequency domain 3D EM problems have not achieved wide scale adoption, with emphasis on fairly coarse grained parallelism using MPI and similar approaches. The communications bandwidth as well as the latency required to send and receive network communication packets is a limiting factor in implementing fine grained parallel strategies, inhibiting wide adoption of these algorithms. Leading Graphics Processor Unit (GPU) companies now produce GPUs with hundreds of GPU processor cores per die. The footprint, in silicon, of the GPU's restricted instruction set is much smaller than the general purpose instruction set required of a CPU. Consequently, the density of processor cores on a GPU can be much greater than on a CPU. GPUs also have local memory, registers and high speed communication with host CPUs, usually through PCIe type interconnects. The extremely low cost and high computational power of GPUs provides the EM geophysics community with an opportunity to achieve fine grained (i.e. massive) parallelization of codes on low cost hardware. The current generation of GPUs (e.g. NVidia Fermi) provides 3 billion transistors per chip die, with nearly 500 processor cores and up to 6 GB of fast (DDR5) GPU memory. This latest generation of GPU supports fast hardware double precision (64 bit) floating point operations of the type required for frequency domain EM forward solutions. Each Fermi GPU board can sustain nearly 1 TFLOP in double precision, and multiple boards can be installed in the host computer system. We

  8. Minimal massive 3D gravity

    NASA Astrophysics Data System (ADS)

    Bergshoeff, Eric; Hohm, Olaf; Merbis, Wout; Routh, Alasdair J.; Townsend, Paul K.

    2014-07-01

    We present an alternative to topologically massive gravity (TMG) with the same ‘minimal’ bulk properties; i.e. a single local degree of freedom that is realized as a massive graviton in linearization about an anti-de Sitter (AdS) vacuum. However, in contrast to TMG, the new ‘minimal massive gravity’ has both a positive energy graviton and positive central charges for the asymptotic AdS-boundary conformal algebra.

  9. Massively parallel patterning of complex 2D and 3D functional polymer brushes by polymer pen lithography.

    PubMed

    Xie, Zhuang; Chen, Chaojian; Zhou, Xuechang; Gao, Tingting; Liu, Danqing; Miao, Qian; Zheng, Zijian

    2014-08-13

    We report the first demonstration of centimeter-area serial patterning of complex 2D and 3D functional polymer brushes by high-throughput polymer pen lithography. Arbitrary 2D and 3D structures of poly(glycidyl methacrylate) (PGMA) brushes are fabricated over areas as large as 2 cm × 1 cm, with a remarkable throughput being 3 orders of magnitudes higher than the state-of-the-arts. Patterned PGMA brushes are further employed as resist for fabricating Au micro/nanostructures and hard molds for the subsequent replica molding of soft stamps. On the other hand, these 2D and 3D PGMA brushes are also utilized as robust and versatile platforms for the immobilization of bioactive molecules to form 2D and 3D patterned DNA oligonucleotide and protein chips. Therefore, this low-cost, yet high-throughput "bench-top" serial fabrication method can be readily applied to a wide range of fields including micro/nanofabrication, optics and electronics, smart surfaces, and biorelated studies.

  10. A fully coupled method for massively parallel simulation of hydraulically driven fractures in 3-dimensions: FULLY COUPLED PARALLEL SIMULATION OF HYDRAULIC FRACTURES IN 3-D

    DOE PAGES

    Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.; ...

    2016-09-18

    This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.

  11. N-Body Classical Systems and Neural Networks on a 3d SIMD Massive Parallel Processor:. APE100/QUADRICS

    NASA Astrophysics Data System (ADS)

    Paolucci, P. S.

    A number of physical systems (e.g., N body Newtonian, Coulombian or Lennard-Jones systems) can be described by N2 interaction terms. Completely connected neural networks are characterised by the same kind of connections: Each neuron sends signals to all the other neurons via synapses. The APE100/Quadricsmassive parallel architecture, with processing power in excess of 100 Gigaflops and a central memory of 8 Gigabytes seems to have processing power and memory adequate to simulate systems formed by more than 1 billion synapses or interaction terms. On the other hand the processing nodes of APE100/Quadrics are organised in a tridimensional cubic lattice; each processing node has a direct communication path only toward the first neighboring nodes. Here we describe a convenient way to map systems with global connectivity onto the first-neighbors connectivity of the APE100/Quadrics architecture. Some numeric criteria, which are useful for matching SIMD tridimensional architectures with globally connected simulations, are introduced.

  12. Massively parallel mathematical sieves

    SciTech Connect

    Montry, G.R.

    1989-01-01

    The Sieve of Eratosthenes is a well-known algorithm for finding all prime numbers in a given subset of integers. A parallel version of the Sieve is described that produces computational speedups over 800 on a hypercube with 1,024 processing elements for problems of fixed size. Computational speedups as high as 980 are achieved when the problem size per processor is fixed. The method of parallelization generalizes to other sieves and will be efficient on any ensemble architecture. We investigate two highly parallel sieves using scattered decomposition and compare their performance on a hypercube multiprocessor. A comparison of different parallelization techniques for the sieve illustrates the trade-offs necessary in the design and implementation of massively parallel algorithms for large ensemble computers.

  13. Massively Parallel Genetics.

    PubMed

    Shendure, Jay; Fields, Stanley

    2016-06-01

    Human genetics has historically depended on the identification of individuals whose natural genetic variation underlies an observable trait or disease risk. Here we argue that new technologies now augment this historical approach by allowing the use of massively parallel assays in model systems to measure the functional effects of genetic variation in many human genes. These studies will help establish the disease risk of both observed and potential genetic variants and to overcome the problem of "variants of uncertain significance."

  14. Massively parallel processor computer

    NASA Technical Reports Server (NTRS)

    Fung, L. W. (Inventor)

    1983-01-01

    An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.

  15. Parallel CARLOS-3D code development

    SciTech Connect

    Putnam, J.M.; Kotulski, J.D.

    1996-02-01

    CARLOS-3D is a three-dimensional scattering code which was developed under the sponsorship of the Electromagnetic Code Consortium, and is currently used by over 80 aerospace companies and government agencies. The code has been extensively validated and runs on both serial workstations and parallel super computers such as the Intel Paragon. CARLOS-3D is a three-dimensional surface integral equation scattering code based on a Galerkin method of moments formulation employing Rao- Wilton-Glisson roof-top basis for triangular faceted surfaces. Fully arbitrary 3D geometries composed of multiple conducting and homogeneous bulk dielectric materials can be modeled. This presentation describes some of the extensions to the CARLOS-3D code, and how the operator structure of the code facilitated these improvements. Body of revolution (BOR) and two-dimensional geometries were incorporated by simply including new input routines, and the appropriate Galerkin matrix operator routines. Some additional modifications were required in the combined field integral equation matrix generation routine due to the symmetric nature of the BOR and 2D operators. Quadrilateral patched surfaces with linear roof-top basis functions were also implemented in the same manner. Quadrilateral facets and triangular facets can be used in combination to more efficiently model geometries with both large smooth surfaces and surfaces with fine detail such as gaps and cracks. Since the parallel implementation in CARLOS-3D is at high level, these changes were independent of the computer platform being used. This approach minimizes code maintenance, while providing capabilities with little additional effort. Results are presented showing the performance and accuracy of the code for some large scattering problems. Comparisons between triangular faceted and quadrilateral faceted geometry representations will be shown for some complex scatterers.

  16. On 3D minimal massive gravity

    NASA Astrophysics Data System (ADS)

    Alishahiha, Mohsen; Qaemmaqami, Mohammad M.; Naseh, Ali; Shirzad, Ahmad

    2014-12-01

    We study linearized equations of motion of the newly proposed three dimensional gravity, known as minimal massive gravity, using its metric formulation. By making use of a redefinition of the parameters of the model, we observe that the resulting linearized equations are exactly the same as that of TMG. In particular the model admits logarithmic modes at critical points. We also study several vacuum solutions of the model, specially at a certain limit where the contribution of Chern-Simons term vanishes.

  17. Parallel 3-D viscoelastic finite difference seismic modelling

    NASA Astrophysics Data System (ADS)

    Bohlen, Thomas

    2002-10-01

    Computational power has advanced to a state where we can begin to perform wavefield simulations for realistic (complex) 3-D earth models at frequencies of interest to both seismologists and engineers. On serial platforms, however, 3-D calculations are still limited to small grid sizes and short seismic wave traveltimes. To make use of the efficiency of network computers a parallel 3-D viscoelastic finite difference (FD) code is implemented which allows to distribute the work on several PCs or workstations connected via standard ethernet in an in-house network. By using the portable message passing interface standard (MPI) for the communication between processors, running times can be reduced and grid sizes can be increased significantly. Furthermore, the code shows good performance on massive parallel supercomputers which makes the computation of very large grids feasible. This implementation greatly expands the applicability of the 3-D elastic/viscoelastic finite-difference modelling technique by providing an efficient, portable and practical C-program.

  18. Parallel rendering techniques for massively parallel visualization

    SciTech Connect

    Hansen, C.; Krogh, M.; Painter, J.

    1995-07-01

    As the resolution of simulation models increases, scientific visualization algorithms which take advantage of the large memory. and parallelism of Massively Parallel Processors (MPPs) are becoming increasingly important. For large applications rendering on the MPP tends to be preferable to rendering on a graphics workstation due to the MPP`s abundant resources: memory, disk, and numerous processors. The challenge becomes developing algorithms that can exploit these resources while minimizing overhead, typically communication costs. This paper will describe recent efforts in parallel rendering for polygonal primitives as well as parallel volumetric techniques. This paper presents rendering algorithms, developed for massively parallel processors (MPPs), for polygonal, spheres, and volumetric data. The polygon algorithm uses a data parallel approach whereas the sphere and volume render use a MIMD approach. Implementations for these algorithms are presented for the Thinking Ma.chines Corporation CM-5 MPP.

  19. Fast, Massively Parallel Data Processors

    NASA Technical Reports Server (NTRS)

    Heaton, Robert A.; Blevins, Donald W.; Davis, ED

    1994-01-01

    Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.

  20. Massively Parallel MRI Detector Arrays

    PubMed Central

    Keil, Boris; Wald, Lawrence L

    2013-01-01

    Originally proposed as a method to increase sensitivity by extending the locally high-sensitivity of small surface coil elements to larger areas, the term parallel imaging now includes the use of array coils to perform image encoding. This methodology has impacted clinical imaging to the point where many examinations are performed with an array comprising multiple smaller surface coil elements as the detector of the MR signal. This article reviews the theoretical and experimental basis for the trend towards higher channel counts relying on insights gained from modeling and experimental studies as well as the theoretical analysis of the so-called “ultimate” SNR and g-factor. We also review the methods for optimally combining array data and changes in RF methodology needed to construct massively parallel MRI detector arrays and show some examples of state-of-the-art for highly accelerated imaging with the resulting highly parallel arrays. PMID:23453758

  1. Seismic imaging on massively parallel computers

    SciTech Connect

    Ober, C.C.; Oldfield, R.A.; Womble, D.E.; Mosher, C.C.

    1997-07-01

    A key to reducing the risks and costs associated with oil and gas exploration is the fast, accurate imaging of complex geologies, such as salt domes in the Gulf of Mexico and overthrust regions in US onshore regions. Pre-stack depth migration generally yields the most accurate images, and one approach to this is to solve the scalar-wave equation using finite differences. Current industry computational capabilities are insufficient for the application of finite-difference, 3-D, prestack, depth-migration algorithms. High performance computers and state-of-the-art algorithms and software are required to meet this need. As part of an ongoing ACTI project funded by the US Department of Energy, the authors have developed a finite-difference, 3-D prestack, depth-migration code for massively parallel computer systems. The goal of this work is to demonstrate that massively parallel computers (thousands of processors) can be used efficiently for seismic imaging, and that sufficient computing power exists (or soon will exist) to make finite-difference, prestack, depth migration practical for oil and gas exploration.

  2. Merlin - Massively parallel heterogeneous computing

    NASA Technical Reports Server (NTRS)

    Wittie, Larry; Maples, Creve

    1989-01-01

    Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.

  3. Massive-Star Magnetospheres: Now in 3-D!

    NASA Astrophysics Data System (ADS)

    Townsend, Richard

    Magnetic fields are unexpected in massive stars, due to the absence of a dynamo convection zone beneath their surface layers. Nevertheless, kilogauss-strength, ordered fields were detected in a small subset of these stars over three decades ago, and the intervening years have witnessed the steady expansion of this subset. A distinctive feature of magnetic massive stars is that they harbor magnetospheres --- circumstellar environments where the magnetic field interacts strongly with the star's radiation-driven wind, confining it and channelling it into energetic shocks. A wide range of observational signatures are associated with these magnetospheres, in diagnostics ranging from X-rays all the way through to radio emission. Moreover, these magnetospheres can play an important role in massive-star evolution, by amplifying angular momentum loss in the wind. Recent progress in understanding massive-star magnetospheres has largely been driven by magnetohydrodynamical (MHD) simulations. However, these have been restricted to two- dimensional axisymmetric configurations, with three-dimensional configurations possible only in certain special cases. These restrictions are limiting further progress; we therefore propose to develop completely general three-dimensional models for the magnetospheres of massive stars, on the one hand to understand their observational properties and exploit them as plasma-physics laboratories, and on the other to gain a comprehensive understanding of how they influence the evolution of their host star. For weak- and intermediate-field stars, the models will be based on 3-D MHD simulations using a modified version of the ZEUS-MP code. For strong-field stars, we will extend our existing Rigid Field Hydrodynamics (RFHD) code to handle completely arbitrary field topologies. To explore a putative 'photoionization-moderated mass loss' mechanism for massive-star magnetospheres, we will also further develop a photoionization code we have recently

  4. CALTRANS: A parallel, deterministic, 3D neutronics code

    SciTech Connect

    Carson, L.; Ferguson, J.; Rogers, J.

    1994-04-01

    Our efforts to parallelize the deterministic solution of the neutron transport equation has culminated in a new neutronics code CALTRANS, which has full 3D capability. In this article, we describe the layout and algorithms of CALTRANS and present performance measurements of the code on a variety of platforms. Explicit implementation of the parallel algorithms of CALTRANS using both the function calls of the Parallel Virtual Machine software package (PVM 3.2) and the Meiko CS-2 tagged message passing library (based on the Intel NX/2 interface) are provided in appendices.

  5. 3-D Visualization on Workspace of Parallel Manipulators

    NASA Astrophysics Data System (ADS)

    Tanaka, Yoshito; Yokomichi, Isao; Ishii, Junko; Makino, Toshiaki

    In parallel mechanisms, the form and volume of workspace also change variously with the attitude of a platform. This paper presents a method to search for the workspace of parallel mechanisms with 6-DOF and 3D visualization of the workspace. Workspace is a search for the movable range of the central point of a platform when it moves with a given orientation. In order to search workspace, geometric analysis based on inverse kinematics is considered. Plots of 2D of calculations are compared with those measured by position sensors. The test results are shown to have good agreement with simulation results. The workspace variations are demonstrated in terms of 3D and 2D plots for prototype mechanisms. The workspace plots are created with OpenGL and Visual C++ by implementation of the algorithm. An application module is developed, which displays workspace of the mechanism in 3D images. The effectiveness and practicability of 3D visualization on workspace are successfully demonstrated by 6-DOF parallel mechanisms.

  6. Implementation of parallel matrix decomposition for NIKE3D on the KSR1 system

    SciTech Connect

    Su, Philip S.; Fulton, R.E.; Zacharia, T.

    1995-06-01

    New massively parallel computer architecture has revolutionized the design of computer algorithms and promises to have significant influence on algorithms for engineering computations. Realistic engineering problems using finite element analysis typically imply excessively large computational requirements. Parallel supercomputers that have the potential for significantly increasing calculation speeds can meet these computational requirements. This report explores the potential for the parallel Cholesky (U{sup T}DU) matrix decomposition algorithm on NIKE3D through actual computations. The examples of two- and three-dimensional nonlinear dynamic finite element problems are presented on the Kendall Square Research (KSR1) multiprocessor system, with 64 processors, at Oak Ridge National Laboratory. The numerical results indicate that the parallel Cholesky (U{sup T}DU) matrix decomposition algorithm is attractive for NIKE3D under multi-processor system environments.

  7. A parallel algorithm for solving the 3d Schroedinger equation

    SciTech Connect

    Strickland, Michael; Yager-Elorriaga, David

    2010-08-20

    We describe a parallel algorithm for solving the time-independent 3d Schroedinger equation using the finite difference time domain (FDTD) method. We introduce an optimized parallelization scheme that reduces communication overhead between computational nodes. We demonstrate that the compute time, t, scales inversely with the number of computational nodes as t {proportional_to} (N{sub nodes}){sup -0.95} {sup {+-} 0.04}. This makes it possible to solve the 3d Schroedinger equation on extremely large spatial lattices using a small computing cluster. In addition, we present a new method for precisely determining the energy eigenvalues and wavefunctions of quantum states based on a symmetry constraint on the FDTD initial condition. Finally, we discuss the usage of multi-resolution techniques in order to speed up convergence on extremely large lattices.

  8. Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU.

    PubMed

    Xia, Yong; Wang, Kuanquan; Zhang, Henggui

    2015-01-01

    Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation) and the other is the diffusion term of the monodomain model (partial differential equation). Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations.

  9. Parallel Optimization of 3D Cardiac Electrophysiological Model Using GPU

    PubMed Central

    Xia, Yong; Wang, Kuanquan; Zhang, Henggui

    2015-01-01

    Large-scale 3D virtual heart model simulations are highly demanding in computational resources. This imposes a big challenge to the traditional computation resources based on CPU environment, which already cannot meet the requirement of the whole computation demands or are not easily available due to expensive costs. GPU as a parallel computing environment therefore provides an alternative to solve the large-scale computational problems of whole heart modeling. In this study, using a 3D sheep atrial model as a test bed, we developed a GPU-based simulation algorithm to simulate the conduction of electrical excitation waves in the 3D atria. In the GPU algorithm, a multicellular tissue model was split into two components: one is the single cell model (ordinary differential equation) and the other is the diffusion term of the monodomain model (partial differential equation). Such a decoupling enabled realization of the GPU parallel algorithm. Furthermore, several optimization strategies were proposed based on the features of the virtual heart model, which enabled a 200-fold speedup as compared to a CPU implementation. In conclusion, an optimized GPU algorithm has been developed that provides an economic and powerful platform for 3D whole heart simulations. PMID:26581957

  10. Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver

    NASA Astrophysics Data System (ADS)

    Moustafa, Salli; Dutka-Malen, Ivan; Plagne, Laurent; Ponçot, Angélique; Ramet, Pierre

    2014-06-01

    This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multicore+SIMD) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46 × 106 spatial cells and 1 × 1012 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40:74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool.

  11. Parallel tempering and 3D spin glass models

    NASA Astrophysics Data System (ADS)

    Papakonstantinou, T.; Malakis, A.

    2014-03-01

    We review parallel tempering schemes and examine their main ingredients for accuracy and efficiency. We discuss two selection methods of temperatures and some alternatives for the exchange of replicas, including all-pair exchange methods. We measure specific heat errors and round-trip efficiency using the two-dimensional (2D) Ising model, and also test the efficiency for the ground state production in 3D spin glass models. We find that the optimization of the GS problem is highly influenced by the choice of the temperature range of the PT process. Finally, we present numerical evidence concerning the universality aspects of an anisotropic case of the 3D spin-glass model.

  12. Parallel PAB3D: Experiences with a Prototype in MPI

    NASA Technical Reports Server (NTRS)

    Guerinoni, Fabio; Abdol-Hamid, Khaled S.; Pao, S. Paul

    1998-01-01

    PAB3D is a three-dimensional Navier Stokes solver that has gained acceptance in the research and industrial communities. It takes as computational domain, a set disjoint blocks covering the physical domain. This is the first report on the implementation of PAB3D using the Message Passing Interface (MPI), a standard for parallel processing. We discuss briefly the characteristics of tile code and define a prototype for testing. The principal data structure used for communication is derived from preprocessing "patching". We describe a simple interface (COMMSYS) for MPI communication, and some general techniques likely to be encountered when working on problems of this nature. Last, we identify levels of improvement from the current version and outline future work.

  13. 3D finite-difference seismic migration with parallel computers

    SciTech Connect

    Ober, C.C.; Gjertsen, R.; Minkoff, S.; Womble, D.E.

    1998-11-01

    The ability to image complex geologies such as salt domes in the Gulf of Mexico and thrusts in mountainous regions is essential for reducing the risk associated with oil exploration. Imaging these structures, however, is computationally expensive as datasets can be terabytes in size. Traditional ray-tracing migration methods cannot handle complex velocity variations commonly found near such salt structures. Instead the authors use the full 3D acoustic wave equation, discretized via a finite difference algorithm. They reduce the cost of solving the apraxial wave equation by a number of numerical techniques including the method of fractional steps and pipelining the tridiagonal solves. The imaging code, Salvo, uses both frequency parallelism (generally 90% efficient) and spatial parallelism (65% efficient). Salvo has been tested on synthetic and real data and produces clear images of the subsurface even beneath complicated salt structures.

  14. Exact solutions in 3D new massive gravity.

    PubMed

    Ahmedov, Haji; Aliev, Alikram N

    2011-01-14

    We show that the field equations of new massive gravity (NMG) consist of a massive (tensorial) Klein-Gordon-type equation with a curvature-squared source term and a constraint equation. We also show that, for algebraic type D and N spacetimes, the field equations of topologically massive gravity (TMG) can be thought of as the "square root" of the massive Klein-Gordon-type equation. Using this fact, we establish a simple framework for mapping all types D and N solutions of TMG into NMG. Finally, we present new examples of types D and N solutions to NMG.

  15. Exact Solutions in 3D New Massive Gravity

    SciTech Connect

    Ahmedov, Haji; Aliev, Alikram N.

    2011-01-14

    We show that the field equations of new massive gravity (NMG) consist of a massive (tensorial) Klein-Gordon-type equation with a curvature-squared source term and a constraint equation. We also show that, for algebraic type D and N spacetimes, the field equations of topologically massive gravity (TMG) can be thought of as the 'square root' of the massive Klein-Gordon-type equation. Using this fact, we establish a simple framework for mapping all types D and N solutions of TMG into NMG. Finally, we present new examples of types D and N solutions to NMG.

  16. Exact Solutions in 3D New Massive Gravity

    NASA Astrophysics Data System (ADS)

    Ahmedov, Haji; Aliev, Alikram N.

    2011-01-01

    We show that the field equations of new massive gravity (NMG) consist of a massive (tensorial) Klein-Gordon-type equation with a curvature-squared source term and a constraint equation. We also show that, for algebraic type D and N spacetimes, the field equations of topologically massive gravity (TMG) can be thought of as the “square root” of the massive Klein-Gordon-type equation. Using this fact, we establish a simple framework for mapping all types D and N solutions of TMG into NMG. Finally, we present new examples of types D and N solutions to NMG.

  17. Parallel 3D Mortar Element Method for Adaptive Nonconforming Meshes

    NASA Technical Reports Server (NTRS)

    Feng, Huiyu; Mavriplis, Catherine; VanderWijngaart, Rob; Biswas, Rupak

    2004-01-01

    High order methods are frequently used in computational simulation for their high accuracy. An efficient way to avoid unnecessary computation in smooth regions of the solution is to use adaptive meshes which employ fine grids only in areas where they are needed. Nonconforming spectral elements allow the grid to be flexibly adjusted to satisfy the computational accuracy requirements. The method is suitable for computational simulations of unsteady problems with very disparate length scales or unsteady moving features, such as heat transfer, fluid dynamics or flame combustion. In this work, we select the Mark Element Method (MEM) to handle the non-conforming interfaces between elements. A new technique is introduced to efficiently implement MEM in 3-D nonconforming meshes. By introducing an "intermediate mortar", the proposed method decomposes the projection between 3-D elements and mortars into two steps. In each step, projection matrices derived in 2-D are used. The two-step method avoids explicitly forming/deriving large projection matrices for 3-D meshes, and also helps to simplify the implementation. This new technique can be used for both h- and p-type adaptation. This method is applied to an unsteady 3-D moving heat source problem. With our new MEM implementation, mesh adaptation is able to efficiently refine the grid near the heat source and coarsen the grid once the heat source passes. The savings in computational work resulting from the dynamic mesh adaptation is demonstrated by the reduction of the the number of elements used and CPU time spent. MEM and mesh adaptation, respectively, bring irregularity and dynamics to the computer memory access pattern. Hence, they provide a good way to gauge the performance of computer systems when running scientific applications whose memory access patterns are irregular and unpredictable. We select a 3-D moving heat source problem as the Unstructured Adaptive (UA) grid benchmark, a new component of the NAS Parallel

  18. IM3D: A parallel Monte Carlo code for efficient simulations of primary radiation displacements and damage in 3D geometry

    PubMed Central

    Li, Yong Gang; Yang, Yang; Short, Michael P.; Ding, Ze Jun; Zeng, Zhi; Li, Ju

    2015-01-01

    SRIM-like codes have limitations in describing general 3D geometries, for modeling radiation displacements and damage in nanostructured materials. A universal, computationally efficient and massively parallel 3D Monte Carlo code, IM3D, has been developed with excellent parallel scaling performance. IM3D is based on fast indexing of scattering integrals and the SRIM stopping power database, and allows the user a choice of Constructive Solid Geometry (CSG) or Finite Element Triangle Mesh (FETM) method for constructing 3D shapes and microstructures. For 2D films and multilayers, IM3D perfectly reproduces SRIM results, and can be ∼102 times faster in serial execution and > 104 times faster using parallel computation. For 3D problems, it provides a fast approach for analyzing the spatial distributions of primary displacements and defect generation under ion irradiation. Herein we also provide a detailed discussion of our open-source collision cascade physics engine, revealing the true meaning and limitations of the “Quick Kinchin-Pease” and “Full Cascades” options. The issues of femtosecond to picosecond timescales in defining displacement versus damage, the limitation of the displacements per atom (DPA) unit in quantifying radiation damage (such as inadequacy in quantifying degree of chemical mixing), are discussed. PMID:26658477

  19. Multigrid on massively parallel architectures

    SciTech Connect

    Falgout, R D; Jones, J E

    1999-09-17

    The scalable implementation of multigrid methods for machines with several thousands of processors is investigated. Parallel performance models are presented for three different structured-grid multigrid algorithms, and a description is given of how these models can be used to guide implementation. Potential pitfalls are illustrated when moving from moderate-sized parallelism to large-scale parallelism, and results are given from existing multigrid codes to support the discussion. Finally, the use of mixed programming models is investigated for multigrid codes on clusters of SMPs.

  20. Kundt solutions of minimal massive 3D gravity

    NASA Astrophysics Data System (ADS)

    Deger, Nihat Sadik; Sarıoǧlu, Ã.-zgür

    2015-11-01

    We construct Kundt solutions of minimal massive gravity theory and show that, similar to topologically massive gravity (TMG), most of them are constant scalar invariant (CSI) spacetimes that correspond to deformations of round and warped (A)dS. We also find an explicit non-CSI Kundt solution at the merger point. Finally, we give their algebraic classification with respect to the traceless Ricci tensor (Segre classification) and show that their Segre types match with the types of their counterparts in TMG.

  1. Massively Parallel Computing: A Sandia Perspective

    SciTech Connect

    Dosanjh, Sudip S.; Greenberg, David S.; Hendrickson, Bruce; Heroux, Michael A.; Plimpton, Steve J.; Tomkins, James L.; Womble, David E.

    1999-05-06

    The computing power available to scientists and engineers has increased dramatically in the past decade, due in part to progress in making massively parallel computing practical and available. The expectation for these machines has been great. The reality is that progress has been slower than expected. Nevertheless, massively parallel computing is beginning to realize its potential for enabling significant break-throughs in science and engineering. This paper provides a perspective on the state of the field, colored by the authors' experiences using large scale parallel machines at Sandia National Laboratories. We address trends in hardware, system software and algorithms, and we also offer our view of the forces shaping the parallel computing industry.

  2. Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P

    SciTech Connect

    Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; Syratchev, I.; /CERN

    2009-06-19

    In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).

  3. Performance analysis of high quality parallel preconditioners applied to 3D finite element structural analysis

    SciTech Connect

    Kolotilina, L.; Nikishin, A.; Yeremin, A.

    1994-12-31

    The solution of large systems of linear equations is a crucial bottleneck when performing 3D finite element analysis of structures. Also, in many cases the reliability and robustness of iterative solution strategies, and their efficiency when exploiting hardware resources, fully determine the scope of industrial applications which can be solved on a particular computer platform. This is especially true for modern vector/parallel supercomputers with large vector length and for modern massively parallel supercomputers. Preconditioned iterative methods have been successfully applied to industrial class finite element analysis of structures. The construction and application of high quality preconditioners constitutes a high percentage of the total solution time. Parallel implementation of high quality preconditioners on such architectures is a formidable challenge. Two common types of existing preconditioners are the implicit preconditioners and the explicit preconditioners. The implicit preconditioners (e.g. incomplete factorizations of several types) are generally high quality but require solution of lower and upper triangular systems of equations per iteration which are difficult to parallelize without deteriorating the convergence rate. The explicit type of preconditionings (e.g. polynomial preconditioners or Jacobi-like preconditioners) require sparse matrix-vector multiplications and can be parallelized but their preconditioning qualities are less than desirable. The authors present results of numerical experiments with Factorized Sparse Approximate Inverses (FSAI) for symmetric positive definite linear systems. These are high quality preconditioners that possess a large resource of parallelism by construction without increasing the serial complexity.

  4. Parallel 3D Finite Element Particle-in-Cell Simulations with Pic3P

    SciTech Connect

    Candel, A.; Kabel, A.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; Ko, K.; Ben-Zvi, I.; Kewisch, J.; /Brookhaven

    2009-06-19

    SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic Particle-In-Cell code Pic3P. Designed for simulations of beam-cavity interactions dominated by space charge effects, Pic3P solves the complete set of Maxwell-Lorentz equations self-consistently and includes space-charge, retardation and boundary effects from first principles. Higher-order Finite Element methods with adaptive refinement on conformal unstructured meshes lead to highly efficient use of computational resources. Massively parallel processing with dynamic load balancing enables large-scale modeling of photoinjectors with unprecedented accuracy, aiding the design and operation of next-generation accelerator facilities. Applications include the LCLS RF gun and the BNL polarized SRF gun.

  5. Parallel ALLSPD-3D: Speeding Up Combustor Analysis Via Parallel Processing

    NASA Technical Reports Server (NTRS)

    Fricker, David M.

    1997-01-01

    The ALLSPD-3D Computational Fluid Dynamics code for reacting flow simulation was run on a set of benchmark test cases to determine its parallel efficiency. These test cases included non-reacting and reacting flow simulations with varying numbers of processors. Also, the tests explored the effects of scaling the simulation with the number of processors in addition to distributing a constant size problem over an increasing number of processors. The test cases were run on a cluster of IBM RS/6000 Model 590 workstations with ethernet and ATM networking plus a shared memory SGI Power Challenge L workstation. The results indicate that the network capabilities significantly influence the parallel efficiency, i.e., a shared memory machine is fastest and ATM networking provides acceptable performance. The limitations of ethernet greatly hamper the rapid calculation of flows using ALLSPD-3D.

  6. Template based parallel checkpointing in a massively parallel computer system

    DOEpatents

    Archer, Charles Jens; Inglett, Todd Alan

    2009-01-13

    A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.

  7. Massively Parallel Algorithms for Solution of Schrodinger Equation

    NASA Technical Reports Server (NTRS)

    Fijany, Amir; Barhen, Jacob; Toomerian, Nikzad

    1994-01-01

    In this paper massively parallel algorithms for solution of Schrodinger equation are developed. Our results clearly indicate that the Crank-Nicolson method, in addition to its excellent numerical properties, is also highly suitable for massively parallel computation.

  8. Aerodynamic simulation on massively parallel systems

    NASA Technical Reports Server (NTRS)

    Haeuser, Jochem; Simon, Horst D.

    1992-01-01

    This paper briefly addresses the computational requirements for the analysis of complete configurations of aircraft and spacecraft currently under design to be used for advanced transportation in commercial applications as well as in space flight. The discussion clearly shows that massively parallel systems are the only alternative which is both cost effective and on the other hand can provide the necessary TeraFlops, needed to satisfy the narrow design margins of modern vehicles. It is assumed that the solution of the governing physical equations, i.e., the Navier-Stokes equations which may be complemented by chemistry and turbulence models, is done on multiblock grids. This technique is situated between the fully structured approach of classical boundary fitted grids and the fully unstructured tetrahedra grids. A fully structured grid best represents the flow physics, while the unstructured grid gives best geometrical flexibility. The multiblock grid employed is structured within a block, but completely unstructured on the block level. While a completely unstructured grid is not straightforward to parallelize, the above mentioned multiblock grid is inherently parallel, in particular for multiple instruction multiple datastream (MIMD) machines. In this paper guidelines are provided for setting up or modifying an existing sequential code so that a direct parallelization on a massively parallel system is possible. Results are presented for three parallel systems, namely the Intel hypercube, the Ncube hypercube, and the FPS 500 system. Some preliminary results for an 8K CM2 machine will also be mentioned. The code run is the two dimensional grid generation module of Grid, which is a general two dimensional and three dimensional grid generation code for complex geometries. A system of nonlinear Poisson equations is solved. This code is also a good testcase for complex fluid dynamics codes, since the same datastructures are used. All systems provided good speedups, but

  9. 3-D parallel program for numerical calculation of gas dynamics problems with heat conductivity on distributed memory computational systems (CS)

    SciTech Connect

    Sofronov, I.D.; Voronin, B.L.; Butnev, O.I.

    1997-12-31

    The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle. The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.

  10. New 3D parallel GILD electromagnetic modeling and nonlinear inversion using global magnetic integral and local differential equation

    SciTech Connect

    Xie, G.; Li, J.; Majer, E.; Zuo, D.

    1998-07-01

    This paper describes a new 3D parallel GILD electromagnetic (EM) modeling and nonlinear inversion algorithm. The algorithm consists of: (a) a new magnetic integral equation instead of the electric integral equation to solve the electromagnetic forward modeling and inverse problem; (b) a collocation finite element method for solving the magnetic integral and a Galerkin finite element method for the magnetic differential equations; (c) a nonlinear regularizing optimization method to make the inversion stable and of high resolution; and (d) a new parallel 3D modeling and inversion using a global integral and local differential domain decomposition technique (GILD). The new 3D nonlinear electromagnetic inversion has been tested with synthetic data and field data. The authors obtained very good imaging for the synthetic data and reasonable subsurface EM imaging for the field data. The parallel algorithm has high parallel efficiency over 90% and can be a parallel solver for elliptic, parabolic, and hyperbolic modeling and inversion. The parallel GILD algorithm can be extended to develop a high resolution and large scale seismic and hydrology modeling and inversion in the massively parallel computer.

  11. Massively-Parallel Dislocation Dynamics Simulations

    SciTech Connect

    Cai, W; Bulatov, V V; Pierce, T G; Hiratani, M; Rhee, M; Bartelt, M; Tang, M

    2003-06-18

    Prediction of the plastic strength of single crystals based on the collective dynamics of dislocations has been a challenge for computational materials science for a number of years. The difficulty lies in the inability of the existing dislocation dynamics (DD) codes to handle a sufficiently large number of dislocation lines, in order to be statistically representative and to reproduce experimentally observed microstructures. A new massively-parallel DD code is developed that is capable of modeling million-dislocation systems by employing thousands of processors. We discuss the general aspects of this code that make such large scale simulations possible, as well as a few initial simulation results.

  12. Parallel implementation of 3D protein structure similarity searches using a GPU and the CUDA.

    PubMed

    Mrozek, Dariusz; Brożek, Miłosz; Małysiak-Mrozek, Bożena

    2014-02-01

    Searching for similar 3D protein structures is one of the primary processes employed in the field of structural bioinformatics. However, the computational complexity of this process means that it is constantly necessary to search for new methods that can perform such a process faster and more efficiently. Finding molecular substructures that complex protein structures have in common is still a challenging task, especially when entire databases containing tens or even hundreds of thousands of protein structures must be scanned. Graphics processing units (GPUs) and general purpose graphics processing units (GPGPUs) can perform many time-consuming and computationally demanding processes much more quickly than a classical CPU can. In this paper, we describe the GPU-based implementation of the CASSERT algorithm for 3D protein structure similarity searching. This algorithm is based on the two-phase alignment of protein structures when matching fragments of the compared proteins. The GPU (GeForce GTX 560Ti: 384 cores, 2GB RAM) implementation of CASSERT ("GPU-CASSERT") parallelizes both alignment phases and yields an average 180-fold increase in speed over its CPU-based, single-core implementation on an Intel Xeon E5620 (2.40GHz, 4 cores). In this paper, we show that massive parallelization of the 3D structure similarity search process on many-core GPU devices can reduce the execution time of the process, allowing it to be performed in real time. GPU-CASSERT is available at: http://zti.polsl.pl/dmrozek/science/gpucassert/cassert.htm.

  13. Plasma simulation using the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Lin, C. S.; Thring, A. L.; Koga, J.; Janetzke, R. W.

    1987-01-01

    Two dimensional electrostatic simulation codes using the particle-in-cell model are developed on the Massively Parallel Processor (MPP). The conventional plasma simulation procedure that computes electric fields at particle positions by means of a gridded system is found inefficient on the MPP. The MPP simulation code is thus based on the gridless system in which particles are assigned to processing elements and electric fields are computed directly via Discrete Fourier Transform. Currently, the gridless model on the MPP in two dimensions is about nine times slower that the gridded system on the CRAY X-MP without considering I/O time. However, the gridless system on the MPP can be improved by incorporating a faster I/O between the staging memory and Array Unit and a more efficient procedure for taking floating point sums over processing elements. The initial results suggest that the parallel processors have the potential for performing large scale plasma simulations.

  14. Parallel deterministic neutronics with AMR in 3D

    SciTech Connect

    Clouse, C.; Ferguson, J.; Hendrickson, C.

    1997-12-31

    AMTRAN, a three dimensional Sn neutronics code with adaptive mesh refinement (AMR) has been parallelized over spatial domains and energy groups and runs on the Meiko CS-2 with MPI message passing. Block refined AMR is used with linear finite element representations for the fluxes, which allows for a straight forward interpretation of fluxes at block interfaces with zoning differences. The load balancing algorithm assumes 8 spatial domains, which minimizes idle time among processors.

  15. MASSIVE HYBRID PARALLELISM FOR FULLY IMPLICIT MULTIPHYSICS

    SciTech Connect

    Cody J. Permann; David Andrs; John W. Peterson; Derek R. Gaston

    2013-05-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided.

  16. Massive hybrid parallelism for fully implicit multiphysics

    SciTech Connect

    Gaston, D. R.; Permann, C. J.; Andrs, D.; Peterson, J. W.

    2013-07-01

    As hardware advances continue to modify the supercomputing landscape, traditional scientific software development practices will become more outdated, ineffective, and inefficient. The process of rewriting/retooling existing software for new architectures is a Sisyphean task, and results in substantial hours of development time, effort, and money. Software libraries which provide an abstraction of the resources provided by such architectures are therefore essential if the computational engineering and science communities are to continue to flourish in this modern computing environment. The Multiphysics Object Oriented Simulation Environment (MOOSE) framework enables complex multiphysics analysis tools to be built rapidly by scientists, engineers, and domain specialists, while also allowing them to both take advantage of current HPC architectures, and efficiently prepare for future supercomputer designs. MOOSE employs a hybrid shared-memory and distributed-memory parallel model and provides a complete and consistent interface for creating multiphysics analysis tools. In this paper, a brief discussion of the mathematical algorithms underlying the framework and the internal object-oriented hybrid parallel design are given. Representative massively parallel results from several applications areas are presented, and a brief discussion of future areas of research for the framework are provided. (authors)

  17. Parallel beam optical tomography apparatus for 3D radiation dosimetry

    NASA Astrophysics Data System (ADS)

    Krstajic, Nikola; Doran, Simon J.

    2005-06-01

    Since the discovery of X rays radiotherapy has had the same aim - to deliver a precisely measured dose of radiation to a defined tumour volume with minimal damage to surrounding healthy tissue. Recent developments in radiotherapy such as intensity modulated radiotherapy (IMRT) can generate complex shapes of dose distributions. Until recently it has not been possible to verify that the delivered dose matches the planned dose. However, one often wants to know the real three-dimensional dose distribution. Three-dimensional radiation dosimeters have been developed since the early 1980s. Most chemical formulations involve a radiosensitive species immobilised in space by gelling agent. Magnetic Resonance Imaging (MRI) and optical techniques have been the most successful gel scanning techniques so far. Optical techniques rely on gels changing colour once irradiated. Parallel beam optical tomography has been developed at the University of Surrey since the late 1990s. The apparatus involves light emitting diode light source collimated to a wide (12cm) parallel beam. The beam is attenuated or scattered (depending on the chemical formulation) as it passes through the gel. Focusing optics projects the beam onto a CCD chip. The dosimeter sits on a rotation stage. The tomography scan involves continuously rotating the dosimeter and taking CCD images. Once the dosimeter has been rotated over 180 degrees the images are processed by filtered back projection. The work presented discusses the optics of the apparatus in more detail.

  18. Parallel Cartesian grid refinement for 3D complex flow simulations

    NASA Astrophysics Data System (ADS)

    Angelidis, Dionysios; Sotiropoulos, Fotis

    2013-11-01

    A second order accurate method for discretizing the Navier-Stokes equations on 3D unstructured Cartesian grids is presented. Although the grid generator is based on the oct-tree hierarchical method, fully unstructured data-structure is adopted enabling robust calculations for incompressible flows, avoiding both the need of synchronization of the solution between different levels of refinement and usage of prolongation/restriction operators. The current solver implements a hybrid staggered/non-staggered grid layout, employing the implicit fractional step method to satisfy the continuity equation. The pressure-Poisson equation is discretized by using a novel second order fully implicit scheme for unstructured Cartesian grids and solved using an efficient Krylov subspace solver. The momentum equation is also discretized with second order accuracy and the high performance Newton-Krylov method is used for integrating them in time. Neumann and Dirichlet conditions are used to validate the Poisson solver against analytical functions and grid refinement results to a significant reduction of the solution error. The effectiveness of the fractional step method results in the stability of the overall algorithm and enables the performance of accurate multi-resolution real life simulations. This material is based upon work supported by the Department of Energy under Award Number DE-EE0005482.

  19. Parallel Impurity Spreading During Massive Gas Injection

    NASA Astrophysics Data System (ADS)

    Izzo, V. A.

    2016-10-01

    Extended-MHD simulations of disruption mitigation in DIII-D demonstrate that both pre-existing islands (locked-modes) and plasma rotation can significantly influence toroidal spreading of impurities following massive gas injection (MGI). Given the importance of successful disruption mitigation in ITER and the large disparity in device parameters, empirical demonstrations of disruption mitigation strategies in present tokamaks are insufficient to inspire unreserved confidence for ITER. Here, MHD simulations elucidate how impurities injected as a localized jet spread toroidally and poloidally. Simulations with large pre-existing islands at the q = 2 surface reveal that the magnetic topology strongly influences the rate of impurity spreading parallel to the field lines. Parallel spreading is largely driven by rapid parallel heat conduction, and is much faster at low order rational surfaces, where a short parallel connection length leads to faster thermal equilibration. Consequently, the presence of large islands, which alter the connection length, can slow impurity transport; but the simulations also show that the appearance of a 4/2 harmonic of the 2/1 mode, which breaks up the large islands, can increase the rate of spreading. This effect is seen both for simulations with spontaneously growing and directly imposed 4/2 modes. Given the prevalence of locked-modes as a cause of disruptions, understanding the effect of large islands is of particular importance. Simulations with and without islands also show that rotation can alter impurity spreading, even reversing the predominant direction of spreading, which is toward the high-field-side in the absence of rotation. Given expected differences in rotation for ITER vs. DIII-D, rotation effects are another important consideration when extrapolating experimental results. Work supported by US DOE under DE-FG02-95ER54309.

  20. Computational chaos in massively parallel neural networks

    NASA Technical Reports Server (NTRS)

    Barhen, Jacob; Gulati, Sandeep

    1989-01-01

    A fundamental issue which directly impacts the scalability of current theoretical neural network models to massively parallel embodiments, in both software as well as hardware, is the inherent and unavoidable concurrent asynchronicity of emerging fine-grained computational ensembles and the possible emergence of chaotic manifestations. Previous analyses attributed dynamical instability to the topology of the interconnection matrix, to parasitic components or to propagation delays. However, researchers have observed the existence of emergent computational chaos in a concurrently asynchronous framework, independent of the network topology. Researcher present a methodology enabling the effective asynchronous operation of large-scale neural networks. Necessary and sufficient conditions guaranteeing concurrent asynchronous convergence are established in terms of contracting operators. Lyapunov exponents are computed formally to characterize the underlying nonlinear dynamics. Simulation results are presented to illustrate network convergence to the correct results, even in the presence of large delays.

  1. A nanofluidic system for massively parallel PCR

    NASA Astrophysics Data System (ADS)

    Brenan, Colin; Morrison, Tom; Roberts, Douglas; Hurley, James

    2008-02-01

    Massively parallel nanofluidic systems are lab-on-a-chip devices where solution phase biochemical and biological analyses are implemented in high density arrays of nanoliter holes micro-machined in a thin platen. Polymer coatings make the interior surfaces of the holes hydrophilic and the exterior surface of the platen hydrophobic for precise and accurate self-metered loading of liquids into each hole without cross-contamination. We have created a "nanoplate" based on this concept, equivalent in performance to standard microtiter plates, having 3072 thirty-three nanoliter holes in a stainless steel platen the dimensions of a microscope slide. We report on the performance of this device for PCR-based single nucleotide polymorphism (SNP) genotyping or quantitative measurement of gene expression by real-time PCR in applications ranging from plant and animal diagnostics, agricultural genetics and human disease research.

  2. Broadband monitoring simulation with massively parallel processors

    NASA Astrophysics Data System (ADS)

    Trubetskov, Mikhail; Amotchkina, Tatiana; Tikhonravov, Alexander

    2011-09-01

    Modern efficient optimization techniques, namely needle optimization and gradual evolution, enable one to design optical coatings of any type. Even more, these techniques allow obtaining multiple solutions with close spectral characteristics. It is important, therefore, to develop software tools that can allow one to choose a practically optimal solution from a wide variety of possible theoretical designs. A practically optimal solution provides the highest production yield when optical coating is manufactured. Computational manufacturing is a low-cost tool for choosing a practically optimal solution. The theory of probability predicts that reliable production yield estimations require many hundreds or even thousands of computational manufacturing experiments. As a result reliable estimation of the production yield may require too much computational time. The most time-consuming operation is calculation of the discrepancy function used by a broadband monitoring algorithm. This function is formed by a sum of terms over wavelength grid. These terms can be computed simultaneously in different threads of computations which opens great opportunities for parallelization of computations. Multi-core and multi-processor systems can provide accelerations up to several times. Additional potential for further acceleration of computations is connected with using Graphics Processing Units (GPU). A modern GPU consists of hundreds of massively parallel processors and is capable to perform floating-point operations efficiently.

  3. Parallel molecular dynamics: Communication requirements for massively parallel machines

    NASA Astrophysics Data System (ADS)

    Taylor, Valerie E.; Stevens, Rick L.; Arnold, Kathryn E.

    1995-05-01

    Molecular mechanics and dynamics are becoming widely used to perform simulations of molecular systems from large-scale computations of materials to the design and modeling of drug compounds. In this paper we address two major issues: a good decomposition method that can take advantage of future massively parallel processing systems for modest-sized problems in the range of 50,000 atoms and the communication requirements needed to achieve 30 to 40% efficiency on MPPs. We analyzed a scalable benchmark molecular dynamics program executing on the Intel Touchstone Deleta parallelized with an interaction decomposition method. Using a validated analytical performance model of the code, we determined that for an MPP with a four-dimensional mesh topology and 400 MHz processors the communication startup time must be at most 30 clock cycles and the network bandwidth must be at least 2.3 GB/s. This configuration results in 30 to 40% efficiency of the MPP for a problem with 50,000 atoms executing on 50,000 processors.

  4. Parallel OSEM Reconstruction Algorithm for Fully 3-D SPECT on a Beowulf Cluster.

    PubMed

    Rong, Zhou; Tianyu, Ma; Yongjie, Jin

    2005-01-01

    In order to improve the computation speed of ordered subset expectation maximization (OSEM) algorithm for fully 3-D single photon emission computed tomography (SPECT) reconstruction, an experimental beowulf-type cluster was built and several parallel reconstruction schemes were described. We implemented a single-program-multiple-data (SPMD) parallel 3-D OSEM reconstruction algorithm based on message passing interface (MPI) and tested it with combinations of different number of calculating processors and different size of voxel grid in reconstruction (64×64×64 and 128×128×128). Performance of parallelization was evaluated in terms of the speedup factor and parallel efficiency. This parallel implementation methodology is expected to be helpful to make fully 3-D OSEM algorithms more feasible in clinical SPECT studies.

  5. Prediction of parallel NIKE3D performance on the KSR1 system

    SciTech Connect

    Su, P.S.; Zacharia, T.; Fulton, R.E.

    1995-05-01

    Finite element method is one of the bases for numerical solutions to engineering problems. Complex engineering problems using finite element analysis typically imply excessively large computational time. Parallel supercomputers have the potential for significantly increasing calculation speeds in order to meet these computational requirements. This paper predicts parallel NIKE3D performance on the Kendall Square Research (KSR1) system. The first part of the prediction is based on the implementation of parallel Cholesky (U{sup T}DU) matrix decomposition algorithm through actual computations on the KSRI multiprocessor system, with 64 processors, at Oak Ridge National Laboratory. The other predictions are based on actual computations for parallel element matrix generation, parallel global stiffness matrix assembly, and parallel forward/backward substitution on the BBN TC2000 multiprocessor system at Lawrence Livermore National Laboratory. The preliminary results indicate that parallel NIKE3D performance can be attractive under local/shared-memory multiprocessor system environments.

  6. Multiplexed microsatellite recovery using massively parallel sequencing.

    PubMed

    Jennings, T N; Knaus, B J; Mullins, T D; Haig, S M; Cronn, R C

    2011-11-01

    Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356,958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5 M (USD).

  7. Multiplexed microsatellite recovery using massively parallel sequencing

    USGS Publications Warehouse

    Jennings, T.N.; Knaus, B.J.; Mullins, T.D.; Haig, S.M.; Cronn, R.C.

    2011-01-01

    Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of massively parallel sequencing (MPS), it is now possible to sequence microsatellite-enriched genomic libraries in multiplex pools. To test this approach, we prepared seven microsatellite-enriched, barcoded genomic libraries from diverse taxa (two conifer trees, five birds) and sequenced these on one lane of the Illumina Genome Analyzer using paired-end 80-bp reads. In this experiment, we screened 6.1 million sequences and identified 356958 unique microreads that contained di- or trinucleotide microsatellites. Examination of four species shows that our conversion rate from raw sequences to polymorphic markers compares favourably to Sanger- and 454-based methods. The advantage of multiplexed MPS is that the staggering capacity of modern microread sequencing is spread across many libraries; this reduces sample preparation and sequencing costs to less than $400 (USD) per species. This price is sufficiently low that microsatellite libraries could be prepared and sequenced for all 1373 organisms listed as 'threatened' and 'endangered' in the United States for under $0.5M (USD).

  8. Parallel computing simulation of electrical excitation and conduction in the 3D human heart.

    PubMed

    Di Yu; Dongping Du; Hui Yang; Yicheng Tu

    2014-01-01

    A correctly beating heart is important to ensure adequate circulation of blood throughout the body. Normal heart rhythm is produced by the orchestrated conduction of electrical signals throughout the heart. Cardiac electrical activity is the resulted function of a series of complex biochemical-mechanical reactions, which involves transportation and bio-distribution of ionic flows through a variety of biological ion channels. Cardiac arrhythmias are caused by the direct alteration of ion channel activity that results in changes in the AP waveform. In this work, we developed a whole-heart simulation model with the use of massive parallel computing with GPGPU and OpenGL. The simulation algorithm was implemented under several different versions for the purpose of comparisons, including one conventional CPU version and two GPU versions based on Nvidia CUDA platform. OpenGL was utilized for the visualization / interaction platform because it is open source, light weight and universally supported by various operating systems. The experimental results show that the GPU-based simulation outperforms the conventional CPU-based approach and significantly improves the speed of simulation. By adopting modern computer architecture, this present investigation enables real-time simulation and visualization of electrical excitation and conduction in the large and complicated 3D geometry of a real-world human heart.

  9. A Parallel Numerical Algorithm To Solve Linear Systems Of Equations Emerging From 3D Radiative Transfer

    NASA Astrophysics Data System (ADS)

    Wichert, Viktoria; Arkenberg, Mario; Hauschildt, Peter H.

    2016-10-01

    Highly resolved state-of-the-art 3D atmosphere simulations will remain computationally extremely expensive for years to come. In addition to the need for more computing power, rethinking coding practices is necessary. We take a dual approach by introducing especially adapted, parallel numerical methods and correspondingly parallelizing critical code passages. In the following, we present our respective work on PHOENIX/3D. With new parallel numerical algorithms, there is a big opportunity for improvement when iteratively solving the system of equations emerging from the operator splitting of the radiative transfer equation J = ΛS. The narrow-banded approximate Λ-operator Λ* , which is used in PHOENIX/3D, occurs in each iteration step. By implementing a numerical algorithm which takes advantage of its characteristic traits, the parallel code's efficiency is further increased and a speed-up in computational time can be achieved.

  10. 3D parallel inversion of time-domain airborne EM data

    NASA Astrophysics Data System (ADS)

    Liu, Yun-He; Yin, Chang-Chun; Ren, Xiu-Yan; Qiu, Chang-Kai

    2016-12-01

    To improve the inversion accuracy of time-domain airborne electromagnetic data, we propose a parallel 3D inversion algorithm for airborne EM data based on the direct Gauss-Newton optimization. Forward modeling is performed in the frequency domain based on the scattered secondary electrical field. Then, the inverse Fourier transform and convolution of the transmitting waveform are used to calculate the EM responses and the sensitivity matrix in the time domain for arbitrary transmitting waves. To optimize the computational time and memory requirements, we use the EM "footprint" concept to reduce the model size and obtain the sparse sensitivity matrix. To improve the 3D inversion, we use the OpenMP library and parallel computing. We test the proposed 3D parallel inversion code using two synthetic datasets and a field dataset. The time-domain airborne EM inversion results suggest that the proposed algorithm is effective, efficient, and practical.

  11. A massively parallel solution strategy for efficient thermal radiation simulation

    NASA Astrophysics Data System (ADS)

    Nguyen, P. D.; Moureau, V.; Vervisch, L.; Perret, N.

    2012-06-01

    A novel and efficient methodology to solve the Radiative Transfer Equations (RTE) in thermal radiation is discussed. The BiCGStab(2) iterative solution method, as designed for the non-symmetric linear equation systems, is used to solve the discretized RTE. The numerical upwind and central schemes are blended to provide a stable numerical scheme (MUCS) for interpolation of the cell facial radiation intensities in finite volume formulation. The combination of the BiCGStab(2) and MUCS methods proved to be very efficient when coupling with the DOM approach to solve the RTE. A cost-effective tabulation technique for the gaseous radiative property model SNB-FSCK using 7-point Gauss-Labatto quadrature scheme is also introduced. The whole methodology is implemented into a massively parallel unstructured CFD code where the radiative and fluid flow solutions share the same domain decomposition, which is the bottleneck in current radiative solvers. The dual mesh decomposition at the cell groups level and processors level is adopted to optimize the CFD code for massively parallel computing. The whole method is applied to simulate the radiation heat-transfer in a 3D rectangular enclosure containing non-isothermal CO2 and H2O mixtures. Two test cases are studied for homogeneous and inhomogeneous distributions of CO2 and H2O in the enclosure. The result is reported for the heat flux and radiation energy source and the comparison is also made between the present methodology BiCGStab(2)/MUCS/tabulated SNB-FSCK, the benchmark method SNB-CK (implemented at 25cm-1 narrow-band) and some other methods available in the literature. The present method (BiCGStab(2)/MUCS/tabulated SNB-FSCK) yields more accurate predictions particularly for the radiation source term. When comparing with the benchmark solution, the relative error of the radiation source term is remarkably reduced to less than 4% and the CPU time is drastically diminished.

  12. Parallel implementation of the FETI-DPEM algorithm for general 3D EM simulations

    NASA Astrophysics Data System (ADS)

    Li, Yu-Jia; Jin, Jian-Ming

    2009-05-01

    A parallel implementation of the electromagnetic dual-primal finite element tearing and interconnecting algorithm (FETI-DPEM) is designed for general three-dimensional (3D) electromagnetic large-scale simulations. As a domain decomposition implementation of the finite element method, the FETI-DPEM algorithm provides fully decoupled subdomain problems and an excellent numerical scalability, and thus is well suited for parallel computation. The parallel implementation of the FETI-DPEM algorithm on a distributed-memory system using the message passing interface (MPI) is discussed in detail along with a few practical guidelines obtained from numerical experiments. Numerical examples are provided to demonstrate the efficiency of the parallel implementation.

  13. Parallel Finite Element Solution of 3D Rayleigh-Benard-Marangoni Flows

    NASA Technical Reports Server (NTRS)

    Carey, G. F.; McLay, R.; Bicken, G.; Barth, B.; Pehlivanov, A.

    1999-01-01

    A domain decomposition strategy and parallel gradient-type iterative solution scheme have been developed and implemented for computation of complex 3D viscous flow problems involving heat transfer and surface tension effects. Details of the implementation issues are described together with associated performance and scalability studies. Representative Rayleigh-Benard and microgravity Marangoni flow calculations and performance results on the Cray T3D and T3E are presented. The work is currently being extended to tightly-coupled parallel "Beowulf-type" PC clusters and we present some preliminary performance results on this platform. We also describe progress on related work on hierarchic data extraction for visualization.

  14. The EMCC / DARPA Massively Parallel Electromagnetic Scattering Project

    NASA Technical Reports Server (NTRS)

    Woo, Alex C.; Hill, Kueichien C.

    1996-01-01

    The Electromagnetic Code Consortium (EMCC) was sponsored by the Advanced Research Program Agency (ARPA) to demonstrate the effectiveness of massively parallel computing in large scale radar signature predictions. The EMCC/ARPA project consisted of three parts.

  15. Linking 1D evolutionary to 3D hydrodynamical simulations of massive stars

    NASA Astrophysics Data System (ADS)

    Cristini, A.; Meakin, C.; Hirschi, R.; Arnett, D.; Georgy, C.; Viallet, M.

    2016-03-01

    Stellar evolution models of massive stars are important for many areas of astrophysics, for example nucleosynthesis yields, supernova progenitor models and understanding physics under extreme conditions. Turbulence occurs in stars primarily due to nuclear burning at different mass coordinates within the star. The understanding and correct treatment of turbulence and turbulent mixing at convective boundaries in stellar models has been studied for decades but still lacks a definitive solution. This paper presents initial results of a study on convective boundary mixing (CBM) in massive stars. The ‘stiffness’ of a convective boundary can be quantified using the bulk Richardson number ({{Ri}}{{B}}), the ratio of the potential energy for restoration of the boundary to the kinetic energy of turbulent eddies. A ‘stiff’ boundary ({{Ri}}{{B}}˜ {10}4) will suppress CBM, whereas in the opposite case a ‘soft’ boundary ({{Ri}}{{B}}˜ 10) will be more susceptible to CBM. One of the key results obtained so far is that lower convective boundaries (closer to the centre) of nuclear burning shells are ‘stiffer’ than the corresponding upper boundaries, implying limited CBM at lower shell boundaries. This is in agreement with 3D hydrodynamic simulations carried out by Meakin and Arnett (2007 Astrophys. J. 667 448-75). This result also has implications for new CBM prescriptions in massive stars as well as for nuclear burning flame front propagation in super-asymptotic giant branch stars and also the onset of novae.

  16. Experimental free-space optical network for massively parallel computers

    NASA Astrophysics Data System (ADS)

    Araki, S.; Kajita, M.; Kasahara, K.; Kubota, K.; Kurihara, K.; Redmond, I.; Schenfeld, E.; Suzaki, T.

    1996-03-01

    A free-space optical interconnection scheme is described for massively parallel processors based on the interconnection-cached network architecture. The optical network operates in a circuit-switching mode. Combined with a packet-switching operation among the circuit-switched optical channels, a high-bandwidth, low-latency network for massively parallel processing results. The design and assembly of a 64-channel experimental prototype is discussed, and operational results are presented.

  17. A Portable 3D FFT Package for Distributed-Memory Parallel Architectures

    NASA Technical Reports Server (NTRS)

    Ding, H. Q.; Ferraro, R. D.; Gennery, D. B.

    1995-01-01

    A parallel algorithm for 3D FFTs is implemented as a series of local 1D FFTs combined with data transposes. This allows the use of vendor supplied (often fully optimized) sequential 1D FFTs. The FFTs are carried out in-place by using an in-place data transpose across the processors.

  18. Three-dimensional radiative transfer on a massively parallel computer

    NASA Technical Reports Server (NTRS)

    Vath, H. M.

    1994-01-01

    We perform 3D radiative transfer calculations in non-local thermodynamic equilibrium (NLTE) in the simple two-level atom approximation on the Mas-Par MP-1, which contains 8192 processors and is a single instruction multiple data (SIMD) machine, an example of the new generation of massively parallel computers. On such a machine, all processors execute the same command at a given time, but on different data. To make radiative transfer calculations efficient, we must re-consider the numerical methods and storage of data. To solve the transfer equation, we adopt the short characteristic method and examine different acceleration methods to obtain the source function. We use the ALI method and test local and non-local operators. Furthermore, we compare the Ng and the orthomin methods of acceleration. We also investigate the use of multi-grid methods to get fast solutions for the NLTE case. In order to test these numerical methods, we apply them to two problems with and without periodic boundary conditions.

  19. Scan line graphics generation on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Dorband, John E.

    1988-01-01

    Described here is how researchers implemented a scan line graphics generation algorithm on the Massively Parallel Processor (MPP). Pixels are computed in parallel and their results are applied to the Z buffer in large groups. To perform pixel value calculations, facilitate load balancing across the processors and apply the results to the Z buffer efficiently in parallel requires special virtual routing (sort computation) techniques developed by the author especially for use on single-instruction multiple-data (SIMD) architectures.

  20. Advances in Domain Mapping of Massively Parallel Scientific Computations

    SciTech Connect

    Leland, Robert W.; Hendrickson, Bruce A.

    2015-10-01

    One of the most important concerns in parallel computing is the proper distribution of workload across processors. For most scientific applications on massively parallel machines, the best approach to this distribution is to employ data parallelism; that is, to break the datastructures supporting a computation into pieces and then to assign those pieces to different processors. Collectively, these partitioning and assignment tasks comprise the domain mapping problem.

  1. IMPAIR: massively parallel deconvolution on the GPU

    NASA Astrophysics Data System (ADS)

    Sherry, Michael; Shearer, Andy

    2013-02-01

    The IMPAIR software is a high throughput image deconvolution tool for processing large out-of-core datasets of images, varying from large images with spatially varying PSFs to large numbers of images with spatially invariant PSFs. IMPAIR implements a parallel version of the tried and tested Richardson-Lucy deconvolution algorithm regularised via a custom wavelet thresholding library. It exploits the inherently parallel nature of the convolution operation to achieve quality results on consumer grade hardware: through the NVIDIA Tesla GPU implementation, the multi-core OpenMP implementation, and the cluster computing MPI implementation of the software. IMPAIR aims to address the problem of parallel processing in both top-down and bottom-up approaches: by managing the input data at the image level, and by managing the execution at the instruction level. These combined techniques will lead to a scalable solution with minimal resource consumption and maximal load balancing. IMPAIR is being developed as both a stand-alone tool for image processing, and as a library which can be embedded into non-parallel code to transparently provide parallel high throughput deconvolution.

  2. EFFICIENT SCHEDULING OF PARALLEL JOBS ON MASSIVELY PARALLEL SYSTEMS

    SciTech Connect

    F. PETRINI; W. FENG

    1999-09-01

    We present buffered coscheduling, a new methodology to multitask parallel jobs in a message-passing environment and to develop parallel programs that can pave the way to the efficient implementation of a distributed operating system. Buffered coscheduling is based on three innovative techniques: communication buffering, strobing, and non-blocking communication. By leveraging these techniques, we can perform effective optimizations based on the global status of the parallel machine rather than on the limited knowledge available locally to each processor. The advantages of buffered coscheduling include higher resource utilization, reduced communication overhead, efficient implementation of low-control strategies and fault-tolerant protocols, accurate performance modeling, and a simplified yet still expressive parallel programming model. Preliminary experimental results show that buffered coscheduling is very effective in increasing the overall performance in the presence of load imbalance and communication-intensive workloads.

  3. Massively parallel sequencing: the next big thing in genetic medicine.

    PubMed

    Tucker, Tracy; Marra, Marco; Friedman, Jan M

    2009-08-01

    Massively parallel sequencing has reduced the cost and increased the throughput of genomic sequencing by more than three orders of magnitude, and it seems likely that costs will fall and throughput improve even more in the next few years. Clinical use of massively parallel sequencing will provide a way to identify the cause of many diseases of unknown etiology through simultaneous screening of thousands of loci for pathogenic mutations and by sequencing biological specimens for the genomic signatures of novel infectious agents. In addition to providing these entirely new diagnostic capabilities, massively parallel sequencing may also replace arrays and Sanger sequencing in clinical applications where they are currently being used. Routine clinical use of massively parallel sequencing will require higher accuracy, better ways to select genomic subsets of interest, and improvements in the functionality, speed, and ease of use of data analysis software. In addition, substantial enhancements in laboratory computer infrastructure, data storage, and data transfer capacity will be needed to handle the extremely large data sets produced. Clinicians and laboratory personnel will require training to use the sequence data effectively, and appropriate methods will need to be developed to deal with the incidental discovery of pathogenic mutations and variants of uncertain clinical significance. Massively parallel sequencing has the potential to transform the practice of medical genetics and related fields, but the vast amount of personal genomic data produced will increase the responsibility of geneticists to ensure that the information obtained is used in a medically and socially responsible manner.

  4. High-speed 3D imaging by parallel phase-shifting digital holography

    NASA Astrophysics Data System (ADS)

    Awatsuji, Yasuhiro; Xia, Peng; Matoba, Osamu

    2015-07-01

    As a high-speed three-dimensional (3D) imaging technique, parallel phase-shifting digital holography is presented. This technique records a single hologram of an object with an image sensor having a phase-shift array device and reconstructs the instantaneous 3D image of the object with a computer. In this technique, a single hologram in which the multiple holograms required for phase-shifting digital holography are multiplexed by using space-division multiplexing technique pixel by pixel. Also, we present a high-speed parallel phase-shifting digital holography system. The system consists of an interferometer, a continuous-wave laser, and a high-speed polarization imaging camera. Motion pictures of dynamic phenomena at the rate of up to 1,000,000 frames per second have been achieved by the high-speed system.

  5. Gust Acoustics Computation with a Space-Time CE/SE Parallel 3D Solver

    NASA Technical Reports Server (NTRS)

    Wang, X. Y.; Himansu, A.; Chang, S. C.; Jorgenson, P. C. E.; Reddy, D. R. (Technical Monitor)

    2002-01-01

    The benchmark Problem 2 in Category 3 of the Third Computational Aero-Acoustics (CAA) Workshop is solved using the space-time conservation element and solution element (CE/SE) method. This problem concerns the unsteady response of an isolated finite-span swept flat-plate airfoil bounded by two parallel walls to an incident gust. The acoustic field generated by the interaction of the gust with the flat-plate airfoil is computed by solving the 3D (three-dimensional) Euler equations in the time domain using a parallel version of a 3D CE/SE solver. The effect of the gust orientation on the far-field directivity is studied. Numerical solutions are presented and compared with analytical solutions, showing a reasonable agreement.

  6. DANTSYS/MPI: a system for 3-D deterministic transport on parallel architectures

    SciTech Connect

    Baker, R.S.; Alcouffe, R.E.

    1996-12-31

    Since 1994, we have been using a data parallel form of our deterministic transport code DANTSYS to perform time-independent fixed source and eigenvalue calculations on the CM-200`s at Los Alamos National Laboratory (LANL). Parallelization of the transport sweep is obtained by using a 2-D spatial decomposition which retains the ability to invert the source iteration equation in a single iteration (i.e., the diagonal plane sweep). We have now implemented a message passing version of DANTSYS, referred to as DANTSYS/MPI, on the Cray T3D installed at Los Alamos in 1995. By taking advantage of the SPMD (Single Program, Multiple Data) architecture of the Cray T3D, as well as its low latency communications network, we have managed to achieve grind times (time to solve a single cell in phase space) of less than 10 nanoseconds on the 512 PE (Processing Element) T3D, as opposed to typical grind times of 150-200 nanoseconds on a 2048 PE CM-200, or 300-400 nanoseconds on a single PE of a Cray Y-MP. In addition, we have also parallelized the Diffusion Synthetic Accelerator (DSA) equations which are used to accelerate the convergence of the transport equation. DANTSYS/MPI currently runs on traditional Cray PVP`s and the Cray T3D, and it`s computational kernel (Sweep3D) has been ported to and tested on an array of SGI SMP`s (Symmetric Memory Processors), a network of IBM 590 workstations, an IBM SP2, and the Intel TFLOPs machine at Sandia National Laboratory. This paper describes the implementation of DANTSYS/MPI on the Cray T3D, and presents a simple performance model which accurately predicts the grind time as a function of the number of PE`s and problem size, or scalability. This paper also describes the parallel implementation and performance of the elliptic solver used in DANTSYS/MPI for solving the synthetic acceleration equations.

  7. Contact-impact simulations on massively parallel SIMD supercomputers

    SciTech Connect

    Plaskacz, E.J. ); Belytscko, T.; Chiang, H.Y. )

    1992-01-01

    The implementation of explicit finite element methods with contact-impact on massively parallel SIMD computers is described. The basic parallel finite element algorithm employs an exchange process which minimizes interprocessor communication at the expense of redundant computations and storage. The contact-impact algorithm is based on the pinball method in which compatibility is enforced by preventing interpenetration on spheres embedded in elements adjacent to surfaces. The enhancements to the pinball algorithm include a parallel assembled surface normal algorithm and a parallel detection of interpenetrating pairs. Some timings with and without contact-impact are given.

  8. Staging memory for massively parallel processor

    NASA Technical Reports Server (NTRS)

    Batcher, Kenneth E. (Inventor)

    1988-01-01

    The invention herein relates to a computer organization capable of rapidly processing extremely large volumes of data. A staging memory is provided having a main stager portion consisting of a large number of memory banks which are accessed in parallel to receive, store, and transfer data words simultaneous with each other. Substager portions interconnect with the main stager portion to match input and output data formats with the data format of the main stager portion. An address generator is coded for accessing the data banks for receiving or transferring the appropriate words. Input and output permutation networks arrange the lineal order of data into and out of the memory banks.

  9. Developing a Massively Parallel Forward Projection Radiography Model for Large-Scale Industrial Applications

    SciTech Connect

    Bauerle, Matthew

    2014-08-01

    This project utilizes Graphics Processing Units (GPUs) to compute radiograph simulations for arbitrary objects. The generation of radiographs, also known as the forward projection imaging model, is computationally intensive and not widely utilized. The goal of this research is to develop a massively parallel algorithm that can compute forward projections for objects with a trillion voxels (3D pixels). To achieve this end, the data are divided into blocks that can each t into GPU memory. The forward projected image is also divided into segments to allow for future parallelization and to avoid needless computations.

  10. Visualizing Network Traffic to Understand the Performance of Massively Parallel Simulations.

    PubMed

    Landge, A G; Levine, J A; Bhatele, A; Isaacs, K E; Gamblin, T; Schulz, M; Langer, S H; Bremer, Peer-Timo; Pascucci, V

    2012-12-01

    The performance of massively parallel applications is often heavily impacted by the cost of communication among compute nodes. However, determining how to best use the network is a formidable task, made challenging by the ever increasing size and complexity of modern supercomputers. This paper applies visualization techniques to aid parallel application developers in understanding the network activity by enabling a detailed exploration of the flow of packets through the hardware interconnect. In order to visualize this large and complex data, we employ two linked views of the hardware network. The first is a 2D view, that represents the network structure as one of several simplified planar projections. This view is designed to allow a user to easily identify trends and patterns in the network traffic. The second is a 3D view that augments the 2D view by preserving the physical network topology and providing a context that is familiar to the application developers. Using the massively parallel multi-physics code pF3D as a case study, we demonstrate that our tool provides valuable insight that we use to explain and optimize pF3D's performance on an IBM Blue Gene/P system.

  11. Parallel Imaging of 3D Surface Profile with Space-Division Multiplexing

    PubMed Central

    Lee, Hyung Seok; Cho, Soon-Woo; Kim, Gyeong Hun; Jeong, Myung Yung; Won, Young Jae; Kim, Chang-Seok

    2016-01-01

    We have developed a modified optical frequency domain imaging (OFDI) system that performs parallel imaging of three-dimensional (3D) surface profiles by using the space division multiplexing (SDM) method with dual-area swept sourced beams. We have also demonstrated that 3D surface information for two different areas could be well obtained in a same time with only one camera by our method. In this study, double field of views (FOVs) of 11.16 mm × 5.92 mm were achieved within 0.5 s. Height range for each FOV was 460 µm and axial and transverse resolutions were 3.6 and 5.52 µm, respectively. PMID:26805840

  12. Squire's transformation and 3D Optimal Perturbations in Bounded Parallel Shear Flows

    NASA Astrophysics Data System (ADS)

    Chomaz, Jean-Marc; Soundar Jerome, J. John

    2011-11-01

    The aim of this short communication is to present the implication of Squire's transformation on the optimal transient growth of arbitrary 3D disturbances in parallel shear flow bounded in the cross-stream direction. To our best knowledge this simple property has never been discussed before. In particular it allows to express the long-time optimal growth for perturbations of arbitrary wavenumbers as the product of the gains from the 2D optimal at a lower Reynolds number itself due to the Orr-mechanism by a term that may be identified as due to the lift-up mechanism. This property predict scalings for the 3D optimal perturbation well verified by direct computation. It may be extended to take into account buoyancy effect.

  13. Parallel graph search: application to intraretinal layer segmentation of 3D macular OCT scans

    NASA Astrophysics Data System (ADS)

    Lee, Kyungmoo; Abràmoff, Michael D.; Garvin, Mona K.; Sonka, Milan

    2012-02-01

    Image segmentation is of paramount importance for quantitative analysis of medical image data. Recently, a 3-D graph search method which can detect globally optimal interacting surfaces with respect to the cost function of volumetric images has been introduced, and its utility demonstrated in several application areas. Although the method provides excellent segmentation accuracy, its limitation is a slow processing speed when many surfaces are simultaneously segmented in large volumetric datasets. Here, we propose a novel method of parallel graph search, which overcomes the limitation and allows the quick detection of multiple surfaces. To demonstrate the obtained performance with respect to segmentation accuracy and processing speedup, the new approach was applied to retinal optical coherence tomography (OCT) image data and compared with the performance of the former non-parallel method. Our parallel graph search methods for single and double surface detection are approximately 267 and 181 times faster than the original graph search approach in 5 macular OCT volumes (200 x 5 x 1024 voxels) acquired from the right eyes of 5 normal subjects. The resulting segmentation differences were small as demonstrated by the mean unsigned differences between the non-parallel and parallel methods of 0.0 +/- 0.0 voxels (0.0 +/- 0.0 μm) and 0.27 +/- 0.34 voxels (0.53 +/- 0.66 μm) for the single- and dual-surface approaches, respectively.

  14. 3-D Hybrid Simulation of Quasi-Parallel Bow Shock and Its Effects on the Magnetosphere

    SciTech Connect

    Lin, Y.; Wang, X.Y.

    2005-08-01

    A three-dimensional (3-D) global-scale hybrid simulation is carried out for the structure of the quasi-parallel bow shock, in particular the foreshock waves and pressure pulses. The wave evolution and interaction with the dayside magnetosphere are discussed. It is shown that diamagnetic cavities are generated in the turbulent foreshock due to the ion beam plasma interaction, and these compressional pulses lead to strong surface perturbations at the magnetopause and Alfven waves/field line resonance in the magnetosphere.

  15. The development of a scalable parallel 3-D CFD algorithm for turbomachinery. M.S. Thesis Final Report

    NASA Technical Reports Server (NTRS)

    Luke, Edward Allen

    1993-01-01

    Two algorithms capable of computing a transonic 3-D inviscid flow field about rotating machines are considered for parallel implementation. During the study of these algorithms, a significant new method of measuring the performance of parallel algorithms is developed. The theory that supports this new method creates an empirical definition of scalable parallel algorithms that is used to produce quantifiable evidence that a scalable parallel application was developed. The implementation of the parallel application and an automated domain decomposition tool are also discussed.

  16. Parallel implementation of 3D FFT with volumetric decomposition schemes for efficient molecular dynamics simulations

    NASA Astrophysics Data System (ADS)

    Jung, Jaewoon; Kobayashi, Chigusa; Imamura, Toshiyuki; Sugita, Yuji

    2016-03-01

    Three-dimensional Fast Fourier Transform (3D FFT) plays an important role in a wide variety of computer simulations and data analyses, including molecular dynamics (MD) simulations. In this study, we develop hybrid (MPI+OpenMP) parallelization schemes of 3D FFT based on two new volumetric decompositions, mainly for the particle mesh Ewald (PME) calculation in MD simulations. In one scheme, (1d_Alltoall), five all-to-all communications in one dimension are carried out, and in the other, (2d_Alltoall), one two-dimensional all-to-all communication is combined with two all-to-all communications in one dimension. 2d_Alltoall is similar to the conventional volumetric decomposition scheme. We performed benchmark tests of 3D FFT for the systems with different grid sizes using a large number of processors on the K computer in RIKEN AICS. The two schemes show comparable performances, and are better than existing 3D FFTs. The performances of 1d_Alltoall and 2d_Alltoall depend on the supercomputer network system and number of processors in each dimension. There is enough leeway for users to optimize performance for their conditions. In the PME method, short-range real-space interactions as well as long-range reciprocal-space interactions are calculated. Our volumetric decomposition schemes are particularly useful when used in conjunction with the recently developed midpoint cell method for short-range interactions, due to the same decompositions of real and reciprocal spaces. The 1d_Alltoall scheme of 3D FFT takes 4.7 ms to simulate one MD cycle for a virus system containing more than 1 million atoms using 32,768 cores on the K computer.

  17. Solving unstructured grid problems on massively parallel computers

    NASA Technical Reports Server (NTRS)

    Hammond, Steven W.; Schreiber, Robert

    1990-01-01

    A highly parallel graph mapping technique that enables one to efficiently solve unstructured grid problems on massively parallel computers is presented. Many implicit and explicit methods for solving discretized partial differential equations require each point in the discretization to exchange data with its neighboring points every time step or iteration. The cost of this communication can negate the high performance promised by massively parallel computing. To eliminate this bottleneck, the graph of the irregular problem is mapped into the graph representing the interconnection topology of the computer such that the sum of the distances that the messages travel is minimized. It is shown that using the heuristic mapping algorithm significantly reduces the communication time compared to a naive assignment of processes to processors.

  18. Parallel 3D Finite Element Numerical Modelling of DC Electron Guns

    SciTech Connect

    Prudencio, E.; Candel, A.; Ge, L.; Kabel, A.; Ko, K.; Lee, L.; Li, Z.; Ng, C.; Schussman, G.; /SLAC

    2008-02-04

    In this paper we present Gun3P, a parallel 3D finite element application that the Advanced Computations Department at the Stanford Linear Accelerator Center is developing for the analysis of beam formation in DC guns and beam transport in klystrons. Gun3P is targeted specially to complex geometries that cannot be described by 2D models and cannot be easily handled by finite difference discretizations. Its parallel capability allows simulations with more accuracy and less processing time than packages currently available. We present simulation results for the L-band Sheet Beam Klystron DC gun, in which case Gun3P is able to reduce simulation time from days to some hours.

  19. Billion-atom synchronous parallel kinetic Monte Carlo simulations of critical 3D Ising systems

    SciTech Connect

    Martinez, E.; Monasterio, P.R.; Marian, J.

    2011-02-20

    An extension of the synchronous parallel kinetic Monte Carlo (spkMC) algorithm developed by Martinez et al. [J. Comp. Phys. 227 (2008) 3804] to discrete lattices is presented. The method solves the master equation synchronously by recourse to null events that keep all processors' time clocks current in a global sense. Boundary conflicts are resolved by adopting a chessboard decomposition into non-interacting sublattices. We find that the bias introduced by the spatial correlations attendant to the sublattice decomposition is within the standard deviation of serial calculations, which confirms the statistical validity of our algorithm. We have analyzed the parallel efficiency of spkMC and find that it scales consistently with problem size and sublattice partition. We apply the method to the calculation of scale-dependent critical exponents in billion-atom 3D Ising systems, with very good agreement with state-of-the-art multispin simulations.

  20. Assessing the performance of a parallel MATLAB-based 3D convection code

    NASA Astrophysics Data System (ADS)

    Kirkpatrick, G. J.; Hasenclever, J.; Phipps Morgan, J.; Shi, C.

    2008-12-01

    We are currently building 2D and 3D MATLAB-based parallel finite element codes for mantle convection and melting. The codes use the MATLAB implementation of core MPI commands (eg. Send, Receive, Broadcast) for message passing between computational subdomains. We have found that code development and algorithm testing are much faster in MATLAB than in our previous work coding in C or FORTRAN, this code was built from scratch with only 12 man-months of effort. The one extra cost w.r.t. C coding on a Beowulf cluster is the cost of the parallel MATLAB license for a >4core cluster. Here we present some preliminary results on the efficiency of MPI messaging in MATLAB on a small 4 machine, 16core, 32Gb RAM Intel Q6600 processor-based cluster. Our code implements fully parallelized preconditioned conjugate gradients with a multigrid preconditioner. Our parallel viscous flow solver is currently 20% slower for a 1,000,000 DOF problem on a single core in 2D as the direct solve MILAMIN MATLAB viscous flow solver. We have tested both continuous and discontinuous pressure formulations. We test with various configurations of network hardware, CPU speeds, and memory using our own and MATLAB's built in cluster profiler. So far we have only explored relatively small (up to 1.6GB RAM) test problems. We find that with our current code and Intel memory controller bandwidth limitations we can only get ~2.3 times performance out of 4 cores than 1 core per machine. Even for these small problems the code runs faster with message passing between 4 machines with one core each than 1 machine with 4 cores and internal messaging (1.29x slower), or 1 core (2.15x slower). It surprised us that for 2D ~1GB-sized problems with only 3 multigrid levels, the direct- solve on the coarsest mesh consumes comparable time to the iterative solve on the finest mesh - a penalty that is greatly reduced either by using a 4th multigrid level or by using an iterative solve at the coarsest grid level. We plan to

  1. A 3-D liver segmentation method with parallel computing for selective internal radiation therapy.

    PubMed

    Goryawala, Mohammed; Guillen, Magno R; Cabrerizo, Mercedes; Barreto, Armando; Gulec, Seza; Barot, Tushar C; Suthar, Rekha R; Bhatt, Ruchir N; Mcgoron, Anthony; Adjouadi, Malek

    2012-01-01

    This study describes a new 3-D liver segmentation method in support of the selective internal radiation treatment as a treatment for liver tumors. This 3-D segmentation is based on coupling a modified k-means segmentation method with a special localized contouring algorithm. In the segmentation process, five separate regions are identified on the computerized tomography image frames. The merit of the proposed method lays in its potential to provide fast and accurate liver segmentation and 3-D rendering as well as in delineating tumor region(s), all with minimal user interaction. Leveraging of multicore platforms is shown to speed up the processing of medical images considerably, making this method more suitable in clinical settings. Experiments were performed to assess the effect of parallelization using up to 442 slices. Empirical results, using a single workstation, show a reduction in processing time from 4.5 h to almost 1 h for a 78% gain. Most important is the accuracy achieved in estimating the volumes of the liver and tumor region(s), yielding an average error of less than 2% in volume estimation over volumes generated on the basis of the current manually guided segmentation processes. Results were assessed using the analysis of variance statistical analysis.

  2. BioFVM: an efficient, parallelized diffusive transport solver for 3-D biological simulations

    PubMed Central

    Ghaffarizadeh, Ahmadreza; Friedman, Samuel H.; Macklin, Paul

    2016-01-01

    Motivation: Computational models of multicellular systems require solving systems of PDEs for release, uptake, decay and diffusion of multiple substrates in 3D, particularly when incorporating the impact of drugs, growth substrates and signaling factors on cell receptors and subcellular systems biology. Results: We introduce BioFVM, a diffusive transport solver tailored to biological problems. BioFVM can simulate release and uptake of many substrates by cell and bulk sources, diffusion and decay in large 3D domains. It has been parallelized with OpenMP, allowing efficient simulations on desktop workstations or single supercomputer nodes. The code is stable even for large time steps, with linear computational cost scalings. Solutions are first-order accurate in time and second-order accurate in space. The code can be run by itself or as part of a larger simulator. Availability and implementation: BioFVM is written in C ++ with parallelization in OpenMP. It is maintained and available for download at http://BioFVM.MathCancer.org and http://BioFVM.sf.net under the Apache License (v2.0). Contact: paul.macklin@usc.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26656933

  3. The language parallel Pascal and other aspects of the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Reeves, A. P.; Bruner, J. D.

    1982-01-01

    A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.

  4. Development of massively parallel quantum chemistry program SMASH

    SciTech Connect

    Ishimura, Kazuya

    2015-12-31

    A massively parallel program for quantum chemistry calculations SMASH was released under the Apache License 2.0 in September 2014. The SMASH program is written in the Fortran90/95 language with MPI and OpenMP standards for parallelization. Frequently used routines, such as one- and two-electron integral calculations, are modularized to make program developments simple. The speed-up of the B3LYP energy calculation for (C{sub 150}H{sub 30}){sub 2} with the cc-pVDZ basis set (4500 basis functions) was 50,499 on 98,304 cores of the K computer.

  5. Solving mazes with memristors: A massively parallel approach

    NASA Astrophysics Data System (ADS)

    Pershin, Yuriy V.; di Ventra, Massimiliano

    2011-10-01

    Solving mazes is not just a fun pastime: They are prototype models in several areas of science and technology. However, when maze complexity increases, their solution becomes cumbersome and very time consuming. Here, we show that a network of memristors—resistors with memory—can solve such a nontrivial problem quite easily. In particular, maze solving by the network of memristors occurs in a massively parallel fashion since all memristors in the network participate simultaneously in the calculation. The result of the calculation is then recorded into the memristors’ states and can be used and/or recovered at a later time. Furthermore, the network of memristors finds all possible solutions in multiple-solution mazes and sorts out the solution paths according to their length. Our results demonstrate not only the application of memristive networks to the field of massively parallel computing, but also an algorithm to solve mazes, which could find applications in different fields.

  6. Massively parallel sequencing, a new method for detecting adventitious agents.

    PubMed

    Onions, David; Kolman, John

    2010-05-01

    There has been an upsurge of interest in developing new veterinary and human vaccines and, in turn, this has involved the development of new mammalian and insect cell substrates. Excluding adventitious agents from these cells can be problematic, particularly for cells derived from species with limited virological investigation. Massively parallel sequencing is a powerful new method for the identification of viruses and other adventitious agents, without prior knowledge of the nature of the agent. We have developed methods using random priming to detect viruses in the supernatants from cell substrates or in virus seed stocks. Using these methods we have recently discovered a new parvovirus in bovine serum. When applied to sequencing the transcriptome, massively parallel sequencing can reveal latent or silent infections. Enormous amounts of data are developed in this process usually between 100 and 400 Mbp. Consequently, sophisticated bioinformatic algorithms are required to analyse and verify virus targets.

  7. Nearest Neighbor Search Applications for the Terasys Massively Parallel Workstation.

    DTIC Science & Technology

    1996-08-01

    ANALYSES Nearest Neighbor Search Applications for the Terasys Massively Parallel Workstation Eric W. Johnson 00 ^«fciasn» ^ «RBOüSQ...thank Harold E. Conn for help with Tera- sys programming and for running the Terasys tests described in this report. The author would like to thank...USING THE TERASYS TO FIND NEAREST NEIGHBORS 9 4.1 Placing Sample Records in the Terasys 9 4.2 Placing Training Records in the Terasys 10 4.3

  8. Massively parallel Wang-Landau sampling on multiple GPUs

    NASA Astrophysics Data System (ADS)

    Yin, Junqi; Landau, D. P.

    2012-08-01

    Wang-Landau sampling is implemented on the Graphics Processing Unit (GPU) with the Compute Unified Device Architecture (CUDA). Performances on three different GPU cards, including the new generation Fermi architecture card, are compared with that on a Central Processing Unit (CPU). The parameters for massively parallel Wang-Landau sampling are tuned in order to achieve fast convergence. For simulations of the water cluster systems, we obtain an average of over 50 times speedup for a given workload.

  9. A massively parallel corpus: the Bible in 100 languages.

    PubMed

    Christodouloupoulos, Christos; Steedman, Mark

    We describe the creation of a massively parallel corpus based on 100 translations of the Bible. We discuss some of the difficulties in acquiring and processing the raw material as well as the potential of the Bible as a corpus for natural language processing. Finally we present a statistical analysis of the corpora collected and a detailed comparison between the English translation and other English corpora.

  10. Massively parallel Wang Landau sampling on multiple GPUs

    SciTech Connect

    Yin, Junqi; Landau, D. P.

    2012-01-01

    Wang Landau sampling is implemented on the Graphics Processing Unit (GPU) with the Compute Unified Device Architecture (CUDA). Performances on three different GPU cards, including the new generation Fermi architecture card, are compared with that on a Central Processing Unit (CPU). The parameters for massively parallel Wang Landau sampling are tuned in order to achieve fast convergence. For simulations of the water cluster systems, we obtain an average of over 50 times speedup for a given workload.

  11. Development of a Massively Parallel NOGAPS Forecast Model

    DTIC Science & Technology

    2016-06-07

    to exploit massively parallel processor (MPP), distributed memory computer architectures. Future increases in computer power from MPP’s will allow...substantial increases in model resolution, more realistic physical processes, and more sophisticated data assimilation methods, all of which will...immediate objective of the project is to redesign the model’s numerical algorithms and data structures to allow efficient execution on MPP

  12. TSE computers - A means for massively parallel computations

    NASA Technical Reports Server (NTRS)

    Strong, J. P., III

    1976-01-01

    A description is presented of hardware concepts for building a massively parallel processing system for two-dimensional data. The processing system is to use logic arrays of 128 x 128 elements which perform over 16 thousand operations simultaneously. Attention is given to image data, logic arrays, basic image logic functions, a prototype negator, an interleaver device, image logic circuits, and an image memory circuit.

  13. Interviews3D: a platform for interactive handling of massive data sets.

    PubMed

    Brüderlin, Beat; Heyer, Mathias; Pfützner, Sebastian

    2007-01-01

    Interviews3D confronts the CAD data explosion problem with new approaches that let users process fully detailed 3D models in real time, even on standard PCs or 32-bit laptops. It has demonstrated efficiency in real-time rendering and clash detection for automotive, aerospace, and construction models too large for previous systems.

  14. Massive parallel simulation of phenomena in condensed matter at high energy density

    NASA Astrophysics Data System (ADS)

    Fortov, Vladimir

    2005-03-01

    This talk deals with computational hydrodynamics, advanced material properties and phenomena at high energy density. New results of massive parallel 3D simulation done with method of individual particles in cells have been obtained. The gas dynamic code includes advanced physical models of matter such as multi-phase equations of state, elastic-plastic, spallation, optic properties and ion-beams stopping. Investigated are the influence on hypervelocity impact processes effects of equation of state, elastic-plastic and spallation. We also report results of numerical modeling of the action of intense heavy ion beams on metallic targets in comparison with new experimental data.

  15. Requirements for supercomputing in energy research: The transition to massively parallel computing

    SciTech Connect

    Not Available

    1993-02-01

    This report discusses: The emergence of a practical path to TeraFlop computing and beyond; requirements of energy research programs at DOE; implementation: supercomputer production computing environment on massively parallel computers; and implementation: user transition to massively parallel computing.

  16. The 2nd Symposium on the Frontiers of Massively Parallel Computations

    NASA Technical Reports Server (NTRS)

    Mills, Ronnie (Editor)

    1988-01-01

    Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.

  17. A parallel 3-D discrete wavelet transform architecture using pipelined lifting scheme approach for video coding

    NASA Astrophysics Data System (ADS)

    Hegde, Ganapathi; Vaya, Pukhraj

    2013-10-01

    This article presents a parallel architecture for 3-D discrete wavelet transform (3-DDWT). The proposed design is based on the 1-D pipelined lifting scheme. The architecture is fully scalable beyond the present coherent Daubechies filter bank (9, 7). This 3-DDWT architecture has advantages such as no group of pictures restriction and reduced memory referencing. It offers low power consumption, low latency and high throughput. The computing technique is based on the concept that lifting scheme minimises the storage requirement. The application specific integrated circuit implementation of the proposed architecture is done by synthesising it using 65 nm Taiwan Semiconductor Manufacturing Company standard cell library. It offers a speed of 486 MHz with a power consumption of 2.56 mW. This architecture is suitable for real-time video compression even with large frame dimensions.

  18. Massively parallel processing of remotely sensed hyperspectral images

    NASA Astrophysics Data System (ADS)

    Plaza, Javier; Plaza, Antonio; Valencia, David; Paz, Abel

    2009-08-01

    In this paper, we develop several parallel techniques for hyperspectral image processing that have been specifically designed to be run on massively parallel systems. The techniques developed cover the three relevant areas of hyperspectral image processing: 1) spectral mixture analysis, a popular approach to characterize mixed pixels in hyperspectral data addressed in this work via efficient implementation of a morphological algorithm for automatic identification of pure spectral signatures or endmembers from the input data; 2) supervised classification of hyperspectral data using multi-layer perceptron neural networks with back-propagation learning; and 3) automatic target detection in the hyperspectral data using orthogonal subspace projection concepts. The scalability of the proposed parallel techniques is investigated using Barcelona Supercomputing Center's MareNostrum facility, one of the most powerful supercomputers in Europe.

  19. Frame-like gauge-invariant description of massive fermionic higher spins in 3D

    NASA Astrophysics Data System (ADS)

    Permiakova, M. Yu.; Snegirev, T. V.

    2017-03-01

    We give the frame-like gauge-invariant Lagrangian description for massive fermionic arbitrary spin fields in three-dimensional AdS space. The Lagrangian, complete set of gauge transformations and gauge-invariant curvatures are obtained.

  20. 3-D inversion of airborne electromagnetic data parallelized and accelerated by local mesh and adaptive soundings

    NASA Astrophysics Data System (ADS)

    Yang, Dikun; Oldenburg, Douglas W.; Haber, Eldad

    2014-03-01

    Airborne electromagnetic (AEM) methods are highly efficient tools for assessing the Earth's conductivity structures in a large area at low cost. However, the configuration of AEM measurements, which typically have widely distributed transmitter-receiver pairs, makes the rigorous modelling and interpretation extremely time-consuming in 3-D. Excessive overcomputing can occur when working on a large mesh covering the entire survey area and inverting all soundings in the data set. We propose two improvements. The first is to use a locally optimized mesh for each AEM sounding for the forward modelling and calculation of sensitivity. This dedicated local mesh is small with fine cells near the sounding location and coarse cells far away in accordance with EM diffusion and the geometric decay of the signals. Once the forward problem is solved on the local meshes, the sensitivity for the inversion on the global mesh is available through quick interpolation. Using local meshes for AEM forward modelling avoids unnecessary computing on fine cells on a global mesh that are far away from the sounding location. Since local meshes are highly independent, the forward modelling can be efficiently parallelized over an array of processors. The second improvement is random and dynamic down-sampling of the soundings. Each inversion iteration only uses a random subset of the soundings, and the subset is reselected for every iteration. The number of soundings in the random subset, determined by an adaptive algorithm, is tied to the degree of model regularization. This minimizes the overcomputing caused by working with redundant soundings. Our methods are compared against conventional methods and tested with a synthetic example. We also invert a field data set that was previously considered to be too large to be practically inverted in 3-D. These examples show that our methodology can dramatically reduce the processing time of 3-D inversion to a practical level without losing resolution

  1. Parallel robot for micro assembly with integrated innovative optical 3D-sensor

    NASA Astrophysics Data System (ADS)

    Hesselbach, Juergen; Ispas, Diana; Pokar, Gero; Soetebier, Sven; Tutsch, Rainer

    2002-10-01

    Recent advances in the fields of MEMS and MOEMS often require precise assembly of very small parts with an accuracy of a few microns. In order to meet this demand, a new approach using a robot based on parallel mechanisms in combination with a novel 3D-vision system has been chosen. The planar parallel robot structure with 2 DOF provides a high resolution in the XY-plane. It carries two additional serial axes for linear and rotational movement in/about z direction. In order to achieve high precision as well as good dynamic capabilities, the drive concept for the parallel (main) axes incorporates air bearings in combination with a linear electric servo motors. High accuracy position feedback is provided by optical encoders with a resolution of 0.1 μm. To allow for visualization and visual control of assembly processes, a camera module fits into the hollow tool head. It consists of a miniature CCD camera and a light source. In addition a modular gripper support is integrated into the tool head. To increase the accuracy a control loop based on an optoelectronic sensor will be implemented. As a result of an in-depth analysis of different approaches a photogrammetric system using one single camera and special beam-splitting optics was chosen. A pattern of elliptical marks is applied to the surfaces of workpiece and gripper. Using a model-based recognition algorithm the image processing software identifies the gripper and the workpiece and determines their relative position. A deviation vector is calculated and fed into the robot control to guide the gripper.

  2. Bit-parallel arithmetic in a massively-parallel associative processor

    NASA Technical Reports Server (NTRS)

    Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.

    1992-01-01

    A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.

  3. 3D Spectroscopy Unveils Massive Galaxy Formation Modes at High-z

    NASA Astrophysics Data System (ADS)

    Buitrago, F.; Conselice, C. J.; Epinat, B.; Bedregal, A. G.; Trujillo, I.; Grützbauch, R.

    Massive (stellar mass ≥ 1011 M ⊙) galaxies at high redshift (z ≥ 1. 5) remain mysterious objects. Their extremely small sizes (effective radii of 1 - 2 kpc) make them as dense as globular clusters, whereas in the present day Universe similar mass systems are large with old and metal-rich stellar populations. In order to explore this development, we present near-IR IFU observations with SINFONI@VLT for ten massive galaxies at z ˜ 1. 4 solely selected by their high stellar mass which allows us to retrieve velocity dispersions, kinematic maps and dynamical masses. We join this with imaging from the GOODS NICMOS Survey (GNS), which was carried out by our group, and which is the largest sample of massive galaxies (80 objects) at high redshift (1. 7 < z < 3) to date. With these data we show how their morphology changes, possibly as a result of minor merging events also seen in the kinematics.

  4. Comparison of 3-D synthetic aperture phased-array ultrasound imaging and parallel beamforming.

    PubMed

    Rasmussen, Morten Fischer; Jensen, Jørgen Arendt

    2014-10-01

    This paper demonstrates that synthetic aperture imaging (SAI) can be used to achieve real-time 3-D ultrasound phased-array imaging. It investigates whether SAI increases the image quality compared with the parallel beamforming (PB) technique for real-time 3-D imaging. Data are obtained using both simulations and measurements with an ultrasound research scanner and a commercially available 3.5- MHz 1024-element 2-D transducer array. To limit the probe cable thickness, 256 active elements are used in transmit and receive for both techniques. The two imaging techniques were designed for cardiac imaging, which requires sequences designed for imaging down to 15 cm of depth and a frame rate of at least 20 Hz. The imaging quality of the two techniques is investigated through simulations as a function of depth and angle. SAI improved the full-width at half-maximum (FWHM) at low steering angles by 35%, and the 20-dB cystic resolution by up to 62%. The FWHM of the measured line spread function (LSF) at 80 mm depth showed a difference of 20% in favor of SAI. SAI reduced the cyst radius at 60 mm depth by 39% in measurements. SAI improved the contrast-to-noise ratio measured on anechoic cysts embedded in a tissue-mimicking material by 29% at 70 mm depth. The estimated penetration depth on the same tissue-mimicking phantom shows that SAI increased the penetration by 24% compared with PB. Neither SAI nor PB achieved the design goal of 15 cm penetration depth. This is likely due to the limited transducer surface area and a low SNR of the experimental scanner used.

  5. 3D Kirchhoff depth migration algorithm: A new scalable approach for parallelization on multicore CPU based cluster

    NASA Astrophysics Data System (ADS)

    Rastogi, Richa; Londhe, Ashutosh; Srivastava, Abhishek; Sirasala, Kirannmayi M.; Khonde, Kiran

    2017-03-01

    In this article, a new scalable 3D Kirchhoff depth migration algorithm is presented on state of the art multicore CPU based cluster. Parallelization of 3D Kirchhoff depth migration is challenging due to its high demand of compute time, memory, storage and I/O along with the need of their effective management. The most resource intensive modules of the algorithm are traveltime calculations and migration summation which exhibit an inherent trade off between compute time and other resources. The parallelization strategy of the algorithm largely depends on the storage of calculated traveltimes and its feeding mechanism to the migration process. The presented work is an extension of our previous work, wherein a 3D Kirchhoff depth migration application for multicore CPU based parallel system had been developed. Recently, we have worked on improving parallel performance of this application by re-designing the parallelization approach. The new algorithm is capable to efficiently migrate both prestack and poststack 3D data. It exhibits flexibility for migrating large number of traces within the available node memory and with minimal requirement of storage, I/O and inter-node communication. The resultant application is tested using 3D Overthrust data on PARAM Yuva II, which is a Xeon E5-2670 based multicore CPU cluster with 16 cores/node and 64 GB shared memory. Parallel performance of the algorithm is studied using different numerical experiments and the scalability results show striking improvement over its previous version. An impressive 49.05X speedup with 76.64% efficiency is achieved for 3D prestack data and 32.00X speedup with 50.00% efficiency for 3D poststack data, using 64 nodes. The results also demonstrate the effectiveness and robustness of the improved algorithm with high scalability and efficiency on a multicore CPU cluster.

  6. Routing performance analysis and optimization within a massively parallel computer

    DOEpatents

    Archer, Charles Jens; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen

    2013-04-16

    An apparatus, program product and method optimize the operation of a massively parallel computer system by, in part, receiving actual performance data concerning an application executed by the plurality of interconnected nodes, and analyzing the actual performance data to identify an actual performance pattern. A desired performance pattern may be determined for the application, and an algorithm may be selected from among a plurality of algorithms stored within a memory, the algorithm being configured to achieve the desired performance pattern based on the actual performance data.

  7. A Massively Parallel Solver for the Mechanical Harmonic Analysis of Accelerator Cavities

    SciTech Connect

    O. Kononenko

    2015-02-17

    ACE3P is a 3D massively parallel simulation suite that developed at SLAC National Accelerator Laboratory that can perform coupled electromagnetic, thermal and mechanical study. Effectively utilizing supercomputer resources, ACE3P has become a key simulation tool for particle accelerator R and D. A new frequency domain solver to perform mechanical harmonic response analysis of accelerator components is developed within the existing parallel framework. This solver is designed to determine the frequency response of the mechanical system to external harmonic excitations for time-efficient accurate analysis of the large-scale problems. Coupled with the ACE3P electromagnetic modules, this capability complements a set of multi-physics tools for a comprehensive study of microphonics in superconducting accelerating cavities in order to understand the RF response and feedback requirements for the operational reliability of a particle accelerator. (auth)

  8. Ordered fast Fourier transforms on a massively parallel hypercube multiprocessor

    NASA Technical Reports Server (NTRS)

    Tong, Charles; Swarztrauber, Paul N.

    1991-01-01

    The present evaluation of alternative, massively parallel hypercube processor-applicable designs for ordered radix-2 decimation-in-frequency FFT algorithms gives attention to the reduction of computation time-dominating communication. A combination of the order and computational phases of the FFT is accordingly employed, in conjunction with sequence-to-processor maps which reduce communication. Two orderings, 'standard' and 'cyclic', in which the order of the transform is the same as that of the input sequence, can be implemented with ease on the Connection Machine (where orderings are determined by geometries and priorities. A parallel method for trigonometric coefficient computation is presented which does not employ trigonometric functions or interprocessor communication.

  9. A biconjugate gradient type algorithm on massively parallel architectures

    NASA Technical Reports Server (NTRS)

    Freund, Roland W.; Hochbruck, Marlis

    1991-01-01

    The biconjugate gradient (BCG) method is the natural generalization of the classical conjugate gradient algorithm for Hermitian positive definite matrices to general non-Hermitian linear systems. Unfortunately, the original BCG algorithm is susceptible to possible breakdowns and numerical instabilities. Recently, Freund and Nachtigal have proposed a novel BCG type approach, the quasi-minimal residual method (QMR), which overcomes the problems of BCG. Here, an implementation is presented of QMR based on an s-step version of the nonsymmetric look-ahead Lanczos algorithm. The main feature of the s-step Lanczos algorithm is that, in general, all inner products, except for one, can be computed in parallel at the end of each block; this is unlike the other standard Lanczos process where inner products are generated sequentially. The resulting implementation of QMR is particularly attractive on massively parallel SIMD architectures, such as the Connection Machine.

  10. 3D magnetospheric parallel hybrid multi-grid method applied to planet–plasma interactions

    SciTech Connect

    Leclercq, L.; Mancini, M.

    2016-03-15

    We present a new method to exploit multiple refinement levels within a 3D parallel hybrid model, developed to study planet–plasma interactions. This model is based on the hybrid formalism: ions are kinetically treated whereas electrons are considered as a inertia-less fluid. Generally, ions are represented by numerical particles whose size equals the volume of the cells. Particles that leave a coarse grid subsequently entering a refined region are split into particles whose volume corresponds to the volume of the refined cells. The number of refined particles created from a coarse particle depends on the grid refinement rate. In order to conserve velocity distribution functions and to avoid calculations of average velocities, particles are not coalesced. Moreover, to ensure the constancy of particles' shape function sizes, the hybrid method is adapted to allow refined particles to move within a coarse region. Another innovation of this approach is the method developed to compute grid moments at interfaces between two refinement levels. Indeed, the hybrid method is adapted to accurately account for the special grid structure at the interfaces, avoiding any overlapping grid considerations. Some fundamental test runs were performed to validate our approach (e.g. quiet plasma flow, Alfven wave propagation). Lastly, we also show a planetary application of the model, simulating the interaction between Jupiter's moon Ganymede and the Jovian plasma.

  11. A massively parallel fractional step solver for incompressible flows

    SciTech Connect

    Houzeaux, G. Vazquez, M. Aubry, R. Cela, J.M.

    2009-09-20

    This paper presents a parallel implementation of fractional solvers for the incompressible Navier-Stokes equations using an algebraic approach. Under this framework, predictor-corrector and incremental projection schemes are seen as sub-classes of the same class, making apparent its differences and similarities. An additional advantage of this approach is to set a common basis for a parallelization strategy, which can be extended to other split techniques or to compressible flows. The predictor-corrector scheme consists in solving the momentum equation and a modified 'continuity' equation (namely a simple iteration for the pressure Schur complement) consecutively in order to converge to the monolithic solution, thus avoiding fractional errors. On the other hand, the incremental projection scheme solves only one iteration of the predictor-corrector per time step and adds a correction equation to fulfill the mass conservation. As shown in the paper, these two schemes are very well suited for massively parallel implementation. In fact, when compared with monolithic schemes, simpler solvers and preconditioners can be used to solve the non-symmetric momentum equations (GMRES, Bi-CGSTAB) and to solve the symmetric continuity equation (CG, Deflated CG). This gives good speedup properties of the algorithm. The implementation of the mesh partitioning technique is presented, as well as the parallel performances and speedups for thousands of processors.

  12. Ordered fast fourier transforms on a massively parallel hypercube multiprocessor

    NASA Technical Reports Server (NTRS)

    Tong, Charles; Swarztrauber, Paul N.

    1989-01-01

    Design alternatives for ordered Fast Fourier Transformation (FFT) algorithms were examined on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end, the order and computational phases of the FFT were combined, and the sequence to processor maps that reduce communication were used. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, standard-order and A-order which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 exp r elements and the hypercube has P = 2 exp d processors, then a standard-order FFT can be implemented with d + r/2 + 1 parallel transmissions. An A-order sequence can be transformed with 2d - r/2 parallel transmissions which is r - d + 1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine.

  13. Learning Quantitative Sequence-Function Relationships from Massively Parallel Experiments

    NASA Astrophysics Data System (ADS)

    Atwal, Gurinder S.; Kinney, Justin B.

    2016-03-01

    A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships—functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes"—directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.

  14. A HIGHLY COLLIMATED WATER MASER BIPOLAR OUTFLOW IN THE CEPHEUS A HW3d MASSIVE YOUNG STELLAR OBJECT

    SciTech Connect

    Chibueze, James O.; Imai, Hiroshi; Tafoya, Daniel; Omodaka, Toshihiro; Chong, Sze-Ning; Kameya, Osamu; Hirota, Tomoya; Torrelles, Jose M.

    2012-04-01

    We present the results of multi-epoch very long baseline interferometry (VLBI) water (H{sub 2}O) maser observations carried out with the VLBI Exploration of Radio Astrometry toward the Cepheus A HW3d object. We measured for the first time relative proper motions of the H{sub 2}O maser features, whose spatio-kinematics traces a compact bipolar outflow. This outflow looks highly collimated and expanding through {approx}280 AU (400 mas) at a mean velocity of {approx}21 km s{sup -1} ({approx}6 mas yr{sup -1}) without taking into account the turbulent central maser cluster. The opening angle of the outflow is estimated to be {approx}30 Degree-Sign . The dynamical timescale of the outflow is estimated to be {approx}100 years. Our results provide strong support that HW3d harbors an internal massive young star, and the observed outflow could be tracing a very early phase of star formation. We also have analyzed Very Large Array archive data of 1.3 cm continuum emission obtained in 1995 and 2006 toward Cepheus A. The comparative result of the HW3d continuum emission suggests the possibility of the existence of distinct young stellar objects in HW3d and/or strong variability in one of their radio continuum emission components.

  15. Massively parallel density functional calculations for thousands of atoms: KKRnano

    NASA Astrophysics Data System (ADS)

    Thiess, A.; Zeller, R.; Bolten, M.; Dederichs, P. H.; Blügel, S.

    2012-06-01

    Applications of existing precise electronic-structure methods based on density functional theory are typically limited to the treatment of about 1000 inequivalent atoms, which leaves unresolved many open questions in material science, e.g., on complex defects, interfaces, dislocations, and nanostructures. KKRnano is a new massively parallel linear scaling all-electron density functional algorithm in the framework of the Korringa-Kohn-Rostoker (KKR) Green's-function method. We conceptualized, developed, and optimized KKRnano for large-scale applications of many thousands of atoms without compromising on the precision of a full-potential all-electron method, i.e., it is a method without any shape approximation of the charge density or potential. A key element of the new method is the iterative solution of the sparse linear Dyson equation, which we parallelized atom by atom, across energy points in the complex plane and for each spin degree of freedom using the message passing interface standard, followed by a lower-level OpenMP parallelization. This hybrid four-level parallelization allows for an efficient use of up to 100000 processors on the latest generation of supercomputers. The iterative solution of the Dyson equation is significantly accelerated, employing preconditioning techniques making use of coarse-graining principles expressed in a block-circulant preconditioner. In this paper, we will describe the important elements of this new algorithm, focusing on the parallelization and preconditioning and showing scaling results for NiPd alloys up to 8192 atoms and 65536 processors. At the end, we present an order-N algorithm for large-scale simulations of metallic systems, making use of the nearsighted principle of the KKR Green's-function approach by introducing a truncation of the electron scattering to a local cluster of atoms, the size of which is determined by the requested accuracy. By exploiting this algorithm, we show linear scaling calculations of more

  16. Representing and computing regular languages on massively parallel networks

    SciTech Connect

    Miller, M.I.; O'Sullivan, J.A. ); Boysam, B. ); Smith, K.R. )

    1991-01-01

    This paper proposes a general method for incorporating rule-based constraints corresponding to regular languages into stochastic inference problems, thereby allowing for a unified representation of stochastic and syntactic pattern constraints. The authors' approach first established the formal connection of rules to Chomsky grammars, and generalizes the original work of Shannon on the encoding of rule-based channel sequences to Markov chains of maximum entropy. This maximum entropy probabilistic view leads to Gibb's representations with potentials which have their number of minima growing at precisely the exponential rate that the language of deterministically constrained sequences grow. These representations are coupled to stochastic diffusion algorithms, which sample the language-constrained sequences by visiting the energy minima according to the underlying Gibbs' probability law. The coupling to stochastic search methods yields the all-important practical result that fully parallel stochastic cellular automata may be derived to generate samples from the rule-based constraint sets. The production rules and neighborhood state structure of the language of sequences directly determines the necessary connection structures of the required parallel computing surface. Representations of this type have been mapped to the DAP-510 massively-parallel processor consisting of 1024 mesh-connected bit-serial processing elements for performing automated segmentation of electron-micrograph images.

  17. Reconstruction for Time-Domain In Vivo EPR 3D Multigradient Oximetric Imaging—A Parallel Processing Perspective

    PubMed Central

    Dharmaraj, Christopher D.; Thadikonda, Kishan; Fletcher, Anthony R.; Doan, Phuc N.; Devasahayam, Nallathamby; Matsumoto, Shingo; Johnson, Calvin A.; Cook, John A.; Mitchell, James B.; Subramanian, Sankaran; Krishna, Murali C.

    2009-01-01

    Three-dimensional Oximetric Electron Paramagnetic Resonance Imaging using the Single Point Imaging modality generates unpaired spin density and oxygen images that can readily distinguish between normal and tumor tissues in small animals. It is also possible with fast imaging to track the changes in tissue oxygenation in response to the oxygen content in the breathing air. However, this involves dealing with gigabytes of data for each 3D oximetric imaging experiment involving digital band pass filtering and background noise subtraction, followed by 3D Fourier reconstruction. This process is rather slow in a conventional uniprocessor system. This paper presents a parallelization framework using OpenMP runtime support and parallel MATLAB to execute such computationally intensive programs. The Intel compiler is used to develop a parallel C++ code based on OpenMP. The code is executed on four Dual-Core AMD Opteron shared memory processors, to reduce the computational burden of the filtration task significantly. The results show that the parallel code for filtration has achieved a speed up factor of 46.66 as against the equivalent serial MATLAB code. In addition, a parallel MATLAB code has been developed to perform 3D Fourier reconstruction. Speedup factors of 4.57 and 4.25 have been achieved during the reconstruction process and oximetry computation, for a data set with 23 × 23 × 23 gradient steps. The execution time has been computed for both the serial and parallel implementations using different dimensions of the data and presented for comparison. The reported system has been designed to be easily accessible even from low-cost personal computers through local internet (NIHnet). The experimental results demonstrate that the parallel computing provides a source of high computational power to obtain biophysical parameters from 3D EPR oximetric imaging, almost in real-time. PMID:19672315

  18. FIRST RESULTS FROM THE 3D-HST SURVEY: THE STRIKING DIVERSITY OF MASSIVE GALAXIES AT z > 1

    SciTech Connect

    Van Dokkum, Pieter G.; Nelson, Erica; Skelton, Rosalind E.; Bezanson, Rachel; Lundgren, Britt; Brammer, Gabriel; Fumagalli, Mattia; Franx, Marijn; Patel, Shannon; Labbe, Ivo; Rix, Hans-Walter; Schmidt, Kasper B.; Da Cunha, Elisabete; Kriek, Mariska; Bian Fuyan; Fan Xiaohui; Erb, Dawn K.; Foerster Schreiber, Natascha; Illingworth, Garth D.; Magee, Dan; and others

    2011-12-10

    We present first results from the 3D-HST program, a near-IR spectroscopic survey performed with the Wide Field Camera 3 (WFC3) on the HST. We have used 3D-HST spectra to measure redshifts and H{alpha} equivalent widths (EW{sub H{alpha}}) for a complete, stellar mass-limited sample of 34 galaxies at 1 < z < 1.5 with M{sub star} > 10{sup 11} M{sub Sun} in the COSMOS, GOODS, and AEGIS fields. We find that a substantial fraction of massive galaxies at this epoch are forming stars at a high rate: the fraction of galaxies with EW{sub H{alpha}} >10 A is 59%, compared to 10% among Sloan Digital Sky Survey galaxies of similar masses at z = 0.1. Galaxies with weak H{alpha} emission show absorption lines typical of 2-4 Gyr old stellar populations. The structural parameters of the galaxies, derived from the associated WFC3 F140W imaging data, correlate with the presence of H{alpha}; quiescent galaxies are compact with high Sersic index and high inferred velocity dispersion, whereas star-forming galaxies are typically large two-armed spiral galaxies, with low Sersic index. Some of these star-forming galaxies might be progenitors of the most massive S0 and Sa galaxies. Our results challenge the idea that galaxies at fixed mass form a homogeneous population with small scatter in their properties. Instead, we find that massive galaxies form a highly diverse population at z > 1, in marked contrast to the local universe.

  19. Direct stereo radargrammetric processing using massively parallel processing

    NASA Astrophysics Data System (ADS)

    Balz, Timo; Zhang, Lu; Liao, Mingsheng

    2013-05-01

    Synthetic Aperture Radar (SAR) offers many ways to reconstruct digital surface models (DSMs). The two most commonly used methods are SAR interferometry (InSAR) and stereo radargrammetry. Stereo radargrammetry is a very stable and reliable process and is far less affected by temporal decorrelation compared with InSAR. It is therefore often used for DSM generation in heavily vegetated areas. However, stereo radargrammetry often produces rather noisy DSMs, sometimes containing large outliers. In this manuscript, we present a new approach for stereo radargrammetric processing, where the homologous points between the images are found by geocoding large amount of points. This offers a very flexible approach, allowing the simultaneous processing of multiple images and of cross-heading image pairs. Our approach relies on a good initial geocoding accuracy of the data and on very fast processing using a massively parallel implementation. The approach is demonstrated using TerraSAR-X images from Mount Song, China, and from Trento, Italy.

  20. Transmissive Nanohole Arrays for Massively-Parallel Optical Biosensing

    PubMed Central

    2015-01-01

    A high-throughput optical biosensing technique is proposed and demonstrated. This hybrid technique combines optical transmission of nanoholes with colorimetric silver staining. The size and spacing of the nanoholes are chosen so that individual nanoholes can be independently resolved in massive parallel using an ordinary transmission optical microscope, and, in place of determining a spectral shift, the brightness of each nanohole is recorded to greatly simplify the readout. Each nanohole then acts as an independent sensor, and the blocking of nanohole optical transmission by enzymatic silver staining defines the specific detection of a biological agent. Nearly 10000 nanoholes can be simultaneously monitored under the field of view of a typical microscope. As an initial proof of concept, biotinylated lysozyme (biotin-HEL) was used as a model analyte, giving a detection limit as low as 0.1 ng/mL. PMID:25530982

  1. A Computational Fluid Dynamics Algorithm on a Massively Parallel Computer

    NASA Technical Reports Server (NTRS)

    Jespersen, Dennis C.; Levit, Creon

    1989-01-01

    The discipline of computational fluid dynamics is demanding ever-increasing computational power to deal with complex fluid flow problems. We investigate the performance of a finite-difference computational fluid dynamics algorithm on a massively parallel computer, the Connection Machine. Of special interest is an implicit time-stepping algorithm; to obtain maximum performance from the Connection Machine, it is necessary to use a nonstandard algorithm to solve the linear systems that arise in the implicit algorithm. We find that the Connection Machine ran achieve very high computation rates on both explicit and implicit algorithms. The performance of the Connection Machine puts it in the same class as today's most powerful conventional supercomputers.

  2. Integration of IR focal plane arrays with massively parallel processor

    NASA Astrophysics Data System (ADS)

    Esfandiari, P.; Koskey, P.; Vaccaro, K.; Buchwald, W.; Clark, F.; Krejca, B.; Rekeczky, C.; Zarandy, A.

    2008-04-01

    The intent of this investigation is to replace the low fill factor visible sensor of a Cellular Neural Network (CNN) processor with an InGaAs Focal Plane Array (FPA) using both bump bonding and epitaxial layer transfer techniques for use in the Ballistic Missile Defense System (BMDS) interceptor seekers. The goal is to fabricate a massively parallel digital processor with a local as well as a global interconnect architecture. Currently, this unique CNN processor is capable of processing a target scene in excess of 10,000 frames per second with its visible sensor. What makes the CNN processor so unique is that each processing element includes memory, local data storage, local and global communication devices and a visible sensor supported by a programmable analog or digital computer program.

  3. Development of a massively parallel parachute performance prediction code

    SciTech Connect

    Peterson, C.W.; Strickland, J.H.; Wolfe, W.P.; Sundberg, W.D.; McBride, D.D.

    1997-04-01

    The Department of Energy has given Sandia full responsibility for the complete life cycle (cradle to grave) of all nuclear weapon parachutes. Sandia National Laboratories is initiating development of a complete numerical simulation of parachute performance, beginning with parachute deployment and continuing through inflation and steady state descent. The purpose of the parachute performance code is to predict the performance of stockpile weapon parachutes as these parachutes continue to age well beyond their intended service life. A new massively parallel computer will provide unprecedented speed and memory for solving this complex problem, and new software will be written to treat the coupled fluid, structure and trajectory calculations as part of a single code. Verification and validation experiments have been proposed to provide the necessary confidence in the computations.

  4. Massively Parallel Simulations of Diffusion in Dense Polymeric Structures

    SciTech Connect

    Faulon, Jean-Loup, Wilcox, R.T. , Hobbs, J.D. , Ford, D.M.

    1997-11-01

    An original computational technique to generate close-to-equilibrium dense polymeric structures is proposed. Diffusion of small gases are studied on the equilibrated structures using massively parallel molecular dynamics simulations running on the Intel Teraflops (9216 Pentium Pro processors) and Intel Paragon(1840 processors). Compared to the current state-of-the-art equilibration methods this new technique appears to be faster by some orders of magnitude.The main advantage of the technique is that one can circumvent the bottlenecks in configuration space that inhibit relaxation in molecular dynamics simulations. The technique is based on the fact that tetravalent atoms (such as carbon and silicon) fit in the center of a regular tetrahedron and that regular tetrahedrons can be used to mesh the three-dimensional space. Thus, the problem of polymer equilibration described by continuous equations in molecular dynamics is reduced to a discrete problem where solutions are approximated by simple algorithms. Practical modeling applications include the constructing of butyl rubber and ethylene-propylene-dimer-monomer (EPDM) models for oxygen and water diffusion calculations. Butyl and EPDM are used in O-ring systems and serve as sealing joints in many manufactured objects. Diffusion coefficients of small gases have been measured experimentally on both polymeric systems, and in general the diffusion coefficients in EPDM are an order of magnitude larger than in butyl. In order to better understand the diffusion phenomena, 10, 000 atoms models were generated and equilibrated for butyl and EPDM. The models were submitted to a massively parallel molecular dynamics simulation to monitor the trajectories of the diffusing species.

  5. TOMO3D: 3-D joint refraction and reflection traveltime tomography parallel code for active-source seismic data—synthetic test

    NASA Astrophysics Data System (ADS)

    Meléndez, A.; Korenaga, J.; Sallarès, V.; Miniussi, A.; Ranero, C. R.

    2015-10-01

    We present a new 3-D traveltime tomography code (TOMO3D) for the modelling of active-source seismic data that uses the arrival times of both refracted and reflected seismic phases to derive the velocity distribution and the geometry of reflecting boundaries in the subsurface. This code is based on its popular 2-D version TOMO2D from which it inherited the methods to solve the forward and inverse problems. The traveltime calculations are done using a hybrid ray-tracing technique combining the graph and bending methods. The LSQR algorithm is used to perform the iterative regularized inversion to improve the initial velocity and depth models. In order to cope with an increased computational demand due to the incorporation of the third dimension, the forward problem solver, which takes most of the run time (˜90 per cent in the test presented here), has been parallelized with a combination of multi-processing and message passing interface standards. This parallelization distributes the ray-tracing and traveltime calculations among available computational resources. The code's performance is illustrated with a realistic synthetic example, including a checkerboard anomaly and two reflectors, which simulates the geometry of a subduction zone. The code is designed to invert for a single reflector at a time. A data-driven layer-stripping strategy is proposed for cases involving multiple reflectors, and it is tested for the successive inversion of the two reflectors. Layers are bound by consecutive reflectors, and an initial velocity model for each inversion step incorporates the results from previous steps. This strategy poses simpler inversion problems at each step, allowing the recovery of strong velocity discontinuities that would otherwise be smoothened.

  6. Three-dimensional electromagnetic modeling and inversion on massively parallel computers

    SciTech Connect

    Newman, G.A.; Alumbaugh, D.L.

    1996-03-01

    This report has demonstrated techniques that can be used to construct solutions to the 3-D electromagnetic inverse problem using full wave equation modeling. To this point great progress has been made in developing an inverse solution using the method of conjugate gradients which employs a 3-D finite difference solver to construct model sensitivities and predicted data. The forward modeling code has been developed to incorporate absorbing boundary conditions for high frequency solutions (radar), as well as complex electrical properties, including electrical conductivity, dielectric permittivity and magnetic permeability. In addition both forward and inverse codes have been ported to a massively parallel computer architecture which allows for more realistic solutions that can be achieved with serial machines. While the inversion code has been demonstrated on field data collected at the Richmond field site, techniques for appraising the quality of the reconstructions still need to be developed. Here it is suggested that rather than employing direct matrix inversion to construct the model covariance matrix which would be impossible because of the size of the problem, one can linearize about the 3-D model achieved in the inverse and use Monte-Carlo simulations to construct it. Using these appraisal and construction tools, it is now necessary to demonstrate 3-D inversion for a variety of EM data sets that span the frequency range from induction sounding to radar: below 100 kHz to 100 MHz. Appraised 3-D images of the earth`s electrical properties can provide researchers opportunities to infer the flow paths, flow rates and perhaps the chemistry of fluids in geologic mediums. It also offers a means to study the frequency dependence behavior of the properties in situ. This is of significant relevance to the Department of Energy, paramount to characterizing and monitoring of environmental waste sites and oil and gas exploration.

  7. Digital relief 3D model of the Khibiny massive (Kola peninsula)

    NASA Astrophysics Data System (ADS)

    Chesalova, Elena; Asavin, Alex

    2015-04-01

    On the basis of maps of 1: 50,000 and 1: 200,000 3D model Khibiny massif developed. We used software ARC / INFO v10.2 ESRI. This project will be organised to build background for gas pollution monitoring network. We planned to use the model to estimate local heterogeneities in the composition of the atmosphere at the emanation of greenhouse gases in the area, the construction of models of vertical distribution of the content of trace gases in the rock mass. In addition to the project GIS digital elevation model contains layers of geological and tectonic map that allows us to estimate the area of the output of certain petrographic rock groups characterized by different ratios of emitted hydrocarbons (CH4/ H2). The model allows to construct a classification of fault in the array. At first glance, there are two groups of faults - the ancient associated with the formation of the intrusive phases sequence, and the young - due to recent tectonic shifts. Ancient faults form a common semicircular structure of the pluton cause overall asymmetry Khibin heights with the transition to the border area between the Khibiny and Lovoozero. Modern tectonics mainly represented by radial and chord faults which are formed narrow mountain valleys and troughs. It remains an open question as to which system fault (old or young) is more productive to gas emanations? On the one hand the system characterized by a large old depth, on the other hand a young more active faults. Address these issues require further detailed observations. The essential question is to assess the possibility of maintaining a constant concentration gradient of these impurities in the atmosphere due to gas emanations of fracture zones and areas enriched occluded gases. In the simulation of these processes can be used initially set parameters: 1 the flow rate of the gas impurities 2 the value of wind flows in closed and open valley 3 Assessment of thermal diffusion coefficients determined by the temperature gradient

  8. Particle simulation of plasmas on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Gledhill, I. M. A.; Storey, L. R. O.

    1987-01-01

    Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.

  9. Massively Parallel Single-Molecule Manipulation Using Centrifugal Force

    NASA Astrophysics Data System (ADS)

    Wong, Wesley; Halvorsen, Ken

    2011-03-01

    Precise manipulation of single molecules has led to remarkable insights in physics, chemistry, biology, and medicine. However, two issues that have impeded the widespread adoption of these techniques are equipment cost and the laborious nature of making measurements one molecule at a time. To meet these challenges, we have developed an approach that enables massively parallel single- molecule force measurements using centrifugal force. This approach is realized in the centrifuge force microscope, an instrument in which objects in an orbiting sample are subjected to a calibration-free, macroscopically uniform force- field while their micro-to-nanoscopic motions are observed. We demonstrate high- throughput single-molecule force spectroscopy with this technique by performing thousands of rupture experiments in parallel, characterizing force-dependent unbinding kinetics of an antibody-antigen pair in minutes rather than days. Currently, we are taking steps to integrate high-resolution detection, fluorescence, temperature control and a greater dynamic range in force. With significant benefits in efficiency, cost, simplicity, and versatility, single-molecule centrifugation has the potential to expand single-molecule experimentation to a wider range of researchers and experimental systems.

  10. Massively Parallel Atomic Force Microscope with Digital Holographic Readout

    NASA Astrophysics Data System (ADS)

    Sache, L.; Kawakatsu, H.; Emery, Y.; Bleuler, H.

    2007-03-01

    Massively Parallel Scanning Probe Microscopy is an obvious path for data storage (E Grochowski, R F Hoyt, Future Trends in Hard disc Drives, IEEE Trans. Magn. 1996, 32, 1850- 1854; J L Griffin, S W Schlosser, G R Ganger and D F Nagle, Modeling and Performance of MEMS-Based Storage Devices, Proc. ACM SIGMETRICS, 2000). Current experimental systems still lay far behind Hard Disc Drive (HDD) or Digital Video Disk (DVD), be it in access speed, data throughput, storage density or cost per bit. This paper presents an entirely new approach with the promise to break several of these barriers. The key idea is readout of a Scanning Probes Microscope (SPM) array by Digital Holographic Microscopy (DHM). This technology directly gives phase information at each pixel of a CCD array. This means that no contact line to each individual SPM probes is needed. The data is directly available in parallel form. Moreover, the optical setup needs in principle no expensive components, optical (or, to a large extent, mechanical) imperfections being compensated in the signal processing, i.e. in electronics. This gives the system the potential for a low cost device with fast Terabit readout capability.

  11. Efficiently modeling neural networks on massively parallel computers

    NASA Technical Reports Server (NTRS)

    Farber, Robert M.

    1993-01-01

    Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.

  12. CHOLLA: A New Massively Parallel Hydrodynamics Code for Astrophysical Simulation

    NASA Astrophysics Data System (ADS)

    Schneider, Evan E.; Robertson, Brant E.

    2015-04-01

    We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳2563) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.

  13. CHOLLA: A NEW MASSIVELY PARALLEL HYDRODYNAMICS CODE FOR ASTROPHYSICAL SIMULATION

    SciTech Connect

    Schneider, Evan E.; Robertson, Brant E.

    2015-04-15

    We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳256{sup 3}) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.

  14. Massively parallel electrical conductivity imaging of the subsurface: Applications to hydrocarbon exploration

    SciTech Connect

    Newman, G.A.; Commer, M.

    2009-06-01

    Three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical conductivity mapping of potential offshore oil and gas reservoirs. The imaging technology employs controlled source electromagnetic (CSEM) and magnetotelluric (MT) fields and treats geological media exhibiting transverse anisotropy. Moreover when combined with established seismic methods, direct imaging of reservoir fluids is possible. Because of the size of the 3D conductivity imaging problem, strategies are required exploiting computational parallelism and optimal meshing. The algorithm thus developed has been shown to scale to tens of thousands of processors. In one imaging experiment, 32,768 tasks/processors on the IBM Watson Research Blue Gene/L supercomputer were successfully utilized. Over a 24 hour period we were able to image a large scale field data set that previously required over four months of processing time on distributed clusters based on Intel or AMD processors utilizing 1024 tasks on an InfiniBand fabric. Electrical conductivity imaging using massively parallel computational resources produces results that cannot be obtained otherwise and are consistent with timeframes required for practical exploration problems.

  15. Development of a Stereo Vision Measurement System for a 3D Three-Axial Pneumatic Parallel Mechanism Robot Arm

    PubMed Central

    Chiang, Mao-Hsiung; Lin, Hao-Ting; Hou, Chien-Lun

    2011-01-01

    In this paper, a stereo vision 3D position measurement system for a three-axial pneumatic parallel mechanism robot arm is presented. The stereo vision 3D position measurement system aims to measure the 3D trajectories of the end-effector of the robot arm. To track the end-effector of the robot arm, the circle detection algorithm is used to detect the desired target and the SAD algorithm is used to track the moving target and to search the corresponding target location along the conjugate epipolar line in the stereo pair. After camera calibration, both intrinsic and extrinsic parameters of the stereo rig can be obtained, so images can be rectified according to the camera parameters. Thus, through the epipolar rectification, the stereo matching process is reduced to a horizontal search along the conjugate epipolar line. Finally, 3D trajectories of the end-effector are computed by stereo triangulation. The experimental results show that the stereo vision 3D position measurement system proposed in this paper can successfully track and measure the fifth-order polynomial trajectory and sinusoidal trajectory of the end-effector of the three- axial pneumatic parallel mechanism robot arm. PMID:22319408

  16. Development of a stereo vision measurement system for a 3D three-axial pneumatic parallel mechanism robot arm.

    PubMed

    Chiang, Mao-Hsiung; Lin, Hao-Ting; Hou, Chien-Lun

    2011-01-01

    In this paper, a stereo vision 3D position measurement system for a three-axial pneumatic parallel mechanism robot arm is presented. The stereo vision 3D position measurement system aims to measure the 3D trajectories of the end-effector of the robot arm. To track the end-effector of the robot arm, the circle detection algorithm is used to detect the desired target and the SAD algorithm is used to track the moving target and to search the corresponding target location along the conjugate epipolar line in the stereo pair. After camera calibration, both intrinsic and extrinsic parameters of the stereo rig can be obtained, so images can be rectified according to the camera parameters. Thus, through the epipolar rectification, the stereo matching process is reduced to a horizontal search along the conjugate epipolar line. Finally, 3D trajectories of the end-effector are computed by stereo triangulation. The experimental results show that the stereo vision 3D position measurement system proposed in this paper can successfully track and measure the fifth-order polynomial trajectory and sinusoidal trajectory of the end-effector of the three- axial pneumatic parallel mechanism robot arm.

  17. Massively Parallel Interrogation of Aptamer Sequence, Structure and Function

    SciTech Connect

    Fischer, N O; Tok, J B; Tarasow, T M

    2008-02-08

    Optimization of high affinity reagents is a significant bottleneck in medicine and the life sciences. The ability to synthetically create thousands of permutations of a lead high-affinity reagent and survey the properties of individual permutations in parallel could potentially relieve this bottleneck. Aptamers are single stranded oligonucleotides affinity reagents isolated by in vitro selection processes and as a class have been shown to bind a wide variety of target molecules. Methodology/Principal Findings. High density DNA microarray technology was used to synthesize, in situ, arrays of approximately 3,900 aptamer sequence permutations in triplicate. These sequences were interrogated on-chip for their ability to bind the fluorescently-labeled cognate target, immunoglobulin E, resulting in the parallel execution of thousands of experiments. Fluorescence intensity at each array feature was well resolved and shown to be a function of the sequence present. The data demonstrated high intra- and interchip correlation between the same features as well as among the sequence triplicates within a single array. Consistent with aptamer mediated IgE binding, fluorescence intensity correlated strongly with specific aptamer sequences and the concentration of IgE applied to the array. The massively parallel sequence-function analyses provided by this approach confirmed the importance of a consensus sequence found in all 21 of the original IgE aptamer sequences and support a common stem:loop structure as being the secondary structure underlying IgE binding. The microarray application, data and results presented illustrate an efficient, high information content approach to optimizing aptamer function. It also provides a foundation from which to better understand and manipulate this important class of high affinity biomolecules.

  18. Large-Scale Eigenvalue Calculations for Stability Analysis of Steady Flows on Massively Parallel Computers

    SciTech Connect

    Lehoucq, Richard B.; Salinger, Andrew G.

    1999-08-01

    We present an approach for determining the linear stability of steady states of PDEs on massively parallel computers. Linearizing the transient behavior around a steady state leads to a generalized eigenvalue problem. The eigenvalues with largest real part are calculated using Arnoldi's iteration driven by a novel implementation of the Cayley transformation to recast the problem as an ordinary eigenvalue problem. The Cayley transformation requires the solution of a linear system at each Arnoldi iteration, which must be done iteratively for the algorithm to scale with problem size. A representative model problem of 3D incompressible flow and heat transfer in a rotating disk reactor is used to analyze the effect of algorithmic parameters on the performance of the eigenvalue algorithm. Successful calculations of leading eigenvalues for matrix systems of order up to 4 million were performed, identifying the critical Grashof number for a Hopf bifurcation.

  19. A scalable approach to modeling groundwater flow on massively parallel computers

    SciTech Connect

    Ashby, S.F.; Falgout, R.D.; Tompson, A.F.B.

    1995-12-01

    We describe a fully scalable approach to the simulation of groundwater flow on a hierarchy of computing platforms, ranging from workstations to massively parallel computers. Specifically, we advocate the use of scalable conceptual models in which the subsurface model is defined independently of the computational grid on which the simulation takes place. We also describe a scalable multigrid algorithm for computing the groundwater flow velocities. We axe thus able to leverage both the engineer`s time spent developing the conceptual model and the computing resources used in the numerical simulation. We have successfully employed this approach at the LLNL site, where we have run simulations ranging in size from just a few thousand spatial zones (on workstations) to more than eight million spatial zones (on the CRAY T3D)-all using the same conceptual model.

  20. Seismic waves modeling with the Fourier pseudo-spectral method on massively parallel machines.

    NASA Astrophysics Data System (ADS)

    Klin, Peter

    2015-04-01

    The Fourier pseudo-spectral method (FPSM) is an approach for the 3D numerical modeling of the wave propagation, which is based on the discretization of the spatial domain in a structured grid and relies on global spatial differential operators for the solution of the wave equation. This last peculiarity is advantageous from the accuracy point of view but poses difficulties for an efficient implementation of the method to be run on parallel computers with distributed memory architecture. The 1D spatial domain decomposition approach has been so far commonly adopted in the parallel implementations of the FPSM, but it implies an intensive data exchange among all the processors involved in the computation, which can degrade the performance because of communication latencies. Moreover, the scalability of the 1D domain decomposition is limited, since the number of processors can not exceed the number of grid points along the directions in which the domain is partitioned. This limitation inhibits an efficient exploitation of the computational environments with a very large number of processors. In order to overcome the limitations of the 1D domain decomposition we implemented a parallel version of the FPSM based on a 2D domain decomposition, which allows to achieve a higher degree of parallelism and scalability on massively parallel machines with several thousands of processing elements. The parallel programming is essentially achieved using the MPI protocol but OpenMP parts are also included in order to exploit the single processor multi - threading capabilities, when available. The developed tool is aimed at the numerical simulation of the seismic waves propagation and in particular is intended for earthquake ground motion research. We show the scalability tests performed up to 16k processing elements on the IBM Blue Gene/Q computer at CINECA (Italy), as well as the application to the simulation of the earthquake ground motion in the alluvial plain of the Po river (Italy).

  1. Scalable, massively parallel approaches to upstream drainage area computation

    NASA Astrophysics Data System (ADS)

    Richardson, A.; Hill, C. N.; Perron, T.

    2011-12-01

    Accumulated drainage area maps of large regions are required for several applications. Among these are assessments of regional patterns of flow and sediment routing, high-resolution landscape evolution models in which drainage basin geometry evolves with time, and surveys of the characteristics of river basins that drain to continental margins. The computation of accumulated drainage areas is accomplished by inferring the vector field of drainage flow directions from a two-dimensional digital elevation map, and then computing the area that drains to each tile. From this map of elevations we can compute the integrated, upstream area that drains to each tile of the map. Generally this last step is done with a recursive algorithm, that accumulates upstream areas sequentially. The inherently serial nature of this restricts the number of tiles that can be included, thereby limiting the resolution of continental-size domains. This is because of the requirements of both memory, which will rise proportionally to the number of tiles, N, and computing time, which is O(N2). The fundamental sequential property of this approach prohibits effective use of large scale parallelism. An alternate method of calculating accumulated drainage area from drainage direction data can be arrived at by reformulating the problem as the solution of a system of simultaneous linear equations. The equations define the relation that the total upslope area of a particular tile is the sum of all the upslope areas for tiles immediately adjacent to that tile that drain to it, and the tile's own area. Solving these equations amounts to finding the solution of a sparse, nine-diagonal matrix operating on a vector for a right-hand-side that is simply the individual tile areas and where the diagonals of the matrix are determined by the landscape geometry. We show how an iterative method, Bi-CGSTAB, can be used to solve this problem in a scalable, massively parallel manner. However, this introduces

  2. Scalable Iterative Solvers Applied to 3D Parallel Simulation of Advanced Semiconductor Devices

    NASA Astrophysics Data System (ADS)

    García-Loureiro, A. J.; Aldegunde, M.; Seoane, N.

    2009-08-01

    We have studied the performance of a preconditioned iterative solver to speed up a 3D semiconductor device simulator. Since 3D simulations necessitate large computing resources, the choice of algorithms and their parameters become of utmost importance. This code uses a density gradient drift-diffusion semiconductor transport model based on the finite element method which is one of the most general and complex discretisation techniques. It has been implemented for a distributed memory multiprocessor environment using the Message Passing Interface (MPI) library. We have applied this simulator to a 67 nm effective gate length Si MOSFET.

  3. Analysis of composite ablators using massively parallel computation

    NASA Technical Reports Server (NTRS)

    Shia, David

    1995-01-01

    In this work, the feasibility of using massively parallel computation to study the response of ablative materials is investigated. Explicit and implicit finite difference methods are used on a massively parallel computer, the Thinking Machines CM-5. The governing equations are a set of nonlinear partial differential equations. The governing equations are developed for three sample problems: (1) transpiration cooling, (2) ablative composite plate, and (3) restrained thermal growth testing. The transpiration cooling problem is solved using a solution scheme based solely on the explicit finite difference method. The results are compared with available analytical steady-state through-thickness temperature and pressure distributions and good agreement between the numerical and analytical solutions is found. It is also found that a solution scheme based on the explicit finite difference method has the following advantages: incorporates complex physics easily, results in a simple algorithm, and is easily parallelizable. However, a solution scheme of this kind needs very small time steps to maintain stability. A solution scheme based on the implicit finite difference method has the advantage that it does not require very small times steps to maintain stability. However, this kind of solution scheme has the disadvantages that complex physics cannot be easily incorporated into the algorithm and that the solution scheme is difficult to parallelize. A hybrid solution scheme is then developed to combine the strengths of the explicit and implicit finite difference methods and minimize their weaknesses. This is achieved by identifying the critical time scale associated with the governing equations and applying the appropriate finite difference method according to this critical time scale. The hybrid solution scheme is then applied to the ablative composite plate and restrained thermal growth problems. The gas storage term is included in the explicit pressure calculation of both

  4. Cloud identification using genetic algorithms and massively parallel computation

    NASA Technical Reports Server (NTRS)

    Buckles, Bill P.; Petry, Frederick E.

    1996-01-01

    As a Guest Computational Investigator under the NASA administered component of the High Performance Computing and Communication Program, we implemented a massively parallel genetic algorithm on the MasPar SIMD computer. Experiments were conducted using Earth Science data in the domains of meteorology and oceanography. Results obtained in these domains are competitive with, and in most cases better than, similar problems solved using other methods. In the meteorological domain, we chose to identify clouds using AVHRR spectral data. Four cloud speciations were used although most researchers settle for three. Results were remarkedly consistent across all tests (91% accuracy). Refinements of this method may lead to more timely and complete information for Global Circulation Models (GCMS) that are prevalent in weather forecasting and global environment studies. In the oceanographic domain, we chose to identify ocean currents from a spectrometer having similar characteristics to AVHRR. Here the results were mixed (60% to 80% accuracy). Given that one is willing to run the experiment several times (say 10), then it is acceptable to claim the higher accuracy rating. This problem has never been successfully automated. Therefore, these results are encouraging even though less impressive than the cloud experiment. Successful conclusion of an automated ocean current detection system would impact coastal fishing, naval tactics, and the study of micro-climates. Finally we contributed to the basic knowledge of GA (genetic algorithm) behavior in parallel environments. We developed better knowledge of the use of subpopulations in the context of shared breeding pools and the migration of individuals. Rigorous experiments were conducted based on quantifiable performance criteria. While much of the work confirmed current wisdom, for the first time we were able to submit conclusive evidence. The software developed under this grant was placed in the public domain. An extensive user

  5. Massively Parallel Processing for Fast and Accurate Stamping Simulations

    NASA Astrophysics Data System (ADS)

    Gress, Jeffrey J.; Xu, Siguang; Joshi, Ramesh; Wang, Chuan-tao; Paul, Sabu

    2005-08-01

    The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.

  6. Scalable High Performance Computing: Direct and Large-Eddy Turbulent Flow Simulations Using Massively Parallel Computers

    NASA Technical Reports Server (NTRS)

    Morgan, Philip E.

    2004-01-01

    This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.

  7. Simulations of high current wire array Z-pinches using a parallel 3D resistive MHD

    NASA Astrophysics Data System (ADS)

    Chittenden, J. P.; Jennings, C. A.; Ciardi, A.

    2006-10-01

    We present calculations of the implosion and stagnation phases of wire array Z-pinches at Sandia National Laboratory which model the full 3D plasma volume. Modelling the full volume in 3D is found to be necessary in order to accommodate all possible mechanisms for broadening the width of the imploding plasma and for modelling all modes of instability in the stagnated pinch. The width of the imploding plasma is shown to arise from the evolution of the uncorrelated modulations present on each wire in the array early in time into a globally correlated 3D instability structure. The 3D nature of the collision of two nested arrays is highlighted and the implications for radiation pulse shaping are discussed. The addition of a simple circuit model to model the Z generator allows the pinch energetics during stagnation to be treated more accurately and provides another point of comparison to experimental data. The implications of these results for improved X-ray production are discussed both for the keV range and for soft X-ray radiation sources used in inertial confinement fusion research. This work was partially supported by the U.S. Department of Energy through cooperative agreement DE-FC03-02NA00057.

  8. gEMfitter: a highly parallel FFT-based 3D density fitting tool with GPU texture memory acceleration.

    PubMed

    Hoang, Thai V; Cavin, Xavier; Ritchie, David W

    2013-11-01

    Fitting high resolution protein structures into low resolution cryo-electron microscopy (cryo-EM) density maps is an important technique for modeling the atomic structures of very large macromolecular assemblies. This article presents "gEMfitter", a highly parallel fast Fourier transform (FFT) EM density fitting program which can exploit the special hardware properties of modern graphics processor units (GPUs) to accelerate both the translational and rotational parts of the correlation search. In particular, by using the GPU's special texture memory hardware to rotate 3D voxel grids, the cost of rotating large 3D density maps is almost completely eliminated. Compared to performing 3D correlations on one core of a contemporary central processor unit (CPU), running gEMfitter on a modern GPU gives up to 26-fold speed-up. Furthermore, using our parallel processing framework, this speed-up increases linearly with the number of CPUs or GPUs used. Thus, it is now possible to use routinely more robust but more expensive 3D correlation techniques. When tested on low resolution experimental cryo-EM data for the GroEL-GroES complex, we demonstrate the satisfactory fitting results that may be achieved by using a locally normalised cross-correlation with a Laplacian pre-filter, while still being up to three orders of magnitude faster than the well-known COLORES program.

  9. High resolution finite volume parallel simulations of mould filling and binary alloy solidification on unstructured 3-D meshes

    SciTech Connect

    Reddy, A.V.; Kothe, D.B.; Lam, K.L.

    1997-06-01

    The Los Alamos National Laboratory (LANL) is currently developing a new casting simulation tool (known as Telluride) that employs robust, high-resolution finite volume algorithms for incompressible fluid flow, volume tracking of interfaces, and solidification physics on three-dimensional (3-D) unstructured meshes. Their finite volume algorithms are based on colocated cell-centered schemes that are formally second order in time and space. The flow algorithm is a 3-D extension of recent work on projection method solutions of the Navier-Stokes (NS) equations. Their volume tracking algorithm can accurately track topologically complex interfaces by approximating the interface geometry as piecewise planar. Coupled to their fluid flow algorithm is a comprehensive binary alloy solidification model that incorporates macroscopic descriptions of heat transfer, solute redistribution, and melt convection as well as a microscopic description of segregation. The finite volume algorithms, which are efficient, parallel, and robust, can yield high-fidelity solutions on a variety of meshes, ranging from those that are structured orthogonal to fully unstructured (finite element). The authors discuss key computer science issues that have enabled them to efficiently parallelize their unstructured mesh algorithms on both distributed and shared memory computing platforms. These include their functionally object-oriented use of Fortran 90 and new parallel libraries for gather/scatter functions (PGSLib) and solutions of linear systems of equations (JTpack90). Examples of their current capabilities are illustrated with simulations of mold filling and solidification of complex 3-D components currently being poured in LANL foundries.

  10. Massively parallel processor networks with optical express channels

    DOEpatents

    Deri, Robert J.; Brooks, III, Eugene D.; Haigh, Ronald E.; DeGroot, Anthony J.

    1999-01-01

    An optical method for separating and routing local and express channel data comprises interconnecting the nodes in a network with fiber optic cables. A single fiber optic cable carries both express channel traffic and local channel traffic, e.g., in a massively parallel processor (MPP) network. Express channel traffic is placed on, or filtered from, the fiber optic cable at a light frequency or a color different from that of the local channel traffic. The express channel traffic is thus placed on a light carrier that skips over the local intermediate nodes one-by-one by reflecting off of selective mirrors placed at each local node. The local-channel-traffic light carriers pass through the selective mirrors and are not reflected. A single fiber optic cable can thus be threaded throughout a three-dimensional matrix of nodes with the x,y,z directions of propagation encoded by the color of the respective light carriers for both local and express channel traffic. Thus frequency division multiple access is used to hierarchically separate the local and express channels to eliminate the bucket brigade latencies that would otherwise result if the express traffic had to hop between every local node to reach its ultimate destination.

  11. Massively parallel processor networks with optical express channels

    DOEpatents

    Deri, R.J.; Brooks, E.D. III; Haigh, R.E.; DeGroot, A.J.

    1999-08-24

    An optical method for separating and routing local and express channel data comprises interconnecting the nodes in a network with fiber optic cables. A single fiber optic cable carries both express channel traffic and local channel traffic, e.g., in a massively parallel processor (MPP) network. Express channel traffic is placed on, or filtered from, the fiber optic cable at a light frequency or a color different from that of the local channel traffic. The express channel traffic is thus placed on a light carrier that skips over the local intermediate nodes one-by-one by reflecting off of selective mirrors placed at each local node. The local-channel-traffic light carriers pass through the selective mirrors and are not reflected. A single fiber optic cable can thus be threaded throughout a three-dimensional matrix of nodes with the x,y,z directions of propagation encoded by the color of the respective light carriers for both local and express channel traffic. Thus frequency division multiple access is used to hierarchically separate the local and express channels to eliminate the bucket brigade latencies that would otherwise result if the express traffic had to hop between every local node to reach its ultimate destination. 3 figs.

  12. Massively parallel interferometry: towards the all-integrated lambdameter

    NASA Astrophysics Data System (ADS)

    Fodor, Jozsua; Garcia-Marquez, Jorge; Surrel, Yves

    2004-08-01

    In this paper we present recent work about the application of digital phase detection for accurate wavelength measurement using two beam interferometry (lambdametry). The advantage of two beam interferometry is the sinusoidal fringe signal for which precise phase detection algorithms exist. Modern algorithms can cope with different sources of errors, and correct them. We recall the principle of the Michelson-type lambdameter using temporal interference and we introduce the Young-type lambdameter using spatial interference. The Young-type lambdameter is based on the acquisition of the interference pattern from two point sources (e.g. two ends of monomode optical fibers) projected onto a CCD camera. The measurement of an unknown wavelength can be achieved by comparison with a reference wavelength. Accurate interference phase maps can be calculated using spatial phase-shifting. In this way, each small group of contiguous pixels acts as a single interferometer, and the whole set of pixels corresponds to a massively parallel interferometric measurement system (up to many hundreds of thousands units). The major advantage of our method is its structural simplicity and the possibility of full optical integration. The final goal is to achieve a relative uncertainty of the order of some 10-8 with a measurement duration of the order of some minutes. Preliminary results are presented.

  13. Comparing current cluster, massively parallel, and accelerated systems

    SciTech Connect

    Barker, Kevin J; Davis, Kei; Hoisie, Adolfy; Kerbyson, Darren J; Pakin, Scott; Lang, Mike; Sancho Pitarch, Jose C

    2010-01-01

    Currently there is large architectural diversity in high perfonnance computing systems. They include 'commodity' cluster systems that optimize per-node performance for small jobs, massively parallel processors (MPPs) that optimize aggregate perfonnance for large jobs, and accelerated systems that optimize both per-node and aggregate performance but only for applications custom-designed to take advantage of such systems. Because of these dissimilarities, meaningful comparisons of achievable performance are not straightforward. In this work we utilize a methodology that combines both empirical analysis and performance modeling to compare clusters (represented by a 4,352-core IB cluster), MPPs (represented by a 147,456-core BG/P), and accelerated systems (represented by the 129,600-core Roadrunner) across a workload of four applications. Strengths of our approach include the ability to compare architectures - as opposed to specific implementations of an architecture - attribute each application's performance bottlenecks to characteristics unique to each system, and to explore performance scenarios in advance of their availability for measurement. Our analysis illustrates that application performance is essentially unrelated to relative peak performance but that application performance can be both predicted and explained using modeling.

  14. Massively parallel support for a case-based planning system

    NASA Technical Reports Server (NTRS)

    Kettler, Brian P.; Hendler, James A.; Anderson, William A.

    1993-01-01

    Case-based planning (CBP), a kind of case-based reasoning, is a technique in which previously generated plans (cases) are stored in memory and can be reused to solve similar planning problems in the future. CBP can save considerable time over generative planning, in which a new plan is produced from scratch. CBP thus offers a potential (heuristic) mechanism for handling intractable problems. One drawback of CBP systems has been the need for a highly structured memory to reduce retrieval times. This approach requires significant domain engineering and complex memory indexing schemes to make these planners efficient. In contrast, our CBP system, CaPER, uses a massively parallel frame-based AI language (PARKA) and can do extremely fast retrieval of complex cases from a large, unindexed memory. The ability to do fast, frequent retrievals has many advantages: indexing is unnecessary; very large case bases can be used; memory can be probed in numerous alternate ways; and queries can be made at several levels, allowing more specific retrieval of stored plans that better fit the target problem with less adaptation. In this paper we describe CaPER's case retrieval techniques and some experimental results showing its good performance, even on large case bases.

  15. Programming a massively parallel, computation universal system: Static behavior

    NASA Astrophysics Data System (ADS)

    Lapedes, Alan; Farber, Robert

    1986-08-01

    Massively parallel systems are presently the focus of intense interest for a variety of reasons. A key problem is how to control, or ``program'' these systems. In previous work by the authors, the ``optimum finding'' properties of Hopfield neural nets were applied to the nets themselves to create a ``neural compiler.'' This was done in such a way that the problem of programming the attractors of one neural net (called the Slave net) was expressed as an optimization problem that was in turn solved by a second neural net (the Master net). The procedure is effective and efficient. In this series of papers we extend that approach to programming nets that contain interneurons (sometimes called ``hidden neurons''), and thus we deal with nets capable of universal computation. Our work is closely related to recent work of Rummelhart et al. (also Parker, and LeChun), which may be viewed as a special case of this formalism and therefore of ``computing with attractors.'' In later papers in this series, we present the theory for programming time dependent behavior, and consider practical implementations. One may expect numerous applications in view of the computation universality of these networks.

  16. Parallel computation of the SAR distribution in a 3D human head model

    NASA Astrophysics Data System (ADS)

    Walendziuk, Wojciech

    2008-01-01

    This work presents a way of parallel computation of the Specific Absorption Rate distribution. The parallel program used in the computation was based on the FDTD (Finite-Difference Time-Domain) method [1,2,3]. In order to establish communication among the computational nodes, the MPI (Message Passing Interface) standard was used [4,5,6]. The presented example of a human head numerical model was built with the use of MRI (Magnetic Resonance Image) pictures.

  17. Focusing optics of a parallel beam CCD optical tomography apparatus for 3D radiation gel dosimetry.

    PubMed

    Krstajić, Nikola; Doran, Simon J

    2006-04-21

    Optical tomography of gel dosimeters is a promising and cost-effective avenue for quality control of radiotherapy treatments such as intensity-modulated radiotherapy (IMRT). Systems based on a laser coupled to a photodiode have so far shown the best results within the context of optical scanning of radiosensitive gels, but are very slow ( approximately 9 min per slice) and poorly suited to measurements that require many slices. Here, we describe a fast, three-dimensional (3D) optical computed tomography (optical-CT) apparatus, based on a broad, collimated beam, obtained from a high power LED and detected by a charged coupled detector (CCD). The main advantages of such a system are (i) an acquisition speed approximately two orders of magnitude higher than a laser-based system when 3D data are required, and (ii) a greater simplicity of design. This paper advances our previous work by introducing a new design of focusing optics, which take information from a suitably positioned focal plane and project an image onto the CCD. An analysis of the ray optics is presented, which explains the roles of telecentricity, focusing, acceptance angle and depth-of-field (DOF) in the formation of projections. A discussion of the approximation involved in measuring the line integrals required for filtered backprojection reconstruction is given. Experimental results demonstrate (i) the effect on projections of changing the position of the focal plane of the apparatus, (ii) how to measure the acceptance angle of the optics, and (iii) the ability of the new scanner to image both absorbing and scattering gel phantoms. The quality of reconstructed images is very promising and suggests that the new apparatus may be useful in a clinical setting for fast and accurate 3D dosimetry.

  18. Characterization of a parallel beam CCD optical-CT apparatus for 3D radiation dosimetry

    NASA Astrophysics Data System (ADS)

    Krstajić, Nikola; Doran, Simon J.

    2006-12-01

    This paper describes the initial steps we have taken in establishing CCD based optical-CT as a viable alternative for 3-D radiation dosimetry. First, we compare the optical density (OD) measurements from a high quality test target and variable neutral density filter (VNDF). A modulation transfer function (MTF) of individual projections is derived for three positions of the sinusoidal test target within the scanning tank. Our CCD is then characterized in terms of its signal-to-noise ratio (SNR). Finally, a sample reconstruction of a scan of a PRESAGETM (registered trademark of Heuris Pharma, NJ, Skillman, USA.) dosimeter is given, demonstrating the capabilities of the apparatus.

  19. Massive Parallel Sequencing Provides New Perspectives on Bacterial Brain Abscesses

    PubMed Central

    Wilhelmsen, Marianne Thulin; Skrede, Steinar; Meisal, Roger; Jakovljev, Aleksandra; Gaustad, Peter; Hermansen, Nils Olav; Vik-Mo, Einar; Solheim, Ole; Ambur, Ole Herman; Sæbø, Øystein; Høstmælingen, Christina Teisner; Helland, Christian

    2014-01-01

    Rapid development within the field of massive parallel sequencing (MPS) is about to bring this technology within reach for diagnostic microbiology laboratories. We wanted to explore its potential for improving diagnosis and understanding of polymicrobial infections, using bacterial brain abscesses as an example. We conducted a prospective nationwide study on bacterial brain abscesses. Fifty-two surgical samples were included over a 2-year period. The samples were categorized as either spontaneous intracerebral, spontaneous subdural, or postoperative. Bacterial 16S rRNA genes were amplified directly from the specimens and sequenced using Ion Torrent technology, with an average of 500,000 reads per sample. The results were compared to those from culture- and Sanger sequencing-based diagnostics. Compared to culture, MPS allowed for triple the number of bacterial identifications. Aggregatibacter aphrophilus, Fusobacterium nucleatum, and Streptococcus intermedius or combinations of them were found in all spontaneous polymicrobial abscesses. F. nucleatum was systematically detected in samples with anaerobic flora. The increased detection rate for Actinomyces spp. and facultative Gram-negative rods further revealed several species associations. We suggest that A. aphrophilus, F. nucleatum, and S. intermedius are key pathogens for the establishment of spontaneous polymicrobial brain abscesses. In addition, F. nucleatum seems to be important for the development of anaerobic flora. MPS can accurately describe polymicrobial specimens when a sufficient number of reads is used to compensate for unequal species concentrations and principles are defined to discard contaminant bacterial DNA in the subsequent data analysis. This will contribute to our understanding of how different types of polymicrobial infections develop. PMID:24671797

  20. A parallel 3D poisson solver for space charge simulation in cylindrical coordinates.

    SciTech Connect

    Xu, J.; Ostroumov, P. N.; Nolen, J.; Physics

    2008-02-01

    This paper presents the development of a parallel three-dimensional Poisson solver in cylindrical coordinate system for the electrostatic potential of a charged particle beam in a circular tube. The Poisson solver uses Fourier expansions in the longitudinal and azimuthal directions, and Spectral Element discretization in the radial direction. A Dirichlet boundary condition is used on the cylinder wall, a natural boundary condition is used on the cylinder axis and a Dirichlet or periodic boundary condition is used in the longitudinal direction. A parallel 2D domain decomposition was implemented in the (r,{theta}) plane. This solver was incorporated into the parallel code PTRACK for beam dynamics simulations. Detailed benchmark results for the parallel solver and a beam dynamics simulation in a high-intensity proton LINAC are presented. When the transverse beam size is small relative to the aperture of the accelerator line, using the Poisson solver in a Cartesian coordinate system and a Cylindrical coordinate system produced similar results. When the transverse beam size is large or beam center located off-axis, the result from Poisson solver in Cartesian coordinate system is not accurate because different boundary condition used. While using the new solver, we can apply circular boundary condition easily and accurately for beam dynamic simulations in accelerator devices.

  1. Simulation of the 3D viscoelastic free surface flow by a parallel corrected particle scheme

    NASA Astrophysics Data System (ADS)

    Jin-Lian, Ren; Tao, Jiang

    2016-02-01

    In this work, the behavior of the three-dimensional (3D) jet coiling based on the viscoelastic Oldroyd-B model is investigated by a corrected particle scheme, which is named the smoothed particle hydrodynamics with corrected symmetric kernel gradient and shifting particle technique (SPH_CS_SP) method. The accuracy and stability of SPH_CS_SP method is first tested by solving Poiseuille flow and Taylor-Green flow. Then the capacity for the SPH_CS_SP method to solve the viscoelastic fluid is verified by the polymer flow through a periodic array of cylinders. Moreover, the convergence of the SPH_CS_SP method is also investigated. Finally, the proposed method is further applied to the 3D viscoelastic jet coiling problem, and the influences of macroscopic parameters on the jet coiling are discussed. The numerical results show that the SPH_CS_SP method has higher accuracy and better stability than the traditional SPH method and other corrected SPH method, and can improve the tensile instability. Project supported by the Natural Science Foundation of Jiangsu Province, China (Grant Nos. BK20130436 and BK20150436) and the Natural Science Foundation of the Higher Education Institutions of Jiangsu Province, China (Grant No. 15KJB110025).

  2. User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code

    SciTech Connect

    Earth Sciences Division; Zhang, Keni; Zhang, Keni; Wu, Yu-Shu; Pruess, Karsten

    2008-05-27

    TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator is to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used

  3. Massively parallel computational fluid dynamics calculations for aerodynamics and aerothermodynamics applications

    SciTech Connect

    Payne, J.L.; Hassan, B.

    1998-09-01

    Massively parallel computers have enabled the analyst to solve complicated flow fields (turbulent, chemically reacting) that were previously intractable. Calculations are presented using a massively parallel CFD code called SACCARA (Sandia Advanced Code for Compressible Aerothermodynamics Research and Analysis) currently under development at Sandia National Laboratories as part of the Department of Energy (DOE) Accelerated Strategic Computing Initiative (ASCI). Computations were made on a generic reentry vehicle in a hypersonic flowfield utilizing three different distributed parallel computers to assess the parallel efficiency of the code with increasing numbers of processors. The parallel efficiencies for the SACCARA code will be presented for cases using 1, 150, 100 and 500 processors. Computations were also made on a subsonic/transonic vehicle using both 236 and 521 processors on a grid containing approximately 14.7 million grid points. Ongoing and future plans to implement a parallel overset grid capability and couple SACCARA with other mechanics codes in a massively parallel environment are discussed.

  4. Progress in understanding disruptions triggered by massive gas injection via 3D non-linear MHD modelling with JOREK

    NASA Astrophysics Data System (ADS)

    Nardon, E.; Fil, A.; Hoelzl, M.; Huijsmans, G.; contributors, JET

    2017-01-01

    3D non-linear MHD simulations of a D 2 massive gas injection (MGI) triggered disruption in JET with the JOREK code provide results which are qualitatively consistent with experimental observations and shed light on the physics at play. In particular, it is observed that the gas destabilizes a large m/n  =  2/1 tearing mode, with the island O-point coinciding with the gas deposition region, by enhancing the plasma resistivity via cooling. When the 2/1 island gets so large that its inner side reaches the q  =  3/2 surface, a 3/2 tearing mode grows. Simulations suggest that this is due to a steepening of the current profile right inside q  =  3/2. Magnetic field stochastization over a large fraction of the minor radius as well as the growth of higher n modes ensue rapidly, leading to the thermal quench (TQ). The role of the 1/1 internal kink mode is discussed. An I p spike at the TQ is obtained in the simulations but with a smaller amplitude than in the experiment. Possible reasons are discussed.

  5. High-Performance Computation of Distributed-Memory Parallel 3D Voronoi and Delaunay Tessellation

    SciTech Connect

    Peterka, Tom; Morozov, Dmitriy; Phillips, Carolyn

    2014-11-14

    Computing a Voronoi or Delaunay tessellation from a set of points is a core part of the analysis of many simulated and measured datasets: N-body simulations, molecular dynamics codes, and LIDAR point clouds are just a few examples. Such computational geometry methods are common in data analysis and visualization; but as the scale of simulations and observations surpasses billions of particles, the existing serial and shared-memory algorithms no longer suffice. A distributed-memory scalable parallel algorithm is the only feasible approach. The primary contribution of this paper is a new parallel Delaunay and Voronoi tessellation algorithm that automatically determines which neighbor points need to be exchanged among the subdomains of a spatial decomposition. Other contributions include periodic and wall boundary conditions, comparison of our method using two popular serial libraries, and application to numerous science datasets.

  6. Large-Scale Parallel Unstructured Mesh Computations for 3D High-Lift Analysis

    NASA Technical Reports Server (NTRS)

    Mavriplis, D. J.; Pirzadeh, S.

    1999-01-01

    A complete "geometry to drag-polar" analysis capability for three-dimensional high-lift configurations is described. The approach is based on the use of unstructured meshes in order to enable rapid turnaround for complicated geometries which arise in high-lift con gurations. Special attention is devoted to creating a capability for enabling analyses on highly resolved grids. Unstructured meshes of several million vertices are initially generated on a work-station, and subsequently refined on a supercomputer. The flow is solved on these refined meshes on large parallel computers using an unstructured agglomeration multigrid algorithm. Good prediction of lift and drag throughout the range of incidences is demonstrated on a transport take-off configuration using up to 24.7 million grid points. The feasibility of using this approach in a production environment on existing parallel machines is demonstrated, as well as the scalability of the solver on machines using up to 1450 processors.

  7. Large-scale Parallel Unstructured Mesh Computations for 3D High-lift Analysis

    NASA Technical Reports Server (NTRS)

    Mavriplis, Dimitri J.; Pirzadeh, S.

    1999-01-01

    A complete "geometry to drag-polar" analysis capability for the three-dimensional high-lift configurations is described. The approach is based on the use of unstructured meshes in order to enable rapid turnaround for complicated geometries that arise in high-lift configurations. Special attention is devoted to creating a capability for enabling analyses on highly resolved grids. Unstructured meshes of several million vertices are initially generated on a work-station, and subsequently refined on a supercomputer. The flow is solved on these refined meshes on large parallel computers using an unstructured agglomeration multigrid algorithm. Good prediction of lift and drag throughout the range of incidences is demonstrated on a transport take-off configuration using up to 24.7 million grid points. The feasibility of using this approach in a production environment on existing parallel machines is demonstrated, as well as the scalability of the solver on machines using up to 1450 processors.

  8. Large-Scale Parallel Unstructured Mesh Computations for 3D High-Lift Analysis

    NASA Technical Reports Server (NTRS)

    Mavriplis, D. J.; Pirzadeh, S.

    1999-01-01

    A complete "geometry to drag-polar" analysis capability for three-dimensional high-lift configurations is described. The approach is based on the use of unstructured meshes in order to enable rapid turnaround for complicated geometries which arise in high-lift configurations. Special attention is devoted to creating a capability for enabling analyses on highly resolved grids. Unstructured meshes of several million vertices are initially generated on a work-station, and subsequently refined on a supercomputer. The flow is solved on these refined meshes on large parallel computers using an unstructured agglomeration multigrid algorithm. Good prediction of lift and drag throughout the range of incidences is demonstrated on a transport take-off configuration using up to 24.7 million grid points. The feasibility of using this approach in a production environment on existing parallel machines is demonstrated, as well as the scalability of the solver on machines using up to 1450 processors.

  9. Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

    DOE PAGES

    Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael

    2015-04-08

    The growth in size of networked high performance computers along with novel accelerator-based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub-optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on themore » performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter-task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm-based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. As a result, application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, therefore enabling the applications to achieve better time to solution and scalability on Titan during production.« less

  10. Genetic algorithm based task reordering to improve the performance of batch scheduled massively parallel scientific applications

    SciTech Connect

    Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael

    2015-04-08

    The growth in size of networked high performance computers along with novel accelerator-based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub-optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on the performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter-task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm-based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. As a result, application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, therefore enabling the applications to achieve better time to solution and scalability on Titan during production.

  11. Parallel microfluidic synthesis of size-tunable polymeric nanoparticles using 3D flow focusing towards in vivo study

    PubMed Central

    Lim, Jong-Min; Bertrand, Nicolas; Valencia, Pedro M.; Rhee, Minsoung; Langer, Robert; Jon, Sangyong; Farokhzad, Omid C.; Karnik, Rohit

    2014-01-01

    Microfluidic synthesis of nanoparticles (NPs) can enhance the controllability and reproducibility in physicochemical properties of NPs compared to bulk synthesis methods. However, applications of microfluidic synthesis are typically limited to in vitro studies due to low production rates. Herein, we report the parallelization of NP synthesis by 3D hydrodynamic flow focusing (HFF) using a multilayer microfluidic system to enhance the production rate without losing the advantages of reproducibility, controllability, and robustness. Using parallel 3D HFF, polymeric poly(lactide-co-glycolide)-b-polyethyleneglycol (PLGA-PEG) NPs with sizes tunable in the range of 13–150 nm could be synthesized reproducibly with high production rate. As a proof of concept, we used this system to perform in vivo pharmacokinetic and biodistribution study of small (20 nm diameter) PLGA-PEG NPs that are otherwise difficult to synthesize. Microfluidic parallelization thus enables synthesis of NPs with tunable properties with production rates suitable for both in vitro and in vivo studies. PMID:23969105

  12. Waveform inversion for 3-D earth structure using the Direct Solution Method implemented on vector-parallel supercomputer

    NASA Astrophysics Data System (ADS)

    Hara, Tatsuhiko

    2004-08-01

    We implement the Direct Solution Method (DSM) on a vector-parallel supercomputer and show that it is possible to significantly improve its computational efficiency through parallel computing. We apply the parallel DSM calculation to waveform inversion of long period (250-500 s) surface wave data for three-dimensional (3-D) S-wave velocity structure in the upper and uppermost lower mantle. We use a spherical harmonic expansion to represent lateral variation with the maximum angular degree 16. We find significant low velocities under south Pacific hot spots in the transition zone. This is consistent with other seismological studies conducted in the Superplume project, which suggests deep roots of these hot spots. We also perform simultaneous waveform inversion for 3-D S-wave velocity and Q structure. Since resolution for Q is not good, we develop a new technique in which power spectra are used as data for inversion. We find good correlation between long wavelength patterns of Vs and Q in the transition zone such as high Vs and high Q under the western Pacific.

  13. Multi-thread parallel algorithm for reconstructing 3D large-scale porous structures

    NASA Astrophysics Data System (ADS)

    Ju, Yang; Huang, Yaohui; Zheng, Jiangtao; Qian, Xu; Xie, Heping; Zhao, Xi

    2017-04-01

    Geomaterials inherently contain many discontinuous, multi-scale, geometrically irregular pores, forming a complex porous structure that governs their mechanical and transport properties. The development of an efficient reconstruction method for representing porous structures can significantly contribute toward providing a better understanding of the governing effects of porous structures on the properties of porous materials. In order to improve the efficiency of reconstructing large-scale porous structures, a multi-thread parallel scheme was incorporated into the simulated annealing reconstruction method. In the method, four correlation functions, which include the two-point probability function, the linear-path functions for the pore phase and the solid phase, and the fractal system function for the solid phase, were employed for better reproduction of the complex well-connected porous structures. In addition, a random sphere packing method and a self-developed pre-conditioning method were incorporated to cast the initial reconstructed model and select independent interchanging pairs for parallel multi-thread calculation, respectively. The accuracy of the proposed algorithm was evaluated by examining the similarity between the reconstructed structure and a prototype in terms of their geometrical, topological, and mechanical properties. Comparisons of the reconstruction efficiency of porous models with various scales indicated that the parallel multi-thread scheme significantly shortened the execution time for reconstruction of a large-scale well-connected porous model compared to a sequential single-thread procedure.

  14. Design and Sensitivity Analysis Simulation of a Novel 3D Force Sensor Based on a Parallel Mechanism

    PubMed Central

    Yang, Eileen Chih-Ying

    2016-01-01

    Automated force measurement is one of the most important technologies in realizing intelligent automation systems. However, while many methods are available for micro-force sensing, measuring large three-dimensional (3D) forces and loads remains a significant challenge. Accordingly, the present study proposes a novel 3D force sensor based on a parallel mechanism. The transformation function and sensitivity index of the proposed sensor are analytically derived. The simulation results show that the sensor has a larger effective measuring capability than traditional force sensors. Moreover, the sensor has a greater measurement sensitivity for horizontal forces than for vertical forces over most of the measurable force region. In other words, compared to traditional force sensors, the proposed sensor is more sensitive to shear forces than normal forces. PMID:27999246

  15. A massively parallel algorithm for the collision probability calculations in the Apollo-II code using the PVM library

    SciTech Connect

    Stankovski, Z.

    1995-12-31

    The collision probability method in neutron transport, as applied to 2D geometries, consume a great amount of computer time, for a typical 2D assembly calculation about 90% of the computing time is consumed in the collision probability evaluations. Consequently RZ or 3D calculations became prohibitive. In this paper the author presents a simple but efficient parallel algorithm based on the message passing host/node programmation model. Parallelization was applied to the energy group treatment. Such approach permits parallelization of the existing code, requiring only limited modifications. Sequential/parallel computer portability is preserved, which is a necessary condition for a industrial code. Sequential performances are also preserved. The algorithm is implemented on a CRAY 90 coupled to a 128 processor T3D computer, a 16 processor IBM SPI and a network of workstations, using the Public Domain PVM library. The tests were executed for a 2D geometry with the standard 99-group library. All results were very satisfactory, the best ones with IBM SPI. Because of heterogeneity of the workstation network, the author did not ask high performances for this architecture. The same source code was used for all computers. A more impressive advantage of this algorithm will appear in the calculations of the SAPHYR project (with the future fine multigroup library of about 8000 groups) with a massively parallel computer, using several hundreds of processors.

  16. 3D parallel-detection microwave tomography for clinical breast imaging

    NASA Astrophysics Data System (ADS)

    Epstein, N. R.; Meaney, P. M.; Paulsen, K. D.

    2014-12-01

    A biomedical microwave tomography system with 3D-imaging capabilities has been constructed and translated to the clinic. Updates to the hardware and reconfiguration of the electronic-network layouts in a more compartmentalized construct have streamlined system packaging. Upgrades to the data acquisition and microwave components have increased data-acquisition speeds and improved system performance. By incorporating analog-to-digital boards that accommodate the linear amplification and dynamic-range coverage our system requires, a complete set of data (for a fixed array position at a single frequency) is now acquired in 5.8 s. Replacement of key components (e.g., switches and power dividers) by devices with improved operational bandwidths has enhanced system response over a wider frequency range. High-integrity, low-power signals are routinely measured down to -130 dBm for frequencies ranging from 500 to 2300 MHz. Adequate inter-channel isolation has been maintained, and a dynamic range >110 dB has been achieved for the full operating frequency range (500-2900 MHz). For our primary band of interest, the associated measurement deviations are less than 0.33% and 0.5° for signal amplitude and phase values, respectively. A modified monopole antenna array (composed of two interwoven eight-element sub-arrays), in conjunction with an updated motion-control system capable of independently moving the sub-arrays to various in-plane and cross-plane positions within the illumination chamber, has been configured in the new design for full volumetric data acquisition. Signal-to-noise ratios (SNRs) are more than adequate for all transmit/receive antenna pairs over the full frequency range and for the variety of in-plane and cross-plane configurations. For proximal receivers, in-plane SNRs greater than 80 dB are observed up to 2900 MHz, while cross-plane SNRs greater than 80 dB are seen for 6 cm sub-array spacing (for frequencies up to 1500 MHz). We demonstrate accurate recovery

  17. 3D parallel-detection microwave tomography for clinical breast imaging.

    PubMed

    Epstein, N R; Meaney, P M; Paulsen, K D

    2014-12-01

    A biomedical microwave tomography system with 3D-imaging capabilities has been constructed and translated to the clinic. Updates to the hardware and reconfiguration of the electronic-network layouts in a more compartmentalized construct have streamlined system packaging. Upgrades to the data acquisition and microwave components have increased data-acquisition speeds and improved system performance. By incorporating analog-to-digital boards that accommodate the linear amplification and dynamic-range coverage our system requires, a complete set of data (for a fixed array position at a single frequency) is now acquired in 5.8 s. Replacement of key components (e.g., switches and power dividers) by devices with improved operational bandwidths has enhanced system response over a wider frequency range. High-integrity, low-power signals are routinely measured down to -130 dBm for frequencies ranging from 500 to 2300 MHz. Adequate inter-channel isolation has been maintained, and a dynamic range >110 dB has been achieved for the full operating frequency range (500-2900 MHz). For our primary band of interest, the associated measurement deviations are less than 0.33% and 0.5° for signal amplitude and phase values, respectively. A modified monopole antenna array (composed of two interwoven eight-element sub-arrays), in conjunction with an updated motion-control system capable of independently moving the sub-arrays to various in-plane and cross-plane positions within the illumination chamber, has been configured in the new design for full volumetric data acquisition. Signal-to-noise ratios (SNRs) are more than adequate for all transmit/receive antenna pairs over the full frequency range and for the variety of in-plane and cross-plane configurations. For proximal receivers, in-plane SNRs greater than 80 dB are observed up to 2900 MHz, while cross-plane SNRs greater than 80 dB are seen for 6 cm sub-array spacing (for frequencies up to 1500 MHz). We demonstrate accurate recovery

  18. 3D parallel-detection microwave tomography for clinical breast imaging

    SciTech Connect

    Epstein, N. R.; Meaney, P. M.; Paulsen, K. D.

    2014-12-15

    A biomedical microwave tomography system with 3D-imaging capabilities has been constructed and translated to the clinic. Updates to the hardware and reconfiguration of the electronic-network layouts in a more compartmentalized construct have streamlined system packaging. Upgrades to the data acquisition and microwave components have increased data-acquisition speeds and improved system performance. By incorporating analog-to-digital boards that accommodate the linear amplification and dynamic-range coverage our system requires, a complete set of data (for a fixed array position at a single frequency) is now acquired in 5.8 s. Replacement of key components (e.g., switches and power dividers) by devices with improved operational bandwidths has enhanced system response over a wider frequency range. High-integrity, low-power signals are routinely measured down to −130 dBm for frequencies ranging from 500 to 2300 MHz. Adequate inter-channel isolation has been maintained, and a dynamic range >110 dB has been achieved for the full operating frequency range (500–2900 MHz). For our primary band of interest, the associated measurement deviations are less than 0.33% and 0.5° for signal amplitude and phase values, respectively. A modified monopole antenna array (composed of two interwoven eight-element sub-arrays), in conjunction with an updated motion-control system capable of independently moving the sub-arrays to various in-plane and cross-plane positions within the illumination chamber, has been configured in the new design for full volumetric data acquisition. Signal-to-noise ratios (SNRs) are more than adequate for all transmit/receive antenna pairs over the full frequency range and for the variety of in-plane and cross-plane configurations. For proximal receivers, in-plane SNRs greater than 80 dB are observed up to 2900 MHz, while cross-plane SNRs greater than 80 dB are seen for 6 cm sub-array spacing (for frequencies up to 1500 MHz). We demonstrate accurate

  19. 3D parallel-detection microwave tomography for clinical breast imaging

    PubMed Central

    Meaney, P. M.; Paulsen, K. D.

    2014-01-01

    A biomedical microwave tomography system with 3D-imaging capabilities has been constructed and translated to the clinic. Updates to the hardware and reconfiguration of the electronic-network layouts in a more compartmentalized construct have streamlined system packaging. Upgrades to the data acquisition and microwave components have increased data-acquisition speeds and improved system performance. By incorporating analog-to-digital boards that accommodate the linear amplification and dynamic-range coverage our system requires, a complete set of data (for a fixed array position at a single frequency) is now acquired in 5.8 s. Replacement of key components (e.g., switches and power dividers) by devices with improved operational bandwidths has enhanced system response over a wider frequency range. High-integrity, low-power signals are routinely measured down to −130 dBm for frequencies ranging from 500 to 2300 MHz. Adequate inter-channel isolation has been maintained, and a dynamic range >110 dB has been achieved for the full operating frequency range (500–2900 MHz). For our primary band of interest, the associated measurement deviations are less than 0.33% and 0.5° for signal amplitude and phase values, respectively. A modified monopole antenna array (composed of two interwoven eight-element sub-arrays), in conjunction with an updated motion-control system capable of independently moving the sub-arrays to various in-plane and cross-plane positions within the illumination chamber, has been configured in the new design for full volumetric data acquisition. Signal-to-noise ratios (SNRs) are more than adequate for all transmit/receive antenna pairs over the full frequency range and for the variety of in-plane and cross-plane configurations. For proximal receivers, in-plane SNRs greater than 80 dB are observed up to 2900 MHz, while cross-plane SNRs greater than 80 dB are seen for 6 cm sub-array spacing (for frequencies up to 1500 MHz). We demonstrate accurate

  20. Parallel phase-shifting digital holography and its application to high-speed 3D imaging of dynamic object

    NASA Astrophysics Data System (ADS)

    Awatsuji, Yasuhiro; Xia, Peng; Wang, Yexin; Matoba, Osamu

    2016-03-01

    Digital holography is a technique of 3D measurement of object. The technique uses an image sensor to record the interference fringe image containing the complex amplitude of object, and numerically reconstructs the complex amplitude by computer. Parallel phase-shifting digital holography is capable of accurate 3D measurement of dynamic object. This is because this technique can reconstruct the complex amplitude of object, on which the undesired images are not superimposed, form a single hologram. The undesired images are the non-diffraction wave and the conjugate image which are associated with holography. In parallel phase-shifting digital holography, a hologram, whose phase of the reference wave is spatially and periodically shifted every other pixel, is recorded to obtain complex amplitude of object by single-shot exposure. The recorded hologram is decomposed into multiple holograms required for phase-shifting digital holography. The complex amplitude of the object is free from the undesired images is reconstructed from the multiple holograms. To validate parallel phase-shifting digital holography, a high-speed parallel phase-shifting digital holography system was constructed. The system consists of a Mach-Zehnder interferometer, a continuous-wave laser, and a high-speed polarization imaging camera. Phase motion picture of dynamic air flow sprayed from a nozzle was recorded at 180,000 frames per second (FPS) have been recorded by the system. Also phase motion picture of dynamic air induced by discharge between two electrodes has been recorded at 1,000,000 FPS, when high voltage was applied between the electrodes.

  1. Massively parallel implementation of a high order domain decomposition equatorial ocean model

    SciTech Connect

    Ma, H.; McCaffrey, J.W.; Piacsek, S.

    1999-06-01

    The present work is about the algorithms and parallel constructs of a spectral element equatorial ocean model. It shows that high order domain decomposition ocean models can be efficiently implemented on massively parallel architectures, such as the Connection Machine Model CM5. The optimized computational efficiency of the parallel spectral element ocean model comes not only from the exponential convergence of the numerical solution, but also from the work-intensive, medium-grained, geometry-based data parallelism. The data parallelism is created to efficiently implement the spectral element ocean model on the distributed-memory massively parallel computer, which minimizes communication among processing nodes. Computational complexity analysis is given for the parallel algorithm of the spectral element ocean model, and the model's parallel performance on the CM5 is evaluated. Lastly, results from a simulation of wind-driven circulation in low-latitude Atlantic Ocean are described.

  2. MASSIVELY PARALLEL IMPLEMENTATION OF A HIGH ORDER DOMAIN DECOMPOSITION EQUATORIAL OCEAN MODEL

    SciTech Connect

    MA,H.; MCCAFFREY,J.W.; PIACSEK,S.

    1998-07-15

    The present work is about the algorithms and parallel constructs of a spectral element equatorial ocean model. It shows that high order domain decomposition ocean models can be efficiently implemented on massively parallel architectures, such as the Connection Machine Model CM5. The optimized computational efficiency of the parallel spectral element ocean model comes not only from the exponential convergence of the numerical solution, but also from the work-intensive, medium-grained, geometry-based data parallelism. The data parallelism is created to efficiently implement the spectral element ocean model on the distributed-memory massively parallel computer, which minimizes communication among processing nodes. Computational complexity analysis is given for the parallel algorithm of the spectral element ocean model, and the model's parallel performance on the CM5 is evaluated. Lastly, results from a simulation of wind-driven circulation in low-latitude Atlantic ocean are described.

  3. Parallel Implementation of an Adaptive Scheme for 3D Unstructured Grids on the SP2

    NASA Technical Reports Server (NTRS)

    Oliker, Leonid; Biswas, Rupak; Strawn, Roger C.

    1996-01-01

    Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.OX speedup on 64 processors when 10% of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.

  4. Parallel deconvolution of large 3D images obtained by confocal laser scanning microscopy.

    PubMed

    Pawliczek, Piotr; Romanowska-Pawliczek, Anna; Soltys, Zbigniew

    2010-03-01

    Various deconvolution algorithms are often used for restoration of digital images. Image deconvolution is especially needed for the correction of three-dimensional images obtained by confocal laser scanning microscopy. Such images suffer from distortions, particularly in the Z dimension. As a result, reliable automatic segmentation of these images may be difficult or even impossible. Effective deconvolution algorithms are memory-intensive and time-consuming. In this work, we propose a parallel version of the well-known Richardson-Lucy deconvolution algorithm developed for a system with distributed memory and implemented with the use of Message Passing Interface (MPI). It enables significantly more rapid deconvolution of two-dimensional and three-dimensional images by efficiently splitting the computation across multiple computers. The implementation of this algorithm can be used on professional clusters provided by computing centers as well as on simple networks of ordinary PC machines.

  5. Parallel implementation of an adaptive scheme for 3D unstructured grids on the SP2

    NASA Technical Reports Server (NTRS)

    Strawn, Roger C.; Oliker, Leonid; Biswas, Rupak

    1996-01-01

    Dynamic mesh adaption on unstructured grids is a powerful tool for computing unsteady flows that require local grid modifications to efficiently resolve solution features. For this work, we consider an edge-based adaption scheme that has shown good single-processor performance on the C90. We report on our experience parallelizing this code for the SP2. Results show a 47.0X speedup on 64 processors when 10 percent of the mesh is randomly refined. Performance deteriorates to 7.7X when the same number of edges are refined in a highly-localized region. This is because almost all the mesh adaption is confined to a single processor. However, this problem can be remedied by repartitioning the mesh immediately after targeting edges for refinement but before the actual adaption takes place. With this change, the speedup improves dramatically to 43.6X.

  6. Massively parallel computing at Sandia and its application to national defense

    SciTech Connect

    Dosanjh, S.S.

    1991-01-01

    Two years ago, researchers at Sandia National Laboratories showed that a massively parallel computer with 1024 processors could solve scientific problems more than 1000 times faster than a single processor. Since then, interest in massively parallel processing has increased dramatically. This review paper discusses some of the applications of this emerging technology to important problems at Sandia. Particular attention is given here to the impact of massively parallel systems on applications related to national defense. New concepts in heterogenous programming and load balancing for MIMD computers are drastically increasing synthetic aperture radar (SAR) and SDI modeling capabilities. Also, researchers are showing that the current generation of massively parallel MIMD and SIMD computers are highly competitive with a CRAY on hydrodynamic and structural mechanics codes that are optimized for vector processors. 9 refs., 5 figs., 1 tab.

  7. Master-slave interferometry for parallel spectral domain interferometry sensing and versatile 3D optical coherence tomography.

    PubMed

    Podoleanu, Adrian Gh; Bradu, Adrian

    2013-08-12

    Conventional spectral domain interferometry (SDI) methods suffer from the need of data linearization. When applied to optical coherence tomography (OCT), conventional SDI methods are limited in their 3D capability, as they cannot deliver direct en-face cuts. Here we introduce a novel SDI method, which eliminates these disadvantages. We denote this method as Master - Slave Interferometry (MSI), because a signal is acquired by a slave interferometer for an optical path difference (OPD) value determined by a master interferometer. The MSI method radically changes the main building block of an SDI sensor and of a spectral domain OCT set-up. The serially provided signal in conventional technology is replaced by multiple signals, a signal for each OPD point in the object investigated. This opens novel avenues in parallel sensing and in parallelization of signal processing in 3D-OCT, with applications in high- resolution medical imaging and microscopy investigation of biosamples. Eliminating the need of linearization leads to lower cost OCT systems and opens potential avenues in increasing the speed of production of en-face OCT images in comparison with conventional SDI.

  8. A 3D MPI-Parallel GPU-accelerated framework for simulating ocean wave energy converters

    NASA Astrophysics Data System (ADS)

    Pathak, Ashish; Raessi, Mehdi

    2015-11-01

    We present an MPI-parallel GPU-accelerated computational framework for studying the interaction between ocean waves and wave energy converters (WECs). The computational framework captures the viscous effects, nonlinear fluid-structure interaction (FSI), and breaking of waves around the structure, which cannot be captured in many potential flow solvers commonly used for WEC simulations. The full Navier-Stokes equations are solved using the two-step projection method, which is accelerated by porting the pressure Poisson equation to GPUs. The FSI is captured using the numerically stable fictitious domain method. A novel three-phase interface reconstruction algorithm is used to resolve three phases in a VOF-PLIC context. A consistent mass and momentum transport approach enables simulations at high density ratios. The accuracy of the overall framework is demonstrated via an array of test cases. Numerical simulations of the interaction between ocean waves and WECs are presented. Funding from the National Science Foundation CBET-1236462 grant is gratefully acknowledged.

  9. Improvements to the Pool Critical Assembly Pressure Vessel Benchmark with 3-D Parallel SN PENTRAN

    NASA Astrophysics Data System (ADS)

    Edgar, Christopher A.; Sjoden, Glenn E.; Yi, Ce

    2014-06-01

    The internationally circulated Pool Critical Assembly (PCA) Pressure Vessel Benchmark was analyzed using the PENTRAN Parallel SN code system for the geometry, material, and source specifications as described in the PCA Benchmark documentation. Improvements to the benchmark are proposed here through the application of more representative flux and volume weighted homogenized cross sections for the PCA reactor core, which were obtained from a rigorous heterogeneous modeling of all fuel assembly types in the core. A new source term definition is also proposed based on calculated relative power in each core fuel assembly with a spectrum based on the Uranium-235 fission spectra. This research focused on utilizing the BUGLE-96 cross section library and accompanying reaction rates, while also examining PENTRAN's adaptive differencing implemented on a coarse mesh basis, as well as fixed use of Directional Theta-Weighted (DTW) SN differencing scheme in order to compare the calculated PENTRAN results to measured data. The results show good comparison with the measured benchmark data, which suggests PENTRAN is a viable, reliable code system for calculation of light water reactor neutron shielding and pressure vessel dosimetry calculations. Furthermore, the improvements to the benchmark methodology resulting from this work provide a 6 percent increase in accuracy of the calculation (based on the average of all calculation points), when compared with experimentally measured results at the same spatial locations in the PCA pressure vessel simulator.

  10. A parallel overset-curvilinear-immersed boundary framework for simulating complex 3D incompressible flows

    PubMed Central

    Borazjani, Iman; Ge, Liang; Le, Trung; Sotiropoulos, Fotis

    2013-01-01

    We develop an overset-curvilinear immersed boundary (overset-CURVIB) method in a general non-inertial frame of reference to simulate a wide range of challenging biological flow problems. The method incorporates overset-curvilinear grids to efficiently handle multi-connected geometries and increase the resolution locally near immersed boundaries. Complex bodies undergoing arbitrarily large deformations may be embedded within the overset-curvilinear background grid and treated as sharp interfaces using the curvilinear immersed boundary (CURVIB) method (Ge and Sotiropoulos, Journal of Computational Physics, 2007). The incompressible flow equations are formulated in a general non-inertial frame of reference to enhance the overall versatility and efficiency of the numerical approach. Efficient search algorithms to identify areas requiring blanking, donor cells, and interpolation coefficients for constructing the boundary conditions at grid interfaces of the overset grid are developed and implemented using efficient parallel computing communication strategies to transfer information among sub-domains. The governing equations are discretized using a second-order accurate finite-volume approach and integrated in time via an efficient fractional-step method. Various strategies for ensuring globally conservative interpolation at grid interfaces suitable for incompressible flow fractional step methods are implemented and evaluated. The method is verified and validated against experimental data, and its capabilities are demonstrated by simulating the flow past multiple aquatic swimmers and the systolic flow in an anatomic left ventricle with a mechanical heart valve implanted in the aortic position. PMID:23833331

  11. Algebraic multigrid preconditioning within parallel finite-element solvers for 3-D electromagnetic modelling problems in geophysics

    NASA Astrophysics Data System (ADS)

    Koldan, Jelena; Puzyrev, Vladimir; de la Puente, Josep; Houzeaux, Guillaume; Cela, José María

    2014-06-01

    We present an elaborate preconditioning scheme for Krylov subspace methods which has been developed to improve the performance and reduce the execution time of parallel node-based finite-element (FE) solvers for 3-D electromagnetic (EM) numerical modelling in exploration geophysics. This new preconditioner is based on algebraic multigrid (AMG) that uses different basic relaxation methods, such as Jacobi, symmetric successive over-relaxation (SSOR) and Gauss-Seidel, as smoothers and the wave front algorithm to create groups, which are used for a coarse-level generation. We have implemented and tested this new preconditioner within our parallel nodal FE solver for 3-D forward problems in EM induction geophysics. We have performed series of experiments for several models with different conductivity structures and characteristics to test the performance of our AMG preconditioning technique when combined with biconjugate gradient stabilized method. The results have shown that, the more challenging the problem is in terms of conductivity contrasts, ratio between the sizes of grid elements and/or frequency, the more benefit is obtained by using this preconditioner. Compared to other preconditioning schemes, such as diagonal, SSOR and truncated approximate inverse, the AMG preconditioner greatly improves the convergence of the iterative solver for all tested models. Also, when it comes to cases in which other preconditioners succeed to converge to a desired precision, AMG is able to considerably reduce the total execution time of the forward-problem code-up to an order of magnitude. Furthermore, the tests have confirmed that our AMG scheme ensures grid-independent rate of convergence, as well as improvement in convergence regardless of how big local mesh refinements are. In addition, AMG is designed to be a black-box preconditioner, which makes it easy to use and combine with different iterative methods. Finally, it has proved to be very practical and efficient in the

  12. SWAMP+: multiple subsequence alignment using associative massive parallelism

    SciTech Connect

    Steinfadt, Shannon Irene; Baker, Johnnie W

    2010-10-18

    A new parallel algorithm SWAMP+ incorporates the Smith-Waterman sequence alignment on an associative parallel model known as ASC. It is a highly sensitive parallel approach that expands traditional pairwise sequence alignment. This is the first parallel algorithm to provide multiple non-overlapping, non-intersecting subsequence alignments with the accuracy of Smith-Waterman. The efficient algorithm provides multiple alignments similar to BLAST while creating a better workflow for the end users. The parallel portions of the code run in O(m+n) time using m processors. When m = n, the algorithmic analysis becomes O(n) with a coefficient of two, yielding a linear speedup. Implementation of the algorithm on the SIMD ClearSpeed CSX620 confirms this theoretical linear speedup with real timings.

  13. High-performance parallel solver for 3D time-dependent Schrodinger equation for large-scale nanosystems

    NASA Astrophysics Data System (ADS)

    Gainullin, I. K.; Sonkin, M. A.

    2015-03-01

    A parallelized three-dimensional (3D) time-dependent Schrodinger equation (TDSE) solver for one-electron systems is presented in this paper. The TDSE Solver is based on the finite-difference method (FDM) in Cartesian coordinates and uses a simple and explicit leap-frog numerical scheme. The simplicity of the numerical method provides very efficient parallelization and high performance of calculations using Graphics Processing Units (GPUs). For example, calculation of 106 time-steps on the 1000ṡ1000ṡ1000 numerical grid (109 points) takes only 16 hours on 16 Tesla M2090 GPUs. The TDSE Solver demonstrates scalability (parallel efficiency) close to 100% with some limitations on the problem size. The TDSE Solver is validated by calculation of energy eigenstates of the hydrogen atom (13.55 eV) and affinity level of H- ion (0.75 eV). The comparison with other TDSE solvers shows that a GPU-based TDSE Solver is 3 times faster for the problems of the same size and with the same cost of computational resources. The usage of a non-regular Cartesian grid or problem-specific non-Cartesian coordinates increases this benefit up to 10 times. The TDSE Solver was applied to the calculation of the resonant charge transfer (RCT) in nanosystems, including several related physical problems, such as electron capture during H+-H0 collision and electron tunneling between H- ion and thin metallic island film.

  14. Climate systems modeling on massively parallel processing computers at Lawrence Livermore National Laboratory

    SciTech Connect

    Wehner, W.F.; Mirin, A.A.; Bolstad, J.H.

    1996-09-01

    A comprehensive climate system model is under development at Lawrence Livermore National Laboratory. The basis for this model is a consistent coupling of multiple complex subsystem models, each describing a major component of the Earth`s climate. Among these are general circulation models of the atmosphere and ocean, a dynamic and thermodynamic sea ice model, and models of the chemical processes occurring in the air, sea water, and near-surface land. The computational resources necessary to carry out simulations at adequate spatial resolutions for durations of climatic time scales exceed those currently available. Distributed memory massively parallel processing (MPP) computers promise to affordably scale to the computational rates required by directing large numbers of relatively inexpensive processors onto a single problem. We have developed a suite of routines designed to exploit current generation MPP architectures via domain and functional decomposition strategies. These message passing techniques have been implemented in each of the component models and in their coupling interfaces. Production runs of the atmospheric and oceanic components performed on the National Environmental Supercomputing Center (NESC) Cray T3D are described.

  15. Radiation hydrodynamics using characteristics on adaptive decomposed domains for massively parallel star formation simulations

    NASA Astrophysics Data System (ADS)

    Buntemeyer, Lars; Banerjee, Robi; Peters, Thomas; Klassen, Mikhail; Pudritz, Ralph E.

    2016-02-01

    We present an algorithm for solving the radiative transfer problem on massively parallel computers using adaptive mesh refinement and domain decomposition. The solver is based on the method of characteristics which requires an adaptive raytracer that integrates the equation of radiative transfer. The radiation field is split into local and global components which are handled separately to overcome the non-locality problem. The solver is implemented in the framework of the magneto-hydrodynamics code FLASH and is coupled by an operator splitting step. The goal is the study of radiation in the context of star formation simulations with a focus on early disc formation and evolution. This requires a proper treatment of radiation physics that covers both the optically thin as well as the optically thick regimes and the transition region in particular. We successfully show the accuracy and feasibility of our method in a series of standard radiative transfer problems and two 3D collapse simulations resembling the early stages of protostar and disc formation.

  16. ALEGRA -- A massively parallel h-adaptive code for solid dynamics

    SciTech Connect

    Summers, R.M.; Wong, M.K.; Boucheron, E.A.; Weatherby, J.R.

    1997-12-31

    ALEGRA is a multi-material, arbitrary-Lagrangian-Eulerian (ALE) code for solid dynamics designed to run on massively parallel (MP) computers. It combines the features of modern Eulerian shock codes, such as CTH, with modern Lagrangian structural analysis codes using an unstructured grid. ALEGRA is being developed for use on the teraflop supercomputers to conduct advanced three-dimensional (3D) simulations of shock phenomena important to a variety of systems. ALEGRA was designed with the Single Program Multiple Data (SPMD) paradigm, in which the mesh is decomposed into sub-meshes so that each processor gets a single sub-mesh with approximately the same number of elements. Using this approach the authors have been able to produce a single code that can scale from one processor to thousands of processors. A current major effort is to develop efficient, high precision simulation capabilities for ALEGRA, without the computational cost of using a global highly resolved mesh, through flexible, robust h-adaptivity of finite elements. H-adaptivity is the dynamic refinement of the mesh by subdividing elements, thus changing the characteristic element size and reducing numerical error. The authors are working on several major technical challenges that must be met to make effective use of HAMMER on MP computers.

  17. Design of a massively parallel computer using bit serial processing elements

    NASA Technical Reports Server (NTRS)

    Aburdene, Maurice F.; Khouri, Kamal S.; Piatt, Jason E.; Zheng, Jianqing

    1995-01-01

    A 1-bit serial processor designed for a parallel computer architecture is described. This processor is used to develop a massively parallel computational engine, with a single instruction-multiple data (SIMD) architecture. The computer is simulated and tested to verify its operation and to measure its performance for further development.

  18. Solution of large, sparse systems of linear equations in massively parallel applications

    SciTech Connect

    Jones, M.T.; Plassmann, P.E.

    1992-01-01

    We present a general-purpose parallel iterative solver for large, sparse systems of linear equations. This solver is used in two applications, a piezoelectric crystal vibration problem and a superconductor model, that could be solved only on the largest available massively parallel machine. Results obtained on the Intel DELTA show computational rates of up to 3.25 gigaflops for these applications.

  19. Development of a 3D parallel mechanism robot arm with three vertical-axial pneumatic actuators combined with a stereo vision system.

    PubMed

    Chiang, Mao-Hsiung; Lin, Hao-Ting

    2011-01-01

    This study aimed to develop a novel 3D parallel mechanism robot driven by three vertical-axial pneumatic actuators with a stereo vision system for path tracking control. The mechanical system and the control system are the primary novel parts for developing a 3D parallel mechanism robot. In the mechanical system, a 3D parallel mechanism robot contains three serial chains, a fixed base, a movable platform and a pneumatic servo system. The parallel mechanism are designed and analyzed first for realizing a 3D motion in the X-Y-Z coordinate system of the robot's end-effector. The inverse kinematics and the forward kinematics of the parallel mechanism robot are investigated by using the Denavit-Hartenberg notation (D-H notation) coordinate system. The pneumatic actuators in the three vertical motion axes are modeled. In the control system, the Fourier series-based adaptive sliding-mode controller with H(∞) tracking performance is used to design the path tracking controllers of the three vertical servo pneumatic actuators for realizing 3D path tracking control of the end-effector. Three optical linear scales are used to measure the position of the three pneumatic actuators. The 3D position of the end-effector is then calculated from the measuring position of the three pneumatic actuators by means of the kinematics. However, the calculated 3D position of the end-effector cannot consider the manufacturing and assembly tolerance of the joints and the parallel mechanism so that errors between the actual position and the calculated 3D position of the end-effector exist. In order to improve this situation, sensor collaboration is developed in this paper. A stereo vision system is used to collaborate with the three position sensors of the pneumatic actuators. The stereo vision system combining two CCD serves to measure the actual 3D position of the end-effector and calibrate the error between the actual and the calculated 3D position of the end-effector. Furthermore, to

  20. QCD on the Massively Parallel Computer AP1000

    NASA Astrophysics Data System (ADS)

    Akemi, K.; Fujisaki, M.; Okuda, M.; Tago, Y.; Hashimoto, T.; Hioki, S.; Miyamura, O.; Takaishi, T.; Nakamura, A.; de Forcrand, Ph.; Hege, C.; Stamatescu, I. O.

    We present the QCD-TARO program of calculations which uses the parallel computer AP1000 of Fujitsu. We discuss the results on scaling, correlation times and hadronic spectrum, some aspects of the implementation and the future prospects.

  1. Parallelized 3D CSEM modeling using edge-based finite element with total field formulation and unstructured mesh

    NASA Astrophysics Data System (ADS)

    Cai, Hongzhu; Hu, Xiangyun; Li, Jianhui; Endo, Masashi; Xiong, Bin

    2017-02-01

    We solve the 3D controlled-source electromagnetic (CSEM) problem using the edge-based finite element method. The modeling domain is discretized using unstructured tetrahedral mesh. We adopt the total field formulation for the quasi-static variant of Maxwell's equation and the computation cost to calculate the primary field can be saved. We adopt a new boundary condition which approximate the total field on the boundary by the primary field corresponding to the layered earth approximation of the complicated conductivity model. The primary field on the modeling boundary is calculated using fast Hankel transform. By using this new type of boundary condition, the computation cost can be reduced significantly and the modeling accuracy can be improved. We consider that the conductivity can be anisotropic. We solve the finite element system of equations using a parallelized multifrontal solver which works efficiently for multiple source and large scale electromagnetic modeling.

  2. A set of parallel, implicit methods for a reconstructed discontinuous Galerkin method for compressible flows on 3D hybrid grids

    SciTech Connect

    Xia, Yidong; Luo, Hong; Frisbey, Megan; Nourgaliev, Robert

    2014-07-01

    A set of implicit methods are proposed for a third-order hierarchical WENO reconstructed discontinuous Galerkin method for compressible flows on 3D hybrid grids. An attractive feature in these methods are the application of the Jacobian matrix based on the P1 element approximation, resulting in a huge reduction of memory requirement compared with DG (P2). Also, three approaches -- analytical derivation, divided differencing, and automatic differentiation (AD) are presented to construct the Jacobian matrix respectively, where the AD approach shows the best robustness. A variety of compressible flow problems are computed to demonstrate the fast convergence property of the implemented flow solver. Furthermore, an SPMD (single program, multiple data) programming paradigm based on MPI is proposed to achieve parallelism. The numerical results on complex geometries indicate that this low-storage implicit method can provide a viable and attractive DG solution for complicated flows of practical importance.

  3. A set of parallel, implicit methods for a reconstructed discontinuous Galerkin method for compressible flows on 3D hybrid grids

    DOE PAGES

    Xia, Yidong; Luo, Hong; Frisbey, Megan; ...

    2014-07-01

    A set of implicit methods are proposed for a third-order hierarchical WENO reconstructed discontinuous Galerkin method for compressible flows on 3D hybrid grids. An attractive feature in these methods are the application of the Jacobian matrix based on the P1 element approximation, resulting in a huge reduction of memory requirement compared with DG (P2). Also, three approaches -- analytical derivation, divided differencing, and automatic differentiation (AD) are presented to construct the Jacobian matrix respectively, where the AD approach shows the best robustness. A variety of compressible flow problems are computed to demonstrate the fast convergence property of the implemented flowmore » solver. Furthermore, an SPMD (single program, multiple data) programming paradigm based on MPI is proposed to achieve parallelism. The numerical results on complex geometries indicate that this low-storage implicit method can provide a viable and attractive DG solution for complicated flows of practical importance.« less

  4. A Massively Parallel Adaptive Fast Multipole Method on Heterogeneous Architectures

    SciTech Connect

    Lashuk, Ilya; Chandramowlishwaran, Aparna; Langston, Harper; Nguyen, Tuan-Anh; Sampath, Rahul S; Shringarpure, Aashay; Vuduc, Richard; Ying, Lexing; Zorin, Denis; Biros, George

    2012-01-01

    We describe a parallel fast multipole method (FMM) for highly nonuniform distributions of particles. We employ both distributed memory parallelism (via MPI) and shared memory parallelism (via OpenMP and GPU acceleration) to rapidly evaluate two-body nonoscillatory potentials in three dimensions on heterogeneous high performance computing architectures. We have performed scalability tests with up to 30 billion particles on 196,608 cores on the AMD/CRAY-based Jaguar system at ORNL. On a GPU-enabled system (NSF's Keeneland at Georgia Tech/ORNL), we observed 30x speedup over a single core CPU and 7x speedup over a multicore CPU implementation. By combining GPUs with MPI, we achieve less than 10 ns/particle and six digits of accuracy for a run with 48 million nonuniformly distributed particles on 192 GPUs.

  5. Parametric Study of CO2 Sequestration in Geologic Media Using the Massively Parallel Computer Code PFLOTRAN

    NASA Astrophysics Data System (ADS)

    Lu, C.; Lichtner, P. C.; Tsimpanogiannis, I. N.

    2005-12-01

    Uncontrolled release of CO2 to the atmosphere has been identified as a major contributing source to the global warming problem. Significant research efforts from the international scientific community are targeted towards stabilization/reduction of CO2 concentrations in the atmosphere while attempting to satisfy our continuously increasing needs for energy. CO2 sequestration (capture, separation, and long term storage) in various media (e.g. geologic such as depleted oil reservoirs, saline aquifers, etc.; oceanic at different depths) has been considered as a possible solution to reduce green house gas emissions. In this study we utilize the PFLOTRAN simulator to investigate geologic sequestration of CO2. PFLOTRAN is a massively parallel 3-D reservoir simulator for modeling supercritical CO2 sequestration in geologic formations based on continuum scale mass and energy conservations. The mass and energy equations are sequentially coupled to reactive transport equations describing multi-component chemical reactions within the formation including aqueous speciation, and precipitation and dissolution of minerals to describe aqueous and mineral CO2 sequestration. The effect of the injected CO2 on pH, CO2 concentration within the aqueous phase, mineral stability, and other factors can be evaluated with this model. Parallelization is carried out using the PETSc parallel library package based on MPI providing a high parallel efficiency and allowing simulations with several tens of millions of degrees of freedom to be carried out-ideal for large-scale field applications involving multi-component chemistry. In this work, our main focus is a parametrical examination on the effects of reservoir and fluid properties on the sequestration process, such as permeability and capillary pressure functions (e.g. linear, van Genuchten, etc.), diffusion coefficients in a multiphase system, the sensitivity of component solubility on pressure, temperature and mole fractions etc. Several

  6. High density packaging and interconnect of massively parallel image processors

    NASA Technical Reports Server (NTRS)

    Carson, John C.; Indin, Ronald J.

    1991-01-01

    This paper presents conceptual designs for high density packaging of parallel processing systems. The systems fall into two categories: global memory systems where many processors are packaged into a stack, and distributed memory systems where a single processor and many memory chips are packaged into a stack. Thermal behavior and performance are discussed.

  7. Generic, hierarchical framework for massively parallel Wang-Landau sampling.

    PubMed

    Vogel, Thomas; Li, Ying Wai; Wüst, Thomas; Landau, David P

    2013-05-24

    We introduce a parallel Wang-Landau method based on the replica-exchange framework for Monte Carlo simulations. To demonstrate its advantages and general applicability for simulations of complex systems, we apply it to different spin models including spin glasses, the Ising model, and the Potts model, lattice protein adsorption, and the self-assembly process in amphiphilic solutions. Without loss of accuracy, the method gives significant speed-up and potentially scales up to petaflop machines.

  8. Generic, Hierarchical Framework for Massively Parallel Wang-Landau Sampling

    NASA Astrophysics Data System (ADS)

    Vogel, Thomas; Li, Ying Wai; Wüst, Thomas; Landau, David P.

    2013-05-01

    We introduce a parallel Wang-Landau method based on the replica-exchange framework for Monte Carlo simulations. To demonstrate its advantages and general applicability for simulations of complex systems, we apply it to different spin models including spin glasses, the Ising model, and the Potts model, lattice protein adsorption, and the self-assembly process in amphiphilic solutions. Without loss of accuracy, the method gives significant speed-up and potentially scales up to petaflop machines.

  9. A generic, hierarchical framework for massively parallel Wang Landau sampling

    SciTech Connect

    Vogel, Thomas; Li, Ying Wai; Wuest, Thomas; Landau, David P

    2013-01-01

    We introduce a parallel Wang Landau method based on the replica-exchange framework for Monte Carlo simulations. To demonstrate its advantages and general applicability for simulations of com- plex systems, we apply it to the self-assembly process in amphiphilic solutions and to lattice protein adsorption. Without loss of accuracy, the method gives significant speed-up on small architectures like multi-core processors, and should be beneficial for petaflop machines.

  10. Performance Evaluation Methodologies and Tools for Massively Parallel Programs

    NASA Technical Reports Server (NTRS)

    Yan, Jerry C.; Sarukkai, Sekhar; Tucker, Deanne (Technical Monitor)

    1994-01-01

    The need for computing power has forced a migration from serial computation on a single processor to parallel processing on multiprocessors. However, without effective means to monitor (and analyze) program execution, tuning the performance of parallel programs becomes exponentially difficult as program complexity and machine size increase. The recent introduction of performance tuning tools from various supercomputer vendors (Intel's ParAide, TMC's PRISM, CSI'S Apprentice, and Convex's CXtrace) seems to indicate the maturity of performance tool technologies and vendors'/customers' recognition of their importance. However, a few important questions remain: What kind of performance bottlenecks can these tools detect (or correct)? How time consuming is the performance tuning process? What are some important technical issues that remain to be tackled in this area? This workshop reviews the fundamental concepts involved in analyzing and improving the performance of parallel and heterogeneous message-passing programs. Several alternative strategies will be contrasted, and for each we will describe how currently available tuning tools (e.g., AIMS, ParAide, PRISM, Apprentice, CXtrace, ATExpert, Pablo, IPS-2)) can be used to facilitate the process. We will characterize the effectiveness of the tools and methodologies based on actual user experiences at NASA Ames Research Center. Finally, we will discuss their limitations and outline recent approaches taken by vendors and the research community to address them.

  11. Fast parallel Markov clustering in bioinformatics using massively parallel computing on GPU with CUDA and ELLPACK-R sparse format.

    PubMed

    Bustamam, Alhadi; Burrage, Kevin; Hamilton, Nicholas A

    2012-01-01

    Markov clustering (MCL) is becoming a key algorithm within bioinformatics for determining clusters in networks. However,with increasing vast amount of data on biological networks, performance and scalability issues are becoming a critical limiting factor in applications. Meanwhile, GPU computing, which uses CUDA tool for implementing a massively parallel computing environment in the GPU card, is becoming a very powerful, efficient, and low-cost option to achieve substantial performance gains over CPU approaches. The use of on-chip memory on the GPU is efficiently lowering the latency time, thus, circumventing a major issue in other parallel computing environments, such as MPI. We introduce a very fast Markov clustering algorithm using CUDA (CUDA-MCL) to perform parallel sparse matrix-matrix computations and parallel sparse Markov matrix normalizations, which are at the heart of MCL. We utilized ELLPACK-R sparse format to allow the effective and fine-grain massively parallel processing to cope with the sparse nature of interaction networks data sets in bioinformatics applications. As the results show, CUDA-MCL is significantly faster than the original MCL running on CPU. Thus, large-scale parallel computation on off-the-shelf desktop-machines, that were previously only possible on supercomputing architectures, can significantly change the way bioinformaticians and biologists deal with their data.

  12. Performance effects of irregular communications patterns on massively parallel multiprocessors

    NASA Technical Reports Server (NTRS)

    Saltz, Joel; Petiton, Serge; Berryman, Harry; Rifkin, Adam

    1991-01-01

    A detailed study of the performance effects of irregular communications patterns on the CM-2 was conducted. The communications capabilities of the CM-2 were characterized under a variety of controlled conditions. In the process of carrying out the performance evaluation, extensive use was made of a parameterized synthetic mesh. In addition, timings with unstructured meshes generated for aerodynamic codes and a set of sparse matrices with banded patterns on non-zeroes were performed. This benchmarking suite stresses the communications capabilities of the CM-2 in a range of different ways. Benchmark results demonstrate that it is possible to make effective use of much of the massive concurrency available in the communications network.

  13. GPU-based, parallel-line, omni-directional integration of measured acceleration field to obtain the 3D pressure distribution

    NASA Astrophysics Data System (ADS)

    Wang, Jin; Zhang, Cao; Katz, Joseph

    2016-11-01

    A PIV based method to reconstruct the volumetric pressure field by direct integration of the 3D material acceleration directions has been developed. Extending the 2D virtual-boundary omni-directional method (Omni2D, Liu & Katz, 2013), the new 3D parallel-line omni-directional method (Omni3D) integrates the material acceleration along parallel lines aligned in multiple directions. Their angles are set by a spherical virtual grid. The integration is parallelized on a Tesla K40c GPU, which reduced the computing time from three hours to one minute for a single realization. To validate its performance, this method is utilized to calculate the 3D pressure fields in isotropic turbulence and channel flow using the JHU DNS Databases (http://turbulence.pha.jhu.edu). Both integration of the DNS acceleration as well as acceleration from synthetic 3D particles are tested. Results are compared to other method, e.g. solution to the Pressure Poisson Equation (e.g. PPE, Ghaemi et al., 2012) with Bernoulli based Dirichlet boundary conditions, and the Omni2D method. The error in Omni3D prediction is uniformly low, and its sensitivity to acceleration errors is local. It agrees with the PPE/Bernoulli prediction away from the Dirichlet boundary. The Omni3D method is also applied to experimental data obtained using tomographic PIV, and results are correlated with deformation of a compliant wall. ONR.

  14. A sweep algorithm for massively parallel simulation of circuit-switched networks

    NASA Technical Reports Server (NTRS)

    Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.

    1992-01-01

    A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.

  15. Massively parallel finite element computation of three dimensional flow problems

    NASA Astrophysics Data System (ADS)

    Tezduyar, T.; Aliabadi, S.; Behr, M.; Johnson, A.; Mittal, S.

    1992-12-01

    The parallel finite element computation of three-dimensional compressible, and incompressible flows, with emphasis on the space-time formulations, mesh moving schemes and implementations on the Connection Machines CM-200 and CM-5 are presented. For computation of unsteady compressible and incompressible flows involving moving boundaries and interfaces, the Deformable-Spatial-Domain/Stabilized-Space-Time (DSD/SST) formulation that previously developed are employed. In this approach, the stabilized finite element formulations of the governing equations are written over the space-time domain of the problem; therefore, the deformation of the spatial domain with respect to time is taken into account automatically. This approach gives the capability to solve a large class of problems involving free surfaces, moving interfaces, and fluid-structure and fluid-particle interactions. By using special mesh moving schemes, the frequency of remeshing is minimized to reduce the projection errors involved in remeshing and also to increase the parallelization ease of the computations. The implicit equation systems arising from the finite element discretizations are solved iteratively by using the GMRES update technique with the diagonal and nodal-block-diagonal preconditioners. These formulations have all been implemented on the CM-200 and CM-5, and have been applied to several large-scale problems. The three-dimensional problems in this report were all computed on the CM-200 and CM-5.

  16. Performance of the Wavelet Decomposition on Massively Parallel Architectures

    NASA Technical Reports Server (NTRS)

    El-Ghazawi, Tarek A.; LeMoigne, Jacqueline; Zukor, Dorothy (Technical Monitor)

    2001-01-01

    Traditionally, Fourier Transforms have been utilized for performing signal analysis and representation. But although it is straightforward to reconstruct a signal from its Fourier transform, no local description of the signal is included in its Fourier representation. To alleviate this problem, Windowed Fourier transforms and then wavelet transforms have been introduced, and it has been proven that wavelets give a better localization than traditional Fourier transforms, as well as a better division of the time- or space-frequency plane than Windowed Fourier transforms. Because of these properties and after the development of several fast algorithms for computing the wavelet representation of any signal, in particular the Multi-Resolution Analysis (MRA) developed by Mallat, wavelet transforms have increasingly been applied to signal analysis problems, especially real-life problems, in which speed is critical. In this paper we present and compare efficient wavelet decomposition algorithms on different parallel architectures. We report and analyze experimental measurements, using NASA remotely sensed images. Results show that our algorithms achieve significant performance gains on current high performance parallel systems, and meet scientific applications and multimedia requirements. The extensive performance measurements collected over a number of high-performance computer systems have revealed important architectural characteristics of these systems, in relation to the processing demands of the wavelet decomposition of digital images.

  17. Signal processing applications of massively parallel charge domain computing devices

    NASA Technical Reports Server (NTRS)

    Fijany, Amir (Inventor); Barhen, Jacob (Inventor); Toomarian, Nikzad (Inventor)

    1999-01-01

    The present invention is embodied in a charge coupled device (CCD)/charge injection device (CID) architecture capable of performing a Fourier transform by simultaneous matrix vector multiplication (MVM) operations in respective plural CCD/CID arrays in parallel in O(1) steps. For example, in one embodiment, a first CCD/CID array stores charge packets representing a first matrix operator based upon permutations of a Hartley transform and computes the Fourier transform of an incoming vector. A second CCD/CID array stores charge packets representing a second matrix operator based upon different permutations of a Hartley transform and computes the Fourier transform of an incoming vector. The incoming vector is applied to the inputs of the two CCD/CID arrays simultaneously, and the real and imaginary parts of the Fourier transform are produced simultaneously in the time required to perform a single MVM operation in a CCD/CID array.

  18. Repartitioning Strategies for Massively Parallel Simulation of Reacting Flow

    NASA Astrophysics Data System (ADS)

    Pisciuneri, Patrick; Zheng, Angen; Givi, Peyman; Labrinidis, Alexandros; Chrysanthis, Panos

    2015-11-01

    The majority of parallel CFD simulators partition the domain into equal regions and assign the calculations for a particular region to a unique processor. This type of domain decomposition is vital to the efficiency of the solver. However, as the simulation develops, the workload among the partitions often become uneven (e.g. by adaptive mesh refinement, or chemically reacting regions) and a new partition should be considered. The process of repartitioning adjusts the current partition to evenly distribute the load again. We compare two repartitioning tools: Zoltan, an architecture-agnostic graph repartitioner developed at the Sandia National Laboratories; and Paragon, an architecture-aware graph repartitioner developed at the University of Pittsburgh. The comparative assessment is conducted via simulation of the Taylor-Green vortex flow with chemical reaction.

  19. A massively parallel computational approach to coupled thermoelastic/porous gas flow problems

    NASA Technical Reports Server (NTRS)

    Shia, David; Mcmanus, Hugh L.

    1995-01-01

    A new computational scheme for coupled thermoelastic/porous gas flow problems is presented. Heat transfer, gas flow, and dynamic thermoelastic governing equations are expressed in fully explicit form, and solved on a massively parallel computer. The transpiration cooling problem is used as an example problem. The numerical solutions have been verified by comparison to available analytical solutions. Transient temperature, pressure, and stress distributions have been obtained. Small spatial oscillations in pressure and stress have been observed, which would be impractical to predict with previously available schemes. Comparisons between serial and massively parallel versions of the scheme have also been made. The results indicate that for small scale problems the serial and parallel versions use practically the same amount of CPU time. However, as the problem size increases the parallel version becomes more efficient than the serial version.

  20. Massively parallel computation of RCS with finite elements

    NASA Technical Reports Server (NTRS)

    Parker, Jay

    1993-01-01

    One of the promising combinations of finite element approaches for scattering problems uses Whitney edge elements, spherical vector wave-absorbing boundary conditions, and bi-conjugate gradient solution for the frequency-domain near field. Each of these approaches may be criticized. Low-order elements require high mesh density, but also result in fast, reliable iterative convergence. Spherical wave-absorbing boundary conditions require additional space to be meshed beyond the most minimal near-space region, but result in fully sparse, symmetric matrices which keep storage and solution times low. Iterative solution is somewhat unpredictable and unfriendly to multiple right-hand sides, yet we find it to be uniformly fast on large problems to date, given the other two approaches. Implementation of these approaches on a distributed memory, message passing machine yields huge dividends, as full scalability to the largest machines appears assured and iterative solution times are well-behaved for large problems. We present times and solutions for computed RCS for a conducting cube and composite permeability/conducting sphere on the Intel ipsc860 with up to 16 processors solving over 200,000 unknowns. We estimate problems of approximately 10 million unknowns, encompassing 1000 cubic wavelengths, may be attempted on a currently available 512 processor machine, but would be exceedingly tedious to prepare. The most severe bottlenecks are due to the slow rate of mesh generation on non-parallel machines and the large transfer time from such a machine to the parallel processor. One solution, in progress, is to create and then distribute a coarse mesh among the processors, followed by systematic refinement within each processor. Elimination of redundant node definitions at the mesh-partition surfaces, snap-to-surface post processing of the resulting mesh for good modelling of curved surfaces, and load-balancing redistribution of new elements after the refinement are auxiliary

  1. MASSIVELY PARALLEL LATENT SEMANTIC ANALYSES USING A GRAPHICS PROCESSING UNIT

    SciTech Connect

    Cavanagh, J.; Cui, S.

    2009-01-01

    Latent Semantic Analysis (LSA) aims to reduce the dimensions of large term-document datasets using Singular Value Decomposition. However, with the ever-expanding size of datasets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. A graphics processing unit (GPU) can solve some highly parallel problems much faster than a traditional sequential processor or central processing unit (CPU). Thus, a deployable system using a GPU to speed up large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a PC cluster. Due to the GPU’s application-specifi c architecture, harnessing the GPU’s computational prowess for LSA is a great challenge. We presented a parallel LSA implementation on the GPU, using NVIDIA® Compute Unifi ed Device Architecture and Compute Unifi ed Basic Linear Algebra Subprograms software. The performance of this implementation is compared to traditional LSA implementation on a CPU using an optimized Basic Linear Algebra Subprograms library. After implementation, we discovered that the GPU version of the algorithm was twice as fast for large matrices (1 000x1 000 and above) that had dimensions not divisible by 16. For large matrices that did have dimensions divisible by 16, the GPU algorithm ran fi ve to six times faster than the CPU version. The large variation is due to architectural benefi ts of the GPU for matrices divisible by 16. It should be noted that the overall speeds for the CPU version did not vary from relative normal when the matrix dimensions were divisible by 16. Further research is needed in order to produce a fully implementable version of LSA. With that in mind, the research we presented shows that the GPU is a viable option for increasing the speed of LSA, in terms of cost/performance ratio.

  2. Massively Parallel Latent Semantic Analyzes using a Graphics Processing Unit

    SciTech Connect

    Cavanagh, Joseph M; Cui, Xiaohui

    2009-01-01

    Latent Semantic Indexing (LSA) aims to reduce the dimensions of large Term-Document datasets using Singular Value Decomposition. However, with the ever expanding size of data sets, current implementations are not fast enough to quickly and easily compute the results on a standard PC. The Graphics Processing Unit (GPU) can solve some highly parallel problems much faster than the traditional sequential processor (CPU). Thus, a deployable system using a GPU to speedup large-scale LSA processes would be a much more effective choice (in terms of cost/performance ratio) than using a computer cluster. Due to the GPU s application-specific architecture, harnessing the GPU s computational prowess for LSA is a great challenge. We present a parallel LSA implementation on the GPU, using NVIDIA Compute Unified Device Architecture and Compute Unified Basic Linear Algebra Subprograms. The performance of this implementation is compared to traditional LSA implementation on CPU using an optimized Basic Linear Algebra Subprograms library. After implementation, we discovered that the GPU version of the algorithm was twice as fast for large matrices (1000x1000 and above) that had dimensions not divisible by 16. For large matrices that did have dimensions divisible by 16, the GPU algorithm ran five to six times faster than the CPU version. The large variation is due to architectural benefits the GPU has for matrices divisible by 16. It should be noted that the overall speeds for the CPU version did not vary from relative normal when the matrix dimensions were divisible by 16. Further research is needed in order to produce a fully implementable version of LSA. With that in mind, the research we presented shows that the GPU is a viable option for increasing the speed of LSA, in terms of cost/performance ratio.

  3. Massively parallel algorithms for trace-driven cache simulations

    NASA Technical Reports Server (NTRS)

    Nicol, David M.; Greenberg, Albert G.; Lubachevsky, Boris D.

    1991-01-01

    Trace driven cache simulation is central to computer design. A trace is a very long sequence of reference lines from main memory. At the t(exp th) instant, reference x sub t is hashed into a set of cache locations, the contents of which are then compared with x sub t. If at the t sup th instant x sub t is not present in the cache, then it is said to be a miss, and is loaded into the cache set, possibly forcing the replacement of some other memory line, and making x sub t present for the (t+1) sup st instant. The problem of parallel simulation of a subtrace of N references directed to a C line cache set is considered, with the aim of determining which references are misses and related statistics. A simulation method is presented for the Least Recently Used (LRU) policy, which regradless of the set size C runs in time O(log N) using N processors on the exclusive read, exclusive write (EREW) parallel model. A simpler LRU simulation algorithm is given that runs in O(C log N) time using N/log N processors. Timings are presented of the second algorithm's implementation on the MasPar MP-1, a machine with 16384 processors. A broad class of reference based line replacement policies are considered, which includes LRU as well as the Least Frequently Used and Random replacement policies. A simulation method is presented for any such policy that on any trace of length N directed to a C line set runs in the O(C log N) time with high probability using N processors on the EREW model. The algorithms are simple, have very little space overhead, and are well suited for SIMD implementation.

  4. Massively Parallel Solution of Poisson Equation on Coarse Grain MIMD Architectures

    NASA Technical Reports Server (NTRS)

    Fijany, A.; Weinberger, D.; Roosta, R.; Gulati, S.

    1998-01-01

    In this paper a new algorithm, designated as Fast Invariant Imbedding algorithm, for solution of Poisson equation on vector and massively parallel MIMD architectures is presented. This algorithm achieves the same optimal computational efficiency as other Fast Poisson solvers while offering a much better structure for vector and parallel implementation. Our implementation on the Intel Delta and Paragon shows that a speedup of over two orders of magnitude can be achieved even for moderate size problems.

  5. A massively parallel adaptive scheme for melt migration in geodynamics computations

    NASA Astrophysics Data System (ADS)

    Dannberg, Juliane; Heister, Timo; Grove, Ryan

    2016-04-01

    Melt generation and migration are important processes for the evolution of the Earth's interior and impact the global convection of the mantle. While they have been the subject of numerous investigations, the typical time and length-scales of melt transport are vastly different from global mantle convection, which determines where melt is generated. This makes it difficult to study mantle convection and melt migration in a unified framework. In addition, modelling magma dynamics poses the challenge of highly non-linear and spatially variable material properties, in particular the viscosity. We describe our extension of the community mantle convection code ASPECT that adds equations describing the behaviour of silicate melt percolating through and interacting with a viscously deforming host rock. We use the original compressible formulation of the McKenzie equations, augmented by an equation for the conservation of energy. This approach includes both melt migration and melt generation with the accompanying latent heat effects, and it incorporates the individual compressibilities of the solid and the fluid phase. For this, we derive an accurate and stable Finite Element scheme that can be combined with adaptive mesh refinement. This is particularly advantageous for this type of problem, as the resolution can be increased in mesh cells where melt is present and viscosity gradients are high, whereas a lower resolution is sufficient in regions without melt. Together with a high-performance, massively parallel implementation, this allows for high resolution, 3d, compressible, global mantle convection simulations coupled with melt migration. Furthermore, scalable iterative linear solvers are required to solve the large linear systems arising from the discretized system. Finally, we present benchmarks and scaling tests of our solver up to tens of thousands of cores, show the effectiveness of adaptive mesh refinement when applied to melt migration and compare the

  6. Massively parallel computation of three-dimensional scramjet combustor

    NASA Astrophysics Data System (ADS)

    Zheng, Z. H.; Le, J. L.

    Recent progress of computational study of scramjet combustor has been described in Refs 1-3. However, detailed flow properties, especially the lateral properties and the sidewall effects are not considered. In this paper, a parallel simulation of an experimental dual-mode scramjet combustor configuration is presented, considering the jet-to-jet symmetry and the full-duct modeling. Turbulence is modeled with the k-ɛ two-equation turbulence model and a 7-specie, 8-equation kinetics model is used to model hydrogen/air combustion. The conservation form of the Navier-Stokes equations with finite-rate chemistry reactions is solved using a diagonal implicit finite-volume method. For the two cases, the three-dimension flow-fields with equivalence ratio Φ=0.0 and 0.35 have been respectively simulated on the COW and MPP. Wall pressure comparisons between CFD and experiments (CARDC and NAL) show fair agreement for the jet-to-jet case. For the full-duct modeling, more detailed flow properties are obtained. The fuelpenetrating heights of the injectors are different because of the effects of the sidewall boundary layer and the shock wave in the combustor. According to numerical results, if adjusting the locations of the injectors, the combustion efficiency could be improved.

  7. Large-eddy simulation of the Rayleigh-Taylor instability on a massively parallel computer

    SciTech Connect

    Amala, P.A.K.

    1995-03-01

    A computational model for the solution of the three-dimensional Navier-Stokes equations is developed. This model includes a turbulence model: a modified Smagorinsky eddy-viscosity with a stochastic backscatter extension. The resultant equations are solved using finite difference techniques: the second-order explicit Lax-Wendroff schemes. This computational model is implemented on a massively parallel computer. Programming models on massively parallel computers are next studied. It is desired to determine the best programming model for the developed computational model. To this end, three different codes are tested on a current massively parallel computer: the CM-5 at Los Alamos. Each code uses a different programming model: one is a data parallel code; the other two are message passing codes. Timing studies are done to determine which method is the fastest. The data parallel approach turns out to be the fastest method on the CM-5 by at least an order of magnitude. The resultant code is then used to study a current problem of interest to the computational fluid dynamics community. This is the Rayleigh-Taylor instability. The Lax-Wendroff methods handle shocks and sharp interfaces poorly. To this end, the Rayleigh-Taylor linear analysis is modified to include a smoothed interface. The linear growth rate problem is then investigated. Finally, the problem of the randomly perturbed interface is examined. Stochastic backscatter breaks the symmetry of the stationary unstable interface and generates a mixing layer growing at the experimentally observed rate. 115 refs., 51 figs., 19 tabs.

  8. Parallel optimization of pixel purity index algorithm for massive hyperspectral images in cloud computing environment

    NASA Astrophysics Data System (ADS)

    Chen, Yufeng; Wu, Zebin; Sun, Le; Wei, Zhihui; Li, Yonglong

    2016-04-01

    With the gradual increase in the spatial and spectral resolution of hyperspectral images, the size of image data becomes larger and larger, and the complexity of processing algorithms is growing, which poses a big challenge to efficient massive hyperspectral image processing. Cloud computing technologies distribute computing tasks to a large number of computing resources for handling large data sets without the limitation of memory and computing resource of a single machine. This paper proposes a parallel pixel purity index (PPI) algorithm for unmixing massive hyperspectral images based on a MapReduce programming model for the first time in the literature. According to the characteristics of hyperspectral images, we describe the design principle of the algorithm, illustrate the main cloud unmixing processes of PPI, and analyze the time complexity of serial and parallel algorithms. Experimental results demonstrate that the parallel implementation of the PPI algorithm on the cloud can effectively process big hyperspectral data and accelerate the algorithm.

  9. Implementation of Shifted Periodic Boundary Conditions in the Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) Software

    DTIC Science & Technology

    2015-08-01

    Atomic /Molecular Massively Parallel Simulator (LAMMPS) Software by N Scott Weingarten and James P Larentzos Approved for...0687 ● AUG 2015 US Army Research Laboratory Implementation of Shifted Periodic Boundary Conditions in the Large-Scale Atomic /Molecular...Shifted Periodic Boundary Conditions in the Large-Scale Atomic /Molecular Massively Parallel Simulator (LAMMPS) Software 5a. CONTRACT NUMBER 5b

  10. CODE BLUE: Three dimensional massively-parallel simulation of multi-scale configurations

    NASA Astrophysics Data System (ADS)

    Juric, Damir; Kahouadji, Lyes; Chergui, Jalel; Shin, Seungwon; Craster, Richard; Matar, Omar

    2016-11-01

    We present recent progress on BLUE, a solver for massively parallel simulations of fully three-dimensional multiphase flows which runs on a variety of computer architectures from laptops to supercomputers and on 131072 threads or more (limited only by the availability to us of more threads). The code is wholly written in Fortran 2003 and uses a domain decomposition strategy for parallelization with MPI. The fluid interface solver is based on a parallel implementation of a hybrid Front Tracking/Level Set method designed to handle highly deforming interfaces with complex topology changes. We developed parallel GMRES and multigrid iterative solvers suited to the linear systems arising from the implicit solution for the fluid velocities and pressure in the presence of strong density and viscosity discontinuities across fluid phases. Particular attention is drawn to the details and performance of the parallel Multigrid solver. EPSRC UK Programme Grant MEMPHIS (EP/K003976/1).

  11. A highly scalable massively parallel fast marching method for the Eikonal equation

    NASA Astrophysics Data System (ADS)

    Yang, Jianming; Stern, Frederick

    2017-03-01

    The fast marching method is a widely used numerical method for solving the Eikonal equation arising from a variety of scientific and engineering fields. It is long deemed inherently sequential and an efficient parallel algorithm applicable to large-scale practical applications is not available in the literature. In this study, we present a highly scalable massively parallel implementation of the fast marching method using a domain decomposition approach. Central to this algorithm is a novel restarted narrow band approach that coordinates the frequency of communications and the amount of computations extra to a sequential run for achieving an unprecedented parallel performance. Within each restart, the narrow band fast marching method is executed; simple synchronous local exchanges and global reductions are adopted for communicating updated data in the overlapping regions between neighboring subdomains and getting the latest front status, respectively. The independence of front characteristics is exploited through special data structures and augmented status tags to extract the masked parallelism within the fast marching method. The efficiency, flexibility, and applicability of the parallel algorithm are demonstrated through several examples. These problems are extensively tested on six grids with up to 1 billion points using different numbers of processes ranging from 1 to 65536. Remarkable parallel speedups are achieved using tens of thousands of processes. Detailed pseudo-codes for both the sequential and parallel algorithms are provided to illustrate the simplicity of the parallel implementation and its similarity to the sequential narrow band fast marching algorithm.

  12. Salinas - An implicit finite element structural dynamics code developed for massively parallel platforms

    SciTech Connect

    BHARDWAJ, MANLJ K.; REESE,GARTH M.; DRIESSEN,BRIAN; ALVIN,KENNETH F.; DAY,DAVID M.

    2000-04-06

    As computational needs for structural finite element analysis increase, a robust implicit structural dynamics code is needed which can handle millions of degrees of freedom in the model and produce results with quick turn around time. A parallel code is needed to avoid limitations of serial platforms. Salinas is an implicit structural dynamics code specifically designed for massively parallel platforms. It computes the structural response of very large complex structures and provides solutions faster than any existing serial machine. This paper gives a current status of Salinas and uses demonstration problems to show Salinas' performance.

  13. Analysis of gallium arsenide deposition in a horizontal chemical vapor deposition reactor using massively parallel computations

    SciTech Connect

    Salinger, A.G.; Shadid, J.N.; Hutchinson, S.A.

    1998-01-01

    A numerical analysis of the deposition of gallium from trimethylgallium (TMG) and arsine in a horizontal CVD reactor with tilted susceptor and a three inch diameter rotating substrate is performed. The three-dimensional model includes complete coupling between fluid mechanics, heat transfer, and species transport, and is solved using an unstructured finite element discretization on a massively parallel computer. The effects of three operating parameters (the disk rotation rate, inlet TMG fraction, and inlet velocity) and two design parameters (the tilt angle of the reactor base and the reactor width) on the growth rate and uniformity are presented. The nonlinear dependence of the growth rate uniformity on the key operating parameters is discussed in detail. Efficient and robust algorithms for massively parallel reacting flow simulations, as incorporated into our analysis code MPSalsa, make detailed analysis of this complicated system feasible.

  14. Chemical network problems solved on NASA/Goddard's massively parallel processor computer

    NASA Technical Reports Server (NTRS)

    Cho, Seog Y.; Carmichael, Gregory R.

    1987-01-01

    The single instruction stream, multiple data stream Massively Parallel Processor (MPP) unit consists of 16,384 bit serial arithmetic processors configured as a 128 x 128 array whose speed can exceed that of current supercomputers (Cyber 205). The applicability of the MPP for solving reaction network problems is presented and discussed, including the mapping of the calculation to the architecture, and CPU timing comparisons.

  15. Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes

    PubMed Central

    2009-01-01

    Background Molecular evolutionary studies share the common goal of elucidating historical relationships, and the common challenge of adequately sampling taxa and characters. Particularly at low taxonomic levels, recent divergence, rapid radiations, and conservative genome evolution yield limited sequence variation, and dense taxon sampling is often desirable. Recent advances in massively parallel sequencing make it possible to rapidly obtain large amounts of sequence data, and multiplexing makes extensive sampling of megabase sequences feasible. Is it possible to efficiently apply massively parallel sequencing to increase phylogenetic resolution at low taxonomic levels? Results We reconstruct the infrageneric phylogeny of Pinus from 37 nearly-complete chloroplast genomes (average 109 kilobases each of an approximately 120 kilobase genome) generated using multiplexed massively parallel sequencing. 30/33 ingroup nodes resolved with ≥ 95% bootstrap support; this is a substantial improvement relative to prior studies, and shows massively parallel sequencing-based strategies can produce sufficient high quality sequence to reach support levels originally proposed for the phylogenetic bootstrap. Resampling simulations show that at least the entire plastome is necessary to fully resolve Pinus, particularly in rapidly radiating clades. Meta-analysis of 99 published infrageneric phylogenies shows that whole plastome analysis should provide similar gains across a range of plant genera. A disproportionate amount of phylogenetic information resides in two loci (ycf1, ycf2), highlighting their unusual evolutionary properties. Conclusion Plastome sequencing is now an efficient option for increasing phylogenetic resolution at lower taxonomic levels in plant phylogenetic and population genetic analyses. With continuing improvements in sequencing capacity, the strategies herein should revolutionize efforts requiring dense taxon and character sampling, such as phylogeographic

  16. Implementation of (omega)-k synthetic aperture radar imaging algorithm on a massively parallel supercomputer

    NASA Astrophysics Data System (ADS)

    Yerkes, Christopher R.; Webster, Eric D.

    1994-06-01

    Advanced algorithms for synthetic aperture radar (SAR) imaging have in the past required computing capabilities only available from high performance special purpose hardware. Such architectures have tended to have short life cycles with respect to development expense. Current generation Massively Parallel Processors (MPP) are offering high performance capabilities necessary for such applications with both a scalable architecture and a longer projected life cycle. In this paper we explore issues associated with implementation of a SAR imaging algorithm on a mesh configured MPP architecture.

  17. Using CLIPS in the domain of knowledge-based massively parallel programming

    NASA Technical Reports Server (NTRS)

    Dvorak, Jiri J.

    1994-01-01

    The Program Development Environment (PDE) is a tool for massively parallel programming of distributed-memory architectures. Adopting a knowledge-based approach, the PDE eliminates the complexity introduced by parallel hardware with distributed memory and offers complete transparency in respect of parallelism exploitation. The knowledge-based part of the PDE is realized in CLIPS. Its principal task is to find an efficient parallel realization of the application specified by the user in a comfortable, abstract, domain-oriented formalism. A large collection of fine-grain parallel algorithmic skeletons, represented as COOL objects in a tree hierarchy, contains the algorithmic knowledge. A hybrid knowledge base with rule modules and procedural parts, encoding expertise about application domain, parallel programming, software engineering, and parallel hardware, enables a high degree of automation in the software development process. In this paper, important aspects of the implementation of the PDE using CLIPS and COOL are shown, including the embedding of CLIPS with C++-based parts of the PDE. The appropriateness of the chosen approach and of the CLIPS language for knowledge-based software engineering are discussed.

  18. Massive sulfide exploration models of the Iberian Pyrite Belt Neves Corvo mine region, based in a 3D geological, geophysical and geochemical ProMine study

    NASA Astrophysics Data System (ADS)

    Inverno, Carlos; Matos, João Xavier; Rosa, Carlos; Mário Castelo-Branco, José; Granado, Isabel; Carvalho, João; João Baptista, Maria; Represas, Patrícia; Pereira, Zélia; Oliveira, Tomás; Araujo, Vitor

    2013-04-01

    The Iberian Pyrite Belt (IPB) hosts one of the largest concentrations of massive sulfides in the Earth's crust. This highly productive VMS belt contains more than 85 massive sulfide deposits, totalling an estimate of 1600 Mt of massive ore and about 250 Mt of stockwork ore (Leistel et al., 1998; Oliveira et al., 2005; Tornos, 2006). Included in the South Portuguese Zone the IPB is represented by the Phyllite-Quartzite Group (PQG) composed of shales and quartzites of late Devonian age followed by the Volcanic-Sedimentary Complex (VSC) a submarine succession of sediments and felsic and basic volcanic rocks (late Famennian-late Viséan age). Above the IPB a turbidite sedimentary unit occurs being represented by the Baixo Alentejo Flysch Group (BAFG). The ore deposits are hosted by felsic volcanic rocks and sediments that are dominant in the lower part of the VSC succession. The Neves Corvo (ProMine, EU FP7) project area is focused on the Neves Corvo deposit, an active copper mine. The project area is located between the Messejana Fault and the Portuguese/Spanish border which has been selected for the 3D geological and geophysical modelling study, based on high exploration potential of the Neves Corvo area (Oliveira et al. 2006, Relvas et al. 2006, Pereira et al. 2008, Rosa et al. 2008, Matos et al. 2011, Oliveira et al. 2013). In this study existing LNEG and AGC geological, geophysical and geochemistry databases were considered. New surveys were done: i) - A physical volcanology and palynostratigraphic age data study and log of the Cotovio drill-hole core (1,888 m, drilled by AGC). ii) - Interpretation of 280 km of Squid TEM performed by AGC. Based on the TEM data, significant conductors have been identified related with: shallow conductive cover, graphitic shale, black shale and sulphide mineralizations. The most important TEM conductors are related with the Neves Corvo massive sulphides lenses (1-10 Ωm). iii) - Ground and residual gravimetry studies including

  19. ASCI Red -- Experiences and lessons learned with a massively parallel teraFLOP supercomputer

    SciTech Connect

    Christon, M.A.; Crawford, D.A.; Hertel, E.S.; Peery, J.S.; Robinson, A.C.

    1997-06-01

    The Accelerated Strategic Computing Initiative (ASCI) program involves Sandia, Los Alamos and Lawrence Livermore National Laboratories. At Sandia National Laboratories, ASCI applications include large deformation transient dynamics, shock propagation, electromechanics, and abnormal thermal environments. In order to resolve important physical phenomena in these problems, it is estimated that meshes ranging from 10{sup 6} to 10{sup 9} grid points will be required. The ASCI program is relying on the use of massively parallel supercomputers initially capable of delivering over 1 TFLOPs to perform such demanding computations. The ASCI Red machine at Sandia National Laboratories consists of over 4,500 computational nodes with a peak computational rate of 1.8 TFLOPs, 567 GBytes of memory, and 2 TBytes of disk storage. Regardless of the peak FLOP rate, there are many issues surrounding the use of massively parallel supercomputers in a production environment. These issues include parallel I/O, mesh generation, visualization, archival storage, high-bandwidth networking and the development of parallel algorithms. In order to illustrate these issues and their solution with respect to ASCI Red, demonstration calculations of time-dependent buoyancy-dominated plumes, electromechanics, and shock propagation will be presented.

  20. Massively parallel Monte Carlo for many-particle simulations on GPUs

    SciTech Connect

    Anderson, Joshua A.; Jankowski, Eric; Grubb, Thomas L.; Engel, Michael; Glotzer, Sharon C.

    2013-12-01

    Current trends in parallel processors call for the design of efficient massively parallel algorithms for scientific computing. Parallel algorithms for Monte Carlo simulations of thermodynamic ensembles of particles have received little attention because of the inherent serial nature of the statistical sampling. In this paper, we present a massively parallel method that obeys detailed balance and implement it for a system of hard disks on the GPU. We reproduce results of serial high-precision Monte Carlo runs to verify the method. This is a good test case because the hard disk equation of state over the range where the liquid transforms into the solid is particularly sensitive to small deviations away from the balance conditions. On a Tesla K20, our GPU implementation executes over one billion trial moves per second, which is 148 times faster than on a single Intel Xeon E5540 CPU core, enables 27 times better performance per dollar, and cuts energy usage by a factor of 13. With this improved performance we are able to calculate the equation of state for systems of up to one million hard disks. These large system sizes are required in order to probe the nature of the melting transition, which has been debated for the last forty years. In this paper we present the details of our computational method, and discuss the thermodynamics of hard disks separately in a companion paper.

  1. Molecular Dynamics Simulations from SNL's Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS)

    DOE Data Explorer

    Plimpton, Steve; Thompson, Aidan; Crozier, Paul

    LAMMPS (http://lammps.sandia.gov/index.html) stands for Large-scale Atomic/Molecular Massively Parallel Simulator and is a code that can be used to model atoms or, as the LAMMPS website says, as a parallel particle simulator at the atomic, meso, or continuum scale. This Sandia-based website provides a long list of animations from large simulations. These were created using different visualization packages to read LAMMPS output, and each one provides the name of the PI and a brief description of the work done or visualization package used. See also the static images produced from simulations at http://lammps.sandia.gov/pictures.html The foundation paper for LAMMPS is: S. Plimpton, Fast Parallel Algorithms for Short-Range Molecular Dynamics, J Comp Phys, 117, 1-19 (1995), but the website also lists other papers describing contributions to LAMMPS over the years.

  2. SPECT reconstruction using a backpropagation neural network implemented on a massively parallel SIMD computer

    SciTech Connect

    Kerr, J.P.; Bartlett, E.B.

    1992-12-31

    In this paper, the feasibility of reconstructing a single photon emission computed tomography (SPECT) image via the parallel implementation of a backpropagation neural network is shown. The MasPar, MP-1 is a single instruction multiple data (SIMD) massively parallel machine. It is composed of a 128 x 128 array of 4-bit processors. The neural network is distributed on the array by dedicating a processor to each node and each interconnection of the network. An 8 x 8 SPECT image slice section is projected into eight planes. It is shown that based on the projections, the neural network can produce the original SPECT slice image exactly. Likewise, when trained on two parallel slices, separated by one slice, the neural network is able to reproduce the center, untrained image to an RMS error of 0.001928.

  3. Fast structural design and analysis via hybrid domain decomposition on massively parallel processors

    NASA Technical Reports Server (NTRS)

    Farhat, Charbel

    1993-01-01

    A hybrid domain decomposition framework for static, transient and eigen finite element analyses of structural mechanics problems is presented. Its basic ingredients include physical substructuring and /or automatic mesh partitioning, mapping algorithms, 'gluing' approximations for fast design modifications and evaluations, and fast direct and preconditioned iterative solvers for local and interface subproblems. The overall methodology is illustrated with the structural design of a solar viewing payload that is scheduled to fly in March 1993. This payload has been entirely designed and validated by a group of undergraduate students at the University of Colorado using the proposed hybrid domain decomposition approach on a massively parallel processor. Performance results are reported on the CRAY Y-MP/8 and the iPSC-860/64 Touchstone systems, which represent both extreme parallel architectures. The hybrid domain decomposition methodology is shown to outperform leading solution algorithms and to exhibit an excellent parallel scalability.

  4. LDRD final report on massively-parallel linear programming : the parPCx system.

    SciTech Connect

    Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar

    2005-02-01

    This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runs on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We

  5. On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods

    PubMed Central

    Lee, Anthony; Yau, Christopher; Giles, Michael B.; Doucet, Arnaud; Holmes, Christopher C.

    2011-01-01

    We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. PMID:22003276

  6. On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods.

    PubMed

    Lee, Anthony; Yau, Christopher; Giles, Michael B; Doucet, Arnaud; Holmes, Christopher C

    2010-12-01

    We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design.

  7. Overcoming rule-based rigidity and connectionist limitations through massively-parallel case-based reasoning

    NASA Technical Reports Server (NTRS)

    Barnden, John; Srinivas, Kankanahalli

    1990-01-01

    Symbol manipulation as used in traditional Artificial Intelligence has been criticized by neural net researchers for being excessively inflexible and sequential. On the other hand, the application of neural net techniques to the types of high-level cognitive processing studied in traditional artificial intelligence presents major problems as well. A promising way out of this impasse is to build neural net models that accomplish massively parallel case-based reasoning. Case-based reasoning, which has received much attention recently, is essentially the same as analogy-based reasoning, and avoids many of the problems leveled at traditional artificial intelligence. Further problems are avoided by doing many strands of case-based reasoning in parallel, and by implementing the whole system as a neural net. In addition, such a system provides an approach to some aspects of the problems of noise, uncertainty and novelty in reasoning systems. The current neural net system (Conposit), which performs standard rule-based reasoning, is being modified into a massively parallel case-based reasoning version.

  8. Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project.

    PubMed

    Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A L

    2012-06-13

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.

  9. Time-dependent density-functional theory in massively parallel computer architectures: the octopus project

    NASA Astrophysics Data System (ADS)

    Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.

    2012-06-01

    Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.

  10. Commodity cluster and hardware-based massively parallel implementations of hyperspectral imaging algorithms

    NASA Astrophysics Data System (ADS)

    Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David

    2006-05-01

    The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.

  11. A Two Colorable Fourth Order Compact Difference Scheme and Parallel Iterative Solution of the 3D Convection Diffusion Equation

    NASA Technical Reports Server (NTRS)

    Zhang, Jun; Ge, Lixin; Kouatchou, Jules

    2000-01-01

    A new fourth order compact difference scheme for the three dimensional convection diffusion equation with variable coefficients is presented. The novelty of this new difference scheme is that it Only requires 15 grid points and that it can be decoupled with two colors. The entire computational grid can be updated in two parallel subsweeps with the Gauss-Seidel type iterative method. This is compared with the known 19 point fourth order compact differenCe scheme which requires four colors to decouple the computational grid. Numerical results, with multigrid methods implemented on a shared memory parallel computer, are presented to compare the 15 point and the 19 point fourth order compact schemes.

  12. Scaling and performance of a 3-D radiation hydrodynamics code on message-passing parallel computers: final report

    SciTech Connect

    Hayes, J C; Norman, M

    1999-10-28

    This report details an investigation into the efficacy of two approaches to solving the radiation diffusion equation within a radiation hydrodynamic simulation. Because leading-edge scientific computing platforms have evolved from large single-node vector processors to parallel aggregates containing tens to thousands of individual CPU's, the ability of an algorithm to maintain high compute efficiency when distributed over a large array of nodes is critically important. The viability of an algorithm thus hinges upon the tripartite question of numerical accuracy, total time to solution, and parallel efficiency.

  13. Massively parallel DNA sequencing facilitates diagnosis of patients with Usher syndrome type 1.

    PubMed

    Yoshimura, Hidekane; Iwasaki, Satoshi; Nishio, Shin-Ya; Kumakawa, Kozo; Tono, Tetsuya; Kobayashi, Yumiko; Sato, Hiroaki; Nagai, Kyoko; Ishikawa, Kotaro; Ikezono, Tetsuo; Naito, Yasushi; Fukushima, Kunihiro; Oshikawa, Chie; Kimitsuki, Takashi; Nakanishi, Hiroshi; Usami, Shin-Ichi

    2014-01-01

    Usher syndrome is an autosomal recessive disorder manifesting hearing loss, retinitis pigmentosa and vestibular dysfunction, and having three clinical subtypes. Usher syndrome type 1 is the most severe subtype due to its profound hearing loss, lack of vestibular responses, and retinitis pigmentosa that appears in prepuberty. Six of the corresponding genes have been identified, making early diagnosis through DNA testing possible, with many immediate and several long-term advantages for patients and their families. However, the conventional genetic techniques, such as direct sequence analysis, are both time-consuming and expensive. Targeted exon sequencing of selected genes using the massively parallel DNA sequencing technology will potentially enable us to systematically tackle previously intractable monogenic disorders and improve molecular diagnosis. Using this technique combined with direct sequence analysis, we screened 17 unrelated Usher syndrome type 1 patients and detected probable pathogenic variants in the 16 of them (94.1%) who carried at least one mutation. Seven patients had the MYO7A mutation (41.2%), which is the most common type in Japanese. Most of the mutations were detected by only the massively parallel DNA sequencing. We report here four patients, who had probable pathogenic mutations in two different Usher syndrome type 1 genes, and one case of MYO7A/PCDH15 digenic inheritance. This is the first report of Usher syndrome mutation analysis using massively parallel DNA sequencing and the frequency of Usher syndrome type 1 genes in Japanese. Mutation screening using this technique has the power to quickly identify mutations of many causative genes while maintaining cost-benefit performance. In addition, the simultaneous mutation analysis of large numbers of genes is useful for detecting mutations in different genes that are possibly disease modifiers or of digenic inheritance.

  14. Stochastic simulation of charged particle transport on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Earl, James A.

    1988-01-01

    Computations of cosmic-ray transport based upon finite-difference methods are afflicted by instabilities, inaccuracies, and artifacts. To avoid these problems, researchers developed a Monte Carlo formulation which is closely related not only to the finite-difference formulation, but also to the underlying physics of transport phenomena. Implementations of this approach are currently running on the Massively Parallel Processor at Goddard Space Flight Center, whose enormous computing power overcomes the poor statistical accuracy that usually limits the use of stochastic methods. These simulations have progressed to a stage where they provide a useful and realistic picture of solar energetic particle propagation in interplanetary space.

  15. Scalable load balancing for massively parallel distributed Monte Carlo particle transport

    SciTech Connect

    O'Brien, M. J.; Brantley, P. S.; Joy, K. I.

    2013-07-01

    In order to run computer simulations efficiently on massively parallel computers with hundreds of thousands or millions of processors, care must be taken that the calculation is load balanced across the processors. Examining the workload of every processor leads to an unscalable algorithm, with run time at least as large as O(N), where N is the number of processors. We present a scalable load balancing algorithm, with run time 0(log(N)), that involves iterated processor-pair-wise balancing steps, ultimately leading to a globally balanced workload. We demonstrate scalability of the algorithm up to 2 million processors on the Sequoia supercomputer at Lawrence Livermore National Laboratory. (authors)

  16. A fully coupled method for massively parallel simulation of hydraulically driven fractures in 3-dimensions

    DOE PAGES

    Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.; ...

    2016-09-18

    This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.

  17. Estimating water flow through a hillslope using the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Devaney, Judy E.; Camillo, P. J.; Gurney, R. J.

    1988-01-01

    A new two-dimensional model of water flow in a hillslope has been implemented on the Massively Parallel Processor at the Goddard Space Flight Center. Flow in the soil both in the saturated and unsaturated zones, evaporation and overland flow are all modelled, and the rainfall rates are allowed to vary spatially. Previous models of this type had always been very limited computationally. This model takes less than a minute to model all the components of the hillslope water flow for a day. The model can now be used in sensitivity studies to specify which measurements should be taken and how accurate they should be to describe such flows for environmental studies.

  18. Degradation in forensic trace DNA samples explored by massively parallel sequencing.

    PubMed

    Hanssen, Eirik Nataas; Lyle, Robert; Egeland, Thore; Gill, Peter

    2017-03-01

    Routine forensic analysis using STRs will fail if the DNA is too degraded. The DNA degradation process in biological stain material is not well understood. In this study we sequenced old semen and blood stains by massively parallel sequencing. The sequence data coverage was used to measure degradation across the genome. The results supported the contention that degradation is uniform across the genome, showing no evidence of regions with increased or decreased resistance towards degradation. Thus the lack of genetic regions robust to degradation removes the possibility of using such regions to further optimize analysis performance for degraded DNA.

  19. Block iterative restoration of astronomical images with the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Heap, Sara R.; Lindler, Don J.

    1987-01-01

    A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images.

  20. Genome-wide DNA methylation analysis using massively parallel sequencing technologies.

    PubMed

    Suzuki, Masako; Greally, John M

    2013-01-01

    "Epigenetics" refers to a heritable change in transcriptional status without alteration in the primary nucleotide sequence. Epigenetics provides an extra layer of transcriptional control and plays a crucial role in normal development, as well as in pathological conditions. DNA methylation is one of the best known and well-studied epigenetic modifications. Genome-wide DNA methylation profiling has become recognized as a biologically and clinically important epigenomic assay. In this review, we discuss the strengths and weaknesses of the protocols for genome-wide DNA methylation profiling using massively parallel sequencing (MPS) techniques. We will also describe recently discovered DNA modifications, and the protocols to detect these modifications.

  1. Animated computer graphics models of space and earth sciences data generated via the massively parallel processor

    NASA Technical Reports Server (NTRS)

    Treinish, Lloyd A.; Gough, Michael L.; Wildenhain, W. David

    1987-01-01

    The capability was developed of rapidly producing visual representations of large, complex, multi-dimensional space and earth sciences data sets via the implementation of computer graphics modeling techniques on the Massively Parallel Processor (MPP) by employing techniques recently developed for typically non-scientific applications. Such capabilities can provide a new and valuable tool for the understanding of complex scientific data, and a new application of parallel computing via the MPP. A prototype system with such capabilities was developed and integrated into the National Space Science Data Center's (NSSDC) Pilot Climate Data System (PCDS) data-independent environment for computer graphics data display to provide easy access to users. While developing these capabilities, several problems had to be solved independently of the actual use of the MPP, all of which are outlined.

  2. Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

    DOE PAGES

    Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; ...

    2015-12-21

    This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Somemore » specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000® problems. These benchmark and scaling studies show promising results.« less

  3. Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

    SciTech Connect

    Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; Davidson, Gregory G.; Hamilton, Steven P.; Godfrey, Andrew T.

    2015-12-21

    This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Some specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000® problems. These benchmark and scaling studies show promising results.

  4. Massively Parallel Computation of Soil Surface Roughness Parameters on A Fermi GPU

    NASA Astrophysics Data System (ADS)

    Li, Xiaojie; Song, Changhe

    2016-06-01

    Surface roughness is description of the surface micro topography of randomness or irregular. The standard deviation of surface height and the surface correlation length describe the statistical variation for the random component of a surface height relative to a reference surface. When the number of data points is large, calculation of surface roughness parameters is time-consuming. With the advent of Graphics Processing Unit (GPU) architectures, inherently parallel problem can be effectively solved using GPUs. In this paper we propose a GPU-based massively parallel computing method for 2D bare soil surface roughness estimation. This method was applied to the data collected by the surface roughness tester based on the laser triangulation principle during the field experiment in April 2012. The total number of data points was 52,040. It took 47 seconds on a Fermi GTX 590 GPU whereas its serial CPU version took 5422 seconds, leading to a significant 115x speedup.

  5. Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

    NASA Astrophysics Data System (ADS)

    Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; Davidson, Gregory G.; Hamilton, Steven P.; Godfrey, Andrew T.

    2016-03-01

    This work discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package authored at Oak Ridge National Laboratory. Shift has been developed to scale well from laptops to small computing clusters to advanced supercomputers and includes features such as support for multiple geometry and physics engines, hybrid capabilities for variance reduction methods such as the Consistent Adjoint-Driven Importance Sampling methodology, advanced parallel decompositions, and tally methods optimized for scalability on supercomputing architectures. The scaling studies presented in this paper demonstrate good weak and strong scaling behavior for the implemented algorithms. Shift has also been validated and verified against various reactor physics benchmarks, including the Consortium for Advanced Simulation of Light Water Reactors' Virtual Environment for Reactor Analysis criticality test suite and several Westinghouse AP1000® problems presented in this paper. These benchmark results compare well to those from other contemporary Monte Carlo codes such as MCNP5 and KENO.

  6. "Multipoint Force Feedback" Leveling of Massively Parallel Tip Arrays in Scanning Probe Lithography.

    PubMed

    Noh, Hanaul; Jung, Goo-Eun; Kim, Sukhyun; Yun, Seong-Hun; Jo, Ahjin; Kahng, Se-Jong; Cho, Nam-Joon; Cho, Sang-Joon

    2015-09-16

    Nanoscale patterning with massively parallel 2D array tips is of significant interest in scanning probe lithography. A challenging task for tip-based large area nanolithography is maintaining parallel tip arrays at the same contact point with a sample substrate in order to pattern a uniform array. Here, polymer pen lithography is demonstrated with a novel leveling method to account for the magnitude and direction of the total applied force of tip arrays by a multipoint force sensing structure integrated into the tip holder. This high-precision approach results in a 0.001° slope of feature edge length variation over 1 cm wide tip arrays. The position sensitive leveling operates in a fully automated manner and is applicable to recently developed scanning probe lithography techniques of various kinds which can enable "desktop nanofabrication."

  7. Progress in the Simulation of Steady and Time-Dependent Flows with 3D Parallel Unstructured Cartesian Methods

    NASA Technical Reports Server (NTRS)

    Aftosmis, M. J.; Berger, M. J.; Murman, S. M.; Kwak, Dochan (Technical Monitor)

    2002-01-01

    The proposed paper will present recent extensions in the development of an efficient Euler solver for adaptively-refined Cartesian meshes with embedded boundaries. The paper will focus on extensions of the basic method to include solution adaptation, time-dependent flow simulation, and arbitrary rigid domain motion. The parallel multilevel method makes use of on-the-fly parallel domain decomposition to achieve extremely good scalability on large numbers of processors, and is coupled with an automatic coarse mesh generation algorithm for efficient processing by a multigrid smoother. Numerical results are presented demonstrating parallel speed-ups of up to 435 on 512 processors. Solution-based adaptation may be keyed off truncation error estimates using tau-extrapolation or a variety of feature detection based refinement parameters. The multigrid method is extended to for time-dependent flows through the use of a dual-time approach. The extension to rigid domain motion uses an Arbitrary Lagrangian-Eulerlarian (ALE) formulation, and results will be presented for a variety of two- and three-dimensional example problems with both simple and complex geometry.

  8. O( N) tight-binding molecular dynamics on massively parallel computers: an orbital decomposition approach

    NASA Astrophysics Data System (ADS)

    Canning, A.; Galli, G.; Mauri, F.; De Vita, A.; Car, R.

    1996-04-01

    The implementation of an O( N) tight-binding molecular dynamics code on the Cray T3D parallel computer is discussed. The O( N) energy functional depends on non-orthogonal, localised orbitals and a chemical potential parameter which determines the number of electrons in the system. The localisation introduces a sparse nature to the orbital data and Hamiltonian matrix, greatly changing the coding on parallel machines compared to non-localised systems. The data distribution, communication routines and dynamic load-balancing scheme of the program are presented in detail together with the speed and scaling of the code on various homogeneous and inhomogeneous physical systems. Performance results will be presented for systems of 2048 to 32768 atoms on 32 to 512 processors. We discuss the relevance to quantum molecular dynamics simulations with localised orbitals, of techniques used for programming short-range classical molecular dynamics simulations on parallel machines. The absence of global communications and the localised nature of the orbitals makes these algorithms extremely scalable in terms of memory and speed on parallel systems with fast communications. The main aim of this article is to present in detail all the new concepts and programming techniques that localisation of the orbitals introduces which scientists, coming from a background in non-localised quantum molecular dynamics simulations, may be unfamiliar with.

  9. Massively-parallel electrical-conductivity imaging of hydrocarbonsusing the Blue Gene/L supercomputer

    SciTech Connect

    Commer, M.; Newman, G.A.; Carazzone, J.J.; Dickens, T.A.; Green,K.E.; Wahrmund, L.A.; Willen, D.E.; Shiu, J.

    2007-05-16

    Large-scale controlled source electromagnetic (CSEM)three-dimensional (3D) geophysical imaging is now receiving considerableattention for electrical conductivity mapping of potential offshore oiland gas reservoirs. To cope with the typically large computationalrequirements of the 3D CSEM imaging problem, our strategies exploitcomputational parallelism and optimized finite-difference meshing. Wereport on an imaging experiment, utilizing 32,768 tasks/processors on theIBM Watson Research Blue Gene/L (BG/L) supercomputer. Over a 24-hourperiod, we were able to image a large scale marine CSEM field data setthat previously required over four months of computing time ondistributed clusters utilizing 1024 tasks on an Infiniband fabric. Thetotal initial data misfit could be decreased by 67 percent within 72completed inversion iterations, indicating an electrically resistiveregion in the southern survey area below a depth of 1500 m below theseafloor. The major part of the residual misfit stems from transmitterparallel receiver components that have an offset from the transmittersail line (broadside configuration). Modeling confirms that improvedbroadside data fits can be achieved by considering anisotropic electricalconductivities. While delivering a satisfactory gross scale image for thedepths of interest, the experiment provides important evidence for thenecessity of discriminating between horizontal and verticalconductivities for maximally consistent 3D CSEM inversions.

  10. Massively parallel simulation of flow and transport in variably saturated porous and fractured media

    SciTech Connect

    Wu, Yu-Shu; Zhang, Keni; Pruess, Karsten

    2002-01-15

    This paper describes a massively parallel simulation method and its application for modeling multiphase flow and multicomponent transport in porous and fractured reservoirs. The parallel-computing method has been implemented into the TOUGH2 code and its numerical performance is tested on a Cray T3E-900 and IBM SP. The efficiency and robustness of the parallel-computing algorithm are demonstrated by completing two simulations with more than one million gridblocks, using site-specific data obtained from a site-characterization study. The first application involves the development of a three-dimensional numerical model for flow in the unsaturated zone of Yucca Mountain, Nevada. The second application is the study of tracer/radionuclide transport through fracture-matrix rocks for the same site. The parallel-computing technique enhances modeling capabilities by achieving several-orders-of-magnitude speedup for large-scale and high resolution modeling studies. The resulting modeling results provide many new insights into flow and transport processes that could not be obtained from simulations using the single-CPU simulator.

  11. DGDFT: A massively parallel method for large scale density functional theory calculations

    SciTech Connect

    Hu, Wei Yang, Chao; Lin, Lin

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10{sup −4} Hartree/atom in terms of the error of energy and 6.2 × 10{sup −4} Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  12. DGDFT: A massively parallel method for large scale density functional theory calculations.

    PubMed

    Hu, Wei; Lin, Lin; Yang, Chao

    2015-09-28

    We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.

  13. Regional 3D Numerical Modeling of the Lithosphere-Mantle System: Implications for Continental Rift-Parallel Surface Velocities

    NASA Astrophysics Data System (ADS)

    Stamps, S.; Bangerth, W.; Hager, B. H.

    2014-12-01

    The East African Rift System (EARS) is an active divergent plate boundary with slow, approximately E-W extension rates ranging from <1-6 mm/yr. Previous work using thin-sheet modeling indicates lithospheric buoyancy dominates the force balance driving large-scale Nubia-Somalia divergence, however GPS observations within the Western Branch of the EARS show along-rift motions that contradict this simple model. Here, we test the role of mantle flow at the rift-scale using our new, regional 3D numerical model based on the open-source code ASPECT. We define a thermal lithosphere with thicknesses that are systematically changed for generic models or based on geophysical constraints in the Western branch (e.g. melting depths, xenoliths, seismic tomography). Preliminary results suggest existing variations in lithospheric thicknesses along-rift in the Western Branch can drive upper mantle flow that is consistent with geodetic observations.

  14. A massively parallel semi-Lagrangian algorithm for solving the transport equation

    SciTech Connect

    Manson, Russell; Wang, Dali

    2010-01-01

    The scalar transport equation underpins many models employed in science, engineering, technology and business. Application areas include, but are not restricted to, pollution transport, weather forecasting, video analysis and encoding (the optical flow equation), options and stock pricing (the Black-Scholes equation) and spatially explicit ecological models. Unfortunately finding numerical solutions to this equation which are fast and accurate is not trivial. Moreover, finding such numerical algorithms that can be implemented on high performance computer architectures efficiently is challenging. In this paper the authors describe a massively parallel algorithm for solving the advection portion of the transport equation. We present an approach here which is different to that used in most transport models and which we have tried and tested for various scenarios. The approach employs an intelligent domain decomposition based on the vector field of the system equations and thus automatically partitions the computational domain into algorithmically autonomous regions. The solution of a classic pure advection transport problem is shown to be conservative, monotonic and highly accurate at large time steps. Additionally we demonstrate that the algorithm is highly efficient for high performance computer architectures and thus offers a route towards massively parallel application.

  15. On distributed memory MPI-based parallelization of SPH codes in massive HPC context

    NASA Astrophysics Data System (ADS)

    Oger, G.; Le Touzé, D.; Guibert, D.; de Leffe, M.; Biddiscombe, J.; Soumagne, J.; Piccinali, J.-G.

    2016-03-01

    Most of particle methods share the problem of high computational cost and in order to satisfy the demands of solvers, currently available hardware technologies must be fully exploited. Two complementary technologies are now accessible. On the one hand, CPUs which can be structured into a multi-node framework, allowing massive data exchanges through a high speed network. In this case, each node is usually comprised of several cores available to perform multithreaded computations. On the other hand, GPUs which are derived from the graphics computing technologies, able to perform highly multi-threaded calculations with hundreds of independent threads connected together through a common shared memory. This paper is primarily dedicated to the distributed memory parallelization of particle methods, targeting several thousands of CPU cores. The experience gained clearly shows that parallelizing a particle-based code on moderate numbers of cores can easily lead to an acceptable scalability, whilst a scalable speedup on thousands of cores is much more difficult to obtain. The discussion revolves around speeding up particle methods as a whole, in a massive HPC context by making use of the MPI library. We focus on one particular particle method which is Smoothed Particle Hydrodynamics (SPH), one of the most widespread today in the literature as well as in engineering.

  16. The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

    DOE PAGES

    O'keefe, Matthew; Parr, Terence; Edgar, B. Kevin; ...

    1995-01-01

    Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. Wemore » have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.« less

  17. Massively Parallel Dantzig-Wolfe Decomposition Applied to Traffic Flow Scheduling

    NASA Technical Reports Server (NTRS)

    Rios, Joseph Lucio; Ross, Kevin

    2009-01-01

    Optimal scheduling of air traffic over the entire National Airspace System is a computationally difficult task. To speed computation, Dantzig-Wolfe decomposition is applied to a known linear integer programming approach for assigning delays to flights. The optimization model is proven to have the block-angular structure necessary for Dantzig-Wolfe decomposition. The subproblems for this decomposition are solved in parallel via independent computation threads. Experimental evidence suggests that as the number of subproblems/threads increases (and their respective sizes decrease), the solution quality, convergence, and runtime improve. A demonstration of this is provided by using one flight per subproblem, which is the finest possible decomposition. This results in thousands of subproblems and associated computation threads. This massively parallel approach is compared to one with few threads and to standard (non-decomposed) approaches in terms of solution quality and runtime. Since this method generally provides a non-integral (relaxed) solution to the original optimization problem, two heuristics are developed to generate an integral solution. Dantzig-Wolfe followed by these heuristics can provide a near-optimal (sometimes optimal) solution to the original problem hundreds of times faster than standard (non-decomposed) approaches. In addition, when massive decomposition is employed, the solution is shown to be more likely integral, which obviates the need for an integerization step. These results indicate that nationwide, real-time, high fidelity, optimal traffic flow scheduling is achievable for (at least) 3 hour planning horizons.

  18. A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

    SciTech Connect

    Madduri, Kamesh; Ediger, David; Jiang, Karl; Bader, David A.; Chavarria-Miranda, Daniel

    2009-02-15

    We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.

  19. Transcriptional analysis of endocrine disruption using zebrafish and massively parallel sequencing

    PubMed Central

    Baker, Michael E.; Hardiman, Gary

    2014-01-01

    Endocrine disrupting chemicals (EDCs) including plasticizers, pesticides, detergents and pharmaceuticals, affect a variety of hormone-regulated physiological pathways in humans and wildlife. Many EDCs are lipophilic molecules and bind to hydrophobic pockets in steroid receptors, such as the estrogen receptor and androgen receptor, which are important in vertebrate reproduction and development. Indeed, health effects attributed to EDCs include reproductive dysfunction (e.g., reduced fertility, reproductive tract abnormalities and skewed male/female sex ratios in fish), early puberty, various cancers and obesity. A major concern is the effects of exposure to low concentrations of endocrine disruptors in utero and post partum, which may increase the incidence of cancer and diabetes in adults. EDCs affect transcription of hundreds and even thousands of genes, which has created the need for new tools to monitor the global effects of EDCs. The emergence of massive parallel sequencing for investigating gene transcription provides a sensitive tool for monitoring the effects of EDCs on humans and other vertebrates as well as elucidating the mechanism of action of EDCs. Zebrafish conserve many developmental pathways found in humans, which makes zebrafish a valuable model system for studying EDCs especially on early organ development because their embryos are translucent. In this article we review recent advances in massive parallel sequencing approaches with a focus on zebrafish. We make the case that zebrafish exposed to EDCs at different stages of development, can provide important insights on EDC effects on human health. PMID:24850832

  20. High-quality draft assemblies of mammalian genomes from massively parallel sequence data

    PubMed Central

    Gnerre, Sante; MacCallum, Iain; Przybylski, Dariusz; Ribeiro, Filipe J.; Burton, Joshua N.; Walker, Bruce J.; Sharpe, Ted; Hall, Giles; Shea, Terrance P.; Sykes, Sean; Berlin, Aaron M.; Aird, Daniel; Costello, Maura; Daza, Riza; Williams, Louise; Nicol, Robert; Gnirke, Andreas; Nusbaum, Chad; Lander, Eric S.; Jaffe, David B.

    2011-01-01

    Massively parallel DNA sequencing technologies are revolutionizing genomics by making it possible to generate billions of relatively short (~100-base) sequence reads at very low cost. Whereas such data can be readily used for a wide range of biomedical applications, it has proven difficult to use them to generate high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. To date, the genome assemblies generated from such data have fallen far short of those obtained with the older (but much more expensive) capillary-based sequencing approach. Here, we report the development of an algorithm for genome assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence data from the human and mouse genomes, generated on the Illumina platform. The resulting draft genome assemblies have good accuracy, short-range contiguity, long-range connectivity, and coverage of the genome. In particular, the base accuracy is high (≥99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing. The combination of improved sequencing technology and improved computational methods should now make it possible to increase dramatically the de novo sequencing of large genomes. The ALLPATHS-LG program is available at http://www.broadinstitute.org/science/programs/genome-biology/crd. PMID:21187386

  1. Slab-Dip Variability and Trench-Parallel Flow beneath Non-Uniform Overriding Plates: Insights form 3D Numerical Models

    NASA Astrophysics Data System (ADS)

    Rodríguez-González, J.; Billen, M. I.; Negredo, A. M.

    2012-12-01

    Forces driving plate tectonics are reasonably well known but some factors controlling the dynamics and the geometry of subduction processes are still poorly understood. The effect of the thermal state of the subducting and overriding plates on the slab dip have been systematically studied in previous works by means of 2D and 3D numerical modeling. These models showed that kinematically-driven slabs subducting under a cold overriding plate are affected by an increased hydrodynamic suction, due to the lower temperature of the mantle wedge, which leads to a lower subduction angle, and eventually to the formation of flat slab segments. In these models the subduction is achieved by imposing a constant velocity at the top of the overriding plate, which may lead to unrealistic results. Here we present the results of 3D non-Newtonian thermo-mechanical numerical models, considering a dynamically-driven self-sustained subduction, to test the influence of a non-uniform overriding plate. Variations of the thermal state of the overriding plate along the trench cause variation in the hydrodynamic suction, which lead to variations of the slab dip along strike (Fig. 1) and a significant trench-parallel flow. When the material can flow around the edges of the slab, through the addition of lateral plates, the trench parallel flow is enhanced (Fig. 2), whereas the variations on the slab dip are diminished.; Effect of a non-uniform overriding plate on slab-dip. 3D view of the 1000 C isosurface. ; Effect of a non-uniform overriding plate on trench-parallel flow. Map view of the slab at different depths and times, showing the viscosity (colormap) and the velocity (arrows).

  2. MADmap: A Massively Parallel Maximum-Likelihood Cosmic Microwave Background Map-Maker

    SciTech Connect

    Cantalupo, Christopher; Borrill, Julian; Jaffe, Andrew; Kisner, Theodore; Stompor, Radoslaw

    2009-06-09

    MADmap is a software application used to produce maximum-likelihood images of the sky from time-ordered data which include correlated noise, such as those gathered by Cosmic Microwave Background (CMB) experiments. It works efficiently on platforms ranging from small workstations to the most massively parallel supercomputers. Map-making is a critical step in the analysis of all CMB data sets, and the maximum-likelihood approach is the most accurate and widely applicable algorithm; however, it is a computationally challenging task. This challenge will only increase with the next generation of ground-based, balloon-borne and satellite CMB polarization experiments. The faintness of the B-mode signal that these experiments seek to measure requires them to gather enormous data sets. MADmap is already being run on up to O(1011) time samples, O(108) pixels and O(104) cores, with ongoing work to scale to the next generation of data sets and supercomputers. We describe MADmap's algorithm based around a preconditioned conjugate gradient solver, fast Fourier transforms and sparse matrix operations. We highlight MADmap's ability to address problems typically encountered in the analysis of realistic CMB data sets and describe its application to simulations of the Planck and EBEX experiments. The massively parallel and distributed implementation is detailed and scaling complexities are given for the resources required. MADmap is capable of analysing the largest data sets now being collected on computing resources currently available, and we argue that, given Moore's Law, MADmap will be capable of reducing the most massive projected data sets.

  3. MPI/OpenMP Hybrid Parallel Algorithm of Resolution of Identity Second-Order Møller-Plesset Perturbation Calculation for Massively Parallel Multicore Supercomputers.

    PubMed

    Katouda, Michio; Nakajima, Takahito

    2013-12-10

    A new algorithm for massively parallel calculations of electron correlation energy of large molecules based on the resolution of identity second-order Møller-Plesset perturbation (RI-MP2) technique is developed and implemented into the quantum chemistry software NTChem. In this algorithm, a Message Passing Interface (MPI) and Open Multi-Processing (OpenMP) hybrid parallel programming model is applied to attain efficient parallel performance on massively parallel supercomputers. An in-core storage scheme of intermediate data of three-center electron repulsion integrals utilizing the distributed memory is developed to eliminate input/output (I/O) overhead. The parallel performance of the algorithm is tested on massively parallel supercomputers such as the K computer (using up to 45 992 central processing unit (CPU) cores) and a commodity Intel Xeon cluster (using up to 8192 CPU cores). The parallel RI-MP2/cc-pVTZ calculation of two-layer nanographene sheets (C150H30)2 (number of atomic orbitals is 9640) is performed using 8991 node and 71 288 CPU cores of the K computer.

  4. ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains

    PubMed Central

    Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz

    2016-01-01

    With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity. PMID:27420734

  5. ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains.

    PubMed

    Torre, Emiliano; Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz; Grün, Sonja

    2016-07-01

    With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity.

  6. Longitudinal imaging of HIV-1 spread in humanized mice with parallel 3D immunofluorescence and electron tomography

    PubMed Central

    Kieffer, Collin; Ladinsky, Mark S; Ninh, Allen; Galimidi, Rachel P; Bjorkman, Pamela J

    2017-01-01

    Dissemination of HIV-1 throughout lymphoid tissues leads to systemic virus spread following infection. We combined tissue clearing, 3D-immunofluorescence, and electron tomography (ET) to longitudinally assess early HIV-1 spread in lymphoid tissues in humanized mice. Immunofluorescence revealed peak infection density in gut at 10–12 days post-infection when blood viral loads were low. Human CD4+ T-cells and HIV-1–infected cells localized predominantly to crypts and the lower third of intestinal villi. Free virions and infected cells were not readily detectable by ET at 5-days post-infection, whereas HIV-1–infected cells surrounded by pools of free virions were present in ~10% of intestinal crypts by 10–12 days. ET of spleen revealed thousands of virions released by individual cells and discreet cytoplasmic densities near sites of prolific virus production. These studies highlight the importance of multiscale imaging of HIV-1–infected tissues and are adaptable to other animal models and human patient samples. DOI: http://dx.doi.org/10.7554/eLife.23282.001 PMID:28198699

  7. Computational modeling of pitching cylinder-type ocean wave energy converters using 3D MPI-parallel simulations

    NASA Astrophysics Data System (ADS)

    Freniere, Cole; Pathak, Ashish; Raessi, Mehdi

    2016-11-01

    Ocean Wave Energy Converters (WECs) are devices that convert energy from ocean waves into electricity. To aid in the design of WECs, an advanced computational framework has been developed which has advantages over conventional methods. The computational framework simulates the performance of WECs in a virtual wave tank by solving the full Navier-Stokes equations in 3D, capturing the fluid-structure interaction, nonlinear and viscous effects. In this work, we present simulations of the performance of pitching cylinder-type WECs and compare against experimental data. WECs are simulated at both model and full scales. The results are used to determine the role of the Keulegan-Carpenter (KC) number. The KC number is representative of viscous drag behavior on a bluff body in an oscillating flow, and is considered an important indicator of the dynamics of a WEC. Studying the effects of the KC number is important for determining the validity of the Froude scaling and the inviscid potential flow theory, which are heavily relied on in the conventional approaches to modeling WECs. Support from the National Science Foundation is gratefully acknowledged.

  8. Verification and validation of a parallel 3D direct simulation Monte Carlo solver for atmospheric entry applications

    NASA Astrophysics Data System (ADS)

    Nizenkov, Paul; Noeding, Peter; Konopka, Martin; Fasoulas, Stefanos

    2017-03-01

    The in-house direct simulation Monte Carlo solver PICLas, which enables parallel, three-dimensional simulations of rarefied gas flows, is verified and validated. Theoretical aspects of the method and the employed schemes are briefly discussed. Considered cases include simple reservoir simulations and complex re-entry geometries, which were selected from literature and simulated with PICLas. First, the chemistry module is verified using simple numerical and analytical solutions. Second, simulation results of the rarefied gas flow around a 70° blunted-cone, the REX Free-Flyer as well as multiple points of the re-entry trajectory of the Orion capsule are presented in terms of drag and heat flux. A comparison to experimental measurements as well as other numerical results shows an excellent agreement across the different simulation cases. An outlook on future code development and applications is given.

  9. Verification and validation of a parallel 3D direct simulation Monte Carlo solver for atmospheric entry applications

    NASA Astrophysics Data System (ADS)

    Nizenkov, Paul; Noeding, Peter; Konopka, Martin; Fasoulas, Stefanos

    2016-07-01

    The in-house direct simulation Monte Carlo solver PICLas, which enables parallel, three-dimensional simulations of rarefied gas flows, is verified and validated. Theoretical aspects of the method and the employed schemes are briefly discussed. Considered cases include simple reservoir simulations and complex re-entry geometries, which were selected from literature and simulated with PICLas. First, the chemistry module is verified using simple numerical and analytical solutions. Second, simulation results of the rarefied gas flow around a 70° blunted-cone, the REX Free-Flyer as well as multiple points of the re-entry trajectory of the Orion capsule are presented in terms of drag and heat flux. A comparison to experimental measurements as well as other numerical results shows an excellent agreement across the different simulation cases. An outlook on future code development and applications is given.

  10. A Diffusion-Based and Dynamic 3D-Printed Device That Enables Parallel in Vitro Pharmacokinetic Profiling of Molecules

    PubMed Central

    Lockwood, Sarah Y.; Meisel, Jayda E.; Monsma, Frederick J.; Spence, Dana M.

    2016-01-01

    The process of bringing a drug to market involves many steps, including the preclinical stage, where various properties of the drug candidate molecule are determined. These properties, which include drug absorption, distribution, metabolism, and excretion, are often displayed in a pharmacokinetic (PK) profile. While PK profiles are determined in animal models, in vitro systems that model in vivo processes are available, although each possesses shortcomings. Here, we present a 3D-printed, diffusion-based, and dynamic in vitro PK device. The device contains six flow channels, each with integrated porous membrane-based insert wells. The pores of these membranes enable drugs to freely diffuse back and forth between the flow channels and the inserts, thus enabling both loading and clearance portions of a standard PK curve to be generated. The device is designed to work with 96-well plate technology and consumes single-digit milliliter volumes to generate multiple PK profiles, simultaneously. Generation of PK profiles by use of the device was initially performed with fluorescein as a test molecule. Effects of such parameters as flow rate, loading time, volume in the insert well, and initial concentration of the test molecule were investigated. A prediction model was generated from this data, enabling the user to predict the concentration of the test molecule at any point along the PK profile within a coefficient of variation of ~5%. Depletion of the analyte from the well was characterized and was determined to follow first-order rate kinetics, indicated by statistically equivalent (p > 0.05) depletion half-lives that were independent of the starting concentration. A PK curve for an approved antibiotic, levofloxacin, was generated to show utility beyond the fluorescein test molecule. PMID:26727249

  11. Compact Graph Representations and Parallel Connectivity Algorithms for Massive Dynamic Network Analysis

    SciTech Connect

    Madduri, Kamesh; Bader, David A.

    2009-02-15

    Graph-theoretic abstractions are extensively used to analyze massive data sets. Temporal data streams from socioeconomic interactions, social networking web sites, communication traffic, and scientific computing can be intuitively modeled as graphs. We present the first study of novel high-performance combinatorial techniques for analyzing large-scale information networks, encapsulating dynamic interaction data in the order of billions of entities. We present new data structures to represent dynamic interaction networks, and discuss algorithms for processing parallel insertions and deletions of edges in small-world networks. With these new approaches, we achieve an average performance rate of 25 million structural updates per second and a parallel speedup of nearly28 on a 64-way Sun UltraSPARC T2 multicore processor, for insertions and deletions to a small-world network of 33.5 million vertices and 268 million edges. We also design parallel implementations of fundamental dynamic graph kernels related to connectivity and centrality queries. Our implementations are freely distributed as part of the open-source SNAP (Small-world Network Analysis and Partitioning) complex network analysis framework.

  12. Mars-solar wind interaction: LatHyS, an improved parallel 3-D multispecies hybrid model

    NASA Astrophysics Data System (ADS)

    Modolo, Ronan; Hess, Sebastien; Mancini, Marco; Leblanc, Francois; Chaufray, Jean-Yves; Brain, David; Leclercq, Ludivine; Esteban-Hernández, Rosa; Chanteur, Gerard; Weill, Philippe; González-Galindo, Francisco; Forget, Francois; Yagi, Manabu; Mazelle, Christian

    2016-07-01

    In order to better represent Mars-solar wind interaction, we present an unprecedented model achieving spatial resolution down to 50 km, a so far unexplored resolution for global kinetic models of the Martian ionized environment. Such resolution approaches the ionospheric plasma scale height. In practice, the model is derived from a first version described in Modolo et al. (2005). An important effort of parallelization has been conducted and is presented here. A better description of the ionosphere was also implemented including ionospheric chemistry, electrical conductivities, and a drag force modeling the ion-neutral collisions in the ionosphere. This new version of the code, named LatHyS (Latmos Hybrid Simulation), is here used to characterize the impact of various spatial resolutions on simulation results. In addition, and following a global model challenge effort, we present the results of simulation run for three cases which allow addressing the effect of the suprathermal corona and of the solar EUV activity on the magnetospheric plasma boundaries and on the global escape. Simulation results showed that global patterns are relatively similar for the different spatial resolution runs, but finest grid runs provide a better representation of the ionosphere and display more details of the planetary plasma dynamic. Simulation results suggest that a significant fraction of escaping O+ ions is originated from below 1200 km altitude.

  13. Efficient Massively-Parallel Approach for Soving the Time-Dependent Schrodinger Equation

    NASA Astrophysics Data System (ADS)

    Schneider, B. I.; Hu, S. X.; Collins, L. A.

    2006-05-01

    A variety of problems in physics and chemistry require the solution of the time-dependent Schr"odinger equation (TDSE), including atoms and molecules in oscillating electromagnetic fields, atomic collisions, ultracold systems, and materials subjected to external forces. We describe an approach in which the Finite Element Discrete Variable Representation (FEDVR) is combined with the Real-Space Product (RSP) algorithm to generate an efficient and highly accurate method for the solution of both the linear and nonlinear TDSE. The FEDVR provides a highly-accurate spatial representation using a minimum number of grid points (N) while the RSP algorithm propagates the wavefunction in O(N) operations per time step. Parallelization of the method is transparent and is implemented by distributing one or two spatial dimension across the available processors within the Message-Passing-Interface (MPI) scheme. The complete formalism and a number of three-dimensional (3D) examples are given.

  14. Simulating massively parallel electron beam inspection for sub-20 nm defects

    NASA Astrophysics Data System (ADS)

    Bunday, Benjamin D.; Mukhtar, Maseeh; Quoi, Kathy; Thiel, Brad; Malloy, Matt

    2015-03-01

    SEMATECH has initiated a program to develop massively-parallel electron beam defect inspection (MPEBI). Here we use JMONSEL simulations to generate expected imaging responses of chosen test cases of patterns and defects with ability to vary parameters for beam energy, spot size, pixel size, and/or defect material and form factor. The patterns are representative of the design rules for an aggressively-scaled FinFET-type design. With these simulated images and resulting shot noise, a signal-to-noise framework is developed, which relates to defect detection probabilities. Additionally, with this infrastructure the effect of detection chain noise and frequency dependent system response can be made, allowing for targeting of best recipe parameters for MPEBI validation experiments, ultimately leading to insights into how such parameters will impact MPEBI tool design, including necessary doses for defect detection and estimations of scanning speeds for achieving high throughput for HVM.

  15. Guiding the design of synthetic DNA-binding molecules with massively parallel sequencing.

    PubMed

    Meier, Jordan L; Yu, Abigail S; Korf, Ian; Segal, David J; Dervan, Peter B

    2012-10-24

    Genomic applications of DNA-binding molecules require an unbiased knowledge of their high affinity sites. We report the high-throughput analysis of pyrrole-imidazole polyamide DNA-binding specificity in a 10(12)-member DNA sequence library using affinity purification coupled with massively parallel sequencing. We find that even within this broad context, the canonical pairing rules are remarkably predictive of polyamide DNA-binding specificity. However, this approach also allows identification of unanticipated high affinity DNA-binding sites in the reverse orientation for polyamides containing β/Im pairs. These insights allow the redesign of hairpin polyamides with different turn units capable of distinguishing 5'-WCGCGW-3' from 5'-WGCGCW-3'. Overall, this study displays the power of high-throughput methods to aid the optimal targeting of sequence-specific minor groove binding molecules, an essential underpinning for biological and nanotechnological applications.

  16. Computations on the massively parallel processor at the Goddard Space Flight Center

    NASA Technical Reports Server (NTRS)

    Strong, James P.

    1991-01-01

    Described are four significant algorithms implemented on the massively parallel processor (MPP) at the Goddard Space Flight Center. Two are in the area of image analysis. Of the other two, one is a mathematical simulation experiment and the other deals with the efficient transfer of data between distantly separated processors in the MPP array. The first algorithm presented is the automatic determination of elevations from stereo pairs. The second algorithm solves mathematical logistic equations capable of producing both ordered and chaotic (or random) solutions. This work can potentially lead to the simulation of artificial life processes. The third algorithm is the automatic segmentation of images into reasonable regions based on some similarity criterion, while the fourth is an implementation of a bitonic sort of data which significantly overcomes the nearest neighbor interconnection constraints on the MPP for transferring data between distant processors.

  17. Demonstration of EDA flow for massively parallel e-beam lithography

    NASA Astrophysics Data System (ADS)

    Brandt, P.; Belledent, J.; Tranquillin, C.; Figueiro, T.; Meunier, S.; Bayle, S.; Fay, A.; Milléquant, M.; Icard, B.; Wieland, M.

    2014-03-01

    Today's soaring complexity in pushing the limits of 193nm immersion lithography drives the development of other technologies. One of these alternatives is mask-less massively parallel electron beam lithography, (MP-EBL), a promising candidate in which future resolution needs can be fulfilled at competitive cost. MAPPER Lithography's MATRIX MP-EBL platform has currently entered an advanced stage of development. The first tool in this platform, the FLX 1200, will operate using more than 1,300 beams, each one writing a stripe 2.2μm wide. 0.2μm overlap from stripe to stripe is allocated for stitching. Each beam is composed of 49 individual sub-beams that can be blanked independently in order to write in a raster scan pixels onto the wafer.

  18. Massively parallel enzyme kinetics reveals the substrate recognition landscape of the metalloprotease ADAMTS13.

    PubMed

    Kretz, Colin A; Dai, Manhong; Soylemez, Onuralp; Yee, Andrew; Desch, Karl C; Siemieniak, David; Tomberg, Kärt; Kondrashov, Fyodor A; Meng, Fan; Ginsburg, David

    2015-07-28

    Proteases play important roles in many biologic processes and are key mediators of cancer, inflammation, and thrombosis. However, comprehensive and quantitative techniques to define the substrate specificity profile of proteases are lacking. The metalloprotease ADAMTS13 regulates blood coagulation by cleaving von Willebrand factor (VWF), reducing its procoagulant activity. A mutagenized substrate phage display library based on a 73-amino acid fragment of VWF was constructed, and the ADAMTS13-dependent change in library complexity was evaluated over reaction time points, using high-throughput sequencing. Reaction rate constants (kcat/KM) were calculated for nearly every possible single amino acid substitution within this fragment. This massively parallel enzyme kinetics analysis detailed the specificity of ADAMTS13 and demonstrated the critical importance of the P1-P1' substrate residues while defining exosite binding domains. These data provided empirical evidence for the propensity for epistasis within VWF and showed strong correlation to conservation across orthologs, highlighting evolutionary selective pressures for VWF.

  19. Simultaneous digital quantification and fluorescence-based size characterization of massively parallel sequencing libraries.

    PubMed

    Laurie, Matthew T; Bertout, Jessica A; Taylor, Sean D; Burton, Joshua N; Shendure, Jay A; Bielas, Jason H

    2013-08-01

    Due to the high cost of failed runs and suboptimal data yields, quantification and determination of fragment size range are crucial steps in the library preparation process for massively parallel sequencing (or next-generation sequencing). Current library quality control methods commonly involve quantification using real-time quantitative PCR and size determination using gel or capillary electrophoresis. These methods are laborious and subject to a number of significant limitations that can make library calibration unreliable. Herein, we propose and test an alternative method for quality control of sequencing libraries using droplet digital PCR (ddPCR). By exploiting a correlation we have discovered between droplet fluorescence and amplicon size, we achieve the joint quantification and size determination of target DNA with a single ddPCR assay. We demonstrate the accuracy and precision of applying this method to the preparation of sequencing libraries.

  20. Phase space simulation of collisionless stellar systems on the massively parallel processor

    NASA Technical Reports Server (NTRS)

    White, Richard L.

    1987-01-01

    A numerical technique for solving the collisionless Boltzmann equation describing the time evolution of a self gravitating fluid in phase space was implemented on the Massively Parallel Processor (MPP). The code performs calculations for a two dimensional phase space grid (with one space and one velocity dimension). Some results from calculations are presented. The execution speed of the code is comparable to the speed of a single processor of a Cray-XMP. Advantages and disadvantages of the MPP architecture for this type of problem are discussed. The nearest neighbor connectivity of the MPP array does not pose a significant obstacle. Future MPP-like machines should have much more local memory and easier access to staging memory and disks in order to be effective for this type of problem.

  1. Massively parallel cis-regulatory analysis in the mammalian central nervous system.

    PubMed

    Shen, Susan Q; Myers, Connie A; Hughes, Andrew E O; Byrne, Leah C; Flannery, John G; Corbo, Joseph C

    2016-02-01

    Cis-regulatory elements (CREs, e.g., promoters and enhancers) regulate gene expression, and variants within CREs can modulate disease risk. Next-generation sequencing has enabled the rapid generation of genomic data that predict the locations of CREs, but a bottleneck lies in functionally interpreting these data. To address this issue, massively parallel reporter assays (MPRAs) have emerged, in which barcoded reporter libraries are introduced into cells, and the resulting barcoded transcripts are quantified by next-generation sequencing. Thus far, MPRAs have been largely restricted to assaying short CREs in a limited repertoire of cultured cell types. Here, we present two advances that extend the biological relevance and applicability of MPRAs. First, we adapt exome capture technology to instead capture candidate CREs, thereby tiling across the targeted regions and markedly increasing the length of CREs that can be readily assayed. Second, we package the library into adeno-associated virus (AAV), thereby allowing delivery to target organs in vivo. As a proof of concept, we introduce a capture library of about 46,000 constructs, corresponding to roughly 3500 DNase I hypersensitive (DHS) sites, into the mouse retina by ex vivo plasmid electroporation and into the mouse cerebral cortex by in vivo AAV injection. We demonstrate tissue-specific cis-regulatory activity of DHSs and provide examples of high-resolution truncation mutation analysis for multiplex parsing of CREs. Our approach should enable massively parallel functional analysis of a wide range of CREs in any organ or species that can be infected by AAV, such as nonhuman primates and human stem cell-derived organoids.

  2. Architecture for next-generation massively parallel maskless lithography system (MPML2)

    NASA Astrophysics Data System (ADS)

    Su, Ming-Shing; Tsai, Kuen-Yu; Lu, Yi-Chang; Kuo, Yu-Hsuan; Pei, Ting-Hang; Yen, Jia-Yush

    2010-03-01

    Electron-beam lithography is promising for future manufacturing technology because it does not suffer from wavelength limits set by light sources. Since single electron-beam lithography systems have a common problem in throughput, a multi-electron-beam lithography (MEBL) system should be a feasible alternative using the concept of massive parallelism. In this paper, we evaluate the advantages and the disadvantages of different MEBL system architectures, and propose our novel Massively Parallel MaskLess Lithography System, MPML2. MPML2 system is targeting for cost-effective manufacturing at the 32nm node and beyond. The key structure of the proposed system is its beamlet array cells (BACs). Hundreds of BACs are uniformly arranged over the whole wafer area in the proposed system. Each BAC has a data processor and an array of beamlets, and each beamlet consists of an electron-beam source, a source controller, a set of electron lenses, a blanker, a deflector, and an electron detector. These essential parts of beamlets are integrated using MEMS technology, which increases the density of beamlets and reduces the system cost. The data processor in the BAC processes layout information coming off-chamber and dispatches them to the corresponding beamlet to control its ON/OFF status. High manufacturing cost of masks is saved in maskless lithography systems, however, immense mask data are needed to be handled and transmitted. Therefore, data compression technique is applied to reduce required transmission bandwidth. The compression algorithm is fast and efficient so that the real-time decoder can be implemented on-chip. Consequently, the proposed MPML2 can achieve 10 wafers per hour (WPH) throughput for 300mm-wafer systems.

  3. Massively parallel simulation with DOE's ASCI supercomputers : an overview of the Los Alamos Crestone project

    SciTech Connect

    Weaver, R. P.; Gittings, M. L.

    2004-01-01

    The Los Alamos Crestone Project is part of the Department of Energy's (DOE) Accelerated Strategic Computing Initiative, or ASCI Program. The main goal of this software development project is to investigate the use of continuous adaptive mesh refinement (CAMR) techniques for application to problems of interest to the Laboratory. There are many code development efforts in the Crestone Project, both unclassified and classified codes. In this overview I will discuss the unclassified SAGE and the RAGE codes. The SAGE (SAIC adaptive grid Eulerian) code is a one-, two-, and three-dimensional multimaterial Eulerian massively parallel hydrodynamics code for use in solving a variety of high-deformation flow problems. The RAGE CAMR code is built from the SAGE code by adding various radiation packages, improved setup utilities and graphics packages and is used for problems in which radiation transport of energy is important. The goal of these massively-parallel versions of the codes is to run extremely large problems in a reasonable amount of calendar time. Our target is scalable performance to {approx}10,000 processors on a 1 billion CAMR computational cell problem that requires hundreds of variables per cell, multiple physics packages (e.g. radiation and hydrodynamics), and implicit matrix solves for each cycle. A general description of the RAGE code has been published in [l],[ 2], [3] and [4]. Currently, the largest simulations we do are three-dimensional, using around 500 million computation cells and running for literally months of calendar time using {approx}2000 processors. Current ASCI platforms range from several 3-teraOPS supercomputers to one 12-teraOPS machine at Lawrence Livermore National Laboratory, the White machine, and one 20-teraOPS machine installed at Los Alamos, the Q machine. Each machine is a system comprised of many component parts that must perform in unity for the successful run of these simulations. Key features of any massively parallel system

  4. Development of the 3D Parallel Particle-In-Cell Code IMPACT to Simulate the Ion Beam Transport System of VENUS (Abstract)

    SciTech Connect

    Qiang, J.; Leitner, D.; Todd, D.S.; Ryne, R.D.

    2005-03-15

    The superconducting ECR ion source VENUS serves as the prototype injector ion source for the Rare Isotope Accelerator (RIA) driver linac. The RIA driver linac requires a great variety of high charge state ion beams with up to an order of magnitude higher intensity than currently achievable with conventional ECR ion sources. In order to design the beam line optics of the low energy beam line for the RIA front end for the wide parameter range required for the RIA driver accelerator, reliable simulations of the ion beam extraction from the ECR ion source through the ion mass analyzing system are essential. The RIA low energy beam transport line must be able to transport intense beams (up to 10 mA) of light and heavy ions at 30 keV.For this purpose, LBNL is developing the parallel 3D particle-in-cell code IMPACT to simulate the ion beam transport from the ECR extraction aperture through the analyzing section of the low energy transport system. IMPACT, a parallel, particle-in-cell code, is currently used to model the superconducting RF linac section of RIA and is being modified in order to simulate DC beams from the ECR ion source extraction. By using the high performance of parallel supercomputing we will be able to account consistently for the changing space charge in the extraction region and the analyzing section. A progress report and early results in the modeling of the VENUS source will be presented.

  5. Development of the 3D Parallel Particle-In-Cell Code IMPACT to Simulate the Ion Beam Transport System of VENUS (Abstract)

    NASA Astrophysics Data System (ADS)

    Qiang, J.; Leitner, D.; Todd, D. S.; Ryne, R. D.

    2005-03-01

    The superconducting ECR ion source VENUS serves as the prototype injector ion source for the Rare Isotope Accelerator (RIA) driver linac. The RIA driver linac requires a great variety of high charge state ion beams with up to an order of magnitude higher intensity than currently achievable with conventional ECR ion sources. In order to design the beam line optics of the low energy beam line for the RIA front end for the wide parameter range required for the RIA driver accelerator, reliable simulations of the ion beam extraction from the ECR ion source through the ion mass analyzing system are essential. The RIA low energy beam transport line must be able to transport intense beams (up to 10 mA) of light and heavy ions at 30 keV. For this purpose, LBNL is developing the parallel 3D particle-in-cell code IMPACT to simulate the ion beam transport from the ECR extraction aperture through the analyzing section of the low energy transport system. IMPACT, a parallel, particle-in-cell code, is currently used to model the superconducting RF linac section of RIA and is being modified in order to simulate DC beams from the ECR ion source extraction. By using the high performance of parallel supercomputing we will be able to account consistently for the changing space charge in the extraction region and the analyzing section. A progress report and early results in the modeling of the VENUS source will be presented.

  6. Integration Architecture of Content Addressable Memory and Massive-Parallel Memory-Embedded SIMD Matrix for Versatile Multimedia Processor

    NASA Astrophysics Data System (ADS)

    Kumaki, Takeshi; Ishizaki, Masakatsu; Koide, Tetsushi; Mattausch, Hans Jürgen; Kuroda, Yasuto; Gyohten, Takayuki; Noda, Hideyuki; Dosaka, Katsumi; Arimoto, Kazutami; Saito, Kazunori

    This paper presents an integration architecture of content addressable memory (CAM) and a massive-parallel memory-embedded SIMD matrix for constructing a versatile multimedia processor. The massive-parallel memory-embedded SIMD matrix has 2,048 2-bit processing elements, which are connected by a flexible switching network, and supports 2-bit 2,048-way bit-serial and word-parallel operations with a single command. The SIMD matrix architecture is verified to be a better way for processing the repeated arithmetic operation types in multimedia applications. The proposed architecture, reported in this paper, exploits in addition CAM technology and enables therefore fast pipelined table-lookup coding operations. Since both arithmetic and table-lookup operations execute extremely fast, the proposed novel architecture can realize consequently efficient and versatile multimedia data processing. Evaluation results of the proposed CAM-enhanced massive-parallel SIMD matrix processor for the example of the frequently used JPEG image-compression application show that the necessary clock cycle number can be reduced by 86% in comparison to a conventional mobile DSP architecture. The determined performances in Mpixel/mm2 are factors 3.3 and 4.4 better than with a CAM-less massive-parallel memory-embedded SIMD matrix processor and a conventional mobile DSP, respectively.

  7. The divide-expand-consolidate MP2 scheme goes massively parallel

    NASA Astrophysics Data System (ADS)

    Kristensen, Kasper; Kjærgaard, Thomas; Høyvik, Ida-Marie; Ettenhuber, Patrick; Jørgensen, Poul; Jansik, Branislav; Reine, Simen; Jakowski, Jacek

    2013-07-01

    For large molecular systems conventional implementations of second order Møller-Plesset (MP2) theory encounter a scaling wall, both memory- and time-wise. We describe how this scaling wall can be removed. We present a massively parallel algorithm for calculating MP2 energies and densities using the divide-expand-consolidate scheme where a calculation on a large system is divided into many small fragment calculations employing local orbital spaces. The resulting algorithm is linear-scaling with system size, exhibits near perfect parallel scalability, removes memory bottlenecks and does not involve any I/O. The algorithm employs three levels of parallelisation combined via a dynamic job distribution scheme. Results on two molecular systems containing 528 and 1056 atoms (4278 and 8556 basis functions) using 47,120 and 94,240 cores are presented. The results demonstrate the scalability of the algorithm both with respect to the number of cores and with respect to system size. The presented algorithm is thus highly suited for large super computer architectures and allows MP2 calculations on large molecular systems to be carried out within a few hours - for example, the correlated calculation on the molecular system containing 1056 atoms took 2.37 hours using 94240 cores.

  8. CRISPR–Cas9-targeted fragmentation and selective sequencing enable massively parallel microsatellite analysis

    PubMed Central

    Shin, GiWon; Grimes, Susan M.; Lee, HoJoon; Lau, Billy T.; Xia, Li C.; Ji, Hanlee P.

    2017-01-01

    Microsatellites are multi-allelic and composed of short tandem repeats (STRs) with individual motifs composed of mononucleotides, dinucleotides or higher including hexamers. Next-generation sequencing approaches and other STR assays rely on a limited number of PCR amplicons, typically in the tens. Here, we demonstrate STR-Seq, a next-generation sequencing technology that analyses over 2,000 STRs in parallel, and provides the accurate genotyping of microsatellites. STR-Seq employs in vitro CRISPR–Cas9-targeted fragmentation to produce specific DNA molecules covering the complete microsatellite sequence. Amplification-free library preparation provides single molecule sequences without unique molecular barcodes. STR-selective primers enable massively parallel, targeted sequencing of large STR sets. Overall, STR-Seq has higher throughput, improved accuracy and provides a greater number of informative haplotypes compared with other microsatellite analysis approaches. With these new features, STR-Seq can identify a 0.1% minor genome fraction in a DNA mixture composed of different, unrelated samples. PMID:28169275

  9. GPAW - massively parallel electronic structure calculations with Python-based software.

    SciTech Connect

    Enkovaara, J.; Romero, N.; Shende, S.; Mortensen, J.

    2011-01-01

    Electronic structure calculations are a widely used tool in materials science and large consumer of supercomputing resources. Traditionally, the software packages for these kind of simulations have been implemented in compiled languages, where Fortran in its different versions has been the most popular choice. While dynamic, interpreted languages, such as Python, can increase the effciency of programmer, they cannot compete directly with the raw performance of compiled languages. However, by using an interpreted language together with a compiled language, it is possible to have most of the productivity enhancing features together with a good numerical performance. We have used this approach in implementing an electronic structure simulation software GPAW using the combination of Python and C programming languages. While the chosen approach works well in standard workstations and Unix environments, massively parallel supercomputing systems can present some challenges in porting, debugging and profiling the software. In this paper we describe some details of the implementation and discuss the advantages and challenges of the combined Python/C approach. We show that despite the challenges it is possible to obtain good numerical performance and good parallel scalability with Python based software.

  10. Automation of Molecular-Based Analyses: A Primer on Massively Parallel Sequencing

    PubMed Central

    Nguyen, Lan; Burnett, Leslie

    2014-01-01

    Recent advances in genetics have been enabled by new genetic sequencing techniques called massively parallel sequencing (MPS) or next-generation sequencing. Through the ability to sequence in parallel hundreds of thousands to millions of DNA fragments, the cost and time required for sequencing has dramatically decreased. There are a number of different MPS platforms currently available and being used in Australia. Although they differ in the underlying technology involved, their overall processes are very similar: DNA fragmentation, adaptor ligation, immobilisation, amplification, sequencing reaction and data analysis. MPS is being used in research, translational and increasingly now also in clinical settings. Common applications include sequencing of whole genomes, whole exomes or targeted genes for disease-causing gene discovery, genetic diagnosis and targeted cancer therapy. Even though the revolution that is occurring with MPS is exciting due to its increasing use, improving and emerging technologies and new applications, significant challenges still exist. Particularly challenging issues are the bioinformatics required for data analysis, interpretation of results and the ethical dilemma of ‘incidental findings’. PMID:25336762

  11. Rigid body constraints realized in massively-parallel molecular dynamics on graphics processing units

    NASA Astrophysics Data System (ADS)

    Nguyen, Trung Dac; Phillips, Carolyn L.; Anderson, Joshua A.; Glotzer, Sharon C.

    2011-11-01

    Molecular dynamics (MD) methods compute the trajectory of a system of point particles in response to a potential function by numerically integrating Newton's equations of motion. Extending these basic methods with rigid body constraints enables composite particles with complex shapes such as anisotropic nanoparticles, grains, molecules, and rigid proteins to be modeled. Rigid body constraints are added to the GPU-accelerated MD package, HOOMD-blue, version 0.10.0. The software can now simulate systems of particles, rigid bodies, or mixed systems in microcanonical (NVE), canonical (NVT), and isothermal-isobaric (NPT) ensembles. It can also apply the FIRE energy minimization technique to these systems. In this paper, we detail the massively parallel scheme that implements these algorithms and discuss how our design is tuned for the maximum possible performance. Two different case studies are included to demonstrate the performance attained, patchy spheres and tethered nanorods. In typical cases, HOOMD-blue on a single GTX 480 executes 2.5-3.6 times faster than LAMMPS executing the same simulation on any number of CPU cores in parallel. Simulations with rigid bodies may now be run with larger systems and for longer time scales on a single workstation than was previously even possible on large clusters.

  12. GPU-accelerated Tersoff potentials for massively parallel Molecular Dynamics simulations

    NASA Astrophysics Data System (ADS)

    Nguyen, Trung Dac

    2017-03-01

    The Tersoff potential is one of the empirical many-body potentials that has been widely used in simulation studies at atomic scales. Unlike pair-wise potentials, the Tersoff potential involves three-body terms, which require much more arithmetic operations and data dependency. In this contribution, we have implemented the GPU-accelerated version of several variants of the Tersoff potential for LAMMPS, an open-source massively parallel Molecular Dynamics code. Compared to the existing MPI implementation in LAMMPS, the GPU implementation exhibits a better scalability and offers a speedup of 2.2X when run on 1000 compute nodes on the Titan supercomputer. On a single node, the speedup ranges from 2.0 to 8.0 times, depending on the number of atoms per GPU and hardware configurations. The most notable features of our GPU-accelerated version include its design for MPI/accelerator heterogeneous parallelism, its compatibility with other functionalities in LAMMPS, its ability to give deterministic results and to support both NVIDIA CUDA- and OpenCL-enabled accelerators. Our implementation is now part of the GPU package in LAMMPS and accessible for public use.

  13. A Fast Parallel Simulation Code for Interaction between Proto-Planetary Disk and Embedded Proto-Planets: Implementation for 3D Code

    SciTech Connect

    Li, Shengtai; Li, Hui

    2012-06-14

    the position of the planet, we adopt the corotating frame that allows the planet moving only in radial direction if only one planet is present. This code has been extensively tested on a number of problems. For the earthmass planet with constant aspect ratio h = 0.05, the torque calculated using our code matches quite well with the the 3D linear theory results by Tanaka et al. (2002). The code is fully parallelized via message-passing interface (MPI) and has very high parallel efficiency. Several numerical examples for both fixed planet and moving planet are provided to demonstrate the efficacy of the numerical method and code.

  14. A task-based parallelism and vectorized approach to 3D Method of Characteristics (MOC) reactor simulation for high performance computing architectures

    NASA Astrophysics Data System (ADS)

    Tramm, John R.; Gunow, Geoffrey; He, Tim; Smith, Kord S.; Forget, Benoit; Siegel, Andrew R.

    2016-05-01

    In this study we present and analyze a formulation of the 3D Method of Characteristics (MOC) technique applied to the simulation of full core nuclear reactors. Key features of the algorithm include a task-based parallelism model that allows independent MOC tracks to be assigned to threads dynamically, ensuring load balancing, and a wide vectorizable inner loop that takes advantage of modern SIMD computer architectures. The algorithm is implemented in a set of highly optimized proxy applications in order to investigate its performance characteristics on CPU, GPU, and Intel Xeon Phi architectures. Speed, power, and hardware cost efficiencies are compared. Additionally, performance bottlenecks are identified for each architecture in order to determine the prospects for continued scalability of the algorithm on next generation HPC architectures.

  15. A Parallel 3D Spectral Difference Method for Solutions of Compressible Navier Stokes Equations on Deforming Grids and Simulations of Vortex Induced Vibration

    NASA Astrophysics Data System (ADS)

    DeJong, Andrew

    Numerical models of fluid-structure interaction have grown in importance due to increasing interest in environmental energy harvesting, airfoil-gust interactions, and bio-inspired formation flying. Powered by increasingly powerful parallel computers, such models seek to explain the fundamental physics behind the complex, unsteady fluid-structure phenomena. To this end, a high-fidelity computational model based on the high-order spectral difference method on 3D unstructured, dynamic meshes has been developed. The spectral difference method constructs continuous solution fields within each element with a Riemann solver to compute the inviscid fluxes at the element interfaces and an averaging mechanism to compute the viscous fluxes. This method has shown promise in the past as a highly accurate, yet sufficiently fast method for solving unsteady viscous compressible flows. The solver is monolithically coupled to the equations of motion of an elastically mounted 3-degree of freedom rigid bluff body undergoing flow-induced lift, drag, and torque. The mesh is deformed using 4 methods: an analytic function, Laplace equation, biharmonic equation, and a bi-elliptic equation with variable diffusivity. This single system of equations -- fluid and structure -- is advanced through time using a 5-stage, 4th-order Runge-Kutta scheme. Message Passing Interface is used to run the coupled system in parallel on up to 240 processors. The solver is validated against previously published numerical and experimental data for an elastically mounted cylinder. The effect of adding an upstream body and inducing wake galloping is observed.

  16. Switching dynamics of thin film ferroelectric devices - a massively parallel phase field study

    NASA Astrophysics Data System (ADS)

    Ashraf, Md. Khalid

    In this thesis, we investigate the switching dynamics in thin film ferroelectrics. Ferroelectric materials are of inherent interest for low power and multi-functional devices. However, possible device applications of these materials have been limited due to the poorly understood electromagnetic and mechanical response at the nanoscale in arbitrary device structures. The difficulty in understanding switching dynamics mainly arises from the presence of features at multiple length scales and the nonlinearity associated with the strongly coupled states. For example, in a ferroelectric material, the domain walls are of nm size whereas the domain pattern forms at micron scale. The switching is determined by coupled chemical, electrostatic, mechanical and thermal interactions. Thus computational understanding of switching dynamics in thin film ferroelectrics and a direct comparison with experiment poses a significant numerical challenge. We have developed a phase field model that describes the physics of polarization dynamics at the microscopic scale. A number of efficient numerical methods have been applied for achieving massive parallelization of all the calculation steps. Conformally mapped elements, node wise assembly and prevention of dynamic loading minimized the communication between processors and increased the parallelization efficiency. With these improvements, we have reached the experimental scale - a significant step forward compared to the state of the art thin film ferroelectric switching dynamics models. Using this model, we elucidated the switching dynamics on multiple surfaces of the multiferroic material BFO. We also calculated the switching energy of scaled BFO islands. Finally, we studied the interaction of domain wall propagation with misfit dislocations in the thin film. We believe that the model will be useful in understanding the switching dynamics in many different experimental setups incorporating thin film ferroelectrics.

  17. Hierarchical Image Segmentation of Remotely Sensed Data using Massively Parallel GNU-LINUX Software

    NASA Technical Reports Server (NTRS)

    Tilton, James C.

    2003-01-01

    A hierarchical set of image segmentations is a set of several image segmentations of the same image at different levels of detail in which the segmentations at coarser levels of detail can be produced from simple merges of regions at finer levels of detail. In [1], Tilton, et a1 describes an approach for producing hierarchical segmentations (called HSEG) and gave a progress report on exploiting these hierarchical segmentations for image information mining. The HSEG algorithm is a hybrid of region growing and constrained spectral clustering that produces a hierarchical set of image segmentations based on detected convergence points. In the main, HSEG employs the hierarchical stepwise optimization (HSWO) approach to region growing, which was described as early as 1989 by Beaulieu and Goldberg. The HSWO approach seeks to produce segmentations that are more optimized than those produced by more classic approaches to region growing (e.g. Horowitz and T. Pavlidis, [3]). In addition, HSEG optionally interjects between HSWO region growing iterations, merges between spatially non-adjacent regions (i.e., spectrally based merging or clustering) constrained by a threshold derived from the previous HSWO region growing iteration. While the addition of constrained spectral clustering improves the utility of the segmentation results, especially for larger images, it also significantly increases HSEG s computational requirements. To counteract this, a computationally efficient recursive, divide-and-conquer, implementation of HSEG (RHSEG) was devised, which includes special code to avoid processing artifacts caused by RHSEG s recursive subdivision of the image data. The recursive nature of RHSEG makes for a straightforward parallel implementation. This paper describes the HSEG algorithm, its recursive formulation (referred to as RHSEG), and the implementation of RHSEG using massively parallel GNU-LINUX software. Results with Landsat TM data are included comparing RHSEG with classic

  18. High-order accurate solution of the incompressible Navier-Stokes equations on massively parallel computers

    NASA Astrophysics Data System (ADS)

    Henniger, R.; Obrist, D.; Kleiser, L.

    2010-05-01

    The emergence of "petascale" supercomputers requires us to develop today's simulation codes for (incompressible) flows by codes which are using numerical schemes and methods that are better able to exploit the offered computational power. In that spirit, we present a massively parallel high-order Navier-Stokes solver for large incompressible flow problems in three dimensions. The governing equations are discretized with finite differences in space and a semi-implicit time integration scheme. This discretization leads to a large linear system of equations which is solved with a cascade of iterative solvers. The iterative solver for the pressure uses a highly efficient commutation-based preconditioner which is robust with respect to grid stretching. The efficiency of the implementation is further enhanced by carefully setting the (adaptive) termination criteria for the different iterative solvers. The computational work is distributed to different processing units by a geometric data decomposition in all three dimensions. This decomposition scheme ensures a low communication overhead and excellent scaling capabilities. The discretization is thoroughly validated. First, we verify the convergence orders of the spatial and temporal discretizations for a forced channel flow. Second, we analyze the iterative solution technique by investigating the absolute accuracy of the implementation with respect to the different termination criteria. Third, Orr-Sommerfeld and Squire eigenmodes for plane Poiseuille flow are simulated and compared to analytical results. Fourth, the practical applicability of the implementation is tested for transitional and turbulent channel flow. The results are compared to solutions from a pseudospectral solver. Subsequently, the performance of the commutation-based preconditioner for the pressure iteration is demonstrated. Finally, the excellent parallel scalability of the proposed method is demonstrated with a weak and a strong scaling test on up to

  19. Massively parallel computation of absolute binding free energy with well-equilibrated states

    NASA Astrophysics Data System (ADS)

    Fujitani, Hideaki; Tanida, Yoshiaki; Matsuura, Azuma

    2009-02-01

    A force field formulator for organic molecules (FF-FOM) was developed to assign bond, angle, and dihedral parameters to arbitrary organic molecules in a unified manner including proteins and nucleic acids. With the unified force field parametrization we performed massively parallel computations of absolute binding free energies for pharmaceutical target proteins and ligands. Compared with the previous calculation with the ff99 force field in the Amber simulation package (Amber99) and the ligand charges produced by the Austin Model 1 bond charge correction (AM1-BCC), the unified parametrization gave better absolute binding energies for the FK506 binding protein (FKBP) and ligand system. Our method is based on extensive work measurement between thermodynamic states to calculate the free energy difference and it is also the same as the traditional free energy perturbation. There are important requirements for accurate calculations. The first is a well-equilibrated bound structure including the conformational change of the protein induced by the binding of the ligand. The second requirement is the convergence of the work distribution with a sufficient number of trajectories and dense spacing of the coupling constant between the ligand and the rest of the system. Finally, the most important requirement is the force field parametrization.

  20. Unique archaeal assemblages in the Arctic Ocean unveiled by massively parallel tag sequencing.

    PubMed

    Galand, Pierre E; Casamayor, Emilio O; Kirchman, David L; Potvin, Marianne; Lovejoy, Connie

    2009-07-01

    The Arctic Ocean plays a critical role in controlling nutrient budgets between the Pacific and Atlantic Ocean. Archaea are key players in the nitrogen cycle and in cycling nutrients, but their community composition has been little studied in the Arctic Ocean. Here, we characterize archaeal assemblages from surface and deep Arctic water masses using massively parallel tag sequencing of the V6 region of the 16S rRNA gene. This approach gave a very high coverage of the natural communities, allowing a precise description of archaeal assemblages. This first taxonomic description of archaeal communities by tag sequencing reported so far shows that it is possible to assign an identity below phylum level to most (95%) of the archaeal V6 tags, and shows that tag sequencing is a powerful tool for resolving the diversity and distribution of specific microbes in the environment. Marine group I Crenarchaeota was overall the most abundant group in the Arctic Ocean and comprised between 27% and 63% of all tags. Group III Euryarchaeota were more abundant in deep-water masses and represented the largest archaeal group in the deep Atlantic layer of the central Arctic Ocean. Coastal surface waters, in turn, harbored more group II Euryarchaeota. Moreover, group II sequences that dominated surface waters were different from the group II sequences detected in deep waters, suggesting functional differences in closely related groups. Our results unveiled for the first time an archaeal community dominated by group III Euryarchaeota and show biogeographical traits for marine Arctic Archaea.

  1. Photo-patterned free-standing hydrogel microarrays for massively parallel protein analysis

    NASA Astrophysics Data System (ADS)

    Duncombe, Todd A.; Herr, Amy E.

    2015-03-01

    Microfluidic technologies have largely been realized within enclosed microchannels. While powerful, a principle limitation of closed-channel microfluidics is the difficulty for sample extraction and downstream processing. To address this limitation and expand the utility of microfluidic analytical separation tools, we developed an openchannel hydrogel architecture for rapid protein analysis. Designed for compatibility with slab-gel polyacrylamide gel electrophoresis (PAGE) reagents and instruments, we detail the development of free-standing polyacrylamide gel (fsPAG) microstructures supporting electrophoretic performance rivalling that of microfluidic platforms. Owing to its open architecture - the platform can be easily interfaced with automated robotic controllers and downstream processing (e.g., sample spotters, immunological probing, mass spectroscopy). The fsPAG devices are directly photopatterened atop of and covalently attached to planar polymer or glass surfaces. Due to the fast < 1 hr design-prototype-test cycle - significantly faster than mold based fabrication techniques - rapid prototyping devices with fsPAG microstructures provides researchers a powerful tool for developing custom analytical assays. Leveraging the rapid prototyping benefits - we up-scale from a unit separation to an array of 96 concurrent fsPAGE assays in 10 min run time driven by one electrode pair. The fsPAGE platform is uniquely well-suited for massively parallelized proteomics, a major unrealized goal from bioanalytical technology.

  2. Mitochondrial DNA heteroplasmy in the emerging field of massively parallel sequencing

    PubMed Central

    Just, Rebecca S.; Irwin, Jodi A.; Parson, Walther

    2015-01-01

    Long an important and useful tool in forensic genetic investigations, mitochondrial DNA (mtDNA) typing continues to mature. Research in the last few years has demonstrated both that data from the entire molecule will have practical benefits in forensic DNA casework, and that massively parallel sequencing (MPS) methods will make full mitochondrial genome (mtGenome) sequencing of forensic specimens feasible and cost-effective. A spate of recent studies has employed these new technologies to assess intraindividual mtDNA variation. However, in several instances, contamination and other sources of mixed mtDNA data have been erroneously identified as heteroplasmy. Well vetted mtGenome datasets based on both Sanger and MPS sequences have found authentic point heteroplasmy in approximately 25% of individuals when minor component detection thresholds are in the range of 10–20%, along with positional distribution patterns in the coding region that differ from patterns of point heteroplasmy in the well-studied control region. A few recent studies that examined very low-level heteroplasmy are concordant with these observations when the data are examined at a common level of resolution. In this review we provide an overview of considerations related to the use of MPS technologies to detect mtDNA heteroplasmy. In addition, we examine published reports on point heteroplasmy to characterize features of the data that will assist in the evaluation of future mtGenome data developed by any typing method. PMID:26009256

  3. Breast cancer genomics from microarrays to massively parallel sequencing: paradigms and new insights.

    PubMed

    Ng, Charlotte K Y; Schultheis, Anne M; Bidard, Francois-Clement; Weigelt, Britta; Reis-Filho, Jorge S

    2015-02-23

    Rapid advancements in massively parallel sequencing methods have enabled the analysis of breast cancer genomes at an unprecedented resolution, which have revealed the remarkable heterogeneity of the disease. As a result, we now accept that despite originating in the breast, estrogen receptor (ER)-positive and ER-negative breast cancers are completely different diseases at the molecular level. It has become apparent that there are very few highly recurrently mutated genes such as TP53, PIK3CA, and GATA3, that no two breast cancers display an identical repertoire of somatic genetic alterations at base-pair resolution and that there might not be a single highly recurrently mutated gene that defines each of the "intrinsic" subtypes of breast cancer (ie, basal-like, HER2-enriched, luminal A, and luminal B). Breast cancer heterogeneity, however, extends beyond the diversity between tumors. There is burgeoning evidence to demonstrate that at least some primary breast cancers are composed of multiple, genetically diverse clones at diagnosis and that metastatic lesions may differ in their repertoire of somatic genetic alterations when compared with their respective primary tumors. Several biological phenomena may shape the reported intratumor genetic heterogeneity observed in breast cancers, including the different mutational processes and multiple types of genomic instability. Harnessing the emerging concepts of the diversity of breast cancer genomes and the phenomenon of intratumor genetic heterogeneity will be essential for the development of optimal methods for diagnosis, disease monitoring, and the matching of patients to the drugs that would benefit them the most.

  4. Massively parallel network architectures for automatic recognition of visual speech signals. Final technical report

    SciTech Connect

    Sejnowski, T.J.; Goldstein, M.

    1990-01-01

    This research sought to produce a massively-parallel network architecture that could interpret speech signals from video recordings of human talkers. This report summarizes the project's results: (1) A corpus of video recordings from two human speakers was analyzed with image processing techniques and used as the data for this study; (2) We demonstrated that a feed forward network could be trained to categorize vowels from these talkers. The performance was comparable to that of the nearest neighbors techniques and to trained humans on the same data; (3) We developed a novel approach to sensory fusion by training a network to transform from facial images to short-time spectral amplitude envelopes. This information can be used to increase the signal-to-noise ratio and hence the performance of acoustic speech recognition systems in noisy environments; (4) We explored the use of recurrent networks to perform the same mapping for continuous speech. Results of this project demonstrate the feasibility of adding a visual speech recognition component to enhance existing speech recognition systems. Such a combined system could be used in noisy environments, such as cockpits, where improved communication is needed. This demonstration of presymbolic fusion of visual and acoustic speech signals is consistent with our current understanding of human speech perception.

  5. GRay: A MASSIVELY PARALLEL GPU-BASED CODE FOR RAY TRACING IN RELATIVISTIC SPACETIMES

    SciTech Connect

    Chan, Chi-kwan; Psaltis, Dimitrios; Özel, Feryal

    2013-11-01

    We introduce GRay, a massively parallel integrator designed to trace the trajectories of billions of photons in a curved spacetime. This graphics-processing-unit (GPU)-based integrator employs the stream processing paradigm, is implemented in CUDA C/C++, and runs on nVidia graphics cards. The peak performance of GRay using single-precision floating-point arithmetic on a single GPU exceeds 300 GFLOP (or 1 ns per photon per time step). For a realistic problem, where the peak performance cannot be reached, GRay is two orders of magnitude faster than existing central-processing-unit-based ray-tracing codes. This performance enhancement allows more effective searches of large parameter spaces when comparing theoretical predictions of images, spectra, and light curves from the vicinities of compact objects to observations. GRay can also perform on-the-fly ray tracing within general relativistic magnetohydrodynamic algorithms that simulate accretion flows around compact objects. Making use of this algorithm, we calculate the properties of the shadows of Kerr black holes and the photon rings that surround them. We also provide accurate fitting formulae of their dependencies on black hole spin and observer inclination, which can be used to interpret upcoming observations of the black holes at the center of the Milky Way, as well as M87, with the Event Horizon Telescope.

  6. Targeted massively parallel sequencing provides comprehensive genetic diagnosis for patients with disorders of sex development.

    PubMed

    Arboleda, V A; Lee, H; Sánchez, F J; Délot, E C; Sandberg, D E; Grody, W W; Nelson, S F; Vilain, E

    2013-01-01

    Disorders of sex development (DSD) are rare disorders in which there is discordance between chromosomal, gonadal, and phenotypic sex. Only a minority of patients clinically diagnosed with DSD obtains a molecular diagnosis, leaving a large gap in our understanding of the prevalence, management, and outcomes in affected patients. We created a novel DSD-genetic diagnostic tool, in which sex development genes are captured using RNA probes and undergo massively parallel sequencing. In the pilot group of 14 patients, we determined sex chromosome dosage, copy number variation, and gene mutations. In the patients with a known genetic diagnosis (obtained either on a clinical or research basis), this test identified the molecular cause in 100% (7/7) of patients. In patients in whom no molecular diagnosis had been made, this tool identified a genetic diagnosis in two of seven patients. Targeted sequencing of genes representing a specific spectrum of disorders can result in a higher rate of genetic diagnoses than current diagnostic approaches. Our DSD diagnostic tool provides for first time, in a single blood test, a comprehensive genetic diagnosis in patients presenting with a wide range of urogenital anomalies.

  7. CuBIC: cumulant based inference of higher-order correlations in massively parallel spike trains

    PubMed Central

    Rotter, Stefan; Grün, Sonja

    2009-01-01

    Recent developments in electrophysiological and optical recording techniques enable the simultaneous observation of large numbers of neurons. A meaningful interpretation of the resulting multivariate data, however, presents a serious challenge. In particular, the estimation of higher-order correlations that characterize the cooperative dynamics of groups of neurons is impeded by the combinatorial explosion of the parameter space. The resulting requirements with respect to sample size and recording time has rendered the detection of coordinated neuronal groups exceedingly difficult. Here we describe a novel approach to infer higher-order correlations in massively parallel spike trains that is less susceptible to these problems. Based on the superimposed activity of all recorded neurons, the cumulant-based inference of higher-order correlations (CuBIC) presented here exploits the fact that the absence of higher-order correlations imposes also strong constraints on correlations of lower order. Thus, estimates of only few lower-order cumulants suffice to infer higher-order correlations in the population. As a consequence, CuBIC is much better compatible with the constraints of in vivo recordings than previous approaches, which is shown by a systematic analysis of its parameter dependence. PMID:19862611

  8. An atlas of human gene expression from massively parallel signature sequencing (MPSS)

    PubMed Central

    Jongeneel, C. Victor; Delorenzi, Mauro; Iseli, Christian; Zhou, Daixing; Haudenschild, Christian D.; Khrebtukova, Irina; Kuznetsov, Dmitry; Stevenson, Brian J.; Strausberg, Robert L.; Simpson, Andrew J.G.; Vasicek, Thomas J.

    2005-01-01

    We have used massively parallel signature sequencing (MPSS) to sample the transcriptomes of 32 normal human tissues to an unprecedented depth, thus documenting the patterns of expression of almost 20,000 genes with high sensitivity and specificity. The data confirm the widely held belief that differences in gene expression between cell and tissue types are largely determined by transcripts derived from a limited number of tissue-specific genes, rather than by combinations of more promiscuously expressed genes. Expression of a little more than half of all known human genes seems to account for both the common requirements and the specific functions of the tissues sampled. A classification of tissues based on patterns of gene expression largely reproduces classifications based on anatomical and biochemical properties. The unbiased sampling of the human transcriptome achieved by MPSS supports the idea that most human genes have been mapped, if not functionally characterized. This data set should prove useful for the identification of tissue-specific genes, for the study of global changes induced by pathological conditions, and for the definition of a minimal set of genes necessary for basic cell maintenance. The data are available on the Web at http://mpss.licr.org and http://sgb.lynxgen.com. PMID:15998913

  9. Tracking the roots of cellulase hyperproduction by the fungus Trichoderma reesei using massively parallel DNA sequencing

    SciTech Connect

    Le Crom, Stphane; Schackwitz, Wendy; Pennacchiod, Len; Magnuson, Jon K.; Culley, David E.; Collett, James R.; Martin, Joel X.; Druzhinina, Irina S.; Mathis, Hugues; Monot, Frdric; Seiboth, Bernhard; Cherry, Barbara; Rey, Michael; Berka, Randy; Kubicek, Christian P.; Baker, Scott E.; Margeot, Antoine

    2009-09-22

    Trichoderma reesei (teleomorph Hypocrea jecorina) is the main industrial source of cellulases and hemicellulases harnessed for the hydrolysis of biomass to simple sugars, which can then be converted to biofuels, such as ethanol, and other chemicals. The highly productive strains in use today were generated by classical mutagenesis. To learn how cellulase production was improved by these techniques, we performed massively parallel sequencing to identify mutations in the genomes of two hyperproducing strains (NG14, and its direct improved descendant, RUT C30). We detected a surprisingly high number of mutagenic events: 223 single nucleotides variants, 15 small deletions or insertions and 18 larger deletions leading to the loss of more than 100 kb of genomic DNA. From these events we report previously undocumented non-synonymous mutations in 43 genes that are mainly involved in nuclear transport, mRNA stability, transcription, secretion/vacuolar targeting, and metabolism. This homogeneity of functional categories suggests that multiple changes are necessary to improve cellulase production and not simply a few clear-cut mutagenic events. Phenotype microarrays show that some of these mutations result in strong changes in the carbon assimilation pattern of the two mutants with respect to the wild type strain QM6a. Our analysis provides the first genome-wide insights into the changes induced by classical mutagenesis in a filamentous fungus, and suggests new areas for the generation of enhanced T. reesei strains for industrial applications such as biofuel production.

  10. Tracking the roots of cellulase hyperproduction by the fungus Trichoderma reesei using massively parallel DNA sequencing.

    PubMed

    Le Crom, Stéphane; Schackwitz, Wendy; Pennacchio, Len; Magnuson, Jon K; Culley, David E; Collett, James R; Martin, Joel; Druzhinina, Irina S; Mathis, Hugues; Monot, Frédéric; Seiboth, Bernhard; Cherry, Barbara; Rey, Michael; Berka, Randy; Kubicek, Christian P; Baker, Scott E; Margeot, Antoine

    2009-09-22

    Trichoderma reesei (teleomorph Hypocrea jecorina) is the main industrial source of cellulases and hemicellulases harnessed for the hydrolysis of biomass to simple sugars, which can then be converted to biofuels such as ethanol and other chemicals. The highly productive strains in use today were generated by classical mutagenesis. To learn how cellulase production was improved by these techniques, we performed massively parallel sequencing to identify mutations in the genomes of two hyperproducing strains (NG14, and its direct improved descendant, RUT C30). We detected a surprisingly high number of mutagenic events: 223 single nucleotides variants, 15 small deletions or insertions, and 18 larger deletions, leading to the loss of more than 100 kb of genomic DNA. From these events, we report previously undocumented non-synonymous mutations in 43 genes that are mainly involved in nuclear transport, mRNA stability, transcription, secretion/vacuolar targeting, and metabolism. This homogeneity of functional categories suggests that multiple changes are necessary to improve cellulase production and not simply a few clear-cut mutagenic events. Phenotype microarrays show that some of these mutations result in strong changes in the carbon assimilation pattern of the two mutants with respect to the wild-type strain QM6a. Our analysis provides genome-wide insights into the changes induced by classical mutagenesis in a filamentous fungus and suggests areas for the generation of enhanced T. reesei strains for industrial applications such as biofuel production.

  11. Massively parallel sequencing-based survey of eukaryotic community structures in Hiroshima Bay and Ishigaki Island.

    PubMed

    Nagai, Satoshi; Hida, Kohsuke; Urusizaki, Shingo; Takano, Yoshihito; Hongo, Yuki; Kameda, Takahiko; Abe, Kazuo

    2016-02-01

    In this study, we compared the eukaryote biodiversity between Hiroshima Bay and Ishigaki Island in Japanese coastal waters by using the massively parallel sequencing (MPS)-based technique to collect preliminary data. The relative abundance of Alveolata was highest in both localities, and the second highest groups were Stramenopiles, Opisthokonta, or Hacrobia, which varied depending on the samples considered. For microalgal phyla, the relative abundance of operational taxonomic units (OTUs) and the number of MPS were highest for Dinophyceae in both localities, followed by Bacillariophyceae in Hiroshima Bay, and by Bacillariophyceae or Chlorophyceae in Ishigaki Island. The number of detected OTUs in Hiroshima Bay and Ishigaki Island was 645 and 791, respectively, and 15.3% and 12.5% of the OTUs were common between the two localities. In the non-metric multidimensional scaling analysis, the samples from the two localities were plotted in different positions. In the dendrogram developed using similarity indices, the samples were clustered into different nodes based on localities with high multiscale bootstrap values, reflecting geographic differences in biodiversity. Thus, we succeeded in demonstrating biodiversity differences between the two localities, although the read numbers of the MPSs were not high enough. The corresponding analysis showed a clear seasonal change in the biodiversity of Hiroshima Bay but it was not clear in Ishigaki Island. Thus, the MPS-based technique shows a great advantage of high performance by detecting several hundreds of OTUs from a single sample, strongly suggesting the effectiveness to apply this technique to routine monitoring programs.

  12. Clinical and ethical considerations of massively parallel sequencing in transplantation science

    PubMed Central

    Scherer, Andreas

    2013-01-01

    Massively parallel sequencing (MPS), alias next-generation sequencing, is making its way from research laboratories into applied sciences and clinics. MPS is a framework of experimental procedures which offer possibilities for genome research and genetics which could only be dreamed of until around 2005 when these technologies became available. Sequencing of a transcriptome, exome, even entire genomes is now possible within a time frame and precision that we could only hope for 10 years ago. Linking other experimental procedures with MPS enables researchers to study secondary DNA modifications across the entire genome, and protein binding sites, to name a few applications. How the advancements of sequencing technologies can contribute to transplantation science is subject of this discussion: immediate applications are in graft matching via human leukocyte antigen sequencing, as part of systems biology approaches which shed light on gene expression processes during immune response, as biomarkers of graft rejection, and to explore changes of microbiomes as a result of transplantation. Of considerable importance is the socio-ethical aspect of data ownership, privacy, informed consent, and result report to the study participant. While the technology is advancing rapidly, legislation is lagging behind due to the globalisation of data requisition, banking and sharing. PMID:24392310

  13. Adaptive Flow Simulation of Turbulence in Subject-Specific Abdominal Aortic Aneurysm on Massively Parallel Computers

    NASA Astrophysics Data System (ADS)

    Sahni, Onkar; Jansen, Kenneth; Shephard, Mark; Taylor, Charles

    2007-11-01

    Flow within the healthy human vascular system is typically laminar but diseased conditions can alter the geometry sufficiently to produce transitional/turbulent flows in regions focal (and immediately downstream) of the diseased section. The mean unsteadiness (pulsatile or respiratory cycle) further complicates the situation making traditional turbulence simulation techniques (e.g., Reynolds-averaged Navier-Stokes simulations (RANSS)) suspect. At the other extreme, direct numerical simulation (DNS) while fully appropriate can lead to large computational expense, particularly when the simulations must be done quickly since they are intended to affect the outcome of a medical treatment (e.g., virtual surgical planning). To produce simulations in a clinically relevant time frame requires; 1) adaptive meshing technique that closely matches the desired local mesh resolution in all three directions to the highly anisotropic physical length scales in the flow, 2) efficient solution algorithms, and 3) excellent scaling on massively parallel computers. In this presentation we will demonstrate results for a subject-specific simulation of an abdominal aortic aneurysm using stabilized finite element method on anisotropically adapted meshes consisting of O(10^8) elements over O(10^4) processors.

  14. Comparison of Two Massively Parallel Sequencing Platforms using 83 Single Nucleotide Polymorphisms for Human Identification.

    PubMed

    Apaga, Dame Loveliness T; Dennis, Sheila E; Salvador, Jazelyn M; Calacal, Gayvelline C; De Ungria, Maria Corazon A

    2017-03-24

    The potential of Massively Parallel Sequencing (MPS) technology to vastly expand the capabilities of human identification led to the emergence of different MPS platforms that use forensically relevant genetic markers. Two of the MPS platforms that are currently available are the MiSeq(®) FGx™ Forensic Genomics System (Illumina) and the HID-Ion Personal Genome Machine (PGM)™ (Thermo Fisher Scientific). These are coupled with the ForenSeq™ DNA Signature Prep kit (Illumina) and the HID-Ion AmpliSeq™ Identity Panel (Thermo Fisher Scientific), respectively. In this study, we compared the genotyping performance of the two MPS systems based on 83 SNP markers that are present in both MPS marker panels. Results show that MiSeq(®) FGx™ has greater sample-to-sample variation than the HID-Ion PGM™ in terms of read counts for all the 83 SNP markers. Allele coverage ratio (ACR) values show generally balanced heterozygous reads for both platforms. Two and four SNP markers from the MiSeq(®) FGx™ and HID-Ion PGM™, respectively, have average ACR values lower than the recommended value of 0.67. Comparison of genotype calls showed 99.7% concordance between the two platforms.

  15. Massively parallel energy space exploration for uncluttered visualization of vascular structures.

    PubMed

    Jeon, Yongkweon; Won, Joong-Ho; Yoon, Sungroh

    2013-01-01

    Images captured using computed tomography and magnetic resonance angiography are used in the examination of the abdominal aorta and its branches. The examination of all clinically relevant branches simultaneously in a single 2-D image without any misleading overlaps facilitates the diagnosis of vascular abnormalities. This problem is called uncluttered single-image visualization (USIV). We can solve the USIV problem by assigning energy-based scores to visualization candidates and then finding the candidate that optimizes the score; this approach is similar to the manner in which the protein side-chain placement problem has been solved. To obtain near-optimum images, we need to explore the energy space extensively, which is often time consuming. This paper describes a method for exploring the energy space in a massively parallel fashion using graphics processing units. According to our experiments, in which we used 30 images obtained from five patients, the proposed method can reduce the total visualization time substantially. We believe that the proposed method can make a significant contribution to the effective visualization of abdominal vascular structures and precise diagnosis of related abnormalities.

  16. Development and validation of a clinical cancer genomic profiling test based on massively parallel DNA sequencing.

    PubMed

    Frampton, Garrett M; Fichtenholtz, Alex; Otto, Geoff A; Wang, Kai; Downing, Sean R; He, Jie; Schnall-Levin, Michael; White, Jared; Sanford, Eric M; An, Peter; Sun, James; Juhn, Frank; Brennan, Kristina; Iwanik, Kiel; Maillet, Ashley; Buell, Jamie; White, Emily; Zhao, Mandy; Balasubramanian, Sohail; Terzic, Selmira; Richards, Tina; Banning, Vera; Garcia, Lazaro; Mahoney, Kristen; Zwirko, Zac; Donahue, Amy; Beltran, Himisha; Mosquera, Juan Miguel; Rubin, Mark A; Dogan, Snjezana; Hedvat, Cyrus V; Berger, Michael F; Pusztai, Lajos; Lechner, Matthias; Boshoff, Chris; Jarosz, Mirna; Vietz, Christine; Parker, Alex; Miller, Vincent A; Ross, Jeffrey S; Curran, John; Cronin, Maureen T; Stephens, Philip J; Lipson, Doron; Yelensky, Roman

    2013-11-01

    As more clinically relevant cancer genes are identified, comprehensive diagnostic approaches are needed to match patients to therapies, raising the challenge of optimization and analytical validation of assays that interrogate millions of bases of cancer genomes altered by multiple mechanisms. Here we describe a test based on massively parallel DNA sequencing to characterize base substitutions, short insertions and deletions (indels), copy number alterations and selected fusions across 287 cancer-related genes from routine formalin-fixed and paraffin-embedded (FFPE) clinical specimens. We implemented a practical validation strategy with reference samples of pooled cell lines that model key determinants of accuracy, including mutant allele frequency, indel length and amplitude of copy change. Test sensitivity achieved was 95-99% across alteration types, with high specificity (positive predictive value >99%). We confirmed accuracy using 249 FFPE cancer specimens characterized by established assays. Application of the test to 2,221 clinical cases revealed clinically actionable alterations in 76% of tumors, three times the number of actionable alterations detected by current diagnostic tests.

  17. The use of targeted genomic capture and massively parallel sequencing in diagnosis of Chinese Leukoencephalopathies

    PubMed Central

    Wang, Xiaole; He, Fang; Yin, Fei; Chen, Chao; Wu, Liwen; Yang, Lifen; Peng, Jing

    2016-01-01

    Leukoencephalopathies are diseases with high clinical heterogeneity. In clinical work, it’s difficult for doctors to make a definite etiological diagnosis. Here, we designed a custom probe library which contains the known pathogenic genes reported to be associated with Leukoencephalopathies, and performed targeted gene capture and massively parallel sequencing (MPS) among 49 Chinese patients who has white matter damage as the main imaging changes, and made the validation by Sanger sequencing for the probands’ parents. As result, a total of 40.8% (20/49) of the patients identified pathogenic mutations, including four associated with metachromatic leukodystrophy, three associated with vanishing white matter leukoencephalopathy, three associated with mitochondrial complex I deficiency, one associated with Globoid cell leukodystrophy (or Krabbe diseases), three associated with megalencephalic leukoencephalopathy with subcortical cysts, two associated with Pelizaeus-Merzbacher disease, two associated with X-linked adrenoleukodystrophy, one associated with Zellweger syndrome and one associated with Alexander disease. Targeted capture and MPS enables to identify mutations of all classes causing leukoencephalopathy. Our study combines targeted capture and MPS technology with clinical and genetic diagnosis and highlights its usefulness for rapid and comprehensive genetic testing in the clinical setting. This method will also expand our knowledge of the genetic and clinical spectra of leukoencephalopathy. PMID:27779215

  18. Evaluation of a549 as a new vaccine cell substrate: digging deeper with massively parallel sequencing.

    PubMed

    Shabram, Paul; Kolman, John L

    2014-01-01

    In the past three decades, the use of tumorigenic cell substrates has been the topic of five Vaccine and Related Biological Products Advisory Committee (VRBPAC) meetings, including a review of the A549 cell line in September 2012. Over that period of time, major technological advances in biotechnology have improved our ability to assess the risk associated with using a tumorigenic cell line. As part of the September 2012 review, we assessed the history of A549 cells and evaluated the probable transforming event based on patterns of mutations to cancer genes. In addition, massively parallel sequencing was used to first screen then augment the characterization of A549 cells by searching for the presence of hidden viral threats using sequencing of the entire cellular transcriptome and comparing sequences to a curated viral sequence database. Based upon the combined results of next-generation sequencing technology along with standard cell characterization as outlined in published regulatory guidances, we believe that A549 cells pose no more risk than any other cell substrate for the manufacture of vaccines.

  19. Whole genome characterization of hepatitis B virus quasispecies with massively parallel pyrosequencing.

    PubMed

    Li, F; Zhang, D; Li, Y; Jiang, D; Luo, S; Du, N; Chen, W; Deng, L; Zeng, C

    2015-03-01

    Viral quasispecies analysis is important for basic and clinical research. This study was designed to detect hepatitis B virus (HBV) genome-wide mutation profiling with detailed variant composition in individual patients, especially quasispecies evolution correlating with liver disease progression. We characterized viral populations by massively parallel pyrosequencing at whole HBV genome level in 17 patients with advanced liver disease (ALD) and 30 chronic carriers (CC). An average sequencing coverage of 2047× and 687× in ALD and CC groups, respectively, were achieved. Deep sequencing data resolved the landscapes of HBV substitutions and a more complicated quasispecies composition than previously observed. The values of substitution frequencies in quasispecies were clustered as either more than 80% or less than 20%, forming a unique U-shaped distribution pattern in both clinical groups. Furthermore, quantitative comparison of mutation frequencies of each site between two groups resulted in a spectrum of substitutions associated with liver disease progression, and among which, C2288A/T, C2304A, and A/G2525C/T were novel candidates. Moreover, distinct deletion patterns in preS, X, and C regions were shown between the two groups. In conclusion, pyrosequencing of the whole HBV genome revealed a panorama of viral quasispecies composition, characteristics of substitution distribution, and mutations correlating to severe liver disease.

  20. Novel Y-chromosome Short Tandem Repeat Variants Detected Through the Use of Massively Parallel Sequencing

    PubMed Central

    Warshauer, David H.; Churchill, Jennifer D.; Novroski, Nicole; King, Jonathan L.; Budowle, Bruce

    2015-01-01

    Massively parallel sequencing (MPS) technology is capable of determining the sizes of short tandem repeat (STR) alleles as well as their individual nucleotide sequences. Thus, single nucleotide polymorphisms (SNPs) within the repeat regions of STRs and variations in the pattern of repeat units in a given repeat motif can be used to differentiate alleles of the same length. In this study, MPS was used to sequence 28 forensically-relevant Y-chromosome STRs in a set of 41 DNA samples from the 3 major U.S. population groups (African Americans, Caucasians, and Hispanics). The resulting sequence data, which were analyzed with STRait Razor v2.0, revealed 37 unique allele sequence variants that have not been previously reported. Of these, 19 sequences were variations of documented sequences resulting from the presence of intra-repeat SNPs or alternative repeat unit patterns. Despite a limited sampling, two of the most frequently-observed variants were found only in African American samples. The remaining 18 variants represented allele sequences for which there were no published data with which to compare. These findings illustrate the great potential of MPS with regard to increasing the resolving power of STR typing and emphasize the need for sample population characterization of STR alleles. PMID:26391384

  1. Massively parallel LES of azimuthal thermo-acoustic instabilities in annular gas turbines

    NASA Astrophysics Data System (ADS)

    Wolf, P.; Staffelbach, G.; Roux, A.; Gicquel, L.; Poinsot, T.; Moureau, V.

    2009-06-01

    Increasingly stringent regulations and the need to tackle rising fuel prices have placed great emphasis on the design of aeronautical gas turbines, which are unfortunately more and more prone to combustion instabilities. In the particular field of annular combustion chambers, these instabilities often take the form of azimuthal modes. To predict these modes, one must compute the full combustion chamber, which remained out of reach until very recently and the development of massively parallel computers. In this article, full annular Large Eddy Simulations (LES) of two helicopter combustors, which differ only on the swirlers' design, are performed. In both computations, LES captures self-established rotating azimuthal modes. However, the two cases exhibit different thermo-acoustic responses and the resulting limit-cycles are different. With the first design, a self-excited strong instability develops, leading to pulsating flames and local flashback. In the second case, the flames are much less affected by the azimuthal mode and remain stable, allowing an acceptable operation. Hence, this study highlights the potential of LES for discriminating injection system designs. To cite this article: P. Wolf et al., C. R. Mecanique 337 (2009).

  2. Ab initio construction of a eukaryotic transcriptome by massively parallel mRNA sequencing

    PubMed Central

    Yassour, Moran; Kaplan, Tommy; Fraser, Hunter B.; Levin, Joshua Z.; Pfiffner, Jenna; Adiconis, Xian; Schroth, Gary; Luo, Shujun; Khrebtukova, Irina; Gnirke, Andreas; Nusbaum, Chad; Thompson, Dawn-Anne; Friedman, Nir; Regev, Aviv

    2009-01-01

    Defining the transcriptome, the repertoire of transcribed regions encoded in the genome, is a challenging experimental task. Current approaches, relying on sequencing of ESTs or cDNA libraries, are expensive and labor-intensive. Here, we present a general approach for ab initio discovery of the complete transcriptome of the budding yeast, based only on the unannotated genome sequence and millions of short reads from a single massively parallel sequencing run. Using novel algorithms, we automatically construct a highly accurate transcript catalog. Our approach automatically and fully defines 86% of the genes expressed under the given conditions, and discovers 160 previously undescribed transcription units of 250 bp or longer. It correctly demarcates the 5′ and 3′ UTR boundaries of 86 and 77% of expressed genes, respectively. The method further identifies 83% of known splice junctions in expressed genes, and discovers 25 previously uncharacterized introns, including 2 cases of condition-dependent intron retention. Our framework is applicable to poorly understood organisms, and can lead to greater understanding of the transcribed elements in an explored genome. PMID:19208812

  3. Nanopantography: A new method for massively parallel nanopatterning over large areas

    NASA Astrophysics Data System (ADS)

    Xu, Lin

    Nanopantography, a radically new method for versatile fabrication of sub-20 nm features in a massively parallel fashion, represents a breakthrough in nanotechnology. The concept of this technique is to focus ion "beamlets" in parallel to write identical, arbitrary nano-patterns. Depending on the ion species, nanopatterns can be either etched, or deposited by nanopantography. An array of electrostatic lenses and a broad-area, directional, monoenergetic ion beam are required to implement nanopantography. This dissertation is dedicated to extracting an ion beam with desired properties from a plasma source and realizing nanopantography using this beam. A novel ion extraction strategy has been used to extract a nearly monoenergetic and energy-specified ion beam from a capacitively-coupled or an inductively-coupled, pulsed Ar plasma. The electron temperature decayed rapidly in the afterglow, resulting in uniform plasma potential, and minimal energy spread for ions extracted in the afterglow. Ion energy was controlled by a DC bias, or alternatively by a high-voltage pulse, on the ring electrode surrounding the plasma. Langmuir probe measurements indicated that this bias raised the plasma potential without heating the electrons in the afterglow. The energy spread was 3.4 eV (FWHM) For a peak ion beam energy of 102.0 eV. Similar results were obtained in an inductively-coupled pulsed plasma when the acceleration ring was pulsed exclusively during the afterglow. To achieve Ni deposition by nanopantography, higher Ni atom and ion densities are desired in the plasma source. An ionized physical vapor deposition (IPVD) system with a Ni internal RF coil and Ni target was used to introduce Ni atoms, and a fraction of the atoms becomes ionized in the high-density plasma. Optical emission spectroscopy (OAS) and optical absorption spectroscopy (OAS), in combination with global models, were used to determine the Ni atom and ion density. For a pressure of 8-20 mTorr and coil power of 40

  4. 3-D magnetotelluric inversion including topography using deformed hexahedral edge finite elements and direct solvers parallelized on SMP computers - Part I: forward problem and parameter Jacobians

    NASA Astrophysics Data System (ADS)

    Kordy, M.; Wannamaker, P.; Maris, V.; Cherkaev, E.; Hill, G.

    2016-01-01

    We have developed an algorithm, which we call HexMT, for 3-D simulation and inversion of magnetotelluric (MT) responses using deformable hexahedral finite elements that permit incorporation of topography. Direct solvers parallelized on symmetric multiprocessor (SMP), single-chassis workstations with large RAM are used throughout, including the forward solution, parameter Jacobians and model parameter update. In Part I, the forward simulator and Jacobian calculations are presented. We use first-order edge elements to represent the secondary electric field (E), yielding accuracy O(h) for E and its curl (magnetic field). For very low frequencies or small material admittivities, the E-field requires divergence correction. With the help of Hodge decomposition, the correction may be applied in one step after the forward solution is calculated. This allows accurate E-field solutions in dielectric air. The system matrix factorization and source vector solutions are computed using the MKL PARDISO library, which shows good scalability through 24 processor cores. The factorized matrix is used to calculate the forward response as well as the Jacobians of electromagnetic (EM) field and MT responses using the reciprocity theorem. Comparison with other codes demonstrates accuracy of our forward calculations. We consider a popular conductive/resistive double brick structure, several synthetic topographic models and the natural topography of Mount Erebus in Antarctica. In particular, the ability of finite elements to represent smooth topographic slopes permits accurate simulation of refraction of EM waves normal to the slopes at high frequencies. Run-time tests of the parallelized algorithm indicate that for meshes as large as 176 × 176 × 70 elements, MT forward responses and Jacobians can be calculated in ˜1.5 hr per frequency. Together with an efficient inversion parameter step described in Part II, MT inversion problems of 200-300 stations are computable with total run times

  5. The Double Hierarchy Method. A parallel 3D contact method for the interaction of spherical particles with rigid FE boundaries using the DEM

    NASA Astrophysics Data System (ADS)

    Santasusana, Miquel; Irazábal, Joaquín; Oñate, Eugenio; Carbonell, Josep Maria

    2016-07-01

    In this work, we present a new methodology for the treatment of the contact interaction between rigid boundaries and spherical discrete elements (DE). Rigid body parts are present in most of large-scale simulations. The surfaces of the rigid parts are commonly meshed with a finite element-like (FE) discretization. The contact detection and calculation between those DE and the discretized boundaries is not straightforward and has been addressed by different approaches. The algorithm presented in this paper considers the contact of the DEs with the geometric primitives of a FE mesh, i.e. facet, edge or vertex. To do so, the original hierarchical method presented by Horner et al. (J Eng Mech 127(10):1027-1032, 2001) is extended with a new insight leading to a robust, fast and accurate 3D contact algorithm which is fully parallelizable. The implementation of the method has been developed in order to deal ideally with triangles and quadrilaterals. If the boundaries are discretized with another type of geometries, the method can be easily extended to higher order planar convex polyhedra. A detailed description of the procedure followed to treat a wide range of cases is presented. The description of the developed algorithm and its validation is verified with several practical examples. The parallelization capabilities and the obtained performance are presented with the study of an industrial application example.

  6. Lattice gauge theory on a massively parallel computing facility. Final report

    SciTech Connect

    Sugar, R.

    1998-08-07

    This grant provided access to the massively parallel computing facilities at Oak Ridge National Laboratory for the study of lattice gauge theory. The major project was a calculation of the weak decay constants of pseudoscalar mesons with one light and one heavy quark. A number of these constants have not yet been measured, so the calculations constituted a set of predictions which will be tested by future experiments. More importantly, f{sub B} and f{sub B{sub s}}, the decay constants of the B and B{sub s} mesons, are crucial inputs for extracting information regarding the CKM matrix element V{sub td} from experimental measurements of B-{anti B} mixing, and future measurements of B{sub s}-{anti B}{sub s} mixing planned for the B-factory currently under construction at the Stanford Linear Accelerator Center. V{sub td} is one of the least well determined parameters of the Standard Model of High Energy Physics. It does not appear likely that F{sub B} and f{sub B{sub s}} will be measured experimentally in the near future, so lattice calculations such as this will play a crucial role in extracting information about the Standard Model from the B-factory experiments. The author has carried out the most accurate calculations of the heavy-light decay constants to date within the quenched approximation, that is ignoring the effects of sea quarks. Furthermore, his was the only group to have estimated the errors in the decay constants associated with the quenched approximation.

  7. Massively parallel sampling of lattice proteins reveals foundations of thermal adaptation

    NASA Astrophysics Data System (ADS)

    Venev, Sergey V.; Zeldovich, Konstantin B.

    2015-08-01

    Evolution of proteins in bacteria and archaea living in different conditions leads to significant correlations between amino acid usage and environmental temperature. The origins of these correlations are poorly understood, and an important question of protein theory, physics-based prediction of types of amino acids overrepresented in highly thermostable proteins, remains largely unsolved. Here, we extend the random energy model of protein folding by weighting the interaction energies of amino acids by their frequencies in protein sequences and predict the energy gap of proteins designed to fold well at elevated temperatures. To test the model, we present a novel scalable algorithm for simultaneous energy calculation for many sequences in many structures, targeting massively parallel computing architectures such as graphics processing unit. The energy calculation is performed by multiplying two matrices, one representing the complete set of sequences, and the other describing the contact maps of all structural templates. An implementation of the algorithm for the CUDA platform is available at http://www.github.com/kzeldovich/galeprot and calculates protein folding energies over 250 times faster than a single central processing unit. Analysis of amino acid usage in 64-mer cubic lattice proteins designed to fold well at different temperatures demonstrates an excellent agreement between theoretical and simulated values of energy gap. The theoretical predictions of temperature trends of amino acid frequencies are significantly correlated with bioinformatics data on 191 bacteria and archaea, and highlight protein folding constraints as a fundamental selection pressure during thermal adaptation in biological evolution.

  8. Implementation of a Message Passing Interface into a Cloud-Resolving Model for Massively Parallel Computing

    NASA Technical Reports Server (NTRS)

    Juang, Hann-Ming Henry; Tao, Wei-Kuo; Zeng, Xi-Ping; Shie, Chung-Lin; Simpson, Joanne; Lang, Steve

    2004-01-01

    The capability for massively parallel programming (MPP) using a message passing interface (MPI) has been implemented into a three-dimensional version of the Goddard Cumulus Ensemble (GCE) model. The design for the MPP with MPI uses the concept of maintaining similar code structure between the whole domain as well as the portions after decomposition. Hence the model follows the same integration for single and multiple tasks (CPUs). Also, it provides for minimal changes to the original code, so it is easily modified and/or managed by the model developers and users who have little knowledge of MPP. The entire model domain could be sliced into one- or two-dimensional decomposition with a halo regime, which is overlaid on partial domains. The halo regime requires that no data be fetched across tasks during the computational stage, but it must be updated before the next computational stage through data exchange via MPI. For reproducible purposes, transposing data among tasks is required for spectral transform (Fast Fourier Transform, FFT), which is used in the anelastic version of the model for solving the pressure equation. The performance of the MPI-implemented codes (i.e., the compressible and anelastic versions) was tested on three different computing platforms. The major results are: 1) both versions have speedups of about 99% up to 256 tasks but not for 512 tasks; 2) the anelastic version has better speedup and efficiency because it requires more computations than that of the compressible version; 3) equal or approximately-equal numbers of slices between the x- and y- directions provide the fastest integration due to fewer data exchanges; and 4) one-dimensional slices in the x-direction result in the slowest integration due to the need for more memory relocation for computation.

  9. Massively parallel computation of lattice associative memory classifiers on multicore processors

    NASA Astrophysics Data System (ADS)

    Ritter, Gerhard X.; Schmalz, Mark S.; Hayden, Eric T.

    2011-09-01

    Over the past quarter century, concepts and theory derived from neural networks (NNs) have featured prominently in the literature of pattern recognition. Implementationally, classical NNs based on the linear inner product can present performance challenges due to the use of multiplication operations. In contrast, NNs having nonlinear kernels based on Lattice Associative Memories (LAM) theory tend to concentrate primarily on addition and maximum/minimum operations. More generally, the emergence of LAM-based NNs, with their superior information storage capacity, fast convergence and training due to relatively lower computational cost, as well as noise-tolerant classification has extended the capabilities of neural networks far beyond the limited applications potential of classical NNs. This paper explores theory and algorithmic approaches for the efficient computation of LAM-based neural networks, in particular lattice neural nets and dendritic lattice associative memories. Of particular interest are massively parallel architectures such as multicore CPUs and graphics processing units (GPUs). Originally developed for video gaming applications, GPUs hold the promise of high computational throughput without compromising numerical accuracy. Unfortunately, currently-available GPU architectures tend to have idiosyncratic memory hierarchies that can produce unacceptably high data movement latencies for relatively simple operations, unless careful design of theory and algorithms is employed. Advantageously, some GPUs (e.g., the Nvidia Fermi GPU) are optimized for efficient streaming computation (e.g., concurrent multiply and add operations). As a result, the linear or nonlinear inner product structures of NNs are inherently suited to multicore GPU computational capabilities. In this paper, the authors' recent research in lattice associative memories and their implementation on multicores is overviewed, with results that show utility for a wide variety of pattern

  10. A SNP panel for identity and kinship testing using massive parallel sequencing.

    PubMed

    Grandell, Ida; Samara, Raed; Tillmar, Andreas O

    2016-07-01

    Within forensic genetics, there is still a need for supplementary DNA marker typing in order to increase the power to solve cases for both identity testing and complex kinship issues. One major disadvantage with current capillary electrophoresis (CE) methods is the limitation in DNA marker multiplex capability. By utilizing massive parallel sequencing (MPS) technology, this capability can, however, be increased. We have designed a customized GeneRead DNASeq SNP panel (Qiagen) of 140 previously published autosomal forensically relevant identity SNPs for analysis using MPS. One single amplification step was followed by library preparation using the GeneRead Library Prep workflow (Qiagen). The sequencing was performed on a MiSeq System (Illumina), and the bioinformatic analyses were done using the software Biomedical Genomics Workbench (CLC Bio, Qiagen). Forty-nine individuals from a Swedish population were genotyped in order to establish genotype frequencies and to evaluate the performance of the assay. The analyses showed to have a balanced coverage among the included loci, and the heterozygous balance showed to have less than 0.5 % outliers. Analyses of dilution series of the 2800M Control DNA gave reproducible results down to 0.2 ng DNA input. In addition, typing of FTA samples and bone samples was performed with promising results. Further studies and optimizations are, however, required for a more detailed evaluation of the performance of degraded and PCR-inhibited forensic samples. In summary, the assay offers a straightforward sample-to-genotype workflow and could be useful to gain information in forensic casework, for both identity testing and in order to solve complex kinship issues.

  11. Transcriptional analysis of the Arabidopsis ovule by massively parallel signature sequencing

    PubMed Central

    Sánchez-León, Nidia; Arteaga-Vázquez, Mario; Alvarez-Mejía, César; Mendiola-Soto, Javier; Durán-Figueroa, Noé; Rodríguez-Leal, Daniel; Rodríguez-Arévalo, Isaac; García-Campayo, Vicenta; García-Aguilar, Marcelina; Olmedo-Monfil, Vianey; Arteaga-Sánchez, Mario; Martínez de la Vega, Octavio; Nobuta, Kan; Vemaraju, Kalyan; Meyers, Blake C.; Vielle-Calzada, Jean-Philippe

    2012-01-01

    The life cycle of flowering plants alternates between a predominant sporophytic (diploid) and an ephemeral gametophytic (haploid) generation that only occurs in reproductive organs. In Arabidopsis thaliana, the female gametophyte is deeply embedded within the ovule, complicating the study of the genetic and molecular interactions involved in the sporophytic to gametophytic transition. Massively parallel signature sequencing (MPSS) was used to conduct a quantitative large-scale transcriptional analysis of the fully differentiated Arabidopsis ovule prior to fertilization. The expression of 9775 genes was quantified in wild-type ovules, additionally detecting >2200 new transcripts mapping to antisense or intergenic regions. A quantitative comparison of global expression in wild-type and sporocyteless (spl) individuals resulted in 1301 genes showing 25-fold reduced or null activity in ovules lacking a female gametophyte, including those encoding 92 signalling proteins, 75 transcription factors, and 72 RNA-binding proteins not reported in previous studies based on microarray profiling. A combination of independent genetic and molecular strategies confirmed the differential expression of 28 of them, showing that they are either preferentially active in the female gametophyte, or dependent on the presence of a female gametophyte to be expressed in sporophytic cells of the ovule. Among 18 genes encoding pentatricopeptide-repeat proteins (PPRs) that show transcriptional activity in wild-type but not spl ovules, CIHUATEOTL (At4g38150) is specifically expressed in the female gametophyte and necessary for female gametogenesis. These results expand the nature of the transcriptional universe present in the ovule of Arabidopsis, and offer a large-scale quantitative reference of global expression for future genomic and developmental studies. PMID:22442422

  12. A Massively Parallel Sequencing Approach Uncovers Ancient Origins and High Genetic Variability of Endangered Przewalski's Horses

    PubMed Central

    Goto, Hiroki; Ryder, Oliver A.; Fisher, Allison R.; Schultz, Bryant; Nekrutenko, Anton; Makova, Kateryna D.

    2011-01-01

    The endangered Przewalski's horse is the closest relative of the domestic horse and is the only true wild horse species surviving today. The question of whether Przewalski's horse is the direct progenitor of domestic horse has been hotly debated. Studies of DNA diversity within Przewalski's horses have been sparse but are urgently needed to ensure their successful reintroduction to the wild. In an attempt to resolve the controversy surrounding the phylogenetic position and genetic diversity of Przewalski's horses, we used massively parallel sequencing technology to decipher the complete mitochondrial and partial nuclear genomes for all four surviving maternal lineages of Przewalski's horses. Unlike single-nucleotide polymorphism (SNP) typing usually affected by ascertainment bias, the present method is expected to be largely unbiased. Three mitochondrial haplotypes were discovered—two similar ones, haplotypes I/II, and one substantially divergent from the other two, haplotype III. Haplotypes I/II versus III did not cluster together on a phylogenetic tree, rejecting the monophyly of Przewalski's horse maternal lineages, and were estimated to split 0.117–0.186 Ma, significantly preceding horse domestication. In the phylogeny based on autosomal sequences, Przewalski's horses formed a monophyletic clade, separate from the Thoroughbred domestic horse lineage. Our results suggest that Przewalski's horses have ancient origins and are not the direct progenitors of domestic horses. The analysis of the vast amount of sequence data presented here suggests that Przewalski's and domestic horse lineages diverged at least 0.117 Ma but since then have retained ancestral genetic polymorphism and/or experienced gene flow. PMID:21803766

  13. Entire Mitochondrial DNA Sequencing on Massively Parallel Sequencing for the Korean Population

    PubMed Central

    2017-01-01

    Mitochondrial DNA (mtDNA) genome analysis has been a potent tool in forensic practice as well as in the understanding of human phylogeny in the maternal lineage. The traditional mtDNA analysis is focused on the control region, but the introduction of massive parallel sequencing (MPS) has made the typing of the entire mtDNA genome (mtGenome) more accessible for routine analysis. The complete mtDNA information can provide large amounts of novel genetic data for diverse populations as well as improved discrimination power for identification. The genetic diversity of the mtDNA sequence in different ethnic populations has been revealed through MPS analysis, but the Korean population not only has limited MPS data for the entire mtGenome, the existing data is mainly focused on the control region. In this study, the complete mtGenome data for 186 Koreans, obtained using Ion Torrent Personal Genome Machine (PGM) technology and retrieved from rather common mtDNA haplogroups based on the control region sequence, are described. The results showed that 24 haplogroups, determined with hypervariable regions only, branched into 47 subhaplogroups, and point heteroplasmy was more frequent in the coding regions. In addition, sequence variations in the coding regions observed in this study were compared with those presented in other reports on different populations, and there were similar features observed in the sequence variants for the predominant haplogroups among East Asian populations, such as Haplogroup D and macrohaplogroups M9, G, and D. This study is expected to be the trigger for the development of Korean specific mtGenome data followed by numerous future studies. PMID:28244283

  14. Massively parallel neural circuits for stereoscopic color vision: encoding, decoding and identification.

    PubMed

    Lazar, Aurel A; Slutskiy, Yevgeniy B; Zhou, Yiyin

    2015-03-01

    Past work demonstrated how monochromatic visual stimuli could be faithfully encoded and decoded under Nyquist-type rate conditions. Color visual stimuli were then traditionally encoded and decoded in multiple separate monochromatic channels. The brain, however, appears to mix information about color channels at the earliest stages of the visual system, including the retina itself. If information about color is mixed and encoded by a common pool of neurons, how can colors be demixed and perceived? We present Color Video Time Encoding Machines (Color Video TEMs) for encoding color visual stimuli that take into account a variety of color representations within a single neural circuit. We then derive a Color Video Time Decoding Machine (Color Video TDM) algorithm for color demixing and reconstruction of color visual scenes from spikes produced by a population of visual neurons. In addition, we formulate Color Video Channel Identification Machines (Color Video CIMs) for functionally identifying color visual processing performed by a spiking neural circuit. Furthermore, we derive a duality between TDMs and CIMs that unifies the two and leads to a general theory of neural information representation for stereoscopic color vision. We provide examples demonstrating that a massively parallel color visual neural circuit can be first identified with arbitrary precision and its spike trains can be subsequently used to reconstruct the encoded stimuli. We argue that evaluation of the functional identification methodology can be effectively and intuitively performed in the stimulus space. In this space, a signal reconstructed from spike trains generated by the identified neural circuit can be compared to the original stimulus.

  15. Investigations on the usefulness of the Massively Parallel Processor for study of electronic properties of atomic and condensed matter systems

    NASA Technical Reports Server (NTRS)

    Das, T. P.

    1988-01-01

    The usefulness of the Massively Parallel Processor (MPP) for investigation of electronic structures and hyperfine properties of atomic and condensed matter systems was explored. The major effort was directed towards the preparation of algorithms for parallelization of the computational procedure being used on serial computers for electronic structure calculations in condensed matter systems. Detailed descriptions of investigations and results are reported, including MPP adaptation of self-consistent charge extended Hueckel (SCCEH) procedure, MPP adaptation of the first-principles Hartree-Fock cluster procedure for electronic structures of large molecules and solid state systems, and MPP adaptation of the many-body procedure for atomic systems.

  16. Targeted-capture massively-parallel sequencing enables robust detection of clinically informative mutations from formalin-fixed tumours

    PubMed Central

    Wong, Stephen Q.; Li, Jason; Salemi, Renato; Sheppard, Karen E.; Hongdo Do; Tothill, Richard W.; McArthur, Grant A.; Dobrovic, Alexander

    2013-01-01

    Massively parallel sequencing offers the ability to interrogate a tumour biopsy for multiple mutational changes. For clinical samples, methodologies must enable maximal extraction of available sequence information from formalin-fixed and paraffin-embedded (FFPE) material. We assessed the use of targeted capture for mutation detection in FFPE DNA. The capture probes targeted the coding region of all known kinase genes and selected oncogenes and tumour suppressor genes. Seven melanoma cell lines and matching FFPE xenograft DNAs were sequenced. An informatics pipeline was developed to identify variants and contaminating mouse reads. Concordance of 100% was observed between unfixed and formalin-fixed for reported COSMIC variants including BRAF V600E. mutations in genes not conventionally screened including ERBB4, ATM, STK11 and CDKN2A were readily detected. All regions were adequately covered with independent reads regardless of GC content. This study indicates that hybridisation capture is a robust approach for massively parallel sequencing of FFPE samples. PMID:24336498

  17. 3D geological to geophysical modelling and seismic wave propagation simulation: a case study from the Lalor Lake VMS (Volcanogenic Massive Sulphides) mining camp

    NASA Astrophysics Data System (ADS)

    Miah, Khalid; Bellefleur, Gilles

    2014-05-01

    The global demand for base metals, uranium and precious metals has been pushing mineral explorations at greater depth. Seismic techniques and surveys have become essential in finding and extracting mineral rich ore bodies, especially for deep VMS mining camps. Geophysical parameters collected from borehole logs and laboratory measurements of core samples provide preliminary information about the nature and type of subsurface lithologic units. Alteration halos formed during the hydrothermal alteration process contain ore bodies, which are of primary interests among geologists and mining industries. It is known that the alteration halos are easier to detect than the ore bodies itself. Many 3D geological models are merely projection of 2D surface geology based on outcrop inspections and geochemical analysis of a small number of core samples collected from the area. Since a large scale 3D multicomponent seismic survey can be prohibitively expensive, performance analysis of such geological models can be helpful in reducing exploration costs. In this abstract, we discussed challenges and constraints encountered in geophysical modelling of ore bodies and surrounding geologic structures from the available coarse 3D geological models of the Lalor Lake mining camp, located in northern Manitoba, Canada. Ore bodies in the Lalor lake VMS camp are rich in gold, zinc, lead and copper, and have an approximate weight of 27 Mt. For better understanding of physical parameters of these known ore bodies and potentially unknown ones at greater depth, we constructed a fine resolution 3D seismic model with dimensions: 2000 m (width), 2000 m (height), and 1500 m (vertical depth). Seismic properties (P-wave, S-wave velocities, and density) were assigned based on a previous rock properties study of the same mining camp. 3D finite-difference elastic wave propagation simulation was performed in the model using appropriate parameters. The generated synthetic 3D seismic data was then compared to

  18. RH 1.5D: a massively parallel code for multi-level radiative transfer with partial frequency redistribution and Zeeman polarisation

    NASA Astrophysics Data System (ADS)

    Pereira, Tiago M. D.; Uitenbroek, Han

    2015-02-01

    The emergence of three-dimensional magneto-hydrodynamic simulations of stellar atmospheres has sparked a need for efficient radiative transfer codes to calculate detailed synthetic spectra. We present RH 1.5D, a massively parallel code based on the RH code and capable of performing Zeeman polarised multi-level non-local thermodynamical equilibrium calculations with partial frequency redistribution for an arbitrary amount of chemical species. The code calculates spectra from 3D, 2D or 1D atmospheric models on a column-by-column basis (or 1.5D). While the 1.5D approximation breaks down in the cores of very strong lines in an inhomogeneous environment, it is nevertheless suitable for a large range of scenarios and allows for faster convergence with finer control over the iteration of each simulation column. The code scales well to at least tens of thousands of CPU cores, and is publicly available. In the present work we briefly describe its inner workings, strategies for convergence optimisation, its parallelism, and some possible applications.

  19. Massively parallel hypercube FFTs: CM-2 implementation and error analysis of a parallel trigonometric factor generation method

    SciTech Connect

    Tong, C.H. ); Swarztrauber, P.N. )

    1991-08-01

    On parallel computers, the way the data elements are mapped to the processors may have a large effect on the timing performance of a given algorithm. In our previous paper, we have examined a few mapping strategies for the ordered radix-2 DIF (decimation-in-frequency) Fast Fourier Transform. In particular, we have shown how reduction of communication can be achieved by combining the order and computational phases through the use of i-cycles. A parallel methods was also presented for computing the trigonometric factors which requires neither trigonometric function evaluation nor interprocessor communication. This paper first reviews some of the experimental results on the Connection Machine to demonstrate the importance of reducing communication in a parallel algorithm. The emphasis of this paper, however, is on analyzing the numerical stability of the proposed method for generating the trigonometric factors and showing how the error can be improved. 16 refs., 12 tabs.

  20. A Serious Game for Massive Training and Assessment of French Soldiers Involved in Forward Combat Casualty Care (3D-SC1): Development and Deployment

    PubMed Central

    Mérat, Stéphane; Malgras, Brice; Petit, Ludovic; Queran, Xavier; Bay, Christian; Boutonnet, Mathieu; Jault, Patrick; Ausset, Sylvain; Auroy, Yves; Perez, Jean Paul; Tesnière, Antoine; Pons, François; Mignon, Alexandre

    2016-01-01

    Background The French Military Health Service has standardized its military prehospital care policy in a ‘‘Sauvetage au Combat’’ (SC) program (Forward Combat Casualty Care). A major part of the SC training program relies on simulations, which are challenging and costly when dealing with more than 80,000 soldiers. In 2014, the French Military Health Service decided to develop and deploy 3D-SC1, a serious game (SG) intended to train and assess soldiers managing the early steps of SC. Objectives The purpose of this paper is to describe the creation and production of 3D-SC1 and to present its deployment. Methods A group of 10 experts and the Paris Descartes University Medical Simulation Department spin-off, Medusims, coproduced 3D-SC1. Medusims are virtual medical experiences using 3D real-time videogame technology (creation of an environment and avatars in different scenarios) designed for educational purposes (training and assessment) to simulate medical situations. These virtual situations have been created based on real cases and tested on mannequins by experts. Trainees are asked to manage specific situations according to best practices recommended by SC, and receive a score and a personalized feedback regarding their performance. Results The scenario simulated in the SG is an attack on a patrol of 3 soldiers with an improvised explosive device explosion as a result of which one soldier dies, one soldier is slightly stunned, and the third soldier experiences a leg amputation and other injuries. This scenario was first tested with mannequins in military simulation centers, before being transformed into a virtual 3D real-time scenario using a multi-support, multi–operating system platform, Unity. Processes of gamification and scoring were applied, with 2 levels of difficulty. A personalized debriefing was integrated at the end of the simulations. The design and production of the SG took 9 months. The deployment, performed in 3 months, has reached 84 of 96

  1. Optimization of the deflated Conjugate Gradient algorithm for the solving of elliptic equations on massively parallel machines

    NASA Astrophysics Data System (ADS)

    Malandain, Mathias; Maheu, Nicolas; Moureau, Vincent

    2013-04-01

    The discretization of Partial Differential Equations often leads to the need of solving large symmetric linear systems. In the case of the Navier-Stokes equations for incompressible flows, solving the elliptic pressure Poisson equation can represent the most important part of the computational time required for the massively parallel simulation of the flow. The need for efficiency that this issue induces is completed with a need for stability, in particular when dealing with unstructured meshes. Here, a stable and efficient variant of the Deflated Preconditioned Conjugate Gradient (DPCG) solver is first presented. This two-level method uses an arbitrary coarse grid to reduce the computational cost of the solving. However, in the massively parallel implementation of this technique for very large linear systems, the coarse grids generated can count up to millions of cells, which makes direct solvings on the coarse level impossible. The solving on the coarse grid, performed with a Preconditioned Conjugate Gradient (PCG) solver for this reason, may involve a large number of communications, which reduces dramatically the performances on massively parallel machines. To this effect, two methods developed in order to reduce the number of iterations on the coarse level are introduced, that is the creation of improved initial guesses and the adaptation of the convergence criterion. The design of these methods make them easy to implement in any already existing DPCG solver. The structural requirements for an efficient massively parallel unstructured solver and the implementation of this solver are described. The novel DPCG method is assessed for applications involving turbulence, heat transfers and two-phase flows, with grids up to 17.8 billion elements. Numerical results show a two- to 12-fold reduction of the number of iterations on the coarse level, which implies a reduction of the computational time of the Poisson solver up to 71% and a global reduction of the proportion

  2. A Massive Parallel Variational Multiscale FEM Scheme Applied to Nonhydrostatic Atmospheric Dynamics

    NASA Astrophysics Data System (ADS)

    Vazquez, Mariano; Marras, Simone; Moragues, Margarida; Jorba, Oriol; Houzeaux, Guillaume; Aubry, Romain

    2010-05-01

    The solution of the fully compressible Euler equations of stratified flows is approached from the point of view of Computational Fluid Dynamics techniques. Specifically, the main aim of this contribution is the introduction of a Variational Multiscale Finite Element (CVMS-FE) approach to solve dry atmospheric dynamics effectively on massive parallel architectures with more than 1000 processors. The conservation form of the equations of motion is discretized in all directions with a Galerkin scheme with stabilization given by the compressible counterpart of the variational multiscale technique of Hughes [1] and Houzeaux et al. [2]. The justification of this effort is twofold: the search of optimal parallelization characteristics and linear scalability trends on petascale machines is one. The development of a numerical algorithm whose local nature helps maintaining minimal the communication among the processors implies, in fact, a large leap towards efficient parallel computing. Second, the rising trend to global models and models of higher spatial resolution naturally suggests the use of adaptive grids to only resolve zones of larger gradients while keeping the computational mesh properly coarse elsewhere (thus keeping the computational cost low). With these two hypotheses in mind, the finite element scheme presented here is an open option to the development of the next generation Numerical Weather Prediction (NWP) codes. This methodology is as new in Computational Fluid Dynamics for compressible flows at low Mach number as it is in Numerical Weather Prediction (NWP). We however mean to show its ability to maintain stability in the solution of thermal, gravity-driven flows in a stratified environment in the specific context of dry atmospheric dynamics. Standard two dimensional benchmarks are implemented and compared against the reference literature. In the context of thermal and gravity-driven flows in a neutral atmosphere, we present: (1) the density current

  3. Investigation into the sequence structure of 23 Y chromosomal STR loci using massively parallel sequencing.

    PubMed

    Kwon, So Yeun; Lee, Hwan Young; Kim, Eun Hye; Lee, Eun Young; Shin, Kyoung-Jin

    2016-11-01

    Next-generation sequencing (NGS) can produce massively parallel sequencing (MPS) data for many targeted regions with a high depth of coverage, suggesting its successful application to the amplicons of forensic genetic markers. In the present study, we evaluated the practical utility of MPS in Y-chromosome short tandem repeat (Y-STR) analysis using a multiplex polymerase chain reaction (PCR) system. The multiplex PCR system simultaneously amplified 24 Y-chromosomal markers, including the PowerPlex(®) Y23 loci (DYS19, DYS385ab, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, DYS439, DYS448, DYS456, DYS458, DYS481, DYS533, DYS549, DYS570, DYS576, DYS635, DYS643, and YGATAH4) and the M175 marker with the small-sized amplicons ranging from 85 to 253bp. The barcoded libraries for the amplicons of the 24 Y-chromosomal markers were produced using a simplified PCR-based library preparation method and successfully sequenced using MPS on a MiSeq(®) System with samples from 250 unrelated Korean males. The genotyping concordance between MPS and the capillary electrophoresis (CE) method, as well as the sequence structure of the 23 Y-STRs, were investigated. Three samples exhibited discordance between the MPS and CE results at DYS385, DYS439, and DYS576. There were 12 Y-STR loci that showed sequence variations in the alleles by a fragment size determination, and the most varied alleles occurred in DYS389II with a different sequence structure in the repeat region. The largest increase in gene diversity between the CE and MPS results was in DYS437 at +34.41%. Single nucleotide polymorphisms (SNPs), insertions, and deletions (indels) were observed in the flanking regions of DYS481, DYS576, and DYS385, respectively. Stutter and noise ratios of the 23 Y-STRs using the developed MPS system were also investigated. Based on these results, the MPS analysis system used in this study could facilitate the investigation into the sequences of the 23 Y-STRs in forensic

  4. Underlying Data for Sequencing the Mitochondrial Genome with the Massively Parallel Sequencing Platform Ion Torrent™ PGM™

    PubMed Central

    2015-01-01

    Background Massively parallel sequencing (MPS) technologies have the capacity to sequence targeted regions or whole genomes of multiple nucleic acid samples with high coverage by sequencing millions of DNA fragments simultaneously. Compared with Sanger sequencing, MPS also can reduce labor and cost on a per nucleotide basis and indeed on a per sample basis. In this study, whole genomes of human mitochondria (mtGenome) were sequenced on the Personal Genome Machine (PGMTM) (Life Technologies, San Francisco, CA), the out data were assessed, and the results were compared with data previously generated on the MiSeqTM (Illumina, San Diego, CA). The objectives of this paper were to determine the feasibility, accuracy, and reliability of sequence data obtained from the PGM. Results 24 samples were multiplexed (in groups of six) and sequenced on the at least 10 megabase throughput 314 chip. The depth of coverage pattern was similar among all 24 samples; however the coverage across the genome varied. For strand bias, the average ratio of coverage between the forward and reverse strands at each nucleotide position indicated that two-thirds of the positions of the genome had ratios that were greater than 0.5. A few sites had more extreme strand bias. Another observation was that 156 positions had a false deletion rate greater than 0.15 in one or more individuals. There were 31-98 (SNP) mtGenome variants observed per sample for the 24 samples analyzed. The total 1237 (SNP) variants were concordant between the results from the PGM and MiSeq. The quality scores for haplogroup assignment for all 24 samples ranged between 88.8%-100%. Conclusions In this study, mtDNA sequence data generated from the PGM were analyzed and the output evaluated. Depth of coverage variation and strand bias were identified but generally were infrequent and did not impact reliability of variant calls. Multiplexing of samples was demonstrated which can improve throughput and reduce cost per sample analyzed

  5. Performance analysis of three dimensional integral equation computations on a massively parallel computer. M.S. Thesis

    NASA Technical Reports Server (NTRS)

    Logan, Terry G.

    1994-01-01

    The purpose of this study is to investigate the performance of the integral equation computations using numerical source field-panel method in a massively parallel processing (MPP) environment. A comparative study of computational performance of the MPP CM-5 computer and conventional Cray-YMP supercomputer for a three-dimensional flow problem is made. A serial FORTRAN code is converted into a parallel CM-FORTRAN code. Some performance results are obtained on CM-5 with 32, 62, 128 nodes along with those on Cray-YMP with a single processor. The comparison of the performance indicates that the parallel CM-FORTRAN code near or out-performs the equivalent serial FORTRAN code for some cases.

  6. Massively parallel dual control volume grand canonical molecular dynamics with LADERA II. Gradient driven diffusion through polymers

    NASA Astrophysics Data System (ADS)

    Ford Grant, David M.; Heffelfinger, S.

    This paper, the second part of a series, extends the capabilities of the LADERA FORTRAN code for massively parallel dual control volume grand canonical molecular dynamics (DCVGCMD). DCV-GCMD is a hybrid of two more common molecular simulation techniques (grand canonical Monte Carlo and molecular dynamics) which allows the direct molecularlevel modelling of diffusion under a chemical potential gradient. The present version of the code, LADERA-B has the capability of modelling systems with explicit intramolecular interactions such as bonds, angles, and dihedral rotations. The utility of the new code for studying gradient-driven diffusion of small molecules through polymers is demonstrated by applying it to two model systems. LADERA-B includes another new feature, which is the use of neighbour lists in force calculations. This feature increases the speed of the code but presents several challenges in the parallel hybrid algorithm. There is discussion on how these problems were addressed and how our implementation results in a significant increase in speed over the original LADERA. Scaling results are presented for LADERA-B on two massively parallel message-passing machines.

  7. Massively parallel dual control volume grand canonical molecular dynamics with LADERA I. Gradient driven diffusion in Lennard-Jones fluids

    NASA Astrophysics Data System (ADS)

    Heffelfinger David, Grant S.; Ford, M.

    A new algorithm to enable the implementation of dual control volume grand canonical molecular dynamics (DCV-GCMD) on massively parallel (MP) architectures is presented. DCVGCMD can be thought of as hybridization of molecular dynamics (MD) and grand canonical Monte Carlo (GCMC) and was developed recently to make possible the simulation of gradient-driven diffusion. The method has broad application to such problems as membrane separations, drug delivery systems, diffusion in polymers and zeolites, etc. The massively parallel algorithm for the DCV-GCMD method has been implemented in a code named LADERA which employs the short range Lennard-Jones potential for pure fluids and multicomponent mixtures including bulk and confined (single pore as well as amorphous solid materials) systems. Like DCV-GCMD, LADERA's MP algorithm can be thought of as a hybridization of two different algorithms, spatial MD and spatial GCMC. The DCV-GCMD method is described fully followed by the DCV-GCMD parallel algorithm employed in LADERA. The scaling characteristics of the new MP algorithm are presented together with the results of the application of LADERA to ternary and quaternary Lennard-Jones mixtures.

  8. PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes

    SciTech Connect

    Lichtner, Peter C.; Hammond, Glenn E.; Lu, Chuan; Karra, Satish; Bisht, Gautam; Andre, Benjamin; Mills, Richard; Kumar, Jitendra

    2015-01-20

    PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Written in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 232 = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.

  9. System and method for representing and manipulating three-dimensional objects on massively parallel architectures

    DOEpatents

    Karasick, Michael S.; Strip, David R.

    1996-01-01

    A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modelling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modelling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modelling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication.

  10. System and method for representing and manipulating three-dimensional objects on massively parallel architectures

    DOEpatents

    Karasick, M.S.; Strip, D.R.

    1996-01-30

    A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modeling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modeling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modeling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication. 8 figs.

  11. A Precision Dose Control Circuit for Maskless E-Beam Lithography With Massively Parallel Vertically Aligned Carbon Nanofibers

    SciTech Connect

    Eliza, Sazia A.; Islam, Syed K; Rahman, Touhidur; Bull, Nora D; Blalock, Benjamin; Baylor, Larry R; Ericson, Milton Nance; Gardner, Walter L

    2011-01-01

    This paper describes a highly accurate dose control circuit (DCC) for the emission of a desired number of electrons from vertically aligned carbon nanofibers (VACNFs) in a massively parallel maskless e-beam lithography system. The parasitic components within the VACNF device cause a premature termination of the electron emission, resulting in underexposure of the photoresist. In this paper, we compensate for the effects of the parasitic components and noise while reducing the area of the chip and achieving a precise count of emitted electrons from the VACNFs to obtain the optimum dose for the e-beam lithography.

  12. Method and apparatus for obtaining stack traceback data for multiple computing nodes of a massively parallel computer system

    DOEpatents

    Gooding, Thomas Michael; McCarthy, Patrick Joseph

    2010-03-02

    A data collector for a massively parallel computer system obtains call-return stack traceback data for multiple nodes by retrieving partial call-return stack traceback data from each node, grouping the nodes in subsets according to the partial traceback data, and obtaining further call-return stack traceback data from a representative node or nodes of each subset. Preferably, the partial data is a respective instruction address from each node, nodes having identical instruction address being grouped together in the same subset. Preferably, a single node of each subset is chosen and full stack traceback data is retrieved from the call-return stack within the chosen node.

  13. Integer-encoded massively parallel processing of fast-learning fuzzy ARTMAP neural networks

    NASA Astrophysics Data System (ADS)

    Bahr, Hubert A.; DeMara, Ronald F.; Georgiopoulos, Michael

    1997-04-01

    In this paper we develop techniques that are suitable for the parallel implementation of Fuzzy ARTMAP networks. Speedup and learning performance results are provided for execution on a DECmpp/Sx-1208 parallel processor consisting of a DEC RISC Workstation Front-End and MasPar MP-1 Back-End with 8,192 processors. Experiments of the parallel implementation were conducted on the Letters benchmark database developed by Frey and Slate. The results indicate a speedup on the order of 1000-fold which allows combined training and testing time of under four minutes.

  14. Performance of the UCAN2 Gyrokinetic Particle In Cell (PIC) Code on Two Massively Parallel Mainframes with Intel ``Sandy Bridge'' Processors

    NASA Astrophysics Data System (ADS)

    Leboeuf, Jean-Noel; Decyk, Viktor; Newman, David; Sanchez, Raul

    2013-10-01

    The massively parallel, 2D domain-decomposed, nonlinear, 3D, toroidal, electrostatic, gyrokinetic, Particle in Cell (PIC), Cartesian geometry UCAN2 code, with particle ions and adiabatic electrons, has been ported to two emerging mainframes. These two computers, one at NERSC in the US built by Cray named Edison and the other at the Barcelona Supercomputer Center (BSC) in Spain built by IBM named MareNostrum III (MNIII) just happen to share the same Intel ``Sandy Bridge'' processors. The successful port of UCAN2 to MNIII which came online first has enabled us to be up and running efficiently in record time on Edison. Overall, the performance of UCAN2 on Edison is superior to that on MNIII, particularly at large numbers of processors (>1024) for the same Intel IFORT compiler. This appears to be due to different MPI modules (OpenMPI on MNIII and MPICH2 on Edison) and different interconnection networks (Infiniband on MNIII and Cray's Aries on Edison) on the two mainframes. Details of these ports and comparative benchmarks are presented. Work supported by OFES, USDOE, under contract no. DE-FG02-04ER54741 with the University of Alaska at Fairbanks.

  15. Hybrid massively parallel fast sweeping method for static Hamilton–Jacobi equations

    SciTech Connect

    Detrixhe, Miles; Gibou, Frédéric

    2016-10-01

    The fast sweeping method is a popular algorithm for solving a variety of static Hamilton–Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling, and show state-of-the-art speedup values for the fast sweeping method.

  16. Efficient Extraction of Regional Subsets from Massive Climate Datasets using Parallel IO

    NASA Astrophysics Data System (ADS)

    Daily, J.; Schuchardt, K.; Palmer, B. J.

    2010-12-01

    The size of datasets produced by current climate models is increasing rapidly to the scale of petabytes. To handle data at this scale parallel analysis tools are required, however the majority of climate analysis software is serial and remains at the scale of workstations. Further, many climate analysis tools are designed to process regularly gridded data but lack sufficient features to handle unstructured grids. This paper presents a data-parallel subsetter capable of correctly handling unstructured grids while scaling to over 2000 cores. The approach is based on the partitioned global address space (PGAS) parallel programming model and one-sided communication. The paper demonstrates that parallel analysis of climate data succeeds in practice, although IO remains the single greatest bottleneck.

  17. Hybrid massively parallel fast sweeping method for static Hamilton-Jacobi equations

    NASA Astrophysics Data System (ADS)

    Detrixhe, Miles; Gibou, Frédéric

    2016-10-01

    The fast sweeping method is a popular algorithm for solving a variety of static Hamilton-Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling, and show state-of-the-art speedup values for the fast sweeping method.

  18. Massively parallel implementation of the multi-reference Brillouin-Wigner CCSD method

    SciTech Connect

    Brabec, Jiri; Krishnamoorthy, Sriram; van Dam, Hubertus JJ; Kowalski, Karol; Pittner, Jiri

    2011-10-06

    This paper reports the parallel implementation of the Brillouin Wigner MultiReference Coupled Cluster method with Single and Double excitations (BW-MRCCSD). Preliminary tests for systems composed of 304 and 440 correlated obritals demonstrate the performance of our implementation across 1000 cores and clearly indicate the advantages of using improved task scheduling. Possible ways for further improvements of the parallel performance are also delineated.

  19. A parallel computing tool for large-scale simulation of massive fluid injection in thermo-poro-mechanical systems

    NASA Astrophysics Data System (ADS)

    Karrech, Ali; Schrank, Christoph; Regenauer-Lieb, Klaus

    2015-10-01

    Massive fluid injections into the earth's upper crust are commonly used to stimulate permeability in geothermal reservoirs, enhance recovery in oil reservoirs, store carbon dioxide and so forth. Currently used models for reservoir simulation are limited to small perturbations and/or hydraulic aspects that are insufficient to describe the complex thermal-hydraulic-mechanical behaviour of natural geomaterials. Comprehensive approaches, which take into account the non-linear mechanical deformations of rock masses, fluid flow in percolating pore spaces, and changes of temperature due to heat transfer, are necessary to predict the behaviour of deep geo-materials subjected to high pressure and temperature changes. In this paper, we introduce a thermodynamically consistent poromechanics formulation which includes coupled thermal, hydraulic and mechanical processes. Moreover, we propose a numerical integration strategy based on massively parallel computing. The proposed formulations and numerical integration are validated using analytical solutions of simple multi-physics problems. As a representative application, we investigate the massive injection of fluids within deep formation to mimic the conditions of reservoir stimulation. The model showed, for instance, the effects of initial pre-existing stress fields on the orientations of stimulation-induced failures.

  20. Optical, mechanical, and electro-optical design of an interferometric test station for massive parallel inspection of MEMS and MOEMS

    NASA Astrophysics Data System (ADS)

    Gastinger, Kay; Haugholt, Karl Henrik; Kujawinska, Malgorzata; Jozwik, Michal; Schaeffel, Christoph; Beer, Stephan

    2009-06-01

    The paper presents the optical, mechanical, and electro-optical design of an interferometric inspection system for massive parallel inspection of MicroElectroMechanicalSystems (MEMS) and MicroOptoElectroMechanicalSystems (MOEMS). The basic idea is to adapt a micro-optical probing wafer to the M(O)EMS wafer under test. The probing wafer is exchangeable and contains a micro-optical interferometer array. A low coherent and a laser interferometer array are developed. Two preliminary interferometer designs are presented; a low coherent interferometer array based on a Mirau configuration and a laser interferometer array based on a Twyman-Green configuration. The optical design focuses on the illumination and imaging concept for the interferometer array. The mechanical design concentrates on the scanning system and the integration in a standard test station for micro-fabrication. Models of single channel low coherence and laser interferometers and preliminary measurement results are presented. The smart-pixel approach for massive parallel electro-optical detection and data reduction is discussed.

  1. Smart pixel camera based signal processing in an interferometric test station for massive parallel inspection of MEMS and MOEMS

    NASA Astrophysics Data System (ADS)

    Styk, Adam; Lambelet, Patrick; Røyset, Arne; Kujawińska, Małgorzata; Gastinger, Kay

    2010-09-01

    The paper presents the electro-optical design of an interferometric inspection system for massive parallel inspection of Micro(Opto)ElectroMechanicalSystems (M(O)EMS). The basic idea is to adapt a micro-optical probing wafer to the M(O)EMS wafer under test. The probing wafer is exchangeable and contains a micro-optical interferometer array: a low coherent interferometer (LCI) array based on a Mirau configuration and a laser interferometer (LI) array based on a Twyman-Green configuration. The interference signals are generated in the micro-optical interferometers and are applied for M(O)EMS shape and deformation measurements by means of LCI and for M(O)EMS vibration analysis (the resonance frequency and spatial mode distribution) by means of LI. Distributed array of 5×5 smart pixel imagers detects the interferometric signals. The signal processing is based on the "on pixel" processing capacity of the smart pixel camera array, which can be utilised for phase shifting, signal demodulation or envelope maximum determination. Each micro-interferometer image is detected by the 140 × 146 pixels sub-array distributed in the imaging plane. In the paper the architecture of cameras with smart-pixel approach are described and their application for massive parallel electrooptical detection and data reduction is discussed. The full data processing paths for laser interferometer and low coherent interferometer are presented.

  2. Cyclops Tensor Framework: Reducing Communication and Eliminating Load Imbalance in Massively Parallel Contractions

    DTIC Science & Technology

    2013-02-13

    order cores running at 2.1 GHz with private 64 KB L1 and 512 KB L2 caches. Nodes are connected through Cray’s ‘ Gemini ’ network, which has a 3D torus...topology. Each Gemini chip, which is shared by two Hopper nodes, is capable of 9.8 GB/s bandwidth. However, the NERSC Cray scheduler does not allocate

  3. A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification

    NASA Astrophysics Data System (ADS)

    Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun

    2016-12-01

    Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.

  4. A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification

    PubMed Central

    Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun

    2016-01-01

    Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value. PMID:27905520

  5. A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification.

    PubMed

    Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun

    2016-12-01

    Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.

  6. Massively parallel computing simulation of fluid flow in the unsaturated zone of Yucca Mountain, Nevada

    SciTech Connect

    Zhang, Keni; Wu, Yu-Shu; Bodvarsson, G.S.

    2001-08-31

    This paper presents the application of parallel computing techniques to large-scale modeling of fluid flow in the unsaturated zone (UZ) at Yucca Mountain, Nevada. In this study, parallel computing techniques, as implemented into the TOUGH2 code, are applied in large-scale numerical simulations on a distributed-memory parallel computer. The modeling study has been conducted using an over-one-million-cell three-dimensional numerical model, which incorporates a wide variety of field data for the highly heterogeneous fractured formation at Yucca Mountain. The objective of this study is to analyze the impact of various surface infiltration scenarios (under current and possible future climates) on flow through the UZ system, using various hydrogeological conceptual models with refined grids. The results indicate that the one-million-cell models produce better resolution results and reveal some flow patterns that cannot be obtained using coarse-grid modeling models.

  7. Satisfiability Test with Synchronous Simulated Annealing on the Fujitsu AP1000 Massively-Parallel Multiprocessor

    NASA Technical Reports Server (NTRS)

    Sohn, Andrew; Biswas, Rupak

    1996-01-01

    Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.

  8. A Novel Algorithm for Solving the Multidimensional Neutron Transport Equation on Massively Parallel Architectures

    SciTech Connect

    Azmy, Yousry

    2014-06-10

    We employ the Integral Transport Matrix Method (ITMM) as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells' fluxes and between the cells' and boundary surfaces' fluxes. The main goals of this work are to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and parallel performance of the developed methods with increasing number of processes, P. The fastest observed parallel solution method, Parallel Gauss-Seidel (PGS), was used in a weak scaling comparison with the PARTISN transport code, which uses the source iteration (SI) scheme parallelized with the Koch-baker-Alcouffe (KBA) method. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method- even without acceleration/preconditioning-is completitive for optically thick problems as P is increased to the tens of thousands range. For the most optically thick cells tested, PGS reduced execution time by an approximate factor of three for problems with more than 130 million computational cells on P = 32,768. Moreover, the SI-DSA execution times's trend rises generally more steeply with increasing P than the PGS trend. Furthermore, the PGS method outperforms SI for the periodic heterogeneous layers (PHL) configuration problems. The PGS method outperforms SI and SI-DSA on as few as P = 16 for PHL problems and reduces execution time by a factor of ten or more for all problems considered with more than 2 million computational cells on P = 4.096.

  9. Massively Parallel, Three-Dimensional Transport Solutions for the k-Eigenvalue Problem

    SciTech Connect

    Davidson, Gregory G; Evans, Thomas M; Jarrell, Joshua J; Pandya, Tara M; Slaybaugh, R

    2014-01-01

    We have implemented a new multilevel parallel decomposition in the Denovo dis- crete ordinates radiation transport code. In concert with Krylov subspace iterative solvers, the multilevel decomposition allows concurrency over energy in addition to space-angle, enabling scalability beyond the limits imposed by the traditional KBA space-angle partitioning. Furthermore, a new Arnoldi-based k-eigenvalue solver has been implemented. The added phase-space concurrency combined with the high- performance Krylov and Arnoldi solvers has enabled weak scaling to O(100K) cores on the Jaguar XK6 supercomputer. The multilevel decomposition provides sucient parallelism to scale to exascale computing and beyond.

  10. A Initio Study of the SILICON(111)-(7 by 7) Surface Reconstruction: a Challenge for Massively Parallel Computation

    NASA Astrophysics Data System (ADS)

    Brommer, Karl Daniel

    This thesis presents the first ab initio calculation of the Si(111)-(7 x 7) surface reconstruction, perhaps the most complex and widely studied surface of a solid. The large number of atoms in the unit cell has up to now defied any complete and realistic treatment of its properties. In this thesis, we exploit the power of massively parallel computation to investigate the surface reconstruction with a supercell geometry containing 700 effective atoms. These calculations predict the fully relaxed atomic geometry of this system; allow construction of theoretical STM images as a function of bias voltages; and predict the energy difference between the (7 x 7) and (2 x 1) reconstructions. The diversity of dangling bond sites on the (7 x 7) surface provides an optimal system for investigating chemical reactivity. A detailed study of the electronic surface states is presented, showing that the interpretation of the surface chemical reactivity in terms of newly developed theories of local softness is consistent with chemisorption experiments. We conclude with predictions of results for surface reactions involving a large variety of atoms and molecules. The method of computing electronic structure on a massively parallel computer is fully described, including a discussion of how the calculations would be improved through implementation on a more modern parallel computer. The results demonstrate that the state of the art in ab initio quantum-mechanical computation of electronic structure has been raised to a new echelon as the study of systems involving thousands of atoms is now possible. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617 -253-5668; Fax 617-253-1690.).

  11. Spatiotemporal Domain Decomposition for Massive Parallel Computation of Space-Time Kernel Density

    NASA Astrophysics Data System (ADS)

    Hohl, A.; Delmelle, E. M.; Tang, W.

    2015-07-01

    Accelerated processing capabilities are deemed critical when conducting analysis on spatiotemporal datasets of increasing size, diversity and availability. High-performance parallel computing offers the capacity to solve computationally demanding problems in a limited timeframe, but likewise poses the challenge of preventing processing inefficiency due to workload imbalance between computing resources. Therefore, when designing new algorithms capable of implementing parallel strategies, careful spatiotemporal domain decomposition is necessary to account for heterogeneity in the data. In this study, we perform octtree-based adaptive decomposition of the spatiotemporal domain for parallel computation of space-time kernel density. In order to avoid edge effects near subdomain boundaries, we establish spatiotemporal buffers to include adjacent data-points that are within the spatial and temporal kernel bandwidths. Then, we quantify computational intensity of each subdomain to balance workloads among processors. We illustrate the benefits of our methodology using a space-time epidemiological dataset of Dengue fever, an infectious vector-borne disease that poses a severe threat to communities in tropical climates. Our parallel implementation of kernel density reaches substantial speedup compared to sequential processing, and achieves high levels of workload balance among processors due to great accuracy in quantifying computational intensity. Our approach is portable of other space-time analytical tests.

  12. Optical binary de Bruijn networks for massively parallel computing: design methodology and feasibility study

    NASA Astrophysics Data System (ADS)

    Louri, Ahmed; Sung, Hongki

    1995-10-01

    The interconnection network structure can be the deciding and limiting factor in the cost and the performance of parallel computers. One of the most popular point-to-point interconnection networks for parallel computers today is the hypercube. The regularity, logarithmic diameter, symmetry, high connectivity, fault tolerance, simple routing, and reconfigurability (easy embedding of other network topologies) of the hypercube make it a very attractive choice for parallel computers. Unfortunately the hypercube possesses a major drawback, which is the links per node increases as the network grows in size. As an alternative to the hypercube, the binary de Bruijn (BdB) network has recently received much attention. The BdB not only provides a logarithmic diameter, fault tolerance, and simple routing but also requires fewer links than the hypercube for the same network size. Additionally, a major advantage of the BdB edges per node is independent of the network size. This makes it very desirable for large-scale parallel systems. However, because of its asymmetrical nature and global connectivity, it poses a major challenge for VLSI technology. Optics, owing to its three-dimensional and global-connectivity nature, seems to be very suitable for implementing BdB networks. We present an implementation methodology for optical BdB networks. The distinctive feature of the proposed implementation methodology is partitionability of the network into a few primitive operations that can be implemented efficiently. We further show feasibility of the

  13. Massively-Parallel Architectures for Automatic Recognition of Visual Speech Signals

    DTIC Science & Technology

    1988-10-12

    sometimes the tongue ), others are not. The visible articulators’ contribution to the acoustic signal result in speech sounds that are much more...parallel formant synthesizer," JASA, vol. 67, pp. 971-995 , March 1980. Kohonen, T., Self-Organization and Associative Memory, Springer-Verlag, Berlin, 1984

  14. An open source massively parallel solver for Richards equation: Mechanistic modelling of water fluxes at the watershed scale

    NASA Astrophysics Data System (ADS)

    Orgogozo, L.; Renon, N.; Soulaine, C.; Hénon, F.; Tomer, S. K.; Labat, D.; Pokrovsky, O. S.; Sekhar, M.; Ababou, R.; Quintard, M.

    2014-12-01

    In this paper we present a massively parallel open source solver for Richards equation, named the RichardsFOAM solver. This solver has been developed in the framework of the open source generalist computational fluid dynamics tool box OpenFOAM® and is capable to deal with large scale problems in both space and time. The source code for RichardsFOAM may be downloaded from the CPC program library website. It exhibits good parallel performances (up to ˜90% parallel efficiency with 1024 processors both in strong and weak scaling), and the conditions required for obtaining such performances are analysed and discussed. These performances enable the mechanistic modelling of water fluxes at the scale of experimental watersheds (up to few square kilometres of surface area), and on time scales of decades to a century. Such a solver can be useful in various applications, such as environmental engineering for long term transport of pollutants in soils, water engineering for assessing the impact of land settlement on water resources, or in the study of weathering processes on the watersheds.

  15. Massively parallel classification of single-trial EEG signals using a min-max modular neural network.

    PubMed

    Lu, Bao-Liang; Shin, Jonghan; Ichikawa, Michinori

    2004-03-01

    This paper presents a method for classifying single-trial electroencephalogram (EEG) signals using min-max modular neural networks implemented in a massively parallel way. The method has three main steps. First, a large-scale, complex EEG classification problem is simply divided into a reasonable number of two-class subproblems, as small as needed. Second, the two-class subproblems are simply learned by individual smaller network modules in parallel. Finally, all the individual trained network modules are integrated into a hierarchical, parallel, and modular classifier according to two module combination laws. To demonstrate the effectiveness of the method, we perform simulations on fifteen different four-class EEG classification tasks, each of which consists of 1491 training and 636 test data. These EEG classification tasks were created using a set of non-averaged, single-trial hippocampal EEG signals recorded from rats; the features of the EEG signals are extracted using wavelet transform techniques. The experimental results indicate that the proposed method has several attractive features. 1) The method is appreciably faster than the existing approach that is based on conventional multilayer perceptrons. 2) Complete learning of complex EEG classification problems can be easily realized, and better generalization performance can be achieved. 3) The method scales up to large-scale, complex EEG classification problems.

  16. Solution of the within-group multidimensional discrete ordinates transport equations on massively parallel architectures

    NASA Astrophysics Data System (ADS)

    Zerr, Robert Joseph

    2011-12-01

    The integral transport matrix method (ITMM) has been used as the kernel of new parallel solution methods for the discrete ordinates approximation of the within-group neutron transport equation. The ITMM abandons the repetitive mesh sweeps of the traditional source iterations (SI) scheme in favor of constructing stored operators that account for the direct coupling factors among all the cells and between the cells and boundary surfaces. The main goals of this work were to develop the algorithms that construct these operators and employ them in the solution process, determine the most suitable way to parallelize the entire procedure, and evaluate the behavior and performance of the developed methods for increasing number of processes. This project compares the effectiveness of the ITMM with the SI scheme parallelized with the Koch-Baker-Alcouffe (KBA) method. The primary parallel solution method involves a decomposition of the domain into smaller spatial sub-domains, each with their own transport matrices, and coupled together via interface boundary angular fluxes. Each sub-domain has its own set of ITMM operators and represents an independent transport problem. Multiple iterative parallel solution methods have investigated, including parallel block Jacobi (PBJ), parallel red/black Gauss-Seidel (PGS), and parallel GMRES (PGMRES). The fastest observed parallel solution method, PGS, was used in a weak scaling comparison with the PARTISN code. Compared to the state-of-the-art SI-KBA with diffusion synthetic acceleration (DSA), this new method without acceleration/preconditioning is not competitive for any problem parameters considered. The best comparisons occur for problems that are difficult for SI DSA, namely highly scattering and optically thick. SI DSA execution time curves are generally steeper than the PGS ones. However, until further testing is performed it cannot be concluded that SI DSA does not outperform the ITMM with PGS even on several thousand or tens of

  17. Nonlinear structural response using adaptive dynamic relaxation on a massively-parallel-processing system

    NASA Technical Reports Server (NTRS)

    Oakley, David R.; Knight, Norman F., Jr.

    1994-01-01

    A parallel adaptive dynamic relaxation (ADR) algorithm has been developed for nonlinear structural analysis. This algorithm has minimal memory requirements, is easily parallelizable and scalable to many processors, and is generally very reliable and efficient for highly nonlinear problems. Performance evaluations on single-processor computers have shown that the ADR algorithm is reliable and highly vectorizable, and that it is competitive with direct solution methods for the highly nonlinear problems considered. The present algorithm is implemented on the 512-processor Intel Touchstone DELTA system at Caltech, and it is designed to minimize the extent and frequency of interprocessor communication. The algorithm has been used to solve for the nonlinear static response of two and three dimensional hyperelastic systems involving contact. Impressive relative speedups have been achieved and demonstrate the high scalability of the ADR algorithm. For the class of problems addressed, the ADR algorithm represents a very promising approach for parallel-vector processing.

  18. Analysis and selection of optimal function implementations in massively parallel computer

    DOEpatents

    Archer, Charles Jens; Peters, Amanda; Ratterman, Joseph D.

    2011-05-31

    An apparatus, program product and method optimize the operation of a parallel computer system by, in part, collecting performance data for a set of implementations of a function capable of being executed on the parallel computer system based upon the execution of the set of implementations under varying input parameters in a plurality of input dimensions. The collected performance data may be used to generate selection program code that is configured to call selected implementations of the function in response to a call to the function under varying input parameters. The collected performance data may be used to perform more detailed analysis to ascertain the comparative performance of the set of implementations of the function under the varying input parameters.

  19. Parallel HOP: A Scalable Halo Finder for Massive Cosmological Data Sets

    NASA Astrophysics Data System (ADS)

    Skory, Stephen; Turk, Matthew J.; Norman, Michael L.; Coil, Alison L.

    2010-11-01

    Modern N-body cosmological simulations contain billions (109) of dark matter particles. These simulations require hundreds to thousands of gigabytes of memory and employ hundreds to tens of thousands of processing cores on many compute nodes. In order to study the distribution of dark matter in a cosmological simulation, the dark matter halos must be identified using a halo finder, which establishes the halo membership of every particle in the simulation. The resources required for halo finding are similar to the requirements for the simulation itself. In particular, simulations have become too extensive to use commonly employed halo finders, such that the computational requirements to identify halos must now be spread across multiple nodes and cores. Here, we present a scalable-parallel halo finding method called Parallel HOP for large-scale cosmological simulation data. Based on the halo finder HOP, it utilizes message passing interface and domain decomposition to distribute the halo finding workload across multiple compute nodes, enabling analysis of much larger data sets than is possible with the strictly serial or previous parallel implementations of HOP. We provide a reference implementation of this method as a part of the toolkit "yt", an analysis toolkit for adaptive mesh refinement data that include complementary analysis modules. Additionally, we discuss a suite of benchmarks that demonstrate that this method scales well up to several hundred tasks and data sets in excess of 20003 particles. The Parallel HOP method and our implementation can be readily applied to any kind of N-body simulation data and is therefore widely applicable.

  20. Time-Resolved 3D Quantitative Flow MRI of the Major Intracranial Vessels: Initial Experience and Comparative Evaluation at 1.5T and 3.0T in Combination With Parallel Imaging

    PubMed Central

    Bammer, Roland; Hope, Thomas A.; Aksoy, Murat; Alley, Marcus T.

    2012-01-01

    Exact knowledge of blood flow characteristics in the major cerebral vessels is of great relevance for diagnosing cerebrovascular abnormalities. This involves the assessment of hemodynamically critical areas as well as the derivation of biomechanical parameters such as wall shear stress and pressure gradients. A time-resolved, 3D phase-contrast (PC) MRI method using parallel imaging was implemented to measure blood flow in three dimensions at multiple instances over the cardiac cycle. The 4D velocity data obtained from 14 healthy volunteers were used to investigate dynamic blood flow with the use of multiplanar reformatting, 3D streamlines, and 4D particle tracing. In addition, the effects of magnetic field strength, parallel imaging, and temporal resolution on the data were investigated in a comparative evaluation at 1.5T and 3T using three different parallel imaging reduction factors and three different temporal resolutions in eight of the 14 subjects. Studies were consistently performed faster at 3T than at 1.5T because of better parallel imaging performance. A high temporal resolution (65 ms) was required to follow dynamic processes in the intracranial vessels. The 4D flow measurements provided a high degree of vascular conspicuity. Time-resolved streamline analysis provided features that have not been reported previously for the intracranial vasculature. PMID:17195166

  1. Massive and parallel expression profiling using microarrayed single-cell sequencing

    PubMed Central

    Vickovic, Sanja; Ståhl, Patrik L.; Salmén, Fredrik; Giatrellis, Sarantis; Westholm, Jakub Orzechowski; Mollbrink, Annelie; Navarro, José Fernández; Custodio, Joaquin; Bienko, Magda; Sutton, Lesley-Ann; Rosenquist, Richard; Frisén, Jonas; Lundeberg, Joakim

    2016-01-01

    Single-cell transcriptome analysis overcomes problems inherently associated with averaging gene expression measurements in bulk analysis. However, single-cell analysis is currently challenging in terms of cost, throughput and robustness. Here, we present a method enabling massive microarray-based barcoding of expression patterns in single cells, termed MASC-seq. This technology enables both imaging and high-throughput single-cell analysis, characterizing thousands of single-cell transcriptomes per day at a low cost (0.13 USD/cell), which is two orders of magnitude less than commercially available systems. Our novel approach provides data in a rapid and simple way. Therefore, MASC-seq has the potential to accelerate the study of subtle clonal dynamics and help provide critical insights into disease development and other biological processes. PMID:27739429

  2. Massive and parallel expression profiling using microarrayed single-cell sequencing.

    PubMed

    Vickovic, Sanja; Ståhl, Patrik L; Salmén, Fredrik; Giatrellis, Sarantis; Westholm, Jakub Orzechowski; Mollbrink, Annelie; Navarro, José Fernández; Custodio, Joaquin; Bienko, Magda; Sutton, Lesley-Ann; Rosenquist, Richard; Frisén, Jonas; Lundeberg, Joakim

    2016-10-14

    Single-cell transcriptome analysis overcomes problems inherently associated with averaging gene expression measurements in bulk analysis. However, single-cell analysis is currently challenging in terms of cost, throughput and robustness. Here, we present a method enabling massive microarray-based barcoding of expression patterns in single cells, termed MASC-seq. This technology enables both imaging and high-throughput single-cell analysis, characterizing thousands of single-cell transcriptomes per day at a low cost (0.13 USD/cell), which is two orders of magnitude less than commercially available systems. Our novel approach provides data in a rapid and simple way. Therefore, MASC-seq has the potential to accelerate the study of subtle clonal dynamics and help provide critical insights into disease development and other biological processes.

  3. Advances in time-domain electromagnetic simulation capabilities through the use of overset grids and massively parallel computing

    NASA Astrophysics Data System (ADS)

    Blake, Douglas Clifton

    A new methodology is presented for conducting numerical simulations of electromagnetic scattering and wave-propagation phenomena on massively parallel computing platforms. A process is constructed which is rooted in the Finite-Volume Time-Domain (FVTD) technique to create a simulation capability that is both versatile and practical. In terms of versatility, the method is platform independent, is easily modifiable, and is capable of solving a large number of problems with no alterations. In terms of practicality, the method is sophisticated enough to solve problems of engineering significance and is not limited to mere academic exercises. In order to achieve this capability, techniques are integrated from several scientific disciplines including computational fluid dynamics, computational electromagnetics, and parallel computing. The end result is the first FVTD solver capable of utilizing the highly flexible overset-gridding process in a distributed-memory computing environment. In the process of creating this capability, work is accomplished to conduct the first study designed to quantify the effects of domain-decomposition dimensionality on the parallel performance of hyperbolic partial differential equations solvers; to develop a new method of partitioning a computational domain comprised of overset grids; and to provide the first detailed assessment of the applicability of overset grids to the field of computational electromagnetics. Using these new methods and capabilities, results from a large number of wave propagation and scattering simulations are presented. The overset-grid FVTD algorithm is demonstrated to produce results of comparable accuracy to single-grid simulations while simultaneously shortening the grid-generation process and increasing the flexibility and utility of the FVTD technique. Furthermore, the new domain-decomposition approaches developed for overset grids are shown to be capable of producing partitions that are better load balanced and

  4. Identification of the Bovine Arachnomelia Mutation by Massively Parallel Sequencing Implicates Sulfite Oxidase (SUOX) in Bone Development

    PubMed Central

    Drögemüller, Cord; Tetens, Jens; Sigurdsson, Snaevar; Gentile, Arcangelo; Testoni, Stefania; Lindblad-Toh, Kerstin; Leeb, Tosso

    2010-01-01

    Arachnomelia is a monogenic recessive defect of skeletal development in cattle. The causative mutation was previously mapped to a ∼7 Mb interval on chromosome 5. Here we show that array-based sequence capture and massively parallel sequencing technology, combined with the typical family structure in livestock populations, facilitates the identification of the causative mutation. We re-sequenced the entire critical interval in a healthy partially inbred cow carrying one copy of the critical chromosome segment in its ancestral state and one copy of the same segment with the arachnomelia mutation, and we detected a single heterozygous position. The genetic makeup of several partially inbred cattle provides extremely strong support for the causality of this mutation. The mutation represents a single base insertion leading to a premature stop codon in the coding sequence of the SUOX gene and is perfectly associated with the arachnomelia phenotype. Our findings suggest an important role for sulfite oxidase in bone development. PMID:20865119

  5. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types.

    PubMed

    Jaitin, Diego Adhemar; Kenigsberg, Ephraim; Keren-Shaul, Hadas; Elefant, Naama; Paul, Franziska; Zaretsky, Irina; Mildner, Alexander; Cohen, Nadav; Jung, Steffen; Tanay, Amos; Amit, Ido

    2014-02-14

    In multicellular organisms, biological function emerges when heterogeneous cell types form complex organs. Nevertheless, dissection of tissues into mixtures of cellular subpopulations is currently challenging. We introduce an automated massively parallel single-cell RNA sequencing (RNA-seq) approach for analyzing in vivo transcriptional states in thousands of single cells. Combined with unsupervised classification algorithms, this facilitates ab initio cell-type characterization of splenic tissues. Modeling single-cell transcriptional states in dendritic cells and additional hematopoietic cell types uncovers rich cell-type heterogeneity and gene-modules activity in steady state and after pathogen activation. Cellular diversity is thereby approached through inference of variable and dynamic pathway activity rather than a fixed preprogrammed cell-type hierarchy. These data demonstrate single-cell RNA-seq as an effective tool for comprehensive cellular decomposition of complex tissues.

  6. USH2 caused by GPR98 mutation diagnosed by massively parallel sequencing in advance of the occurrence of visual symptoms

    PubMed Central

    Moteki, Hideaki; Yoshimura, Hidekane; Azaiez, Hela; Booth, Kevin T.; Shearer, A Eliot; Sloan, Christina M.; Kolbe, Diana L.; Murata, Toshinori; Smith, Richard J. H.; Usami, Shin-ichi

    2015-01-01

    Objective We present two patients who were identified with mutations in the GPR98 gene that causes Usher syndrome type 2 (USH2). Methods One hundred ninety-four (194) Japanese subjects from unrelated and families were enrolled in the study. Targeted genomic enrichment and massively parallel sequencing of all known non-syndromic hearing loss genes were used to identify the genetic causes of hearing loss. Results We identified causative mutations in the GPR98 gene in one family (two siblings). The patients had moderate sloping hearing loss, and no progression was observed over a period of 10 years. Fundus examinations were normal. However, electroretinogram revealed impaired responses in both patients. Conclusion Early diagnosis of Usher syndrome has many advantages for patients and their families. This study supports the use of comprehensive genetic diagnosis for Usher syndrome, especially prior to the onset of visual symptoms, to provide the highest chance of diagnostic success in early life stages. PMID:25743181

  7. 3D photoacoustic imaging

    NASA Astrophysics Data System (ADS)

    Carson, Jeffrey J. L.; Roumeliotis, Michael; Chaudhary, Govind; Stodilka, Robert Z.; Anastasio, Mark A.

    2010-06-01

    Our group has concentrated on development of a 3D photoacoustic imaging system for biomedical imaging research. The technology employs a sparse parallel detection scheme and specialized reconstruction software to obtain 3D optical images using a single laser pulse. With the technology we have been able to capture 3D movies of translating point targets and rotating line targets. The current limitation of our 3D photoacoustic imaging approach is its inability ability to reconstruct complex objects in the field of view. This is primarily due to the relatively small number of projections used to reconstruct objects. However, in many photoacoustic imaging situations, only a few objects may be present in the field of view and these objects may have very high contrast compared to background. That is, the objects have sparse properties. Therefore, our work had two objectives: (i) to utilize mathematical tools to evaluate 3D photoacoustic imaging performance, and (ii) to test image reconstruction algorithms that prefer sparseness in the reconstructed images. Our approach was to utilize singular value decomposition techniques to study the imaging operator of the system and evaluate the complexity of objects that could potentially be reconstructed. We also compared the performance of two image reconstruction algorithms (algebraic reconstruction and l1-norm techniques) at reconstructing objects of increasing sparseness. We observed that for a 15-element detection scheme, the number of measureable singular vectors representative of the imaging operator was consistent with the demonstrated ability to reconstruct point and line targets in the field of view. We also observed that the l1-norm reconstruction technique, which is known to prefer sparseness in reconstructed images, was superior to the algebraic reconstruction technique. Based on these findings, we concluded (i) that singular value decomposition of the imaging operator provides valuable insight into the capabilities of

  8. Genome-Wide Footprints of Pig Domestication and Selection Revealed through Massive Parallel Sequencing of Pooled DNA

    PubMed Central

    Amaral, Andreia J.; Ferretti, Luca; Megens, Hendrik-Jan; Crooijmans, Richard P. M. A.; Nie, Haisheng; Ramos-Onsins, Sebastian E.; Perez-Enciso, Miguel; Schook, Lawrence B.; Groenen, Martien A. M.

    2011-01-01

    Background Artificial selection has caused rapid evolution in domesticated species. The identification of selection footprints across domesticated genomes can contribute to uncover the genetic basis of phenotypic diversity. Methodology/Main Findings Genome wide footprints of pig domestication and selection were identified using massive parallel sequencing of pooled reduced representation libraries (RRL) representing ∼2% of the genome from wild boar and four domestic pig breeds (Large White, Landrace, Duroc and Pietrain) which have been under strong selection for muscle development, growth, behavior and coat color. Using specifically developed statistical methods that account for DNA pooling, low mean sequencing depth, and sequencing errors, we provide genome-wide estimates of nucleotide diversity and genetic differentiation in pig. Widespread signals suggestive of positive and balancing selection were found and the strongest signals were observed in Pietrain, one of the breeds most intensively selected for muscle development. Most signals were population-specific but affected genomic regions which harbored genes for common biological categories including coat color, brain development, muscle development, growth, metabolism, olfaction and immunity. Genetic differentiation in regions harboring genes related to muscle development and growth was higher between breeds than between a given breed and the wild boar. Conclusions/Significance These results, suggest that although domesticated breeds have experienced similar selective pressures, selection has acted upon different genes. This might reflect the multiple domestication events of European breeds or could be the result of subsequent introgression of Asian alleles. Overall, it was estimated that approximately 7% of the porcine genome has been affected by selection events. This study illustrates that the massive parallel sequencing of genomic pools is a cost-effective approach to identify footprints of selection

  9. Inter-laboratory evaluation of SNP-based forensic identification by massively parallel sequencing using the Ion PGM™.

    PubMed

    Eduardoff, M; Santos, C; de la Puente, M; Gross, T E; Fondevila, M; Strobl, C; Sobrino, B; Ballard, D; Schneider, P M; Carracedo, Á; Lareu, M V; Parson, W; Phillips, C

    2015-07-01

    Next generation sequencing (NGS) offers the opportunity to analyse forensic DNA samples and obtain massively parallel coverage of targeted short sequences with the variants they carry. We evaluated the levels of sequence coverage, genotyping precision, sensitivity and mixed DNA patterns of a prototype version of the first commercial forensic NGS kit: the HID-Ion AmpliSeq™ Identity Panel with 169-markers designed for the Ion PGM™ system. Evaluations were made between three laboratories following closely matched Ion PGM™ protocols and a simple validation framework of shared DNA controls. The sequence coverage obtained was extensive for the bulk of SNPs targeted by the HID-Ion AmpliSeq™ Identity Panel. Sensitivity studies showed 90-95% of SNP genotypes could be obtained from 25 to 100pg of input DNA. Genotyping concordance tests included Coriell cell-line control DNA analyses checked against whole-genome sequencing data from 1000 Genomes and Complete Genomics, indicating a very high concordance rate of 99.8%. Discordant genotypes detected in rs1979255, rs1004357, rs938283, rs2032597 and rs2399332 indicate these loci should be excluded from the panel. Therefore, the HID-Ion AmpliSeq™ Identity Panel and Ion PGM™ system provide a sensitive and accurate forensic SNP genotyping assay. However, low-level DNA produced much more varied sequence coverage and in forensic use the Ion PGM™ system will require careful calibration of the total samples loaded per chip to preserve the genotyping reliability seen in routine forensic DNA. Furthermore, assessments of mixed DNA indicate the user's control of sequence analysis parameter settings is necessary to ensure mixtures are detected robustly. Given the sensitivity of Ion PGM™, this aspect of forensic genotyping requires further optimisation before massively parallel sequencing is applied to routine casework.

  10. Massively parallel DNA sequencing successfully identifies new causative mutations in deafness genes in patients with cochlear implantation and EAS.

    PubMed

    Miyagawa, Maiko; Nishio, Shin-ya; Ikeda, Takuo; Fukushima, Kunihiro; Usami, Shin-ichi

    2013-01-01

    Genetic factors, the most common etiology in severe to profound hearing loss, are one of the key determinants of Cochlear Implantation (CI) and Electric Acoustic Stimulation (EAS) outcomes. Satisfactory auditory performance after receiving a CI/EAS in patients with certain deafness gene mutations indicates that genetic testing would be helpful in predicting CI/EAS outcomes and deciding treatment choices. However, because of the extreme genetic heterogeneity of deafness, clinical application of genetic information still entails difficulties. Target exon sequencing using massively parallel DNA sequencing is a new powerful strategy to discover rare causative genes in Mendelian disorders such as deafness. We used massive sequencing of the exons of 58 target candidate genes to analyze 8 (4 early-onset, 4 late-onset) Japanese CI/EAS patients, who did not have mutations in commonly found genes including GJB2, SLC26A4, or mitochondrial 1555A>G or 3243A>G mutations. We successfully identified four rare causative mutations in the MYO15A, TECTA, TMPRSS3, and ACTG1 genes in four patients who showed relatively good auditory performance with CI including EAS, suggesting that genetic testing may be able to predict the performance after implantation.

  11. Extended computational kernels in a massively parallel implementation of the Trotter-Suzuki approximation

    NASA Astrophysics Data System (ADS)

    Wittek, Peter; Calderaro, Luca

    2015-12-01

    We extended a parallel and distributed implementation of the Trotter-Suzuki algorithm for simulating quantum systems to study a wider range of physical problems and to make the library easier to use. The new release allows periodic boundary conditions, many-body simulations of non-interacting particles, arbitrary stationary potential functions, and imaginary time evolution to approximate the ground state energy. The new release is more resilient to the computational environment: a wider range of compiler chains and more platforms are supported. To ease development, we provide a more extensive command-line interface, an application programming interface, and wrappers from high-level languages.

  12. Parallel group independent component analysis for massive fMRI data sets.

    PubMed

    Chen, Shaojie; Huang, Lei; Qiu, Huitong; Nebel, Mary Beth; Mostofsky, Stewart H; Pekar, James J; Lindquist, Martin A; Eloyan, Ani; Caffo, Brian S

    2017-01-01

    Independent component analysis (ICA) is widely used in the field of functional neuroimaging to decompose data into spatio-temporal patterns of co-activation. In particular, ICA has found wide usage in the analysis of resting state fMRI (rs-fMRI) data. Recently, a number of large-scale data sets have become publicly available that consist of rs-fMRI scans from thousands of subjects. As a result, efficient ICA algorithms that scale well to the increased number of subjects are required. To address this problem, we propose a two-stage likelihood-based algorithm for performing group ICA, which we denote Parallel Group Independent Component Analysis (PGICA). By utilizing the sequential nature of the algorithm and parallel computing techniques, we are able to efficiently analyze data sets from large numbers of subjects. We illustrate the efficacy of PGICA, which has been implemented in R and is freely available through the Comprehensive R Archive Network, through simulation studies and application to rs-fMRI data from two large multi-subject data sets, consisting of 301 and 779 subjects respectively.

  13. A massively parallel method of characteristic neutral particle transport code for GPUs

    SciTech Connect

    Boyd, W. R.; Smith, K.; Forget, B.

    2013-07-01

    Over the past 20 years, parallel computing has enabled computers to grow ever larger and more powerful while scientific applications have advanced in sophistication and resolution. This trend is being challenged, however, as the power consumption for conventional parallel computing architectures has risen to unsustainable levels and memory limitations have come to dominate compute performance. Heterogeneous computing platforms, such as Graphics Processing Units (GPUs), are an increasingly popular paradigm for solving these issues. This paper explores the applicability of GPUs for deterministic neutron transport. A 2D method of characteristics (MOC) code - OpenMOC - has been developed with solvers for both shared memory multi-core platforms as well as GPUs. The multi-threading and memory locality methodologies for the GPU solver are presented. Performance results for the 2D C5G7 benchmark demonstrate 25-35 x speedup for MOC on the GPU. The lessons learned from this case study will provide the basis for further exploration of MOC on GPUs as well as design decisions for hardware vendors exploring technologies for the next generation of machines for scientific computing. (authors)

  14. Parallel group independent component analysis for massive fMRI data sets

    PubMed Central

    Huang, Lei; Qiu, Huitong; Nebel, Mary Beth; Mostofsky, Stewart H.; Pekar, James J.; Lindquist, Martin A.; Eloyan, Ani; Caffo, Brian S.

    2017-01-01

    Independent component analysis (ICA) is widely used in the field of functional neuroimaging to decompose data into spatio-temporal patterns of co-activation. In particular, ICA has found wide usage in the analysis of resting state fMRI (rs-fMRI) data. Recently, a number of large-scale data sets have become publicly available that consist of rs-fMRI scans from thousands of subjects. As a result, efficient ICA algorithms that scale well to the increased number of subjects are required. To address this problem, we propose a two-stage likelihood-based algorithm for performing group ICA, which we denote Parallel Group Independent Component Analysis (PGICA). By utilizing the sequential nature of the algorithm and parallel computing techniques, we are able to efficiently analyze data sets from large numbers of subjects. We illustrate the efficacy of PGICA, which has been implemented in R and is freely available through the Comprehensive R Archive Network, through simulation studies and application to rs-fMRI data from two large multi-subject data sets, consisting of 301 and 779 subjects respectively. PMID:28278208

  15. Harnessing the killer micros: Applications from LLNL's massively parallel computing initiative

    SciTech Connect

    Belak, J.F.

    1991-07-01

    Recent developments in microprocessor technology have led to performance on scalar applications exceeding traditional supercomputers. This suggests that coupling hundreds or even thousands of these killer-micros'' (all working on a single physical problem) may lead to performance on vector applications in excess of vector supercomputers. Also, future generation killer-micros are expected to have vector floating point units as well. The purpose of this paper is to present an overview of the parallel computing environment at Lawrence Livermore National Laboratory. However, the perspective is necessarily quite narrow and most of the examples are taken from the author's implementation of a large scale molecular dynamics code on the BBN-TC2000 at LLNL. Parallelism is achieved through a geometric domain decomposition -- each processor is assigned a distinct region of space and all atoms contained therein. As the atomic positions evolve, the processors must exchange ownership of specific atoms. This geometric domain decomposition proves to be quite general and we highlight its application to image processing and hydrodynamics simulations as well. 10 refs., 6 figs.

  16. Library preparation and multiplex capture for massive parallel sequencing applications made efficient and easy.

    PubMed

    Neiman, Mårten; Sundling, Simon; Grönberg, Henrik; Hall, Per; Czene, Kamila; Lindberg, Johan; Klevebring, Daniel

    2012-01-01

    During the recent years, rapid development of sequencing technologies and a competitive market has enabled researchers to perform massive sequencing projects at a reasonable cost. As the price for the actual sequencing reactions drops, enabling more samples to be sequenced, the relative price for preparing libraries gets larger and the practical laboratory work becomes complex and tedious. We present a cost-effective strategy for simplified library preparation compatible with both whole genome- and targeted sequencing experiments. An optimized enzyme composition and reaction buffer reduces the number of required clean-up steps and allows for usage of bulk enzymes which makes the whole process cheap, efficient and simple. We also present a two-tagging strategy, which allows for multiplex sequencing of targeted regions. To prove our concept, we have prepared libraries for low-pass sequencing from 100 ng DNA, performed 2-, 4- and 8-plex exome capture and a 96-plex capture of a 500 kb region. In all samples we see a high concordance (>99.4%) of SNP calls when comparing to commercially available SNP-chip platforms.

  17. Deep mutational scanning of an antibody against epidermal growth factor receptor using mammalian cell display and massively parallel pyrosequencing.

    PubMed

    Forsyth, Charles M; Juan, Veronica; Akamatsu, Yoshiko; DuBridge, Robert B; Doan, Minhtam; Ivanov, Alexander V; Ma, Zhiyuan; Polakoff, Dixie; Razo, Jennifer; Wilson, Keith; Powers, David B

    2013-01-01

    We developed a method for deep mutational scanning of antibody complementarity-determining regions (CDRs) that can determine in parallel the effect of every possible single amino acid CDR substitution on antigen binding. The method uses libraries of full length IgGs containing more than 1000 CDR point mutations displayed on mammalian cells, sorted by flow cytometry into subpopulations based on antigen affinity and analyzed by massively parallel pyrosequencing. Higher, lower and neutral affinity mutations are identified by their enrichment or depletion in the FACS subpopulations. We applied this method to a humanized version of the anti-epidermal growth factor receptor antibody cetuximab, generated a near comprehensive data set for 1060 point mutations that recapitulates previously determined structural and mutational data for these CDRs and identified 67 point mutations that increase affinity. The large-scale, comprehensive sequence-function data sets generated by this method should have broad utility for engineering properties such as antibody affinity and specificity and may advance theoretical understanding of antibody-antigen recognition.

  18. Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

    NASA Astrophysics Data System (ADS)

    Calafiura, Paolo; Leggett, Charles; Seuster, Rolf; Tsulaia, Vakhtang; Van Gemmeren, Peter

    2015-12-01

    AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write mechanisms, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows the running of AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the diversity of ATLAS event processing workloads on various computing resources: Grid, opportunistic resources and HPC.

  19. Massively parallel kinetic Monte Carlo simulations of charge carrier transport in organic semiconductors

    NASA Astrophysics Data System (ADS)

    van der Kaap, N. J.; Koster, L. J. A.

    2016-02-01

    A parallel, lattice based Kinetic Monte Carlo simulation is developed that runs on a GPGPU board and includes Coulomb like particle-particle interactions. The performance of this computationally expensive problem is improved by modifying the interaction potential due to nearby particle moves, instead of fully recalculating it. This modification is achieved by adding dipole correction terms that represent the particle move. Exact evaluation of these terms is guaranteed by representing all interactions as 32-bit floating numbers, where only the integers between -222 and 222 are used. We validate our method by modelling the charge transport in disordered organic semiconductors, including Coulomb interactions between charges. Performance is mainly governed by the particle density in the simulation volume, and improves for increasing densities. Our method allows calculations on large volumes including particle-particle interactions, which is important in the field of organic semiconductors.

  20. Massively parallel recording of unit and local field potentials with silicon-based electrodes.

    PubMed

    Csicsvari, Jozsef; Henze, Darrell A; Jamieson, Brian; Harris, Kenneth D; Sirota, Anton; Barthó, Péter; Wise, Kensall D; Buzsáki, György

    2003-08-01

    Parallel recording of neuronal activity in the behaving animal is a prerequisite for our understanding of neuronal representation and storage of information. Here we describe the development of micro-machined silicon microelectrode arrays for unit and local field recordings. The two-dimensional probes with 96 or 64 recording sites provided high-density recording of unit and field activity with minimal tissue displacement or damage. The on-chip active circuit eliminated movement and other artifacts and greatly reduced the weight of the headgear. The precise geometry of the recording tips allowed for the estimation of the spatial location of the recorded neurons and for high-resolution estimation of extracellular current source density. Action potentials could be simultaneously recorded from the soma and dendrites of the same neurons. Silicon technology is a promising approach for high-density, high-resolution sampling of neuronal activity in both basic research and prosthetic devices.

  1. The transition to massively parallel computing within a production environment at a DOE access center

    SciTech Connect

    McCoy, M.G.

    1993-04-01

    In contemplating the transition from sequential to MP computing, the National Energy Research Supercomputer Center (NERSC) is faced with the frictions inherent in the duality of its mission. There have been two goals, the first has been to provide a stable, serviceable, production environment to the user base, the second to bring the most capable early serial supercomputers to the Center to make possible the leading edge simulations. This seeming conundrum has in reality been a source of strength. The task of meeting both goals was faced before with the CRAY 1 which, as delivered, was all iron; so the problems associated with the advent of parallel computers are not entirely new, but they are serious. Current vector supercomputers, such as the C90, offer mature production environments, including software tools, a large applications base, and generality; these machines can be used to attack the spectrum of scientific applications by a large user base knowledgeable in programming techniques for this architecture. Parallel computers to date have offered less developed, even rudimentary, working environments, a sparse applications base, and forced specialization. They have been specialized in terms of programming models, and specialized in terms of the kinds of applications which would do well on the machines. Given this context, why do many service computer centers feel that now is the time to cease or slow the procurement of traditional vector supercomputers in favor of MP systems? What are some of the issues that NERSC must face to engineer a smooth transition? The answers to these questions are multifaceted and by no means completely clear. However, a route exists as a result of early efforts at the Laboratories combined with research within the HPCC Program. One can begin with an analysis of why the hardware and software appearing shortly should be made available to the mainstream, and then address what would be required in an initial production environment.

  2. Quaternary Morphodynamics of Fluvial Dispersal Systems Revealed: The Fly River, PNG, and the Sunda Shelf, SE Asia, simulated with the Massively Parallel GPU-based Model 'GULLEM'

    NASA Astrophysics Data System (ADS)

    Aalto, R. E.; Lauer, J. W.; Darby, S. E.; Best, J.; Dietrich, W. E.

    2015-12-01

    During glacial-marine transgressions vast volumes of sediment are deposited due to the infilling of lowland fluvial systems and shallow shelves, material that is removed during ensuing regressions. Modelling these processes would illuminate system morphodynamics, fluxes, and 'complexity' in response to base level change, yet such problems are computationally formidable. Environmental systems are characterized by strong interconnectivity, yet traditional supercomputers have slow inter-node communication -- whereas rapidly advancing Graphics Processing Unit (GPU) technology offers vastly higher (>100x) bandwidths. GULLEM (GpU-accelerated Lowland Landscape Evolution Model) employs massively parallel code to simulate coupled fluvial-landscape evolution for complex lowland river systems over large temporal and spatial scales. GULLEM models the accommodation space carved/infilled by representing a range of geomorphic processes, including: river & tributary incision within a multi-directional flow regime, non-linear diffusion, glacial-isostatic flexure, hydraulic geometry, tectonic deformation, sediment production, transport & deposition, and full 3D tracking of all resulting stratigraphy. Model results concur with the Holocene dynamics of the Fly River, PNG -- as documented with dated cores, sonar imaging of floodbasin stratigraphy, and the observations of topographic remnants from LGM conditions. Other supporting research was conducted along the Mekong River, the largest fluvial system of the Sunda Shelf. These and other field data provide tantalizing empirical glimpses into the lowland landscapes of large rivers during glacial-interglacial transitions, observations that can be explored with this powerful numerical model. GULLEM affords estimates for the timing and flux budgets within the Fly and Sunda Systems, illustrating complex internal system responses to the external forcing of sea level and climate. Furthermore, GULLEM can be applied to most ANY fluvial system to

  3. Massively Parallel Simulation of Uranium Migration at the Hanford 300 Area

    NASA Astrophysics Data System (ADS)

    Hammond, G. E.; Lichtner, P. C.

    2009-12-01

    Effectively utilized, high-performance computing can have a significant impact on subsurface science by enabling researchers to employ models with ever increasing sophistication and complexity that provide a more accurate and mechanistic representation of subsurface processes. As part of the U.S. Department of Energy’s SciDAC-2 program, the petascale subsurface reactive multiphase flow and transport code PFLOTRAN has been developed and is currently being employed to simulate uranium migration at the Hanford 300 Area. PFLOTRAN has been run on subsurface problems composed of up to two billion degrees of freedom and utilizing up to 131,072 processor cores on the world’s largest open science supercomputer Jaguar. This presentation focuses on the application of PFLOTRAN to simulate geochemical transport of uranium at Hanford using the Jaguar supercomputer. The Hanford 300 Area presents many challenges with regard to simulating radionuclide transport. Aside from the many conceptual uncertainties in the problem such as the choice of initial conditions, rapid fluctuations in the Columbia River stage, which occur on an hourly basis with several meter variations, can have a dramatic impact on the size of the uranium plume, its migration direction, and the rate at which it migrates to the river. Due to the immense size of the physical domain needed to include the transient river boundary condition, the grid resolution required to preserve accuracy, and the number of chemical components simulated, 3D simulation of the Hanford 300 Area would be unsustainable on a single workstation, and thus high-performance computing is essential.

  4. De novo assembly and validation of planaria transcriptome by massive parallel sequencing and shotgun proteomics.

    PubMed

    Adamidi, Catherine; Wang, Yongbo; Gruen, Dominic; Mastrobuoni, Guido; You, Xintian; Tolle, Dominic; Dodt, Matthias; Mackowiak, Sebastian D; Gogol-Doering, Andreas; Oenal, Pinar; Rybak, Agnieszka; Ross, Eric; Sánchez Alvarado, Alejandro; Kempa, Stefan; Dieterich, Christoph; Rajewsky, Nikolaus; Chen, Wei

    2011-07-01

    Freshwater planaria are a very attractive model system for stem cell biology, tissue homeostasis, and regeneration. The genome of the planarian Schmidtea mediterranea has recently been sequenced and is estimated to contain >20,000 protein-encoding genes. However, the characterization of its transcriptome is far from complete. Furthermore, not a single proteome of the entire phylum has been assayed on a genome-wide level. We devised an efficient sequencing strategy that allowed us to de novo assemble a major fraction of the S. mediterranea transcriptome. We then used independent assays and massive shotgun proteomics to validate the authenticity of transcripts. In total, our de novo assembly yielded 18,619 candidate transcripts with a mean length of 1118 nt after filtering. A total of 17,564 candidate transcripts could be mapped to 15,284 distinct loci on the current genome reference sequence. RACE confirmed complete or almost complete 5' and 3' ends for 22/24 transcripts. The frequencies of frame shifts, fusion, and fission events in the assembled transcripts were computationally estimated to be 4.2%-13%, 0%-3.7%, and 2.6%, respectively. Our shotgun proteomics produced 16,135 distinct peptides that validated 4200 transcripts (FDR ≤1%). The catalog of transcripts assembled in this study, together with the identified peptides, dramatically expands and refines planarian gene annotation, demonstrated by validation of several previously unknown transcripts with stem cell-dependent expression patterns. In addition, our robust transcriptome characterization pipeline could be applied to other organisms without genome assembly. All of our data, including homology annotation, are freely available at SmedGD, the S. mediterranea genome database.

  5. Measuring the sequence-affinity landscape of antibodies with massively parallel titration curves

    PubMed Central

    Adams, Rhys M; Mora, Thierry; Walczak, Aleksandra M; Kinney, Justin B

    2016-01-01

    Despite the central role that antibodies play in the adaptive immune system and in biotechnology, much remains unknown about the quantitative relationship between an antibody’s amino acid sequence and its antigen binding affinity. Here we describe a new experimental approach, called Tite-Seq, that is capable of measuring binding titration curves and corresponding affinities for thousands of variant antibodies in parallel. The measurement of titration curves eliminates the confounding effects of antibody expression and stability that arise in standard deep mutational scanning assays. We demonstrate Tite-Seq on the CDR1H and CDR3H regions of a well-studied scFv antibody. Our data shed light on the structural basis for antigen binding affinity and suggests a role for secondary CDR loops in establishing antibody stability. Tite-Seq fills a large gap in the ability to measure critical aspects of the adaptive immune system, and can be readily used for studying sequence-affinity landscapes in other protein systems. DOI: http://dx.doi.org/10.7554/eLife.23156.001 PMID:28035901

  6. Massively parallel first-principles simulation of electron dynamics in materials

    DOE PAGES

    Draeger, Erik W.; Andrade, Xavier; Gunnels, John A.; ...

    2017-03-04

    Here we present a highly scalable, parallel implementation of first-principles electron dynamics coupled with molecular dynamics (MD). By using optimized kernels, network topology aware communication, and by fully distributing all terms in the time-dependent Kohn–Sham equation, we demonstrate unprecedented time to solution for disordered aluminum systems of 2000 atoms (22,000 electrons) and 5400 atoms (59,400 electrons), with wall clock time as low as 7.5 s per MD time step. Despite a significant amount of non-local communication required in every iteration, we achieved excellent strong scaling and sustained performance on the Sequoia Blue Gene/Q supercomputer at LLNL. We obtained up tomore » 59% of the theoretical sustained peak performance on 16,384 nodes and performance of 8.75 Petaflop/s (43% of theoretical peak) on the full 98,304 node machine (1,572,864 cores). Lastly, scalable explicit electron dynamics allows for the study of phenomena beyond the reach of standard first-principles MD, in particular, materials subject to strong or rapid perturbations, such as pulsed electromagnetic radiation, particle irradiation, or strong electric currents.« less

  7. Massively parallel manipulation of single cells and microparticles using optical images.

    PubMed

    Chiou, Pei Yu; Ohta, Aaron T; Wu, Ming C

    2005-07-21

    The ability to manipulate biological cells and micrometre-scale particles plays an important role in many biological and colloidal science applications. However, conventional manipulation techniques--including optical tweezers, electrokinetic forces (electrophoresis, dielectrophoresis, travelling-wave dielectrophoresis), magnetic tweezers, acoustic traps and hydrodynamic flows--cannot achieve high resolution and high throughput at the same time. Optical tweezers offer high resolution for trapping single particles, but have a limited manipulation area owing to tight focusing requirements; on the other hand, electrokinetic forces and other mechanisms provide high throughput, but lack the flexibility or the spatial resolution necessary for controlling individual cells. Here we present an optical image-driven dielectrophoresis technique that permits high-resolution patterning of electric fields on a photoconductive surface for manipulating single particles. It requires 100,000 times less optical intensity than optical tweezers. Using an incoherent light source (a light-emitting diode or a halogen lamp) and a digital micromirror spatial light modulator, we have demonstrated parallel manipulation of 15,000 particle traps on a 1.3 x 1.0 mm2 area. With direct optical imaging control, multiple manipulation functions are combined to achieve complex, multi-step manipulation protocols.

  8. Estimating genome-wide gene networks using nonparametric Bayesian network models on massively parallel computers.

    PubMed

    Tamada, Yoshinori; Imoto, Seiya; Araki, Hiromitsu; Nagasaki, Masao; Print, Cristin; Charnock-Jones, D Stephen; Miyano, Satoru

    2011-01-01

    We present a novel algorithm to estimate genome-wide gene networks consisting of more than 20,000 genes from gene expression data using nonparametric Bayesian networks. Due to the difficulty of learning Bayesian network structures, existing algorithms cannot be applied to more than a few thousand genes. Our algorithm overcomes this limitation by repeatedly estimating subnetworks in parallel for genes selected by neighbor node sampling. Through numerical simulation, we confirmed that our algorithm outperformed a heuristic algorithm in a shorter time. We applied our algorithm to microarray data from human umbilical vein endothelial cells (HUVECs) treated with siRNAs, to construct a human genome-wide gene network, which we compared to a small gene network estimated for the genes extracted using a traditional bioinformatics method. The results showed that our genome-wide gene network contains many features of the small network, as well as others that could not be captured during the small network estimation. The results also revealed master-regulator genes that are not in the small network but that control many of the genes in the small network. These analyses were impossible to realize without our proposed algorithm.

  9. Massively parallel manipulation of single cells and microparticles using optical images

    NASA Astrophysics Data System (ADS)

    Chiou, Pei Yu; Ohta, Aaron T.; Wu, Ming C.

    2005-07-01

    The ability to manipulate biological cells and micrometre-scale particles plays an important role in many biological and colloidal science applications. However, conventional manipulation techniques-including optical tweezers, electrokinetic forces (electrophoresis, dielectrophoresis, travelling-wave dielectrophoresis), magnetic tweezers, acoustic traps and hydrodynamic flows-cannot achieve high resolution and high throughput at the same time. Optical tweezers offer high resolution for trapping single particles, but have a limited manipulation area owing to tight focusing requirements; on the other hand, electrokinetic forces and other mechanisms provide high throughput, but lack the flexibility or the spatial resolution necessary for controlling individual cells. Here we present an optical image-driven dielectrophoresis technique that permits high-resolution patterning of electric fields on a photoconductive surface for manipulating single particles. It requires 100,000 times less optical intensity than optical tweezers. Using an incoherent light source (a light-emitting diode or a halogen lamp) and a digital micromirror spatial light modulator, we have demonstrated parallel manipulation of 15,000 particle traps on a 1.3 × 1.0mm2 area. With direct optical imaging control, multiple manipulation functions are combined to achieve complex, multi-step manipulation protocols.

  10. DFT-Based Electronic Structure Calculations on Hybrid and Massively Parallel Computer Architectures

    NASA Astrophysics Data System (ADS)

    Briggs, Emil; Hodak, Miroslav; Lu, Wenchang; Bernholc, Jerry

    2014-03-01

    The latest generation of supercomputers is capable of multi-petaflop peak performance, achieved by using thousands of multi-core CPU's and often coupled with thousands of GPU's. However, efficient utilization of this computing power for electronic structure calculations presents significant challenges. We describe adaptations of the Real-Space Multigrid (RMG) code that enable it to scale well to thousands of nodes. A hybrid technique that uses one MPI process per node, rather than on per core was adopted with OpenMP and POSIX threads used for intra-node parallelization. This reduces the number of MPI process's by an order of magnitude or more and improves individual node memory utilization. GPU accelerators are also becoming common and are capable of extremely high performance for vector workloads. However, they typically have much lower scalar performance than CPU's, so achieving good performance requires that the workload is carefully partitioned and data transfer between CPU and GPU is optimized. We have used a hybrid approach utilizing MPI/OpenMP/POSIX threads and GPU accelerators to reach excellent scaling to over 100,000 cores on a Cray XE6 platform as well as a factor of three performance improvement when using a Cray XK7 system with CPU-GPU nodes.

  11. Efficient massively parallel simulation of dynamic channel assignment schemes for wireless cellular communications

    NASA Technical Reports Server (NTRS)

    Greenberg, Albert G.; Lubachevsky, Boris D.; Nicol, David M.; Wright, Paul E.

    1994-01-01

    Fast, efficient parallel algorithms are presented for discrete event simulations of dynamic channel assignment schemes for wireless cellular communication networks. The driving events are call arrivals and departures, in continuous time, to cells geographically distributed across the service area. A dynamic channel assignment scheme decides which call arrivals to accept, and which channels to allocate to the accepted calls, attempting to minimize call blocking while ensuring co-channel interference is tolerably low. Specifically, the scheme ensures that the same channel is used concurrently at different cells only if the pairwise distances between those cells are sufficiently large. Much of the complexity of the system comes from ensuring this separation. The network is modeled as a system of interacting continuous time automata, each corresponding to a cell. To simulate the model, conservative methods are used; i.e., methods in which no errors occur in the course of the simulation and so no rollback or relaxation is needed. Implemented on a 16K processor MasPar MP-1, an elegant and simple technique provides speedups of about 15 times over an optimized serial simulation running on a high speed workstation. A drawback of this technique, typical of conservative methods, is that processor utilization is rather low. To overcome this, new methods were developed that exploit slackness in event dependencies over short intervals of time, thereby raising the utilization to above 50 percent and the speedup over the optimized serial code to about 120 times.

  12. Sassena — X-ray and neutron scattering calculated from molecular dynamics trajectories using massively parallel computers

    NASA Astrophysics Data System (ADS)

    Lindner, Benjamin; Smith, Jeremy C.

    2012-07-01

    Massively parallel computers now permit the molecular dynamics (MD) simulation of multi-million atom systems on time scales up to the microsecond. However, the subsequent analysis of the resulting simulation trajectories has now become a high performance computing problem in itself. Here, we present software for calculating X-ray and neutron scattering intensities from MD simulation data that scales well on massively parallel supercomputers. The calculation and data staging schemes used maximize the degree of parallelism and minimize the IO bandwidth requirements. The strong scaling tested on the Jaguar Petaflop Cray XT5 at Oak Ridge National Laboratory exhibits virtually linear scaling up to 7000 cores for most benchmark systems. Since both MPI and thread parallelism is supported, the software is flexible enough to cover scaling demands for different types of scattering calculations. The result is a high performance tool capable of unifying large-scale supercomputing and a wide variety of neutron/synchrotron technology. Catalogue identifier: AELW_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AELW_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU General Public License, version 3 No. of lines in distributed program, including test data, etc.: 1 003 742 No. of bytes in distributed program, including test data, etc.: 798 Distribution format: tar.gz Programming language: C++, OpenMPI Computer: Distributed Memory, Cluster of Computers with high performance network, Supercomputer Operating system: UNIX, LINUX, OSX Has the code been vectorized or parallelized?: Yes, the code has been parallelized using MPI directives. Tested with up to 7000 processors RAM: Up to 1 Gbytes/core Classification: 6.5, 8 External routines: Boost Library, FFTW3, CMAKE, GNU C++ Compiler, OpenMPI, LibXML, LAPACK Nature of problem: Recent developments in supercomputing allow molecular dynamics simulations to

  13. Modeling cardiovascular hemodynamics using the lattice Boltzmann method on massively parallel supercomputers

    NASA Astrophysics Data System (ADS)

    Randles, Amanda Elizabeth

    the modeling of fluids in vessels with smaller diameters and a method for introducing the deformational forces exerted on the arterial flows from the movement of the heart by borrowing concepts from cosmodynamics are presented. These additional forces have a great impact on the endothelial shear stress. Third, the fluid model is extended to not only recover Navier-Stokes hydrodynamics, but also a wider range of Knudsen numbers, which is especially important in micro- and nano-scale flows. The tradeoffs of many optimizations methods such as the use of deep halo level ghost cells that, alongside hybrid programming models, reduce the impact of such higher-order models and enable efficient modeling of extreme regimes of computational fluid dynamics are discussed. Fourth, the extension of these models to other research questions like clogging in microfluidic devices and determining the severity of co-arctation of the aorta is presented. Through this work, a validation of these methods by taking real patient data and the measured pressure value before the narrowing of the aorta and predicting the pressure drop across the co-arctation is shown. Comparison with the measured pressure drop in vivo highlights the accuracy and potential impact of such patient specific simulations. Finally, a method to enable the simulation of longer trajectories in time by discretizing both spatially and temporally is presented. In this method, a serial coarse iterator is used to initialize data at discrete time steps for a fine model that runs in parallel. This coarse solver is based on a larger time step and typically a coarser discretization in space. Iterative refinement enables the compute-intensive fine iterator to be modeled with temporal parallelization. The algorithm consists of a series of prediction-corrector iterations completing when the results have converged within a certain tolerance. Combined, these developments allow large fluid models to be simulated for longer time durations

  14. Massively Parallelized Pollen Tube Guidance and Mechanical Measurements on a Lab-on-a-Chip Platform.

    PubMed

    Shamsudhin, Naveen; Laeubli, Nino; Atakan, Huseyin Baris; Vogler, Hannes; Hu, Chengzhi; Haeberle, Walter; Sebastian, Abu; Grossniklaus, Ueli; Nelson, Bradley J

    2016-01-01

    Pollen tubes are used as a model in the study of plant morphogenesis, cellular differentiation, cell wall biochemistry, biomechanics, and intra- and intercellular signaling. For a "systems-understanding" of the bio-chemo-mechanics of tip-polarized growth in pollen tubes, the need for a versatile, experimental assay platform for quantitative data collection and analysis is critical. We introduce a Lab-on-a-Chip (LoC) concept for high-throughput pollen germination and pollen tube guidance for parallelized optical and mechanical measurements. The LoC localizes a large number of growing pollen tubes on a single plane of focus with unidirectional tip-growth, enabling high-resolution quantitative microscopy. This species-independent LoC platform can be integrated with micro-/nano-indentation systems, such as the cellular force microscope (CFM) or the atomic force microscope (AFM), allowing for rapid measurements of cell wall stiffness of growing tubes. As a demonstrative example, we show the growth and directional guidance of hundreds of lily (Lilium longiflorum) and Arabidopsis (Arabidopsis thaliana) pollen tubes on a single LoC microscopy slide. Combining the LoC with the CFM, we characterized the cell wall stiffness of lily pollen tubes. Using the stiffness statistics and finite-element-method (FEM)-based approaches, we computed an effective range of the linear elastic moduli of the cell wall spanning the variability space of physiological parameters including internal turgor, cell wall thickness, and tube diameter. We propose the LoC device as a versatile and high-throughput phenomics platform for plant reproductive and development biology using the pollen tube as a model.

  15. Massively Parallelized Pollen Tube Guidance and Mechanical Measurements on a Lab-on-a-Chip Platform

    PubMed Central

    Laeubli, Nino; Atakan, Huseyin Baris; Vogler, Hannes; Hu, Chengzhi; Haeberle, Walter; Sebastian, Abu; Grossniklaus, Ueli; Nelson, Bradley J.

    2016-01-01

    Pollen tubes are used as a model in the study of plant morphogenesis, cellular differentiation, cell wall biochemistry, biomechanics, and intra- and intercellular signaling. For a “systems-understanding” of the bio-chemo-mechanics of tip-polarized growth in pollen tubes, the need for a versatile, experimental assay platform for quantitative data collection and analysis is critical. We introduce a Lab-on-a-Chip (LoC) concept for high-throughput pollen germination and pollen tube guidance for parallelized optical and mechanical measurements. The LoC localizes a large number of growing pollen tubes on a single plane of focus with unidirectional tip-growth, enabling high-resolution quantitative microscopy. This species-independent LoC platform can be integrated with micro-/nano-indentation systems, such as the cellular force microscope (CFM) or the atomic force microscope (AFM), allowing for rapid measurements of cell wall stiffness of growing tubes. As a demonstrative example, we show the growth and directional guidance of hundreds of lily (Lilium longiflorum) and Arabidopsis (Arabidopsis thaliana) pollen tubes on a single LoC microscopy slide. Combining the LoC with the CFM, we characterized the cell wall stiffness of lily pollen tubes. Using the stiffness statistics and finite-element-method (FEM)-based approaches, we computed an effective range of the linear elastic moduli of the cell wall spanning the variability space of physiological parameters including internal turgor, cell wall thickness, and tube diameter. We propose the LoC device as a versatile and high-throughput phenomics platform for plant reproductive and development biology using the pollen tube as a model. PMID:27977748

  16. Massively Parallel Geostatistical Inversion of Coupled Processes in Heterogeneous Porous Media

    NASA Astrophysics Data System (ADS)

    Ngo, A.; Schwede, R. L.; Li, W.; Bastian, P.; Ippisch, O.; Cirpka, O. A.

    2012-04-01

    another level of parallelization has been added.

  17. A parallel domain decomposition-based implicit method for the Cahn-Hilliard-Cook phase-field equation in 3D

    NASA Astrophysics Data System (ADS)

    Zheng, Xiang; Yang, Chao; Cai, Xiao-Chuan; Keyes, David

    2015-03-01

    We present a numerical algorithm for simulating the spinodal decomposition described by the three dimensional Cahn-Hilliard-Cook (CHC) equation, which is a fourth-order stochastic partial differential equation with a noise term. The equation is discretized in space and time based on a fully implicit, cell-centered finite difference scheme, with an adaptive time-stepping strategy designed to accelerate the progress to equilibrium. At each time step, a parallel Newton-Krylov-Schwarz algorithm is used to solve the nonlinear system. We discuss various numerical and computational challenges associated with the method. The numerical scheme is validated by a comparison with an explicit scheme of high accuracy (and unreasonably high cost). We present steady state solutions of the CHC equation in two and three dimensions. The effect of the thermal fluctuation on the spinodal decomposition process is studied. We show that the existence of the thermal fluctuation accelerates the spinodal decomposition process and that the final steady morphology is sensitive to the stochastic noise. We also show the evolution of the energies and statistical moments. In terms of the parallel performance, it is found that the implicit domain decomposition approach scales well on supercomputers with a large number of processors.

  18. A parallel domain decomposition-based implicit method for the Cahn–Hilliard–Cook phase-field equation in 3D

    SciTech Connect

    Zheng, Xiang; Yang, Chao; Cai, Xiao-Chuan; Keyes, David

    2015-03-15

    We present a numerical algorithm for simulating the spinodal decomposition described by the three dimensional Cahn–Hilliard–Cook (CHC) equation, which is a fourth-order stochastic partial differential equation with a noise term. The equation is discretized in space and time based on a fully implicit, cell-centered finite difference scheme, with an adaptive time-stepping strategy designed to accelerate the progress to equilibrium. At each time step, a parallel Newton–Krylov–Schwarz algorithm is used to solve the nonlinear system. We discuss various numerical and computational challenges associated with the method. The numerical scheme is validated by a comparison with an explicit scheme of high accuracy (and unreasonably high cost). We present steady state solutions of the CHC equation in two and three dimensions. The effect of the thermal fluctuation on the spinodal decomposition process is studied. We show that the existence of the thermal fluctuation accelerates the spinodal decomposition process and that the final steady morphology is sensitive to the stochastic noise. We also show the evolution of the energies and statistical moments. In terms of the parallel performance, it is found that the implicit domain decomposition approach scales well on supercomputers with a large number of processors.

  19. Massively parallel tag sequencing reveals the complexity of anaerobic marine protistan communities

    PubMed Central

    Stoeck, Thorsten; Behnke, Anke; Christen, Richard; Amaral-Zettler, Linda; Rodriguez-Mora, Maria J; Chistoserdov, Andrei; Orsi, William; Edgcomb, Virginia P

    2009-01-01

    Background Recent advances in sequencing strategies make possible unprecedented depth and scale of sampling for molecular detection of microbial diversity. Two major paradigm-shifting discoveries include the detection of bacterial diversity that is one to two orders of magnitude greater than previous estimates, and the discovery of an exciting 'rare biosphere' of molecular signatures ('species') of poorly understood ecological significance. We applied a high-throughput parallel tag sequencing (454 sequencing) protocol adopted for eukaryotes to investigate protistan community complexity in two contrasting anoxic marine ecosystems (Framvaren Fjord, Norway; Cariaco deep-sea basin, Venezuela). Both sampling sites have previously been scrutinized for protistan diversity by traditional clone library construction and Sanger sequencing. By comparing these clone library data with 454 amplicon library data, we assess the efficiency of high-throughput tag sequencing strategies. We here present a novel, highly conservative bioinformatic analysis pipeline for the processing of large tag sequence data sets. Results The analyses of ca. 250,000 sequence reads revealed that the number of detected Operational Taxonomic Units (OTUs) far exceeded previous richness estimates from the same sites based on clone libraries and Sanger sequencing. More than 90% of this diversity was represented by OTUs with less than 10 sequence tags. We detected a substantial number of taxonomic groups like Apusozoa, Chrysomerophytes, Centroheliozoa, Eustigmatophytes, hyphochytriomycetes, Ichthyosporea, Oikomonads, Phaeothamniophytes, and rhodophytes which remained undetected by previous clone library-based diversity surveys of the sampling sites. The most important innovations in our newly developed bioinformatics pipeline employ (i) BLASTN with query parameters adjusted for highly variable domains and a complete database of public ribosomal RNA (rRNA) gene sequences for taxonomic assignments of tags; (ii

  20. Ring polymer chains confined in a slit geometry of two parallel walls: the massive field theory approach

    NASA Astrophysics Data System (ADS)

    Usatenko, Z.; Halun, J.

    2017-01-01

    The investigation of a dilute solution of phantom ideal ring polymer chains confined in a slit geometry of two parallel repulsive walls, two inert walls, and for the mixed case of one inert and the other one repulsive wall, was performed. Taking into account the well known correspondence between the field theoretical {φ4} O(n)-vector model in the limit n\\to 0 and the behaviour of long-flexible polymer chains in a good solvent, the investigation of a dilute solution of long-flexible ring polymer chains with the excluded volume interaction (EVI) confined in a slit geometry of two parallel repulsive walls was performed in the framework of the massive field theory approach at fixed space dimensions d  =  3 up to one-loop order. For all the above mentioned cases, the correspondent depletion interaction potentials, the depletion forces and the forces which exert the phantom ideal ring polymers and the ring polymers with the EVI on the walls were calculated, respectively. The obtained results indicate that the phantom ideal ring polymer chains and the ring polymer chains with the EVI due to the complexity of chain topology and because of the entropical reason demonstrate completely different behaviour in confined geometries than linear polymer chains. For example, the phantom ideal ring polymers prefer to escape from the space not only between two repulsive walls but also in the case of two inert walls, which leads to the attractive depletion forces. The ring polymer chains with less complex knot types (with the bigger radius of gyration) in a ring topology in the wide slit region exert higher forces on the confining repulsive walls. The depletion force in the case of mixed boundary conditions becomes repulsive in contrast to the case of linear polymer chains.

  1. Massively Parallel Implementation of Explicitly Correlated Coupled-Cluster Singles and Doubles Using TiledArray Framework.

    PubMed

    Peng, Chong; Calvin, Justus A; Pavošević, Fabijan; Zhang, Jinmei; Valeev, Edward F

    2016-12-29

    A new distributed-memory massively parallel implementation of standard and explicitly correlated (F12) coupled-cluster singles and doubles (CCSD) with canonical O(N(6)) computational complexity is described. The implementation is based on the TiledArray tensor framework. Novel features of the implementation include (a) all data greater than O(N) is distributed in memory and (b) the mixed use of density fitting and integral-driven formulations that optionally allows to avoid storage of tensors with three and four unoccupied indices. Excellent strong scaling is demonstrated on a multicore shared-memory computer, a commodity distributed-memory computer, and a national-scale supercomputer. The performance on a shared-memory computer is competitive with the popular CCSD implementations in ORCA and Psi4. Moreover, the CCSD performance on a commodity-size cluster significantly improves on the state-of-the-art package NWChem. The large-scale parallel explicitly correlated coupled-cluster implementation makes routine accurate estimation of the coupled-cluster basis set limit for molecules with 20 or more atoms. Thus, it can provide valuable benchmarks for the merging reduced-scaling coupled-cluster approaches. The new implementation allowed us to revisit the basis set limit for the CCSD contribution to the binding energy of π-stacked uracil dimer, a challenging paradigm of π-stacking interactions from the S66 benchmark database. The revised value for the CCSD correlation binding energy obtained with the help of quadruple-ζ CCSD computations, -8.30 ± 0.02 kcal/mol, is significantly different from the S66 reference value, -8.50 kcal/mol, as well as other CBS limit estimates in the recent literature.

  2. A Parallel 3d Model for The Multi-Species Low Energy BeamTransport System of the RIA Prototype ECR Ion Source Venus

    SciTech Connect

    Qiang, J.; Leitner, D.; Todd, D.

    2005-05-16

    The driver linac of the proposed Rare Isotope Accelerator (RIA) requires a great variety of high intensity, high charge state ion beams. In order to design and to optimize the low energy beamline optics of the RIA front end,we have developed a new parallel three-dimensional model to simulate the low energy, multi-species ion beam formation and transport from the ECR ion source extraction region to the focal plane of the analyzing magnet. A multisection overlapped computational domain has been used to break the original transport system into a number of each subsystem, macro-particle tracking is used to obtain the charge density distribution in this subdomain. The three-dimensional Poisson equation is solved within the subdomain and particle tracking is repeated until the solution converges. Two new Poisson solvers based on a combination of the spectral method and the multigrid method have been developed to solve the Poisson equation in cylindrical coordinates for the beam extraction region and in the Frenet-Serret coordinates for the bending magnet region. Some test examples and initial applications will also be presented.

  3. Massively parallel E-beam inspection: enabling next-generation patterned defect inspection for wafer and mask manufacturing

    NASA Astrophysics Data System (ADS)

    Malloy, Matt; Thiel, Brad; Bunday, Benjamin D.; Wurm, Stefan; Mukhtar, Maseeh; Quoi, Kathy; Kemen, Thomas; Zeidler, Dirk; Eberle, Anna Lena; Garbowski, Tomasz; Dellemann, Gregor; Peters, Jan Hendrik

    2015-03-01

    SEMATECH aims to identify and enable disruptive technologies to meet the ever-increasing demands of semiconductor high volume manufacturing (HVM). As such, a program was initiated in 2012 focused on high-speed e-beam defect inspection as a complement, and eventual successor, to bright field optical patterned defect inspection [1]. The primary goal is to enable a new technology to overcome the key gaps that are limiting modern day inspection in the fab; primarily, throughput and sensitivity to detect ultra-small critical defects. The program specifically targets revolutionary solutions based on massively parallel e-beam technologies, as opposed to incremental improvements to existing e-beam and optical inspection platforms. Wafer inspection is the primary target, but attention is also being paid to next generation mask inspection. During the first phase of the multi-year program multiple technologies were reviewed, a down-selection was made to the top candidates, and evaluations began on proof of concept systems. A champion technology has been selected and as of late 2014 the program has begun to move into the core technology maturation phase in order to enable eventual commercialization of an HVM system. Performance data from early proof of concept systems will be shown along with roadmaps to achieving HVM performance. SEMATECH's vision for moving from early-stage development to commercialization will be shown, including plans for development with industry leading technology providers.

  4. A bumpy ride on the diagnostic bench of massive parallel sequencing, the case of the mitochondrial genome.

    PubMed

    Vancampenhout, Kim; Caljon, Ben; Spits, Claudia; Stouffs, Katrien; Jonckheere, An; De Meirleir, Linda; Lissens, Willy; Vanlander, Arnaud; Smet, Joél; De Paepe, Boel; Van Coster, Rudy; Seneca, Sara

    2014-01-01

    The advent of massive parallel sequencing (MPS) has revolutionized the field of human molecular genetics, including the diagnostic study of mitochondrial (mt) DNA dysfunction. The analysis of the complete mitochondrial genome using MPS platforms is now common and will soon outrun conventional sequencing. However, the development of a robust and reliable protocol is rather challenging. A previous pilot study for the re-sequencing of human mtDNA revealed an uneven coverage, affecting predominantly part of the plus strand. In an attempt to address this problem, we undertook a comparative study of standard and modified protocols for the Ion Torrent PGM system. We could not improve strand representation by altering the recommended shearing methodology of the standard workflow or omitting the DNA polymerase amplification step from the library construction process. However, we were able to associate coverage bias of the plus strand with a specific sequence motif. Additionally, we compared coverage and variant calling across technologies. The same samples were also sequenced on a MiSeq device which showed that coverage and heteroplasmic variant calling were much improved.

  5. External Quality Assessment for Detection of Fetal Trisomy 21, 18, and 13 by Massively Parallel Sequencing in Clinical Laboratories.

    PubMed

    Zhang, Rui; Zhang, Hongyun; Li, Yulong; Han, Yanxi; Xie, Jiehong; Li, Jinming

    2016-03-01

    An external quality assessment for detection of trisomy 21, 18, and 13 by massively parallel sequencing was implemented by the National Center for Clinical Laboratories of People's Republic of China in 2014. Simulated samples were prepared by mixing fragmented abnormal DNA with plasma from non-pregnant women. The external quality assessment panel, comprising 5 samples from pregnant healthy women, 2 samples with sex chromosome aneuploidies, and 13 samples with different concentrations of fetal fractions positive for trisomy 21, 18, and 13, was then distributed to participating laboratories. In total, 55.6% (47 of 84) of respondents correctly identified each of the samples in the panel. Seventeen false-negative and 87 gray zone results were reported, most [102 of 104 (98.1%)] of which were derived from for trisomy samples with effective fetal fractions <4%. No laboratories generated false-positive results. In addition, we observed varied diagnostic capabilities of different assays, with the assay on the basis of NextSeq CN500 performing better than others, whereas Z values generated by BGISEQ-100 fluctuated greatly. There were no significant correlations between the numbers of unique sequence reads and Z values from any trisomy sample generated by BGISEQ-100. Overall, most clinical laboratories detected samples containing effective fetal fractions >4%. Our study shows need for further laboratory training in the management of samples with low fetal fractions. For some assays, precision of Z values needs to be improved.

  6. Feasibility of using the Massively Parallel Processor for large eddy simulations and other Computational Fluid Dynamics applications

    NASA Technical Reports Server (NTRS)

    Bruno, John

    1984-01-01

    The results of an investigation into the feasibility of using the MPP for direct and large eddy simulations of the Navier-Stokes equations is presented. A major part of this study was devoted to the implementation of two of the standard numerical algorithms for CFD. These implementations were not run on the Massively Parallel Processor (MPP) since the machine delivered to NASA Goddard does not have sufficient capacity. Instead, a detailed implementation plan was designed and from these were derived estimates of the time and space requirements of the algorithms on a suitably configured MPP. In addition, other issues related to the practical implementation of these algorithms on an MPP-like architecture were considered; namely, adaptive grid generation, zonal boundary conditions, the table lookup problem, and the software interface. Performance estimates show that the architectural components of the MPP, the Staging Memory and the Array Unit, appear to be well suited to the numerical algorithms of CFD. This combined with the prospect of building a faster and larger MMP-like machine holds the promise of achieving sustained gigaflop rates that are required for the numerical simulations in CFD.

  7. The minimal amount of starting DNA for Agilent’s hybrid capture-based targeted massively parallel sequencing

    PubMed Central

    Chung, Jongsuk; Son, Dae-Soon; Jeon, Hyo-Jeong; Kim, Kyoung-Mee; Park, Gahee; Ryu, Gyu Ha; Park, Woong-Yang; Park, Donghyun

    2016-01-01

    Targeted capture massively parallel sequencing is increasingly being used in clinical settings, and as costs continue to decline, use of this technology may become routine in health care. However, a limited amount of tissue has often been a challenge in meeting quality requirements. To offer a practical guideline for the minimum amount of input DNA for targeted sequencing, we optimized and evaluated the performance of targeted sequencing depending on the input DNA amount. First, using various amounts of input DNA, we compared commercially available library construction kits and selected Agilent’s SureSelect-XT and KAPA Biosystems’ Hyper Prep kits as the kits most compatible with targeted deep sequencing using Agilent’s SureSelect custom capture. Then, we optimized the adapter ligation conditions of the Hyper Prep kit to improve library construction efficiency and adapted multiplexed hybrid selection to reduce the cost of sequencing. In this study, we systematically evaluated the performance of the optimized protocol depending on the amount of input DNA, ranging from 6.25 to 200 ng, suggesting the minimal input DNA amounts based on coverage depths required for specific applications. PMID:27220682

  8. Large-scale modeling of the Antarctic ice sheet using a massively-parallelized finite element model (CIELO).

    NASA Astrophysics Data System (ADS)

    Larour, E.; Rignot, E.; Morlighem, M.; Seroussi, H.

    2008-12-01

    We implemented a fully-three-dimensional, thermo-mechanical, finite element model of the Antarctic Ice Sheet with a spatial resolution varying from 10 km inland to 2 km along the coast on a massively-parallelized architecture named CIELO and developed at JPL. The model is based on a "Pattyn" type formulation for ice sheets, and a "MacAyeal shelf-stream" formulation for ice shelves. Both types of formulations are coupled using penalty methods, which enables a considerable reduction of the computational load. Using a simple law of basal friction (based on locally computed balanced velocities), the model is able to replicate the location and order-magnitude speed of major ice streams and ice shelves. We then coupled the model with observations of ice motion from SAR interferometry to refine the pattern of basal friction using an inverse control method (MacAyeal 1993). The result provides an excellent agreement with observations and a first complete mapping of the pattern of basal friction along the coast of Antarctica at a resolution compatible with the size of its glaciers and ice streams.

  9. Combined fragment molecular orbital cluster in molecule approach to massively parallel electron correlation calculations for large systems.

    PubMed

    Findlater, Alexander D; Zahariev, Federico; Gordon, Mark S

    2015-04-16

    The local correlation "cluster-in-molecule" (CIM) method is combined with the fragment molecular orbital (FMO) method, providing a flexible, massively parallel, and near-linear scaling approach to the calculation of electron correlation energies for large molecular systems. Although the computational scaling of the CIM algorithm is already formally linear, previous knowledge of the Hartree-Fock (HF) reference wave function and subsequent localized orbitals is required; therefore, extending the CIM method to arbitrarily large systems requires the aid of low-scaling/linear-scaling approaches to HF and orbital localization. Through fragmentation, the combined FMO-CIM method linearizes the scaling, with respect to system size, of the HF reference and orbital localization calculations, achieving near-linear scaling at both the reference and electron correlation levels. For the 20-residue alanine α helix, the preliminary implementation of the FMO-CIM method captures 99.6% of the MP2 correlation energy, requiring 21% of the MP2 wall time. The new method is also applied to solvated adamantine to illustrate the multilevel capability of the FMO-CIM method.

  10. Massively parallel rRNA gene sequencing exacerbates the potential for biased community diversity comparisons due to variable library sizes

    SciTech Connect

    Gihring, Thomas; Green, Stefan; Schadt, Christopher Warren

    2011-01-01

    Technologies for massively parallel sequencing are revolutionizing microbial ecology and are vastly increasing the scale of ribosomal RNA (rRNA) gene studies. Although pyrosequencing has increased the breadth and depth of possible rRNA gene sampling, one drawback is that the number of reads obtained per sample is difficult to control. Pyrosequencing libraries typically vary widely in the number of sequences per sample, even within individual studies, and there is a need to revisit the behaviour of richness estimators and diversity indices with variable gene sequence library sizes. Multiple reports and review papers have demonstrated the bias in non-parametric richness estimators (e.g. Chao1 and ACE) and diversity indices when using clone libraries. However, we found that biased community comparisons are accumulating in the literature. Here we demonstrate the effects of sample size on Chao1, ACE, CatchAll, Shannon, Chao-Shen and Simpson's estimations specifically using pyrosequencing libraries. The need to equalize the number of reads being compared across libraries is reiterated, and investigators are directed towards available tools for making unbiased diversity comparisons.

  11. Massively parallel sequencing of Chikso (Korean brindle cattle) to discover genome-wide SNPs and InDels.

    PubMed

    Choi, Jung-Woo; Liao, Xiaoping; Park, Sairom; Jeon, Heoyn-Jeong; Chung, Won-Hyong; Stothard, Paul; Park, Yeon-Soo; Lee, Jeong-Koo; Lee, Kyung-Tai; Kim, Sang-Hwan; Oh, Jae-Don; Kim, Namshin; Kim, Tae-Hun; Lee, Hak-Kyo; Lee, Sung-Jin

    2013-09-01

    Since the completion of the bovine sequencing projects, a substantial number of genetic variations such as single nucleotide polymorphisms have become available across the cattle genome. Recently, cataloguing such genetic variations has been accelerated using massively parallel sequencing technology. However, most of the recent studies have been concentrated on European Bos taurus cattle breeds, resulting in a severe lack of knowledge for valuable native cattle genetic resources worldwide. Here, we present the first whole-genome sequencing results for an endangered Korean native cattle breed, Chikso, using the Illumina HiSeq 2,000 sequencing platform. The genome of a Chikso bull was sequenced to approximately 25.3-fold coverage with 98.8% of the bovine reference genome sequence (UMD 3.1) covered. In total, 5,874,026 single nucleotide polymorphisms and 551,363 insertion/deletions were identified across all 29 autosomes and the X-chromosome, of which 45% and 75% were previously unknown, respectively. Most of the variations (92.7% of single nucleotide polymorphisms and 92.9% of insertion/deletions) were located in intergenic and intron regions. A total of 16,273 single nucleotide polymorphisms causing missense mutations were detected in 7,111 genes throughout the genome, which could potentially contribute to variation in economically important traits in Chikso. This study provides a valuable resource for further investigations of the genetic mechanisms underlying traits of interest in cattle, and for the development of improved genomics-based breeding tools.

  12. Massively Parallel Sequencing of Chikso (Korean Brindle Cattle) to Discover Genome-Wide SNPs and InDels

    PubMed Central

    Choi, Jung-Woo; Liao, Xiaoping; Park, Sairom; Jeon, Heoyn-Jeong; Chung, Won-Hyong; Stothard, Paul; Park, Yeon-Soo; Lee, Jeong-Koo; Lee, Kyung-Tai; Kim, Sang-Hwan; Oh, Jae-Don; Kim, Namshin; Kim, Tae-Hun; Lee, Hak-Kyo; Lee, Sung-Jin

    2013-01-01

    Since the completion of the bovine sequencing projects, a substantial number of genetic variations such as single nucleotide polymorphisms have become available across the cattle genome. Recently, cataloguing such genetic variations has been accelerated using massively parallel sequencing technology. However, most of the recent studies have been concentrated on European Bos taurus cattle breeds, resulting in a severe lack of knowledge for valuable native cattle genetic resources worldwide. Here, we present the first whole-genome sequencing results for an endangered Korean native cattle breed, Chikso, using the Illumina HiSeq 2,000 sequencing platform. The genome of a Chikso bull was sequenced to approximately 25.3-fold coverage with 98.8% of the bovine reference genome sequence (UMD 3.1) covered. In total, 5,874,026 single nucleotide polymorphisms and 551,363 insertion/deletions were identified across all 29 autosomes and the X-chromosome, of which 45% and 75% were previously unknown, respectively. Most of the variations (92.7% of single nucleotide polymorphisms and 92.9% of insertion/deletions) were located in intergenic and intron regions. A total of 16,273 single nucleotide polymorphisms causing missense mutations were detected in 7,111 genes throughout the genome, which could potentially contribute to variation in economically important traits in Chikso. This study provides a valuable resource for further investigations of the genetic mechanisms underlying traits of interest in cattle, and for the development of improved genomics-based breeding tools. PMID:23912596

  13. Non-CAR resists and advanced materials for Massively Parallel E-Beam Direct Write process integration

    NASA Astrophysics Data System (ADS)

    Pourteau, Marie-Line; Servin, Isabelle; Lepinay, Kévin; Essomba, Cyrille; Dal'Zotto, Bernard; Pradelles, Jonathan; Lattard, Ludovic; Brandt, Pieter; Wieland, Marco

    2016-03-01

    The emerging Massively Parallel-Electron Beam Direct Write (MP-EBDW) is an attractive high resolution high throughput lithography technology. As previously shown, Chemically Amplified Resists (CARs) meet process/integration specifications in terms of dose-to-size, resolution, contrast, and energy latitude. However, they are still limited by their line width roughness. To overcome this issue, we tested an alternative advanced non-CAR and showed it brings a substantial gain in sensitivity compared to CAR. We also implemented and assessed in-line post-lithographic treatments for roughness mitigation. For outgassing-reduction purpose, a top-coat layer is added to the total process stack. A new generation top-coat was tested and showed improved printing performances compared to the previous product, especially avoiding dark erosion: SEM cross-section showed a straight pattern profile. A spin-coatable charge dissipation layer based on conductive polyaniline has also been tested for conductivity and lithographic performances, and compatibility experiments revealed that the underlying resist type has to be carefully chosen when using this product. Finally, the Process Of Reference (POR) trilayer stack defined for 5 kV multi-e-beam lithography was successfully etched with well opened and straight patterns, and no lithography-etch bias.

  14. HLA-F coding and regulatory segments variability determined by massively parallel sequencing procedures in a Brazilian population sample.

    PubMed

    Lima, Thálitta Hetamaro Ayala; Buttura, Renato Vidal; Donadi, Eduardo Antônio; Veiga-Castelli, Luciana Caricati; Mendes-Junior, Celso Teixeira; Castelli, Erick C

    2016-10-01

    Human Leucocyte Antigen F (HLA-F) is a non-classical HLA class I gene distinguished from its classical counterparts by low allelic polymorphism and distinctive expression patterns. Its exact function remains unknown. It is believed that HLA-F has tolerogenic and immune modulatory properties. Currently, there is little information regarding the HLA-F allelic variation among human populations and the available studies have evaluated only a fraction of the HLA-F gene segment and/or have searched for known alleles only. Here we present a strategy to evaluate the complete HLA-F variability including its 5' upstream, coding and 3' downstream segments by using massively parallel sequencing procedures. HLA-F variability was surveyed on 196 individuals from the Brazilian Southeast. The results indicate that the HLA-F gene is indeed conserved at the protein level, where thirty coding haplotypes or coding alleles were detected, encoding only four different HLA-F full-length protein molecules. Moreover, a same protein molecule is encoded by 82.45% of all coding alleles detected in this Brazilian population sample. However, the HLA-F nucleotide and haplotype variability is much higher than our current knowledge both in Brazilians and considering the 1000 Genomes Project data. This protein conservation is probably a consequence of the key role of HLA-F in the immune system physiology.

  15. Probing the kinetic landscape of Hox transcription factor-DNA binding in live cells by massively parallel Fluorescence Correlation Spectroscopy.

    PubMed

    Papadopoulos, Dimitrios K; Krmpot, Aleksandar J; Nikolić, Stanko N; Krautz, Robert; Terenius, Lars; Tomancak, Pavel; Rigler, Rudolf; Gehring, Walter J; Vukojević, Vladana

    2015-11-01

    Hox genes encode transcription factors that control the formation of body structures, segment-specifically along the anterior-posterior axis of metazoans. Hox transcription factors bind nuclear DNA pervasively and regulate a plethora of target genes, deploying various molecular mechanisms that depend on the developmental and cellular context. To analyze quantitatively the dynamics of their DNA-binding behavior we have used confocal laser scanning microscopy (CLSM), single-point fluorescence correlation spectroscopy (FCS), fluorescence cross-correlation spectroscopy (FCCS) and bimolecular fluorescence complementation (BiFC). We show that the Hox transcription factor Sex combs reduced (Scr) forms dimers that strongly associate with its specific fork head binding site (fkh250) in live salivary gland cell nuclei. In contrast, dimers of a constitutively inactive, phospho-mimicking variant of Scr show weak, non-specific DNA-binding. Our studies reveal that nuclear dynamics of Scr is complex, exhibiting a changing landscape of interactions that is difficult to characterize by probing one point at a time. Therefore, we also provide mechanistic evidence using massively parallel FCS (mpFCS). We found that Scr dimers are predominantly formed on the DNA and are equally abundant at the chromosomes and an introduced multimeric fkh250 binding-site, indicating different mobilities, presumably reflecting transient binding with different affinities on the DNA. Our proof-of-principle results emphasize the advantages of mpFCS for quantitative characterization of fast dynamic processes in live cells.

  16. A massively parallel track-finding system for the LEVEL 2 trigger in the CLAS detector at CEBAF

    SciTech Connect

    Doughty, D.C. Jr.; Collins, P.; Lemon, S. ); Bonneau, P. )

    1994-02-01

    The track segment finding subsystem of the LEVEL 2 trigger in the CLAS detector has been designed and prototyped. Track segments will be found in the 35,076 wires of the drift chambers using a massively parallel array of 768 Xilinx XC-4005 FPGA's. These FPGA's are located on daughter cards attached to the front-end boards distributed around the detector. Each chip is responsible for finding tracks passing through a 4 x 6 slice of an axial superlayer, and reports two segment found bits, one for each pair of cells. The algorithm used finds segments even when one or two layers or cells along the track is missing (this number is programmable), while being highly resistant to false segments arising from noise hits. Adjacent chips share data to find tracks crossing cell and board boundaries. For maximum speed, fully combinatorial logic is used inside each chip, with the result that all segments in the detector are found within 150 ns. Segment collection boards gather track segments from each axial superlayer and pass them via a high speed link to the segment linking subsystem in an additional 400 ns for typical events. The Xilinx chips are ram-based and therefore reprogrammable, allowing for future upgrades and algorithm enhancements.

  17. LiNbO3: A photovoltaic substrate for massive parallel manipulation and patterning of nano-objects

    NASA Astrophysics Data System (ADS)

    Carrascosa, M.; García-Cabañes, A.; Jubera, M.; Ramiro, J. B.; Agulló-López, F.

    2015-12-01

    The application of evanescent photovoltaic (PV) fields, generated by visible illumination of Fe:LiNbO3 substrates, for parallel massive trapping and manipulation of micro- and nano-objects is critically reviewed. The technique has been often referred to as photovoltaic or photorefractive tweezers. The main advantage of the new method is that the involved electrophoretic and/or dielectrophoretic forces do not require any electrodes and large scale manipulation of nano-objects can be easily achieved using the patterning capabilities of light. The paper describes the experimental techniques for particle trapping and the main reported experimental results obtained with a variety of micro- and nano-particles (dielectric and conductive) and different illumination configurations (single beam, holographic geometry, and spatial light modulator projection). The report also pays attention to the physical basis of the method, namely, the coupling of the evanescent photorefractive fields to the dielectric response of the nano-particles. The role of a number of physical parameters such as the contrast and spatial periodicities of the illumination pattern or the particle deposition method is discussed. Moreover, the main properties of the obtained particle patterns in relation to potential applications are summarized, and first demonstrations reviewed. Finally, the PV method is discussed in comparison to other patterning strategies, such as those based on the pyroelectric response and the electric fields associated to domain poling of ferroelectric materials.

  18. Massively parallel molecular-dynamics simulation of ice crystallisation and melting: the roles of system size, ensemble, and electrostatics.

    PubMed

    English, Niall J

    2014-12-21

    Ice crystallisation and melting was studied via massively parallel molecular dynamics under periodic boundary conditions, using approximately spherical ice nano-particles (both "isolated" and as a series of heterogeneous "seeds") of varying size, surrounded by liquid water and at a variety of temperatures. These studies were performed for a series of systems ranging in size from ∼1 × 10(6) to 8.6 × 10(6) molecules, in order to establish system-size effects upon the nano-clusters" crystallisation and dissociation kinetics. Both "traditional" four-site and "single-site" and water models were used, with and without formal point charges, dipoles, and electrostatics, respectively. Simulations were carried out in the microcanonical and isothermal-isobaric ensembles, to assess the influence of "artificial" thermo- and baro-statting, and important disparities were observed, which declined upon using larger systems. It was found that there was a dependence upon system size for both ice growth and dissociation, in that larger systems favoured slower growth and more rapid melting, given the lower extent of "communication" of ice nano-crystallites with their periodic replicae in neighbouring boxes. Although the single-site model exhibited less variation with system size vis-à-vis the multiple-site representation with explicit electrostatics, its crystallisation-dissociation kinetics was artificially fast.

  19. Massively parallel molecular-dynamics simulation of ice crystallisation and melting: The roles of system size, ensemble, and electrostatics

    NASA Astrophysics Data System (ADS)

    English, Niall J.

    2014-12-01

    Ice crystallisation and melting was studied via massively parallel molecular dynamics under periodic boundary conditions, using approximately spherical ice nano-particles (both "isolated" and as a series of heterogeneous "seeds") of varying size, surrounded by liquid water and at a variety of temperatures. These studies were performed for a series of systems ranging in size from ˜1 × 106 to 8.6 × 106 molecules, in order to establish system-size effects upon the nano-clusters" crystallisation and dissociation kinetics. Both "traditional" four-site and "single-site" and water models were used, with and without formal point charges, dipoles, and electrostatics, respectively. Simulations were carried out in the microcanonical and isothermal-isobaric ensembles, to assess the influence of "artificial" thermo- and baro-statting, and important disparities were observed, which declined upon using larger systems. It was found that there was a dependence upon system size for both ice growth and dissociation, in that larger systems favoured slower growth and more rapid melting, given the lower extent of "communication" of ice nano-crystallites with their periodic replicae in neighbouring boxes. Although the single-site model exhibited less variation with system size vis-à-vis the multiple-site representation with explicit electrostatics, its crystallisation-dissociation kinetics was artificially fast.

  20. LiNbO{sub 3}: A photovoltaic substrate for massive parallel manipulation and patterning of nano-objects

    SciTech Connect

    Carrascosa, M.; García-Cabañes, A.; Jubera, M.; Ramiro, J. B.; Agulló-López, F.

    2015-12-15

    The application of evanescent photovoltaic (PV) fields, generated by visible illumination of Fe:LiNbO{sub 3} substrates, for parallel massive trapping and manipulation of micro- and nano-objects is critically reviewed. The technique has been often referred to as photovoltaic or photorefractive tweezers. The main advantage of the new method is that the involved electrophoretic and/or dielectrophoretic forces do not require any electrodes and large scale manipulation of nano-objects can be easily achieved using the patterning capabilities of light. The paper describes the experimental techniques for particle trapping and the main reported experimental results obtained with a variety of micro- and nano-particles (dielectric and conductive) and different illumination configurations (single beam, holographic geometry, and spatial light modulator projection). The report also pays attention to the physical basis of the method, namely, the coupling of the evanescent photorefractive fields to the dielectric response of the nano-particles. The role of a number of physical parameters such as the contrast and spatial periodicities of the illumination pattern or the particle deposition method is discussed. Moreover, the main properties of the obtained particle patterns in relation to potential applications are summarized, and first demonstrations reviewed. Finally, the PV method is discussed in comparison to other patterning strategies, such as those based on the pyroelectric response and the electric fields associated to domain poling of ferroelectric materials.

  1. Effects of H+, He+ ion reflection at the lunar surface and pickup ion dynamics in case of oblique/quasi-parallel magnetic field: 3-D hybrid kinetic modeling

    NASA Astrophysics Data System (ADS)

    Lipatov, A. S.; Cooper, J. F.; Sittler, E. C.; Hartle, R. E.; Sarantos, M.

    2013-12-01

    The hybrid kinetic model used here supports comprehensive simulation of the interaction between different spatial and energetic elements of the moon-solar wind-magnetosphere of the Earth system. This involves variable upstream magnetic field and solar wind plasma, including energetic ions, electrons, and neutral atoms. This capability is critical to improved interpretation of existing measurements for surface and atmospheric composition from previous missions and planning future missions. Recently, MAP-PAGE-IMA (Plasma energy Angle and Composition Experiment, and Ion Mass Analyzer) onboard Japanese lunar orbiter SELENE (KAGUYA) detected Moon originating ions at 100 km altitude. Ion species of H+, He++, He+, C+, O+, Na+, K+, and Ar+ were definitively identified. The first portion of our modeling devotes to a study of the H+, H2+, He+, Na+ pickup ion dynamics in cases of flow with a oblique and quasi-parallel magnetic field. In the second series of modeling we also take into account collisions between ions and the surface of the moon and further sputtering of fragments from the surface of the moon. The ion reflection at the lunar surface is also responsible for wave activity in the upstream flow. The solar wind parameters are chosen from ARTEMIS observations. The hybrid kinetic model allows us to take into account the finite gyroradius effects of pickup ions and to estimate correctly the ions velocity distribution and the fluxes along the magnetic field. Modeling shows the asymmetric Mach cone, pickup and reflected ion tails, and presents another type of lunar-solar wind interaction. Our simulation may be also important for the study of the interaction between the solar wind and very weak comets, Mercury and Pluto.

  2. Parallel inversion of a massive ERT data set to characterize deep vadose zone contamination beneath former nuclear waste infiltration galleries at the Hanford Site B-Complex (Invited)

    NASA Astrophysics Data System (ADS)

    Johnson, T.; Rucker, D. F.; Wellman, D.

    2013-12-01

    revealed the general footprint of vadose zone contamination beneath infiltration galleries. In 2011, the USDOE commissioned an effort to re-invert the B-Complex ERT data as a whole using a recently developed massively parallel 3D ERT inversion code. The computational mesh included approximately 1.085 million elements and closely honored the 37m of topographic relief as determined by LiDAR imaging. The water table and tank boundaries were also incorporated into the mesh to facilitate regularization disconnects, enabling sharp conductivity contrasts where they occur naturally without penalty. The data were inverted using 1024 processors, requiring 910 Gb of memory and 11.5 hours of computation time. The imaging results revealed previously unrealized detail concerning the distribution and behavior of contaminants migrating through the vadose zone, and are currently being used by site cleanup operators and regulators to understand the origin of a groundwater nitrate plume emerging from one of the infiltration galleries. The results overall demonstrate the utility of high performance computing, unstructured meshing, and custom regularization constraints for optimal processing of massive ERT data sets enabled by modern ERT survey hardware.

  3. Multigrid preconditioned conjugate gradients for the numerical simulation of groundwater flow on the Cray T3D

    SciTech Connect

    Ashby, S.F.; Falgout, R.D.; Smith, S.G.; Fogwell, T.W.

    1994-09-01

    This paper discusses the numerical simulation of groundwater flow through heterogeneous porous media. The focus is on the performance of a parallel multigrid preconditioner for accelerating convergence of conjugate gradients, which is used to compute the hydraulic pressure head. The numerical investigation considers the effects of enlarging the domain, increasing the grid resolution, and varying the geostatistical parameters used to define the subsurface realization. The results were obtained using the PARFLOW groundwater flow simulator on the Cray T3D massively parallel computer.

  4. 3D architectures are not just for microbatteries anymore

    NASA Astrophysics Data System (ADS)

    Lytle, Justin C.; Long, Jeffrey W.; Chervin, Christopher N.; Sassin, Megan B.; Rolison, Debra R.

    2011-06-01

    Building battery architectures with functional interfaces that are interpenetrated in three dimensions opens the door to major gains in performance as compared to conventional 2-D battery designs, particularly with respect to the battery footprint. We are developing 3-D solid-state Li-ion batteries that are sequentially assembled from interpenetrating and tricontinuous networks of anode, cathode, and electrolyte/separator materials. We use fiberpaper- supported carbon nanofoams as a massively parallel, conductive, ultraporous base platform within which to create the 3-D cell. The components required for battery operation are incorporated into the x,y,z-scalable papers and include nanoscale coatings of metal oxides that serve as Li-ion-insertion electrodes and ultrathin, electroninsulating/ Li-ion conducting polymer coatings that serve as the electrolyte/separator.

  5. Enabling inspection solutions for future mask technologies through the development of massively parallel E-Beam inspection

    NASA Astrophysics Data System (ADS)

    Malloy, Matt; Thiel, Brad; Bunday, Benjamin D.; Wurm, Stefan; Jindal, Vibhu; Mukhtar, Maseeh; Quoi, Kathy; Kemen, Thomas; Zeidler, Dirk; Eberle, Anna Lena; Garbowski, Tomasz; Dellemann, Gregor; Peters, Jan Hendrik

    2015-09-01

    The new device architectures and materials being introduced for sub-10nm manufacturing, combined with the complexity of multiple patterning and the need for improved hotspot detection strategies, have pushed current wafer inspection technologies to their limits. In parallel, gaps in mask inspection capability are growing as new generations of mask technologies are developed to support these sub-10nm wafer manufacturing requirements. In particular, the challenges associated with nanoimprint and extreme ultraviolet (EUV) mask inspection require new strategies that enable fast inspection at high sensitivity. The tradeoffs between sensitivity and throughput for optical and e-beam inspection are well understood. Optical inspection offers the highest throughput and is the current workhorse of the industry for both wafer and mask inspection. E-beam inspection offers the highest sensitivity but has historically lacked the throughput required for widespread adoption in the manufacturing environment. It is unlikely that continued incremental improvements to either technology will meet tomorrow's requirements, and therefore a new inspection technology approach is required; one that combines the high-throughput performance of optical with the high-sensitivity capabilities of e-beam inspection. To support the industry in meeting these challenges SUNY Poly SEMATECH has evaluated disruptive technologies that can meet the requirements for high volume manufacturing (HVM), for both the wafer fab [1] and the mask shop. Highspeed massively parallel e-beam defect inspection has been identified as the leading candidate for addressing the key gaps limiting today's patterned defect inspection techniques. As of late 2014 SUNY Poly SEMATECH completed a review, system analysis, and proof of concept evaluation of multiple e-beam technologies for defect inspection. A champion approach has been identified based on a multibeam technology from Carl Zeiss. This paper includes a discussion on the

  6. Detection of CYP2C19 Genetic Variants in Malaysian Orang Asli from Massively Parallel Sequencing Data

    PubMed Central

    Ang, Geik Yong; Yu, Choo Yee; Subramaniam, Vinothini; Abdul Khalid, Mohd Ikhmal Hanif; Tuan Abdu Aziz, Tuan Azlin; Johari James, Richard; Ahmad, Aminuddin; Abdul Rahman, Thuhairah; Mohd Nor, Fadzilah; Ismail, Adzrool Idzwan; Md. Isa, Kamarudzaman; Salleh, Hood; Teh, Lay Kek; Salleh, Mohd Zaki

    2016-01-01

    The human cytochrome P450 (CYP) is a superfamily of enzymes that have been a focus in research for decades due to their prominent role in drug metabolism. CYP2C is one of the major subfamilies which metabolize more than 10% of all clinically used drugs. In the context of CYP2C19, several key genetic variations that alter the enzyme’s activity have been identified and catalogued in the CYP allele nomenclature database. In this study, we investigated the presence of well-established variants as well as novel polymorphisms in the CYP2C19 gene of 62 Orang Asli from the Peninsular Malaysia. A total of 449 genetic variants were detected including 70 novel polymorphisms; 417 SNPs were located in introns, 23 in upstream, 7 in exons, and 2 in downstream regions. Five alleles and seven genotypes were inferred based on the polymorphisms that were found. Null alleles that were observed include CYP2C19*3 (6.5%), *2 (5.7%) and *35 (2.4%) whereas allele with increased function *17 was detected at a frequency of 4.8%. The normal metabolizer genotype was the most predominant (66.1%), followed by intermediate metabolizer (19.4%), rapid metabolizer (9.7%) and poor metabolizer (4.8%) genotypes. Findings from this study provide further insights into the CYP2C19 genetic profile of the Orang Asli as previously unreported variant alleles were detected through the use of massively parallel sequencing technology platform. The systematic and comprehensive analysis of CYP2C19 will allow uncharacterized variants that are present in the Orang Asli to be included in the genotyping panel in the future. PMID:27798644

  7. My-Forensic-Loci-queries (MyFLq) framework for analysis of forensic STR data generated by massive parallel sequencing.

    PubMed

    Van Neste, Christophe; Vandewoestyne, Mado; Van Criekinge, Wim; Deforce, Dieter; Van Nieuwerburgh, Filip

    2014-03-01

    Forensic scientists are currently investigating how to transition from capillary electrophoresis (CE) to massive parallel sequencing (MPS) for analysis of forensic DNA profiles. MPS offers several advantages over CE such as virtually unlimited multiplexy of loci, combining both short tandem repeat (STR) and single nucleotide polymorphism (SNP) loci, small amplicons without constraints of size separation, more discrimination power, deep mixture resolution and sample multiplexing. We present our bioinformatic framework My-Forensic-Loci-queries (MyFLq) for analysis of MPS forensic data. For allele calling, the framework uses a MySQL reference allele database with automatically determined regions of interest (ROIs) by a generic maximal flanking algorithm which makes it possible to use any STR or SNP forensic locus. Python scripts were designed to automatically make allele calls starting from raw MPS data. We also present a method to assess the usefulness and overall performance of a forensic locus with respect to MPS, as well as methods to estimate whether an unknown allele, which sequence is not present in the MySQL database, is in fact a new allele or a sequencing error. The MyFLq framework was applied to an Illumina MiSeq dataset of a forensic Illumina amplicon library, generated from multilocus STR polymerase chain reaction (PCR) on both single contributor samples and multiple person DNA mixtures. Although the multilocus PCR was not yet optimized for MPS in terms of amplicon length or locus selection, the results show excellent results for most loci. The results show a high signal-to-noise ratio, correct allele calls, and a low limit of detection for minor DNA contributors in mixed DNA samples. Technically, forensic MPS affords great promise for routine implementation in forensic genomics. The method is also applicable to adjacent disciplines such as molecular autopsy in legal medicine and in mitochondrial DNA research.

  8. Assessing the clinical value of targeted massively parallel sequencing in a longitudinal, prospective population-based study of cancer patients

    PubMed Central

    Wong, S Q; Fellowes, A; Doig, K; Ellul, J; Bosma, T J; Irwin, D; Vedururu, R; Tan, A Y-C; Weiss, J; Chan, K S; Lucas, M; Thomas, D M; Dobrovic, A; Parisot, J P; Fox, S B

    2015-01-01

    Introduction: Recent discoveries in cancer research have revealed a plethora of clinically actionable mutations that provide therapeutic, prognostic and predictive benefit to patients. The feasibility of screening mutations as part of the routine clinical care of patients remains relatively unexplored as the demonstration of massively parallel sequencing (MPS) of tumours in the general population is required to assess its value towards the health-care system. Methods: Cancer 2015 study is a large-scale, prospective, multisite cohort of newly diagnosed cancer patients from Victoria, Australia with 1094 patients recruited. MPS was performed using the Illumina TruSeq Amplicon Cancer Panel. Results: Overall, 854 patients were successfully sequenced for 48 common cancer genes. Accurate determination of clinically relevant mutations was possible including in less characterised cancer types; however, technical limitations including formalin-induced sequencing artefacts were uncovered. Applying strict filtering criteria, clinically relevant mutations were identified in 63% of patients, with 26% of patients displaying a mutation with therapeutic implications. A subset of patients was validated for canonical mutations using the Agena Bioscience MassARRAY system with 100% concordance. Whereas the prevalence of mutations was consistent with other institutionally based series for some tumour streams (breast carcinoma and colorectal adenocarcinoma), others were different (lung adenocarcinoma and head and neck squamous cell carcinoma), which has significant implications for health economic modelling of particular targeted agents. Actionable mutations in tumours not usually thought to harbour such genetic changes were also identified. Conclusions: Reliable delivery of a diagnostic assay able to screen for a range of actionable mutations in this cohort was achieved, opening unexpected avenues for investigation and treatment of cancer patients. PMID:25742471

  9. Evaluation of cells and biological reagents for adventitious agents using degenerate primer PCR and massively parallel sequencing.

    PubMed

    McClenahan, Shasta D; Uhlenhaut, Christine; Krause, Philip R

    2014-12-12

    We employed a massively parallel sequencing (MPS)-based approach to test reagents and model cell substrates including Chinese hamster ovary (CHO), Madin-Darby canine kidney (MDCK), African green monkey kidney (Vero), and High Five insect cell lines for adventitious agents. RNA and DNA were extracted either directly from the samples or from viral