massively parallel engineering: Topics by Science.gov

Sample records for massively parallel engineering

Design of a massively parallel computer using bit serial processing elements

NASA Technical Reports Server (NTRS)

Aburdene, Maurice F.; Khouri, Kamal S.; Piatt, Jason E.; Zheng, Jianqing

1995-01-01

A 1-bit serial processor designed for a parallel computer architecture is described. This processor is used to develop a massively parallel computational engine, with a single instruction-multiple data (SIMD) architecture. The computer is simulated and tested to verify its operation and to measure its performance for further development.
Neural Parallel Engine: A toolbox for massively parallel neural signal processing.

PubMed

Tam, Wing-Kin; Yang, Zhi

2018-05-01

Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.
Scalable Visual Analytics of Massive Textual Datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Krishnan, Manoj Kumar; Bohn, Shawn J.; Cowley, Wendy E.

2007-04-01

This paper describes the first scalable implementation of text processing engine used in Visual Analytics tools. These tools aid information analysts in interacting with and understanding large textual information content through visual interfaces. By developing parallel implementation of the text processing engine, we enabled visual analytics tools to exploit cluster architectures and handle massive dataset. The paper describes key elements of our parallelization approach and demonstrates virtually linear scaling when processing multi-gigabyte data sets such as Pubmed. This approach enables interactive analysis of large datasets beyond capabilities of existing state-of-the art visual analytics tools.
Using CLIPS in the domain of knowledge-based massively parallel programming

NASA Technical Reports Server (NTRS)

Dvorak, Jiri J.

1994-01-01

The Program Development Environment (PDE) is a tool for massively parallel programming of distributed-memory architectures. Adopting a knowledge-based approach, the PDE eliminates the complexity introduced by parallel hardware with distributed memory and offers complete transparency in respect of parallelism exploitation. The knowledge-based part of the PDE is realized in CLIPS. Its principal task is to find an efficient parallel realization of the application specified by the user in a comfortable, abstract, domain-oriented formalism. A large collection of fine-grain parallel algorithmic skeletons, represented as COOL objects in a tree hierarchy, contains the algorithmic knowledge. A hybrid knowledge base with rule modules and procedural parts, encoding expertise about application domain, parallel programming, software engineering, and parallel hardware, enables a high degree of automation in the software development process. In this paper, important aspects of the implementation of the PDE using CLIPS and COOL are shown, including the embedding of CLIPS with C++-based parts of the PDE. The appropriateness of the chosen approach and of the CLIPS language for knowledge-based software engineering are discussed.
Dynamic Imbalance Would Counter Offcenter Thrust

NASA Technical Reports Server (NTRS)

Mccanna, Jason

1994-01-01

Dynamic imbalance generated by offcenter thrust on rotating body eliminated by shifting some of mass of body to generate opposing dynamic imbalance. Technique proposed originally for spacecraft including massive crew module connected via long, lightweight intermediate structure to massive engine module, such that artificial gravitation in crew module generated by rotating spacecraft around axis parallel to thrust generated by engine. Also applicable to dynamic balancing of rotating terrestrial equipment to which offcenter forces applied.
Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

NASA Technical Reports Server (NTRS)

OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

1998-01-01

This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
Multidisciplinary Design Optimization (MDO) Methods: Their Synergy with Computer Technology in Design Process

NASA Technical Reports Server (NTRS)

Sobieszczanski-Sobieski, Jaroslaw

1998-01-01

The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate a radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimization (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behavior by interaction of a large number of very simple models may be an inspiration for the above algorithms, the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should be now, even though the widespread availability of massively parallel processing is still a few years away.
Multidisciplinary Design Optimisation (MDO) Methods: Their Synergy with Computer Technology in the Design Process

NASA Technical Reports Server (NTRS)

Sobieszczanski-Sobieski, Jaroslaw

1999-01-01

The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate. A radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimisation (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behaviour by interaction of a large number of very simple models may be an inspiration for the above algorithms; the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should begin now, even though the widespread availability of massively parallel processing is still a few years away.
Low-Speed Investigation of Upper-Surface Leading-Edge Blowing on a High-Speed Civil Transport Configuration

NASA Technical Reports Server (NTRS)

Banks, Daniel W.; Laflin, Brenda E. Gile; Kemmerly, Guy T.; Campbell, Bryan A.

1999-01-01

The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate. A radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimisation (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behaviour by interaction of a large number of very simple models may be an inspiration for the above algorithms; the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should begin now, even though the widespread availability of massively parallel processing is still a few years away.
Compute as Fast as the Engineers Can Think! ULTRAFAST COMPUTING TEAM FINAL REPORT

NASA Technical Reports Server (NTRS)

Biedron, R. T.; Mehrotra, P.; Nelson, M. L.; Preston, M. L.; Rehder, J. J.; Rogersm J. L.; Rudy, D. H.; Sobieski, J.; Storaasli, O. O.

1999-01-01

This report documents findings and recommendations by the Ultrafast Computing Team (UCT). In the period 10-12/98, UCT reviewed design case scenarios for a supersonic transport and a reusable launch vehicle to derive computing requirements necessary for support of a design process with efficiency so radically improved that human thought rather than the computer paces the process. Assessment of the present computing capability against the above requirements indicated a need for further improvement in computing speed by several orders of magnitude to reduce time to solution from tens of hours to seconds in major applications. Evaluation of the trends in computer technology revealed a potential to attain the postulated improvement by further increases of single processor performance combined with massively parallel processing in a heterogeneous environment. However, utilization of massively parallel processing to its full capability will require redevelopment of the engineering analysis and optimization methods, including invention of new paradigms. To that end UCT recommends initiation of a new activity at LaRC called Computational Engineering for development of new methods and tools geared to the new computer architectures in disciplines, their coordination, and validation and benefit demonstration through applications.
Neuromorphic Hardware Architecture Using the Neural Engineering Framework for Pattern Recognition.

PubMed

Wang, Runchun; Thakur, Chetan Singh; Cohen, Gregory; Hamilton, Tara Julia; Tapson, Jonathan; van Schaik, Andre

2017-06-01

We present a hardware architecture that uses the neural engineering framework (NEF) to implement large-scale neural networks on field programmable gate arrays (FPGAs) for performing massively parallel real-time pattern recognition. NEF is a framework that is capable of synthesising large-scale cognitive systems from subnetworks and we have previously presented an FPGA implementation of the NEF that successfully performs nonlinear mathematical computations. That work was developed based on a compact digital neural core, which consists of 64 neurons that are instantiated by a single physical neuron using a time-multiplexing approach. We have now scaled this approach up to build a pattern recognition system by combining identical neural cores together. As a proof of concept, we have developed a handwritten digit recognition system using the MNIST database and achieved a recognition rate of 96.55%. The system is implemented on a state-of-the-art FPGA and can process 5.12 million digits per second. The architecture and hardware optimisations presented offer high-speed and resource-efficient means for performing high-speed, neuromorphic, and massively parallel pattern recognition and classification tasks.
Graphics Processing Unit Assisted Thermographic Compositing

NASA Technical Reports Server (NTRS)

Ragasa, Scott; McDougal, Matthew; Russell, Sam

2012-01-01

Objective: To develop a software application utilizing general purpose graphics processing units (GPUs) for the analysis of large sets of thermographic data. Background: Over the past few years, an increasing effort among scientists and engineers to utilize the GPU in a more general purpose fashion is allowing for supercomputer level results at individual workstations. As data sets grow, the methods to work them grow at an equal, and often great, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU to allow for throughput that was previously reserved for compute clusters. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Signal (image) processing is one area were GPUs are being used to greatly increase the performance of certain algorithms and analysis techniques. Technical Methodology/Approach: Apply massively parallel algorithms and data structures to the specific analysis requirements presented when working with thermographic data sets.
Implementation of Shifted Periodic Boundary Conditions in the Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) Software

DTIC Science & Technology

2015-08-01

Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) Software by N Scott Weingarten and James P Larentzos Approved for...Massively Parallel Simulator ( LAMMPS ) Software by N Scott Weingarten Weapons and Materials Research Directorate, ARL James P Larentzos Engility...Shifted Periodic Boundary Conditions in the Large-Scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) Software 5a. CONTRACT NUMBER 5b
Software Engineering for Scientific Computer Simulations

NASA Astrophysics Data System (ADS)

Post, Douglass E.; Henderson, Dale B.; Kendall, Richard P.; Whitney, Earl M.

2004-11-01

Computer simulation is becoming a very powerful tool for analyzing and predicting the performance of fusion experiments. Simulation efforts are evolving from including only a few effects to many effects, from small teams with a few people to large teams, and from workstations and small processor count parallel computers to massively parallel platforms. Successfully making this transition requires attention to software engineering issues. We report on the conclusions drawn from a number of case studies of large scale scientific computing projects within DOE, academia and the DoD. The major lessons learned include attention to sound project management including setting reasonable and achievable requirements, building a good code team, enforcing customer focus, carrying out verification and validation and selecting the optimum computational mathematics approaches.
The 2nd Symposium on the Frontiers of Massively Parallel Computations

NASA Technical Reports Server (NTRS)

Mills, Ronnie (Editor)

1988-01-01

Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
Massively parallel information processing systems for space applications

NASA Technical Reports Server (NTRS)

Schaefer, D. H.

1979-01-01

NASA is developing massively parallel systems for ultra high speed processing of digital image data collected by satellite borne instrumentation. Such systems contain thousands of processing elements. Work is underway on the design and fabrication of the 'Massively Parallel Processor', a ground computer containing 16,384 processing elements arranged in a 128 x 128 array. This computer uses existing technology. Advanced work includes the development of semiconductor chips containing thousands of feedthrough paths. Massively parallel image analog to digital conversion technology is also being developed. The goal is to provide compact computers suitable for real-time onboard processing of images.
High-Performance Parallel Analysis of Coupled Problems for Aircraft Propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Park, K. C.; Gumaste, U.; Chen, P.-S.; Lesoinne, M.; Stern, P.

1997-01-01

Applications are described of high-performance computing methods to the numerical simulation of complete jet engines. The methodology focuses on the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by structural displacements. The latter is treated by a ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field elements. New partitioned analysis procedures to treat this coupled three-component problem were developed. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers, including the iPSC-860, Paragon XP/S and the IBM SP2. The NASA-sponsored ENG10 program was used for the global steady state analysis of the whole engine. This program uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor for parallel versions of ENG10 was developed as well as the capability for the first full 3D aeroelastic simulation of a multirow engine stage. This capability was tested on the IBM SP2 parallel supercomputer at NASA Ames.
Massively Parallel Processing for Fast and Accurate Stamping Simulations

NASA Astrophysics Data System (ADS)

Gress, Jeffrey J.; Xu, Siguang; Joshi, Ramesh; Wang, Chuan-tao; Paul, Sabu

2005-08-01

The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.
Scalable High Performance Computing: Direct and Large-Eddy Turbulent Flow Simulations Using Massively Parallel Computers

NASA Technical Reports Server (NTRS)

Morgan, Philip E.

2004-01-01

This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.
SediFoam: A general-purpose, open-source CFD-DEM solver for particle-laden flow with emphasis on sediment transport

NASA Astrophysics Data System (ADS)

Sun, Rui; Xiao, Heng

2016-04-01

With the growth of available computational resource, CFD-DEM (computational fluid dynamics-discrete element method) becomes an increasingly promising and feasible approach for the study of sediment transport. Several existing CFD-DEM solvers are applied in chemical engineering and mining industry. However, a robust CFD-DEM solver for the simulation of sediment transport is still desirable. In this work, the development of a three-dimensional, massively parallel, and open-source CFD-DEM solver SediFoam is detailed. This solver is built based on open-source solvers OpenFOAM and LAMMPS. OpenFOAM is a CFD toolbox that can perform three-dimensional fluid flow simulations on unstructured meshes; LAMMPS is a massively parallel DEM solver for molecular dynamics. Several validation tests of SediFoam are performed using cases of a wide range of complexities. The results obtained in the present simulations are consistent with those in the literature, which demonstrates the capability of SediFoam for sediment transport applications. In addition to the validation test, the parallel efficiency of SediFoam is studied to test the performance of the code for large-scale and complex simulations. The parallel efficiency tests show that the scalability of SediFoam is satisfactory in the simulations using up to O(107) particles.

Deep sequencing methods for protein engineering and design.

PubMed

Wrenbeck, Emily E; Faber, Matthew S; Whitehead, Timothy A

2017-08-01

The advent of next-generation sequencing (NGS) has revolutionized protein science, and the development of complementary methods enabling NGS-driven protein engineering have followed. In general, these experiments address the functional consequences of thousands of protein variants in a massively parallel manner using genotype-phenotype linked high-throughput functional screens followed by DNA counting via deep sequencing. We highlight the use of information rich datasets to engineer protein molecular recognition. Examples include the creation of multiple dual-affinity Fabs targeting structurally dissimilar epitopes and engineering of a broad germline-targeted anti-HIV-1 immunogen. Additionally, we highlight the generation of enzyme fitness landscapes for conducting fundamental studies of protein behavior and evolution. We conclude with discussion of technological advances. Copyright © 2016 Elsevier Ltd. All rights reserved.
Combustor Simulation

NASA Technical Reports Server (NTRS)

Norris, Andrew

2003-01-01

The goal was to perform 3D simulation of GE90 combustor, as part of full turbofan engine simulation. Requirements of high fidelity as well as fast turn-around time require massively parallel code. National Combustion Code (NCC) was chosen for this task as supports up to 999 processors and includes state-of-the-art combustion models. Also required is ability to take inlet conditions from compressor code and give exit conditions to turbine code.
Genome-wide mapping of mutations at single-nucleotide resolution for protein, metabolic and genome engineering.

PubMed

Garst, Andrew D; Bassalo, Marcelo C; Pines, Gur; Lynch, Sean A; Halweg-Edwards, Andrea L; Liu, Rongming; Liang, Liya; Wang, Zhiwen; Zeitoun, Ramsey; Alexander, William G; Gill, Ryan T

2017-01-01

Improvements in DNA synthesis and sequencing have underpinned comprehensive assessment of gene function in bacteria and eukaryotes. Genome-wide analyses require high-throughput methods to generate mutations and analyze their phenotypes, but approaches to date have been unable to efficiently link the effects of mutations in coding regions or promoter elements in a highly parallel fashion. We report that CRISPR-Cas9 gene editing in combination with massively parallel oligomer synthesis can enable trackable editing on a genome-wide scale. Our method, CRISPR-enabled trackable genome engineering (CREATE), links each guide RNA to homologous repair cassettes that both edit loci and function as barcodes to track genotype-phenotype relationships. We apply CREATE to site saturation mutagenesis for protein engineering, reconstruction of adaptive laboratory evolution experiments, and identification of stress tolerance and antibiotic resistance genes in bacteria. We provide preliminary evidence that CREATE will work in yeast. We also provide a webtool to design multiplex CREATE libraries.
Quantitative analysis of RNA-protein interactions on a massively parallel array for mapping biophysical and evolutionary landscapes

PubMed Central

Buenrostro, Jason D.; Chircus, Lauren M.; Araya, Carlos L.; Layton, Curtis J.; Chang, Howard Y.; Snyder, Michael P.; Greenleaf, William J.

2015-01-01

RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of MS2 coat protein to >107 RNA targets generated on a flow-cell surface by in situ transcription and inter-molecular tethering of RNA to DNA. We decompose the binding energy contributions from primary and secondary RNA structure, finding that differences in affinity are often driven by sequence-specific changes in association rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis, and a long-hypothesized structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNAMaP) relationships across molecular variants. PMID:24727714
High-performance parallel analysis of coupled problems for aircraft propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Chen, P.-S.; Gumaste, U.; Leoinne, M.; Stern, P.

1995-01-01

This research program deals with the application of high-performance computing methods to the numerical simulation of complete jet engines. The program was initiated in 1993 by applying two-dimensional parallel aeroelastic codes to the interior gas flow problem of a by-pass jet engine. The fluid mesh generation, domain decomposition and solution capabilities were successfully tested. Attention was then focused on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by these structural displacements. The latter is treated by an ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field fluid elements. New partitioned analysis procedures to treat this coupled 3-component problem were developed in 1994. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers. For the global steady-state axisymmetric analysis of a complete engine we have decided to use the NASA-sponsored ENG10 program, which uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor for parallel versions of ENG10 has been developed. It is planned to use the steady-state global solution provided by ENG10 as input to a localized three-dimensional FSI analysis for engine regions where aeroelastic effects may be important.
Genomes by design

PubMed Central

Haimovich, Adrian D.; Muir, Paul; Isaacs, Farren J.

2016-01-01

Next-generation DNA sequencing has revealed the complete genome sequences of numerous organisms, establishing a fundamental and growing understanding of genetic variation and phenotypic diversity. Engineering at the gene, network and whole-genome scale aims to introduce targeted genetic changes both to explore emergent phenotypes and to introduce new functionalities. Expansion of these approaches into massively parallel platforms establishes the ability to generate targeted genome modifications, elucidating causal links between genotype and phenotype, as well as the ability to design and reprogramme organisms. In this Review, we explore techniques and applications in genome engineering, outlining key advances and defining challenges. PMID:26260262
Massively parallel algorithms for real-time wavefront control of a dense adaptive optics system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fijany, A.; Milman, M.; Redding, D.

1994-12-31

In this paper massively parallel algorithms and architectures for real-time wavefront control of a dense adaptive optic system (SELENE) are presented. The authors have already shown that the computation of a near optimal control algorithm for SELENE can be reduced to the solution of a discrete Poisson equation on a regular domain. Although, this represents an optimal computation, due the large size of the system and the high sampling rate requirement, the implementation of this control algorithm poses a computationally challenging problem since it demands a sustained computational throughput of the order of 10 GFlops. They develop a novel algorithm,more » designated as Fast Invariant Imbedding algorithm, which offers a massive degree of parallelism with simple communication and synchronization requirements. Due to these features, this algorithm is significantly more efficient than other Fast Poisson Solvers for implementation on massively parallel architectures. The authors also discuss two massively parallel, algorithmically specialized, architectures for low-cost and optimal implementation of the Fast Invariant Imbedding algorithm.« less
RAMA: A file system for massively parallel computers

NASA Technical Reports Server (NTRS)

Miller, Ethan L.; Katz, Randy H.

1993-01-01

This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.
Ultrascalable petaflop parallel supercomputer

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY

2010-07-20

A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
Using algebra for massively parallel processor design and utilization

NASA Technical Reports Server (NTRS)

Campbell, Lowell; Fellows, Michael R.

1990-01-01

This paper summarizes the author's advances in the design of dense processor networks. Within is reported a collection of recent constructions of dense symmetric networks that provide the largest know values for the number of nodes that can be placed in a network of a given degree and diameter. The constructions are in the range of current potential engineering significance and are based on groups of automorphisms of finite-dimensional vector spaces.
Cell-NPE (Numerical Performance Evaluation): Programming the IBM Cell Broadband Engine -- A General Parallelization Strategy

DTIC Science & Technology

2008-04-01

Space GmbH as follows: B. TECHNICAL PRPOPOSA/DESCRIPTION OF WORK Cell: A Revolutionary High Performance Computing Platform On 29 June 2005 [1...IBM has announced that is has partnered with Mercury Computer Systems, a maker of specialized computers . The Cell chip provides massive floating-point...the computing industry away from the traditional processor technology dominated by Intel. While in the past, the development of computing power has
High-Performance Parallel Analysis of Coupled Problems for Aircraft Propulsion

NASA Technical Reports Server (NTRS)

Felippa, C. A.; Farhat, C.; Park, K. C.; Gumaste, U.; Chen, P.-S.; Lesoinne, M.; Stern, P.

1996-01-01

This research program dealt with the application of high-performance computing methods to the numerical simulation of complete jet engines. The program was initiated in January 1993 by applying two-dimensional parallel aeroelastic codes to the interior gas flow problem of a bypass jet engine. The fluid mesh generation, domain decomposition and solution capabilities were successfully tested. Attention was then focused on methodology for the partitioned analysis of the interaction of the gas flow with a flexible structure and with the fluid mesh motion driven by these structural displacements. The latter is treated by a ALE technique that models the fluid mesh motion as that of a fictitious mechanical network laid along the edges of near-field fluid elements. New partitioned analysis procedures to treat this coupled three-component problem were developed during 1994 and 1995. These procedures involved delayed corrections and subcycling, and have been successfully tested on several massively parallel computers, including the iPSC-860, Paragon XP/S and the IBM SP2. For the global steady-state axisymmetric analysis of a complete engine we have decided to use the NASA-sponsored ENG10 program, which uses a regular FV-multiblock-grid discretization in conjunction with circumferential averaging to include effects of blade forces, loss, combustor heat addition, blockage, bleeds and convective mixing. A load-balancing preprocessor tor parallel versions of ENG10 was developed. During 1995 and 1996 we developed the capability tor the first full 3D aeroelastic simulation of a multirow engine stage. This capability was tested on the IBM SP2 parallel supercomputer at NASA Ames. Benchmark results were presented at the 1196 Computational Aeroscience meeting.
Parallel and Divergent Evolutionary Solutions for the Optimization of an Engineered Central Metabolism in Methylobacterium extorquens AM1

PubMed Central

Carroll, Sean Michael; Chubiz, Lon M.; Agashe, Deepa; Marx, Christopher J.

2015-01-01

Bioengineering holds great promise to provide fast and efficient biocatalysts for methanol-based biotechnology, but necessitates proven methods to optimize physiology in engineered strains. Here, we highlight experimental evolution as an effective means for optimizing an engineered Methylobacterium extorquens AM1. Replacement of the native formaldehyde oxidation pathway with a functional analog substantially decreased growth in an engineered Methylobacterium, but growth rapidly recovered after six hundred generations of evolution on methanol. We used whole-genome sequencing to identify the basis of adaptation in eight replicate evolved strains, and examined genomic changes in light of other growth and physiological data. We observed great variety in the numbers and types of mutations that occurred, including instances of parallel mutations at targets that may have been “rationalized” by the bioengineer, plus other “illogical” mutations that demonstrate the ability of evolution to expose unforeseen optimization solutions. Notably, we investigated mutations to RNA polymerase, which provided a massive growth benefit but are linked to highly aberrant transcriptional profiles. Overall, we highlight the power of experimental evolution to present genetic and physiological solutions for strain optimization, particularly in systems where the challenges of engineering are too many or too difficult to overcome via traditional engineering methods. PMID:27682084
Numerical Prediction of Non-Reacting and Reacting Flow in a Model Gas Turbine Combustor

NASA Technical Reports Server (NTRS)

Davoudzadeh, Farhad; Liu, Nan-Suey

2005-01-01

The three-dimensional, viscous, turbulent, reacting and non-reacting flow characteristics of a model gas turbine combustor operating on air/methane are simulated via an unstructured and massively parallel Reynolds-Averaged Navier-Stokes (RANS) code. This serves to demonstrate the capabilities of the code for design and analysis of real combustor engines. The effects of some design features of combustors are examined. In addition, the computed results are validated against experimental data.
Parallel Logic Programming and Parallel Systems Software and Hardware

DTIC Science & Technology

1989-07-29

Conference, Dallas TX. January 1985. (55) [Rous75] Roussel, P., "PROLOG: Manuel de Reference et d’Uilisation", Group d’ Intelligence Artificielle , Universite d...completed. Tools were provided for software development using artificial intelligence techniques. Al software for massively parallel architectures was...using artificial intelligence tech- niques. Al software for massively parallel architectures was started. 1. Introduction We describe research conducted
The EMCC / DARPA Massively Parallel Electromagnetic Scattering Project

NASA Technical Reports Server (NTRS)

Woo, Alex C.; Hill, Kueichien C.

1996-01-01

The Electromagnetic Code Consortium (EMCC) was sponsored by the Advanced Research Program Agency (ARPA) to demonstrate the effectiveness of massively parallel computing in large scale radar signature predictions. The EMCC/ARPA project consisted of three parts.
Topical perspective on massive threading and parallelism.

PubMed

Farber, Robert M

2011-09-01

Unquestionably computer architectures have undergone a recent and noteworthy paradigm shift that now delivers multi- and many-core systems with tens to many thousands of concurrent hardware processing elements per workstation or supercomputer node. GPGPU (General Purpose Graphics Processor Unit) technology in particular has attracted significant attention as new software development capabilities, namely CUDA (Compute Unified Device Architecture) and OpenCL™, have made it possible for students as well as small and large research organizations to achieve excellent speedup for many applications over more conventional computing architectures. The current scientific literature reflects this shift with numerous examples of GPGPU applications that have achieved one, two, and in some special cases, three-orders of magnitude increased computational performance through the use of massive threading to exploit parallelism. Multi-core architectures are also evolving quickly to exploit both massive-threading and massive-parallelism such as the 1.3 million threads Blue Waters supercomputer. The challenge confronting scientists in planning future experimental and theoretical research efforts--be they individual efforts with one computer or collaborative efforts proposing to use the largest supercomputers in the world is how to capitalize on these new massively threaded computational architectures--especially as not all computational problems will scale to massive parallelism. In particular, the costs associated with restructuring software (and potentially redesigning algorithms) to exploit the parallelism of these multi- and many-threaded machines must be considered along with application scalability and lifespan. This perspective is an overview of the current state of threading and parallelize with some insight into the future. Published by Elsevier Inc.
LSPRAY-V: A Lagrangian Spray Module

NASA Technical Reports Server (NTRS)

Raju, M. S.

2015-01-01

LSPRAY-V is a Lagrangian spray solver developed for application with unstructured grids and massively parallel computers. It is mainly designed to predict the flow, thermal and transport properties of a rapidly vaporizing spray encountered over a wide range of operating conditions in modern aircraft engine development. It could easily be coupled with any existing gas-phase flow and/or Monte Carlo Probability Density Function (PDF) solvers. The manual provides the user with an understanding of various models involved in the spray formulation, its code structure and solution algorithm, and various other issues related to parallelization and its coupling with other solvers. With the development of LSPRAY-V, we have advanced the state-of-the-art in spray computations in several important ways.
Box schemes and their implementation on the iPSC/860

NASA Technical Reports Server (NTRS)

Chattot, J. J.; Merriam, M. L.

1991-01-01

Research on algoriths for efficiently solving fluid flow problems on massively parallel computers is continued in the present paper. Attention is given to the implementation of a box scheme on the iPSC/860, a massively parallel computer with a peak speed of 10 Gflops and a memory of 128 Mwords. A domain decomposition approach to parallelism is used.
Scalable parallel distance field construction for large-scale applications

DOE PAGES

Yu, Hongfeng; Xie, Jinrong; Ma, Kwan -Liu; ...

2015-10-01

Computing distance fields is fundamental to many scientific and engineering applications. Distance fields can be used to direct analysis and reduce data. In this paper, we present a highly scalable method for computing 3D distance fields on massively parallel distributed-memory machines. Anew distributed spatial data structure, named parallel distance tree, is introduced to manage the level sets of data and facilitate surface tracking overtime, resulting in significantly reduced computation and communication costs for calculating the distance to the surface of interest from any spatial locations. Our method supports several data types and distance metrics from real-world applications. We demonstrate itsmore » efficiency and scalability on state-of-the-art supercomputers using both large-scale volume datasets and surface models. We also demonstrate in-situ distance field computation on dynamic turbulent flame surfaces for a petascale combustion simulation. In conclusion, our work greatly extends the usability of distance fields for demanding applications.« less

Scalable Parallel Distance Field Construction for Large-Scale Applications.

PubMed

Yu, Hongfeng; Xie, Jinrong; Ma, Kwan-Liu; Kolla, Hemanth; Chen, Jacqueline H

2015-10-01

Computing distance fields is fundamental to many scientific and engineering applications. Distance fields can be used to direct analysis and reduce data. In this paper, we present a highly scalable method for computing 3D distance fields on massively parallel distributed-memory machines. A new distributed spatial data structure, named parallel distance tree, is introduced to manage the level sets of data and facilitate surface tracking over time, resulting in significantly reduced computation and communication costs for calculating the distance to the surface of interest from any spatial locations. Our method supports several data types and distance metrics from real-world applications. We demonstrate its efficiency and scalability on state-of-the-art supercomputers using both large-scale volume datasets and surface models. We also demonstrate in-situ distance field computation on dynamic turbulent flame surfaces for a petascale combustion simulation. Our work greatly extends the usability of distance fields for demanding applications.
Creating a Parallel Version of VisIt for Microsoft Windows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Whitlock, B J; Biagas, K S; Rawson, P L

2011-12-07

VisIt is a popular, free interactive parallel visualization and analysis tool for scientific data. Users can quickly generate visualizations from their data, animate them through time, manipulate them, and save the resulting images or movies for presentations. VisIt was designed from the ground up to work on many scales of computers from modest desktops up to massively parallel clusters. VisIt is comprised of a set of cooperating programs. All programs can be run locally or in client/server mode in which some run locally and some run remotely on compute clusters. The VisIt program most able to harness today's computing powermore » is the VisIt compute engine. The compute engine is responsible for reading simulation data from disk, processing it, and sending results or images back to the VisIt viewer program. In a parallel environment, the compute engine runs several processes, coordinating using the Message Passing Interface (MPI) library. Each MPI process reads some subset of the scientific data and filters the data in various ways to create useful visualizations. By using MPI, VisIt has been able to scale well into the thousands of processors on large computers such as dawn and graph at LLNL. The advent of multicore CPU's has made parallelism the 'new' way to achieve increasing performance. With today's computers having at least 2 cores and in many cases up to 8 and beyond, it is more important than ever to deploy parallel software that can use that computing power not only on clusters but also on the desktop. We have created a parallel version of VisIt for Windows that uses Microsoft's MPI implementation (MSMPI) to process data in parallel on the Windows desktop as well as on a Windows HPC cluster running Microsoft Windows Server 2008. Initial desktop parallel support for Windows was deployed in VisIt 2.4.0. Windows HPC cluster support has been completed and will appear in the VisIt 2.5.0 release. We plan to continue supporting parallel VisIt on Windows so our users will be able to take full advantage of their multicore resources.« less
Scan line graphics generation on the massively parallel processor

NASA Technical Reports Server (NTRS)

Dorband, John E.

1988-01-01

Described here is how researchers implemented a scan line graphics generation algorithm on the Massively Parallel Processor (MPP). Pixels are computed in parallel and their results are applied to the Z buffer in large groups. To perform pixel value calculations, facilitate load balancing across the processors and apply the results to the Z buffer efficiently in parallel requires special virtual routing (sort computation) techniques developed by the author especially for use on single-instruction multiple-data (SIMD) architectures.
A massively parallel computational approach to coupled thermoelastic/porous gas flow problems

NASA Technical Reports Server (NTRS)

Shia, David; Mcmanus, Hugh L.

1995-01-01

A new computational scheme for coupled thermoelastic/porous gas flow problems is presented. Heat transfer, gas flow, and dynamic thermoelastic governing equations are expressed in fully explicit form, and solved on a massively parallel computer. The transpiration cooling problem is used as an example problem. The numerical solutions have been verified by comparison to available analytical solutions. Transient temperature, pressure, and stress distributions have been obtained. Small spatial oscillations in pressure and stress have been observed, which would be impractical to predict with previously available schemes. Comparisons between serial and massively parallel versions of the scheme have also been made. The results indicate that for small scale problems the serial and parallel versions use practically the same amount of CPU time. However, as the problem size increases the parallel version becomes more efficient than the serial version.
Bit-parallel arithmetic in a massively-parallel associative processor

NASA Technical Reports Server (NTRS)

Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.

1992-01-01

A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.
Holographic heat engine within the framework of massive gravity

NASA Astrophysics Data System (ADS)

Mo, Jie-Xiong; Li, Gu-Qiang

2018-05-01

Heat engine models are constructed within the framework of massive gravity in this paper. For the four-dimensional charged black holes in massive gravity, it is shown that the existence of graviton mass improves the heat engine efficiency significantly. The situation is more complicated for the five-dimensional neutral black holes since the constant which corresponds to the third massive potential also contributes to the efficiency. It is also shown that the existence of graviton mass can improve the heat engine efficiency. Moreover, we probe how the massive gravity influences the behavior of the heat engine efficiency approaching the Carnot efficiency.
Black holes in massive gravity as heat engines

NASA Astrophysics Data System (ADS)

Hendi, S. H.; Eslam Panah, B.; Panahiyan, S.; Liu, H.; Meng, X.-H.

2018-06-01

The paper at hand studies the heat engine provided by black holes in the presence of massive gravity. The main motivation is to investigate the effects of massive gravity on different properties of the heat engine. It will be shown that massive gravity parameters modify the efficiency of engine on a significant level. Furthermore, it will be pointed out that it is possible to have a heat engine for non-spherical black holes in massive gravity, and therefore, we will study the effects of horizon topology on the properties of heat engine. Surprisingly, it will be shown that the highest efficiency for the heat engine belongs to black holes with the hyperbolic horizon, while the lowest one belongs to the spherical black holes.
Increasing the reach of forensic genetics with massively parallel sequencing.

PubMed

Budowle, Bruce; Schmedes, Sarah E; Wendt, Frank R

2017-09-01

The field of forensic genetics has made great strides in the analysis of biological evidence related to criminal and civil matters. More so, the discipline has set a standard of performance and quality in the forensic sciences. The advent of massively parallel sequencing will allow the field to expand its capabilities substantially. This review describes the salient features of massively parallel sequencing and how it can impact forensic genetics. The features of this technology offer increased number and types of genetic markers that can be analyzed, higher throughput of samples, and the capability of targeting different organisms, all by one unifying methodology. While there are many applications, three are described where massively parallel sequencing will have immediate impact: molecular autopsy, microbial forensics and differentiation of monozygotic twins. The intent of this review is to expose the forensic science community to the potential enhancements that have or are soon to arrive and demonstrate the continued expansion the field of forensic genetics and its service in the investigation of legal matters.
Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.

PubMed

Maruyama, Yutaka; Yoshida, Norio; Tadano, Hiroto; Takahashi, Daisuke; Sato, Mitsuhisa; Hirata, Fumio

2014-07-05

A new three-dimensional reference interaction site model (3D-RISM) program for massively parallel machines combined with the volumetric 3D fast Fourier transform (3D-FFT) was developed, and tested on the RIKEN K supercomputer. The ordinary parallel 3D-RISM program has a limitation on the number of parallelizations because of the limitations of the slab-type 3D-FFT. The volumetric 3D-FFT relieves this limitation drastically. We tested the 3D-RISM calculation on the large and fine calculation cell (2048(3) grid points) on 16,384 nodes, each having eight CPU cores. The new 3D-RISM program achieved excellent scalability to the parallelization, running on the RIKEN K supercomputer. As a benchmark application, we employed the program, combined with molecular dynamics simulation, to analyze the oligomerization process of chymotrypsin Inhibitor 2 mutant. The results demonstrate that the massive parallel 3D-RISM program is effective to analyze the hydration properties of the large biomolecular systems. Copyright © 2014 Wiley Periodicals, Inc.
Parallel computing of a climate model on the dawn 1000 by domain decomposition method

NASA Astrophysics Data System (ADS)

Bi, Xunqiang

1997-12-01

In this paper the parallel computing of a grid-point nine-level atmospheric general circulation model on the Dawn 1000 is introduced. The model was developed by the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences (CAS). The Dawn 1000 is a MIMD massive parallel computer made by National Research Center for Intelligent Computer (NCIC), CAS. A two-dimensional domain decomposition method is adopted to perform the parallel computing. The potential ways to increase the speed-up ratio and exploit more resources of future massively parallel supercomputation are also discussed.
A sweep algorithm for massively parallel simulation of circuit-switched networks

NASA Technical Reports Server (NTRS)

Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.

1992-01-01

A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.
Multi-threading: A new dimension to massively parallel scientific computation

NASA Astrophysics Data System (ADS)

Nielsen, Ida M. B.; Janssen, Curtis L.

2000-06-01

Multi-threading is becoming widely available for Unix-like operating systems, and the application of multi-threading opens new ways for performing parallel computations with greater efficiency. We here briefly discuss the principles of multi-threading and illustrate the application of multi-threading for a massively parallel direct four-index transformation of electron repulsion integrals. Finally, other potential applications of multi-threading in scientific computing are outlined.
Massively parallel support for a case-based planning system

NASA Technical Reports Server (NTRS)

Kettler, Brian P.; Hendler, James A.; Anderson, William A.

1993-01-01

Case-based planning (CBP), a kind of case-based reasoning, is a technique in which previously generated plans (cases) are stored in memory and can be reused to solve similar planning problems in the future. CBP can save considerable time over generative planning, in which a new plan is produced from scratch. CBP thus offers a potential (heuristic) mechanism for handling intractable problems. One drawback of CBP systems has been the need for a highly structured memory to reduce retrieval times. This approach requires significant domain engineering and complex memory indexing schemes to make these planners efficient. In contrast, our CBP system, CaPER, uses a massively parallel frame-based AI language (PARKA) and can do extremely fast retrieval of complex cases from a large, unindexed memory. The ability to do fast, frequent retrievals has many advantages: indexing is unnecessary; very large case bases can be used; memory can be probed in numerous alternate ways; and queries can be made at several levels, allowing more specific retrieval of stored plans that better fit the target problem with less adaptation. In this paper we describe CaPER's case retrieval techniques and some experimental results showing its good performance, even on large case bases.
The language parallel Pascal and other aspects of the massively parallel processor

NASA Technical Reports Server (NTRS)

Reeves, A. P.; Bruner, J. D.

1982-01-01

A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code

DOE Office of Scientific and Technical Information (OSTI.GOV)

Earth Sciences Division; Zhang, Keni; Zhang, Keni

TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator ismore » to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used. To familiarize users with the parallel code, illustrative sample problems are presented.« less
Genetically Engineered Microelectronic Infrared Filters

NASA Technical Reports Server (NTRS)

Cwik, Tom; Klimeck, Gerhard

1998-01-01

A genetic algorithm is used for design of infrared filters and in the understanding of the material structure of a resonant tunneling diode. These two components are examples of microdevices and nanodevices that can be numerically simulated using fundamental mathematical and physical models. Because the number of parameters that can be used in the design of one of these devices is large, and because experimental exploration of the design space is unfeasible, reliable software models integrated with global optimization methods are examined The genetic algorithm and engineering design codes have been implemented on massively parallel computers to exploit their high performance. Design results are presented for the infrared filter showing new and optimized device design. Results for nanodevices are presented in a companion paper at this workshop.
The factorization of large composite numbers on the MPP

NASA Technical Reports Server (NTRS)

Mckurdy, Kathy J.; Wunderlich, Marvin C.

1987-01-01

The continued fraction method for factoring large integers (CFRAC) was an ideal algorithm to be implemented on a massively parallel computer such as the Massively Parallel Processor (MPP). After much effort, the first 60 digit number was factored on the MPP using about 6 1/2 hours of array time. Although this result added about 10 digits to the size number that could be factored using CFRAC on a serial machine, it was already badly beaten by the implementation of Davis and Holdridge on the CRAY-1 using the quadratic sieve, an algorithm which is clearly superior to CFRAC for large numbers. An algorithm is illustrated which is ideally suited to the single instruction multiple data (SIMD) massively parallel architecture and some of the modifications which were needed in order to make the parallel implementation effective and efficient are described.
Increasing phylogenetic resolution at low taxonomic levels using massively parallel sequencing of chloroplast genomes

Treesearch

Matthew Parks; Richard Cronn; Aaron Liston

2009-01-01

We reconstruct the infrageneric phylogeny of Pinus from 37 nearly-complete chloroplast genomes (average 109 kilobases each of an approximately 120 kilobase genome) generated using multiplexed massively parallel sequencing. We found that 30/33 ingroup nodes resolved wlth > 95-percent bootstrap support; this is a substantial improvement relative...
Fast, Massively Parallel Data Processors

NASA Technical Reports Server (NTRS)

Heaton, Robert A.; Blevins, Donald W.; Davis, ED

1994-01-01

Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.
Massively Parallel Solution of Poisson Equation on Coarse Grain MIMD Architectures

NASA Technical Reports Server (NTRS)

Fijany, A.; Weinberger, D.; Roosta, R.; Gulati, S.

1998-01-01

In this paper a new algorithm, designated as Fast Invariant Imbedding algorithm, for solution of Poisson equation on vector and massively parallel MIMD architectures is presented. This algorithm achieves the same optimal computational efficiency as other Fast Poisson solvers while offering a much better structure for vector and parallel implementation. Our implementation on the Intel Delta and Paragon shows that a speedup of over two orders of magnitude can be achieved even for moderate size problems.

Self-Scheduling Parallel Methods for Multiple Serial Codes with Application to WOPWOP

NASA Technical Reports Server (NTRS)

Long, Lyle N.; Brentner, Kenneth S.

2000-01-01

This paper presents a scheme for efficiently running a large number of serial jobs on parallel computers. Two examples are given of computer programs that run relatively quickly, but often they must be run numerous times to obtain all the results needed. It is very common in science and engineering to have codes that are not massive computing challenges in themselves, but due to the number of instances that must be run, they do become large-scale computing problems. The two examples given here represent common problems in aerospace engineering: aerodynamic panel methods and aeroacoustic integral methods. The first example simply solves many systems of linear equations. This is representative of an aerodynamic panel code where someone would like to solve for numerous angles of attack. The complete code for this first example is included in the appendix so that it can be readily used by others as a template. The second example is an aeroacoustics code (WOPWOP) that solves the Ffowcs Williams Hawkings equation to predict the far-field sound due to rotating blades. In this example, one quite often needs to compute the sound at numerous observer locations, hence parallelization is utilized to automate the noise computation for a large number of observers.
Computational electronics and electromagnetics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shang, C C

The Computational Electronics and Electromagnetics thrust area serves as the focal point for Engineering R and D activities for developing computer-based design and analysis tools. Representative applications include design of particle accelerator cells and beamline components; design of transmission line components; engineering analysis and design of high-power (optical and microwave) components; photonics and optoelectronics circuit design; electromagnetic susceptibility analysis; and antenna synthesis. The FY-97 effort focuses on development and validation of (1) accelerator design codes; (2) 3-D massively parallel, time-dependent EM codes; (3) material models; (4) coupling and application of engineering tools for analysis and design of high-power components; andmore » (5) development of beam control algorithms coupled to beam transport physics codes. These efforts are in association with technology development in the power conversion, nondestructive evaluation, and microtechnology areas. The efforts complement technology development in Lawrence Livermore National programs.« less
Computational electronics and electromagnetics

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shang, C. C.

The Computational Electronics and Electromagnetics thrust area at Lawrence Livermore National Laboratory serves as the focal point for engineering R&D activities for developing computer-based design, analysis, and tools for theory. Key representative applications include design of particle accelerator cells and beamline components; engineering analysis and design of high-power components, photonics, and optoelectronics circuit design; EMI susceptibility analysis; and antenna synthesis. The FY-96 technology-base effort focused code development on (1) accelerator design codes; (2) 3-D massively parallel, object-oriented time-domain EM codes; (3) material models; (4) coupling and application of engineering tools for analysis and design of high-power components; (5) 3-D spectral-domainmore » CEM tools; and (6) enhancement of laser drilling codes. Joint efforts with the Power Conversion Technologies thrust area include development of antenna systems for compact, high-performance radar, in addition to novel, compact Marx generators. 18 refs., 25 figs., 1 tab.« less
SIAM Conference on Parallel Processing for Scientific Computing, 4th, Chicago, IL, Dec. 11-13, 1989, Proceedings

NASA Technical Reports Server (NTRS)

Dongarra, Jack (Editor); Messina, Paul (Editor); Sorensen, Danny C. (Editor); Voigt, Robert G. (Editor)

1990-01-01

Attention is given to such topics as an evaluation of block algorithm variants in LAPACK and presents a large-grain parallel sparse system solver, a multiprocessor method for the solution of the generalized Eigenvalue problem on an interval, and a parallel QR algorithm for iterative subspace methods on the CM2. A discussion of numerical methods includes the topics of asynchronous numerical solutions of PDEs on parallel computers, parallel homotopy curve tracking on a hypercube, and solving Navier-Stokes equations on the Cedar Multi-Cluster system. A section on differential equations includes a discussion of a six-color procedure for the parallel solution of elliptic systems using the finite quadtree structure, data parallel algorithms for the finite element method, and domain decomposition methods in aerodynamics. Topics dealing with massively parallel computing include hypercube vs. 2-dimensional meshes and massively parallel computation of conservation laws. Performance and tools are also discussed.
Improving the Flow

NASA Technical Reports Server (NTRS)

2004-01-01

In early 1995, NASA s Glenn Research Center (then Lewis Research Center) formed an industry-government team with several jet engine companies to develop the National Combustion Code (NCC), which would help aerospace engineers solve complex aerodynamics and combustion problems in gas turbine, rocket, and hypersonic engines. The original development team consisted of Allison Engine Company (now Rolls-Royce Allison), CFD Research Corporation, GE Aircraft Engines, Pratt and Whitney, and NASA. After the baseline beta version was established in July 1998, the team focused its efforts on consolidation, streamlining, and integration, as well as enhancement, evaluation, validation, and application. These activities, mainly conducted at NASA Glenn, led to the completion of NCC version 1.0 in October 2000. NCC version 1.0 features high-fidelity representation of complex geometry, advanced models for two-phase turbulent combustion, and massively parallel computing. Researchers and engineers at Glenn have been using NCC to provide analysis and design support for various aerospace propulsion technology development projects. NASA transfers NCC technology to external customers using non- exclusive Space Act Agreements. Glenn researchers also communicate research and development results derived from NCC's further development through publications and special sessions at technical conferences.
Proxy-equation paradigm: A strategy for massively parallel asynchronous computations

NASA Astrophysics Data System (ADS)

Mittal, Ankita; Girimaji, Sharath

2017-09-01

Massively parallel simulations of transport equation systems call for a paradigm change in algorithm development to achieve efficient scalability. Traditional approaches require time synchronization of processing elements (PEs), which severely restricts scalability. Relaxing synchronization requirement introduces error and slows down convergence. In this paper, we propose and develop a novel "proxy equation" concept for a general transport equation that (i) tolerates asynchrony with minimal added error, (ii) preserves convergence order and thus, (iii) expected to scale efficiently on massively parallel machines. The central idea is to modify a priori the transport equation at the PE boundaries to offset asynchrony errors. Proof-of-concept computations are performed using a one-dimensional advection (convection) diffusion equation. The results demonstrate the promise and advantages of the present strategy.
Here and now: the intersection of computational science, quantum-mechanical simulations, and materials science

NASA Astrophysics Data System (ADS)

Marzari, Nicola

The last 30 years have seen the steady and exhilarating development of powerful quantum-simulation engines for extended systems, dedicated to the solution of the Kohn-Sham equations of density-functional theory, often augmented by density-functional perturbation theory, many-body perturbation theory, time-dependent density-functional theory, dynamical mean-field theory, and quantum Monte Carlo. Their implementation on massively parallel architectures, now leveraging also GPUs and accelerators, has started a massive effort in the prediction from first principles of many or of complex materials properties, leading the way to the exascale through the combination of HPC (high-performance computing) and HTC (high-throughput computing). Challenges and opportunities abound: complementing hardware and software investments and design; developing the materials' informatics infrastructure needed to encode knowledge into complex protocols and workflows of calculations; managing and curating data; resisting the complacency that we have already reached the predictive accuracy needed for materials design, or a robust level of verification of the different quantum engines. In this talk I will provide an overview of these challenges, with the ultimate prize being the computational understanding, prediction, and design of properties and performance for novel or complex materials and devices.
Using collective variables to drive molecular dynamics simulations

NASA Astrophysics Data System (ADS)

Fiorin, Giacomo; Klein, Michael L.; Hénin, Jérôme

2013-12-01

A software framework is introduced that facilitates the application of biasing algorithms to collective variables of the type commonly employed to drive massively parallel molecular dynamics (MD) simulations. The modular framework that is presented enables one to combine existing collective variables into new ones, and combine any chosen collective variable with available biasing methods. The latter include the classic time-dependent biases referred to as steered MD and targeted MD, the temperature-accelerated MD algorithm, as well as the adaptive free-energy biases called metadynamics and adaptive biasing force. The present modular software is extensible, and portable between commonly used MD simulation engines.
Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) Simulations of the Molecular Crystal alphaRDX

DTIC Science & Technology

2013-08-01

potential for HMX / RDX (3, 9). ...................................................................................8 1 1. Purpose This work...6 dispersion and electrostatic interactions. Constants for the SB potential are given in table 1. 8 Table 1. SB potential for HMX / RDX (3, 9...modeling dislocations in the energetic molecular crystal RDX using the Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) molecular
Algorithms and programming tools for image processing on the MPP

NASA Technical Reports Server (NTRS)

Reeves, A. P.

1985-01-01

Topics addressed include: data mapping and rotational algorithms for the Massively Parallel Processor (MPP); Parallel Pascal language; documentation for the Parallel Pascal Development system; and a description of the Parallel Pascal language used on the MPP.
Parallelized direct execution simulation of message-passing parallel programs

NASA Technical Reports Server (NTRS)

Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.

1994-01-01

As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
The architecture of tomorrow's massively parallel computer

NASA Technical Reports Server (NTRS)

Batcher, Ken

1987-01-01

Goodyear Aerospace delivered the Massively Parallel Processor (MPP) to NASA/Goddard in May 1983, over three years ago. Ever since then, Goodyear has tried to look in a forward direction. There is always some debate as to which way is forward when it comes to supercomputer architecture. Improvements to the MPP's massively parallel architecture are discussed in the areas of data I/O, memory capacity, connectivity, and indirect (or local) addressing. In I/O, transfer rates up to 640 megabytes per second can be achieved. There are devices that can supply the data and accept it at this rate. The memory capacity can be increased up to 128 megabytes in the ARU and over a gigabyte in the staging memory. For connectivity, there are several different kinds of multistage networks that should be considered.
Parallel computational fluid dynamics '91; Conference Proceedings, Stuttgart, Germany, Jun. 10-12, 1991

NASA Technical Reports Server (NTRS)

Reinsch, K. G. (Editor); Schmidt, W. (Editor); Ecer, A. (Editor); Haeuser, Jochem (Editor); Periaux, J. (Editor)

1992-01-01

A conference was held on parallel computational fluid dynamics and produced related papers. Topics discussed in these papers include: parallel implicit and explicit solvers for compressible flow, parallel computational techniques for Euler and Navier-Stokes equations, grid generation techniques for parallel computers, and aerodynamic simulation om massively parallel systems.
Overcoming rule-based rigidity and connectionist limitations through massively-parallel case-based reasoning

NASA Technical Reports Server (NTRS)

Barnden, John; Srinivas, Kankanahalli

1990-01-01

Symbol manipulation as used in traditional Artificial Intelligence has been criticized by neural net researchers for being excessively inflexible and sequential. On the other hand, the application of neural net techniques to the types of high-level cognitive processing studied in traditional artificial intelligence presents major problems as well. A promising way out of this impasse is to build neural net models that accomplish massively parallel case-based reasoning. Case-based reasoning, which has received much attention recently, is essentially the same as analogy-based reasoning, and avoids many of the problems leveled at traditional artificial intelligence. Further problems are avoided by doing many strands of case-based reasoning in parallel, and by implementing the whole system as a neural net. In addition, such a system provides an approach to some aspects of the problems of noise, uncertainty and novelty in reasoning systems. The current neural net system (Conposit), which performs standard rule-based reasoning, is being modified into a massively parallel case-based reasoning version.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Performance of the Heavy Flavor Tracker (HFT) detector in star experiment at RHIC

NASA Astrophysics Data System (ADS)

Alruwaili, Manal

With the growing technology, the number of the processors is becoming massive. Current supercomputer processing will be available on desktops in the next decade. For mass scale application software development on massive parallel computing available on desktops, existing popular languages with large libraries have to be augmented with new constructs and paradigms that exploit massive parallel computing and distributed memory models while retaining the user-friendliness. Currently, available object oriented languages for massive parallel computing such as Chapel, X10 and UPC++ exploit distributed computing, data parallel computing and thread-parallelism at the process level in the PGAS (Partitioned Global Address Space) memory model. However, they do not incorporate: 1) any extension at for object distribution to exploit PGAS model; 2) the programs lack the flexibility of migrating or cloning an object between places to exploit load balancing; and 3) lack the programming paradigms that will result from the integration of data and thread-level parallelism and object distribution. In the proposed thesis, I compare different languages in PGAS model; propose new constructs that extend C++ with object distribution and object migration; and integrate PGAS based process constructs with these extensions on distributed objects. Object cloning and object migration. Also a new paradigm MIDD (Multiple Invocation Distributed Data) is presented when different copies of the same class can be invoked, and work on different elements of a distributed data concurrently using remote method invocations. I present new constructs, their grammar and their behavior. The new constructs have been explained using simple programs utilizing these constructs.
Graphics Processing Unit Assisted Thermographic Compositing

NASA Technical Reports Server (NTRS)

Ragasa, Scott; Russell, Samuel S.

2012-01-01

Objective Develop a software application utilizing high performance computing techniques, including general purpose graphics processing units (GPGPUs), for the analysis and visualization of large thermographic data sets. Over the past several years, an increasing effort among scientists and engineers to utilize graphics processing units (GPUs) in a more general purpose fashion is allowing for previously unobtainable levels of computation by individual workstations. As data sets grow, the methods to work them grow at an equal, and often greater, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU which yield significant increases in performance. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Image processing is one area were GPUs are being used to greatly increase the performance of certain analysis and visualization techniques.
Graphics Processing Unit Assisted Thermographic Compositing

NASA Technical Reports Server (NTRS)

Ragasa, Scott; McDougal, Matthew; Russell, Sam

2013-01-01

Objective: To develop a software application utilizing general purpose graphics processing units (GPUs) for the analysis of large sets of thermographic data. Background: Over the past few years, an increasing effort among scientists and engineers to utilize the GPU in a more general purpose fashion is allowing for supercomputer level results at individual workstations. As data sets grow, the methods to work them grow at an equal, and often greater, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU to allow for throughput that was previously reserved for compute clusters. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Signal (image) processing is one area were GPUs are being used to greatly increase the performance of certain algorithms and analysis techniques.
Transitioning NWChem to the Next Generation of Manycore Machines

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bylaska, Eric J.; Apra, Edoardo; Kowalski, Karol

The NorthWest Chemistry (NWChem) modeling software is a popular molecular chemistry simulation software that was designed from the start to work on massively parallel processing supercomputers[6, 28, 49]. It contains an umbrella of modules that today includes Self Consistent Field (SCF), second order Mller-Plesset perturbation theory (MP2), Coupled Cluster, multi-conguration selfconsistent eld (MCSCF), selected conguration interaction (CI), tensor contraction engine (TCE) many body methods, density functional theory (DFT), time-dependent density functional theory (TDDFT), real time time-dependent density functional theory, pseudopotential plane-wave density functional theory (PSPW), band structure (BAND), ab initio molecular dynamics, Car-Parrinello molecular dynamics, classical molecular dynamics (MD), QM/MM,more » AIMD/MM, GIAO NMR, COSMO, COSMO-SMD, and RISM solvation models, free energy simulations, reaction path optimization, parallel in time, among other capabilities[ 22]. Moreover new capabilities continue to be added with each new release.« less

LAMMPS strong scaling performance optimization on Blue Gene/Q

DOE Office of Scientific and Technical Information (OSTI.GOV)

Coffman, Paul; Jiang, Wei; Romero, Nichols A.

2014-11-12

LAMMPS "Large-scale Atomic/Molecular Massively Parallel Simulator" is an open-source molecular dynamics package from Sandia National Laboratories. Significant performance improvements in strong-scaling and time-to-solution for this application on IBM's Blue Gene/Q have been achieved through computational optimizations of the OpenMP versions of the short-range Lennard-Jones term of the CHARMM force field and the long-range Coulombic interaction implemented with the PPPM (particle-particle-particle mesh) algorithm, enhanced by runtime parameter settings controlling thread utilization. Additionally, MPI communication performance improvements were made to the PPPM calculation by re-engineering the parallel 3D FFT to use MPICH collectives instead of point-to-point. Performance testing was done using anmore » 8.4-million atom simulation scaling up to 16 racks on the Mira system at Argonne Leadership Computing Facility (ALCF). Speedups resulting from this effort were in some cases over 2x.« less
Application of CHAD hydrodynamics to shock-wave problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Trease, H.E.; O`Rourke, P.J.; Sahota, M.S.

1997-12-31

CHAD is the latest in a sequence of continually evolving computer codes written to effectively utilize massively parallel computer architectures and the latest grid generators for unstructured meshes. Its applications range from automotive design issues such as in-cylinder and manifold flows of internal combustion engines, vehicle aerodynamics, underhood cooling and passenger compartment heating, ventilation, and air conditioning to shock hydrodynamics and materials modeling. CHAD solves the full unsteady Navier-Stoke equations with the k-epsilon turbulence model in three space dimensions. The code has four major features that distinguish it from the earlier KIVA code, also developed at Los Alamos. First, itmore » is based on a node-centered, finite-volume method in which, like finite element methods, all fluid variables are located at computational nodes. The computational mesh efficiently and accurately handles all element shapes ranging from tetrahedra to hexahedra. Second, it is written in standard Fortran 90 and relies on automatic domain decomposition and a universal communication library written in standard C and MPI for unstructured grids to effectively exploit distributed-memory parallel architectures. Thus the code is fully portable to a variety of computing platforms such as uniprocessor workstations, symmetric multiprocessors, clusters of workstations, and massively parallel platforms. Third, CHAD utilizes a variable explicit/implicit upwind method for convection that improves computational efficiency in flows that have large velocity Courant number variations due to velocity of mesh size variations. Fourth, CHAD is designed to also simulate shock hydrodynamics involving multimaterial anisotropic behavior under high shear. The authors will discuss CHAD capabilities and show several sample calculations showing the strengths and weaknesses of CHAD.« less
Role of APOE Isoforms in the Pathogenesis of TBI induced Alzheimer’s Disease

DTIC Science & Technology

2016-10-01

deletion, APOE targeted replacement, complex breeding, CCI model optimization, mRNA library generation, high throughput massive parallel sequencing...demonstrate that the lack of Abca1 increases amyloid plaques and decreased APOE protein levels in AD-model mice. In this proposal we will test the hypothesis...injury, inflammatory reaction, transcriptome, high throughput massive parallel sequencing, mRNA-seq., behavioral testing, memory impairment, recovery 3
High-Resolution Functional Mapping of the Venezuelan Equine Encephalitis Virus Genome by Insertional Mutagenesis and Massively Parallel Sequencing

DTIC Science & Technology

2010-10-14

High-Resolution Functional Mapping of the Venezuelan Equine Encephalitis Virus Genome by Insertional Mutagenesis and Massively Parallel Sequencing...Venezuelan equine encephalitis virus (VEEV) genome. We initially used a capillary electrophoresis method to gain insight into the role of the VEEV...Smith JM, Schmaljohn CS (2010) High-Resolution Functional Mapping of the Venezuelan Equine Encephalitis Virus Genome by Insertional Mutagenesis and
Efficient, massively parallel eigenvalue computation

NASA Technical Reports Server (NTRS)

Huo, Yan; Schreiber, Robert

1993-01-01

In numerical simulations of disordered electronic systems, one of the most common approaches is to diagonalize random Hamiltonian matrices and to study the eigenvalues and eigenfunctions of a single electron in the presence of a random potential. An effort to implement a matrix diagonalization routine for real symmetric dense matrices on massively parallel SIMD computers, the Maspar MP-1 and MP-2 systems, is described. Results of numerical tests and timings are also presented.
Exploring the Ability of a Coarse-grained Potential to Describe the Stress-strain Response of Glassy Polystyrene

DTIC Science & Technology

2012-10-01

using the open-source code Large-scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) (http://lammps.sandia.gov) (23). The commercial...parameters are proprietary and cannot be ported to the LAMMPS 4 simulation code. In our molecular dynamics simulations at the atomistic resolution, we...IBI iterative Boltzmann inversion LAMMPS Large-scale Atomic/Molecular Massively Parallel Simulator MAPS Materials Processes and Simulations MS
Massively parallel GPU-accelerated minimization of classical density functional theory

NASA Astrophysics Data System (ADS)

Stopper, Daniel; Roth, Roland

2017-08-01

In this paper, we discuss the ability to numerically minimize the grand potential of hard disks in two-dimensional and of hard spheres in three-dimensional space within the framework of classical density functional and fundamental measure theory on modern graphics cards. Our main finding is that a massively parallel minimization leads to an enormous performance gain in comparison to standard sequential minimization schemes. Furthermore, the results indicate that in complex multi-dimensional situations, a heavy parallel minimization of the grand potential seems to be mandatory in order to reach a reasonable balance between accuracy and computational cost.
A new parallel-vector finite element analysis software on distributed-memory computers

NASA Technical Reports Server (NTRS)

Qin, Jiangning; Nguyen, Duc T.

1993-01-01

A new parallel-vector finite element analysis software package MPFEA (Massively Parallel-vector Finite Element Analysis) is developed for large-scale structural analysis on massively parallel computers with distributed-memory. MPFEA is designed for parallel generation and assembly of the global finite element stiffness matrices as well as parallel solution of the simultaneous linear equations, since these are often the major time-consuming parts of a finite element analysis. Block-skyline storage scheme along with vector-unrolling techniques are used to enhance the vector performance. Communications among processors are carried out concurrently with arithmetic operations to reduce the total execution time. Numerical results on the Intel iPSC/860 computers (such as the Intel Gamma with 128 processors and the Intel Touchstone Delta with 512 processors) are presented, including an aircraft structure and some very large truss structures, to demonstrate the efficiency and accuracy of MPFEA.
Template based parallel checkpointing in a massively parallel computer system

DOEpatents

Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN

2009-01-13

A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
Parallel computing works

DOE Office of Scientific and Technical Information (OSTI.GOV)

Not Available

An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of manymore » computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.« less
Reconstructing evolutionary trees in parallel for massive sequences.

PubMed

Zou, Quan; Wan, Shixiang; Zeng, Xiangxiang; Ma, Zhanshan Sam

2017-12-14

Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/ .
Distributed Parallel Processing and Dynamic Load Balancing Techniques for Multidisciplinary High Speed Aircraft Design

NASA Technical Reports Server (NTRS)

Krasteva, Denitza T.

1998-01-01

Multidisciplinary design optimization (MDO) for large-scale engineering problems poses many challenges (e.g., the design of an efficient concurrent paradigm for global optimization based on disciplinary analyses, expensive computations over vast data sets, etc.) This work focuses on the application of distributed schemes for massively parallel architectures to MDO problems, as a tool for reducing computation time and solving larger problems. The specific problem considered here is configuration optimization of a high speed civil transport (HSCT), and the efficient parallelization of the embedded paradigm for reasonable design space identification. Two distributed dynamic load balancing techniques (random polling and global round robin with message combining) and two necessary termination detection schemes (global task count and token passing) were implemented and evaluated in terms of effectiveness and scalability to large problem sizes and a thousand processors. The effect of certain parameters on execution time was also inspected. Empirical results demonstrated stable performance and effectiveness for all schemes, and the parametric study showed that the selected algorithmic parameters have a negligible effect on performance.
(Extreme) Core-collapse Supernova Simulations

NASA Astrophysics Data System (ADS)

Mösta, Philipp

2017-01-01

In this talk I will present recent progress on modeling core-collapse supernovae with massively parallel simulations on the largest supercomputers available. I will discuss the unique challenges in both input physics and computational modeling that come with a problem involving all four fundamental forces and relativistic effects and will highlight recent breakthroughs overcoming these challenges in full 3D simulations. I will pay particular attention to how these simulations can be used to reveal the engines driving some of the most extreme explosions and conclude by discussing what remains to be done in simulation work to maximize what we can learn from current and future time-domain astronomy transient surveys.
Two-mode thermal-noise squeezing in an electromechanical resonator.

PubMed

Mahboob, I; Okamoto, H; Onomitsu, K; Yamaguchi, H

2014-10-17

An electromechanical resonator is developed in which mechanical nonlinearities can be dynamically engineered to emulate the nondegenerate parametric down-conversion interaction. In this configuration, phonons are simultaneously generated in pairs in two macroscopic vibration modes, resulting in the amplification of their motion. In parallel, two-mode thermal squeezed states are also created, which exhibit fluctuations below the thermal motion of their constituent modes as well as harboring correlations between the modes that become almost perfect as their amplification is increased. The existence of correlations between two massive phonon ensembles paves the way towards an entangled macroscopic mechanical system at the single phonon level.
Development of a Stiffness-Based Chemistry Load Balancing Scheme, and Optimization of Input/Output and Communication, to Enable Massively Parallel High-Fidelity Internal Combustion Engine Simulations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kodavasal, Janardhan; Harms, Kevin; Srivastava, Priyesh

A closed-cycle gasoline compression ignition engine simulation near top dead center (TDC) was used to profile the performance of a parallel commercial engine computational fluid dynamics code, as it was scaled on up to 4096 cores of an IBM Blue Gene/Q supercomputer. The test case has 9 million cells near TDC, with a fixed mesh size of 0.15 mm, and was run on configurations ranging from 128 to 4096 cores. Profiling was done for a small duration of 0.11 crank angle degrees near TDC during ignition. Optimization of input/output performance resulted in a significant speedup in reading restart files, andmore » in an over 100-times speedup in writing restart files and files for post-processing. Improvements to communication resulted in a 1400-times speedup in the mesh load balancing operation during initialization, on 4096 cores. An improved, “stiffness-based” algorithm for load balancing chemical kinetics calculations was developed, which results in an over 3-times faster run-time near ignition on 4096 cores relative to the original load balancing scheme. With this improvement to load balancing, the code achieves over 78% scaling efficiency on 2048 cores, and over 65% scaling efficiency on 4096 cores, relative to 256 cores.« less
Load balancing for massively-parallel soft-real-time systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hailperin, M.

1988-09-01

Global load balancing, if practical, would allow the effective use of massively-parallel ensemble architectures for large soft-real-problems. The challenge is to replace quick global communications, which is impractical in a massively-parallel system, with statistical techniques. In this vein, the author proposes a novel approach to decentralized load balancing based on statistical time-series analysis. Each site estimates the system-wide average load using information about past loads of individual sites and attempts to equal that average. This estimation process is practical because the soft-real-time systems of interest naturally exhibit loads that are periodic, in a statistical sense akin to seasonality in econometrics.more » It is shown how this load-characterization technique can be the foundation for a load-balancing system in an architecture employing cut-through routing and an efficient multicast protocol.« less
Evaluation of massively parallel sequencing for forensic DNA methylation profiling.

PubMed

Richards, Rebecca; Patel, Jayshree; Stevenson, Kate; Harbison, SallyAnn

2018-05-11

Epigenetics is an emerging area of interest in forensic science. DNA methylation, a type of epigenetic modification, can be applied to chronological age estimation, identical twin differentiation and body fluid identification. However, there is not yet an agreed, established methodology for targeted detection and analysis of DNA methylation markers in forensic research. Recently a massively parallel sequencing-based approach has been suggested. The use of massively parallel sequencing is well established in clinical epigenetics and is emerging as a new technology in the forensic field. This review investigates the potential benefits, limitations and considerations of this technique for the analysis of DNA methylation in a forensic context. The importance of a robust protocol, regardless of the methodology used, that minimises potential sources of bias is highlighted. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
Performance analysis of three dimensional integral equation computations on a massively parallel computer. M.S. Thesis

NASA Technical Reports Server (NTRS)

Logan, Terry G.

1994-01-01

The purpose of this study is to investigate the performance of the integral equation computations using numerical source field-panel method in a massively parallel processing (MPP) environment. A comparative study of computational performance of the MPP CM-5 computer and conventional Cray-YMP supercomputer for a three-dimensional flow problem is made. A serial FORTRAN code is converted into a parallel CM-FORTRAN code. Some performance results are obtained on CM-5 with 32, 62, 128 nodes along with those on Cray-YMP with a single processor. The comparison of the performance indicates that the parallel CM-FORTRAN code near or out-performs the equivalent serial FORTRAN code for some cases.
Supercomputing on massively parallel bit-serial architectures

NASA Technical Reports Server (NTRS)

Iobst, Ken

1985-01-01

Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.
Shift-and-invert parallel spectral transformation eigensolver: Massively parallel performance for density-functional based tight-binding

DOE PAGES

Zhang, Hong; Zapol, Peter; Dixon, David A.; ...

2015-11-17

The Shift-and-invert parallel spectral transformations (SIPs), a computational approach to solve sparse eigenvalue problems, is developed for massively parallel architectures with exceptional parallel scalability and robustness. The capabilities of SIPs are demonstrated by diagonalization of density-functional based tight-binding (DFTB) Hamiltonian and overlap matrices for single-wall metallic carbon nanotubes, diamond nanowires, and bulk diamond crystals. The largest (smallest) example studied is a 128,000 (2000) atom nanotube for which ~330,000 (~5600) eigenvalues and eigenfunctions are obtained in ~190 (~5) seconds when parallelized over 266,144 (16,384) Blue Gene/Q cores. Weak scaling and strong scaling of SIPs are analyzed and the performance of SIPsmore » is compared with other novel methods. Different matrix ordering methods are investigated to reduce the cost of the factorization step, which dominates the time-to-solution at the strong scaling limit. As a result, a parallel implementation of assembling the density matrix from the distributed eigenvectors is demonstrated.« less

Shift-and-invert parallel spectral transformation eigensolver: Massively parallel performance for density-functional based tight-binding

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Hong; Zapol, Peter; Dixon, David A.

The Shift-and-invert parallel spectral transformations (SIPs), a computational approach to solve sparse eigenvalue problems, is developed for massively parallel architectures with exceptional parallel scalability and robustness. The capabilities of SIPs are demonstrated by diagonalization of density-functional based tight-binding (DFTB) Hamiltonian and overlap matrices for single-wall metallic carbon nanotubes, diamond nanowires, and bulk diamond crystals. The largest (smallest) example studied is a 128,000 (2000) atom nanotube for which ~330,000 (~5600) eigenvalues and eigenfunctions are obtained in ~190 (~5) seconds when parallelized over 266,144 (16,384) Blue Gene/Q cores. Weak scaling and strong scaling of SIPs are analyzed and the performance of SIPsmore » is compared with other novel methods. Different matrix ordering methods are investigated to reduce the cost of the factorization step, which dominates the time-to-solution at the strong scaling limit. As a result, a parallel implementation of assembling the density matrix from the distributed eigenvectors is demonstrated.« less
A Novel Implementation of Massively Parallel Three Dimensional Monte Carlo Radiation Transport

NASA Astrophysics Data System (ADS)

Robinson, P. B.; Peterson, J. D. L.

2005-12-01

The goal of our summer project was to implement the difference formulation for radiation transport into Cosmos++, a multidimensional, massively parallel, magneto hydrodynamics code for astrophysical applications (Peter Anninos - AX). The difference formulation is a new method for Symbolic Implicit Monte Carlo thermal transport (Brooks and Szöke - PAT). Formerly, simultaneous implementation of fully implicit Monte Carlo radiation transport in multiple dimensions on multiple processors had not been convincingly demonstrated. We found that a combination of the difference formulation and the inherent structure of Cosmos++ makes such an implementation both accurate and straightforward. We developed a "nearly nearest neighbor physics" technique to allow each processor to work independently, even with a fully implicit code. This technique coupled with the increased accuracy of an implicit Monte Carlo solution and the efficiency of parallel computing systems allows us to demonstrate the possibility of massively parallel thermal transport. This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48
Towards anatomic scale agent-based modeling with a massively parallel spatially explicit general-purpose model of enteric tissue (SEGMEnT_HPC).

PubMed

Cockrell, Robert Chase; Christley, Scott; Chang, Eugene; An, Gary

2015-01-01

Perhaps the greatest challenge currently facing the biomedical research community is the ability to integrate highly detailed cellular and molecular mechanisms to represent clinical disease states as a pathway to engineer effective therapeutics. This is particularly evident in the representation of organ-level pathophysiology in terms of abnormal tissue structure, which, through histology, remains a mainstay in disease diagnosis and staging. As such, being able to generate anatomic scale simulations is a highly desirable goal. While computational limitations have previously constrained the size and scope of multi-scale computational models, advances in the capacity and availability of high-performance computing (HPC) resources have greatly expanded the ability of computational models of biological systems to achieve anatomic, clinically relevant scale. Diseases of the intestinal tract are exemplary examples of pathophysiological processes that manifest at multiple scales of spatial resolution, with structural abnormalities present at the microscopic, macroscopic and organ-levels. In this paper, we describe a novel, massively parallel computational model of the gut, the Spatially Explicitly General-purpose Model of Enteric Tissue_HPC (SEGMEnT_HPC), which extends an existing model of the gut epithelium, SEGMEnT, in order to create cell-for-cell anatomic scale simulations. We present an example implementation of SEGMEnT_HPC that simulates the pathogenesis of ileal pouchitis, and important clinical entity that affects patients following remedial surgery for ulcerative colitis.
Scalable and massively parallel Monte Carlo photon transport simulations for heterogeneous computing platforms

NASA Astrophysics Data System (ADS)

Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian

2018-01-01

We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.
A fully coupled method for massively parallel simulation of hydraulically driven fractures in 3-dimensions: FULLY COUPLED PARALLEL SIMULATION OF HYDRAULIC FRACTURES IN 3-D

DOE PAGES

Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.; ...

2016-09-18

This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
A fully coupled method for massively parallel simulation of hydraulically driven fractures in 3-dimensions: FULLY COUPLED PARALLEL SIMULATION OF HYDRAULIC FRACTURES IN 3-D

DOE Office of Scientific and Technical Information (OSTI.GOV)

Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.

This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
LAMMPS integrated materials engine (LIME) for efficient automation of particle-based simulations: application to equation of state generation

NASA Astrophysics Data System (ADS)

Barnes, Brian C.; Leiter, Kenneth W.; Becker, Richard; Knap, Jaroslaw; Brennan, John K.

2017-07-01

We describe the development, accuracy, and efficiency of an automation package for molecular simulation, the large-scale atomic/molecular massively parallel simulator (LAMMPS) integrated materials engine (LIME). Heuristics and algorithms employed for equation of state (EOS) calculation using a particle-based model of a molecular crystal, hexahydro-1,3,5-trinitro-s-triazine (RDX), are described in detail. The simulation method for the particle-based model is energy-conserving dissipative particle dynamics, but the techniques used in LIME are generally applicable to molecular dynamics simulations with a variety of particle-based models. The newly created tool set is tested through use of its EOS data in plate impact and Taylor anvil impact continuum simulations of solid RDX. The coarse-grain model results from LIME provide an approach to bridge the scales from atomistic simulations to continuum simulations.
Massively parallel sparse matrix function calculations with NTPoly

NASA Astrophysics Data System (ADS)

Dawson, William; Nakajima, Takahito

2018-04-01

We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
Massively Parallel Assimilation of TOGA/TAO and Topex/Poseidon Measurements into a Quasi Isopycnal Ocean General Circulation Model Using an Ensemble Kalman Filter

NASA Technical Reports Server (NTRS)

Keppenne, Christian L.; Rienecker, Michele; Borovikov, Anna Y.; Suarez, Max

1999-01-01

A massively parallel ensemble Kalman filter (EnKF)is used to assimilate temperature data from the TOGA/TAO array and altimetry from TOPEX/POSEIDON into a Pacific basin version of the NASA Seasonal to Interannual Prediction Project (NSIPP)ls quasi-isopycnal ocean general circulation model. The EnKF is an approximate Kalman filter in which the error-covariance propagation step is modeled by the integration of multiple instances of a numerical model. An estimate of the true error covariances is then inferred from the distribution of the ensemble of model state vectors. This inplementation of the filter takes advantage of the inherent parallelism in the EnKF algorithm by running all the model instances concurrently. The Kalman filter update step also occurs in parallel by having each processor process the observations that occur in the region of physical space for which it is responsible. The massively parallel data assimilation system is validated by withholding some of the data and then quantifying the extent to which the withheld information can be inferred from the assimilation of the remaining data. The distributions of the forecast and analysis error covariances predicted by the ENKF are also examined.
Modular and efficient ozone systems based on massively parallel chemical processing in microchannel plasma arrays: performance and commercialization

NASA Astrophysics Data System (ADS)

Kim, M.-H.; Cho, J. H.; Park, S.-J.; Eden, J. G.

2017-08-01

Plasmachemical systems based on the production of a specific molecule (O3) in literally thousands of microchannel plasmas simultaneously have been demonstrated, developed and engineered over the past seven years, and commercialized. At the heart of this new plasma technology is the plasma chip, a flat aluminum strip fabricated by photolithographic and wet chemical processes and comprising 24-48 channels, micromachined into nanoporous aluminum oxide, with embedded electrodes. By integrating 4-6 chips into a module, the mass output of an ozone microplasma system is scaled linearly with the number of modules operating in parallel. A 115 g/hr (2.7 kg/day) ozone system, for example, is realized by the combined output of 18 modules comprising 72 chips and 1,800 microchannels. The implications of this plasma processing architecture for scaling ozone production capability, and reducing capital and service costs when introducing redundancy into the system, are profound. In contrast to conventional ozone generator technology, microplasma systems operate reliably (albeit with reduced output) in ambient air and humidity levels up to 90%, a characteristic attributable to the water adsorption/desorption properties and electrical breakdown strength of nanoporous alumina. Extensive testing has documented chip and system lifetimes (MTBF) beyond 5,000 hours, and efficiencies >130 g/kWh when oxygen is the feedstock gas. Furthermore, the weight and volume of microplasma systems are a factor of 3-10 lower than those for conventional ozone systems of comparable output. Massively-parallel plasmachemical processing offers functionality, performance, and commercial value beyond that afforded by conventional technology, and is currently in operation in more than 30 countries worldwide.
75 FR 54462 - Airworthiness Directives; Bombardier-Rotax GmbH 912 F Series and 912 S Series Reciprocating Engines

Federal Register 2010, 2011, 2012, 2013, 2014

2010-09-08

... Airworthiness Directives; Bombardier-Rotax GmbH 912 F Series and 912 S Series Reciprocating Engines AGENCY... result in exceeding of the fuel pressure and might cause engine malfunction and/or massive fuel leakage... engine malfunction or a massive fuel leak. These conditions could cause loss of control of the airplane...
A fast ultrasonic simulation tool based on massively parallel implementations

NASA Astrophysics Data System (ADS)

Lambert, Jason; Rougeron, Gilles; Lacassagne, Lionel; Chatillon, Sylvain

2014-02-01

This paper presents a CIVA optimized ultrasonic inspection simulation tool, which takes benefit of the power of massively parallel architectures: graphical processing units (GPU) and multi-core general purpose processors (GPP). This tool is based on the classical approach used in CIVA: the interaction model is based on Kirchoff, and the ultrasonic field around the defect is computed by the pencil method. The model has been adapted and parallelized for both architectures. At this stage, the configurations addressed by the tool are : multi and mono-element probes, planar specimens made of simple isotropic materials, planar rectangular defects or side drilled holes of small diameter. Validations on the model accuracy and performances measurements are presented.
Ordered fast Fourier transforms on a massively parallel hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Tong, Charles; Swarztrauber, Paul N.

1991-01-01

The present evaluation of alternative, massively parallel hypercube processor-applicable designs for ordered radix-2 decimation-in-frequency FFT algorithms gives attention to the reduction of computation time-dominating communication. A combination of the order and computational phases of the FFT is accordingly employed, in conjunction with sequence-to-processor maps which reduce communication. Two orderings, 'standard' and 'cyclic', in which the order of the transform is the same as that of the input sequence, can be implemented with ease on the Connection Machine (where orderings are determined by geometries and priorities. A parallel method for trigonometric coefficient computation is presented which does not employ trigonometric functions or interprocessor communication.
The Fortran-P Translator: Towards Automatic Translation of Fortran 77 Programs for Massively Parallel Processors

DOE PAGES

O'keefe, Matthew; Parr, Terence; Edgar, B. Kevin; ...

1995-01-01

Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. Wemore » have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.« less
On the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods

PubMed Central

Lee, Anthony; Yau, Christopher; Giles, Michael B.; Doucet, Arnaud; Holmes, Christopher C.

2011-01-01

We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. PMID:22003276
Analysis of multigrid methods on massively parallel computers: Architectural implications

NASA Technical Reports Server (NTRS)

Matheson, Lesley R.; Tarjan, Robert E.

1993-01-01

We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.
FBIS report. Science and technology: Europe/International, March 29, 1996

DOE Office of Scientific and Technical Information (OSTI.GOV)

NONE

1996-03-29

;Partial Contents: Advanced Materials (EU Project to Improve Production in Metal Matrix Compounds Noted, Germany: Extremely Hard Carbon Coating Development, Italy: Director of CNR Metallic Materials Institute Interviewed); Aerospace (ESA Considers Delays, Reductions as Result of Budget Cuts, Italy: Space Agency`s Director on Restructuring, Future Plans); Automotive, Transportation (EU: Clean Diesel Engine Technology Research Reviewed); Biotechnology (Germany`s Problems, Successes in Biotechnology Discussed); Computers (EU Europort Parallel Computing Project Concluded, Italy: PQE 2000 Project on Massively Parallel Systems Viewed); Defense R&D (France: Future Tasks of `Brevel` Military Intelligence Drone Noted); Energy, Environment (German Scientist Tests Elimination of Phosphates); Advanced Manufacturing (France:more » Advanced Rapid Prototyping System Presented); Lasers, Sensors, Optics (France: Strategy of Cilas Laser Company Detailed); Microelectronics (France: Simulation Company to Develop Microelectronic Manufacturing Application); Nuclear R&D (France: Megajoule Laser Plan, Cooperation with Livermore Lab Noted); S&T Policy (EU Efforts to Aid Small Companies` Research Viewed); Telecommunications (France Telecom`s Way to Internet).« less
Datacube Services in Action, Using Open Source and Open Standards

NASA Astrophysics Data System (ADS)

Baumann, P.; Misev, D.

2016-12-01

Array Databases comprise novel, promising technology for massive spatio-temporal datacubes, extending the SQL paradigm of "any query, anytime" to n-D arrays. On server side, such queries can be optimized, parallelized, and distributed based on partitioned array storage. The rasdaman ("raster data manager") system, which has pioneered Array Databases, is available in open source on www.rasdaman.org. Its declarative query language extends SQL with array operators which are optimized and parallelized on server side. The rasdaman engine, which is part of OSGeo Live, is mature and in operational use databases individually holding dozens of Terabytes. Further, the rasdaman concepts have strongly impacted international Big Data standards in the field, including the forthcoming MDA ("Multi-Dimensional Array") extension to ISO SQL, the OGC Web Coverage Service (WCS) and Web Coverage Processing Service (WCPS) standards, and the forthcoming INSPIRE WCS/WCPS; in both OGC and INSPIRE, OGC is WCS Core Reference Implementation. In our talk we present concepts, architecture, operational services, and standardization impact of open-source rasdaman, as well as experiences made.
Coupling molecular dynamics with lattice Boltzmann method based on the immersed boundary method

NASA Astrophysics Data System (ADS)

Tan, Jifu; Sinno, Talid; Diamond, Scott

2017-11-01

The study of viscous fluid flow coupled with rigid or deformable solids has many applications in biological and engineering problems, e.g., blood cell transport, drug delivery, and particulate flow. We developed a partitioned approach to solve this coupled Multiphysics problem. The fluid motion was solved by Palabos (Parallel Lattice Boltzmann Solver), while the solid displacement and deformation was simulated by LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator). The coupling was achieved through the immersed boundary method (IBM). The code modeled both rigid and deformable solids exposed to flow. The code was validated with the classic problem of rigid ellipsoid particle orbit in shear flow, blood cell stretching test and effective blood viscosity, and demonstrated essentially linear scaling over 16 cores. An example of the fluid-solid coupling was given for flexible filaments (drug carriers) transport in a flowing blood cell suspensions, highlighting the advantages and capabilities of the developed code. NIH 1U01HL131053-01A1.
High-performance computing — an overview

NASA Astrophysics Data System (ADS)

Marksteiner, Peter

1996-08-01

An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.

Fast I/O for Massively Parallel Applications

NASA Technical Reports Server (NTRS)

OKeefe, Matthew T.

1996-01-01

The two primary goals for this report were the design, contruction and modeling of parallel disk arrays for scientific visualization and animation, and a study of the IO requirements of highly parallel applications. In addition, further work in parallel display systems required to project and animate the very high-resolution frames resulting from our supercomputing simulations in ocean circulation and compressible gas dynamics.
Performance Analysis and Optimization on the UCLA Parallel Atmospheric General Circulation Model Code

NASA Technical Reports Server (NTRS)

Lou, John; Ferraro, Robert; Farrara, John; Mechoso, Carlos

1996-01-01

An analysis is presented of several factors influencing the performance of a parallel implementation of the UCLA atmospheric general circulation model (AGCM) on massively parallel computer systems. Several modificaitons to the original parallel AGCM code aimed at improving its numerical efficiency, interprocessor communication cost, load-balance and issues affecting single-node code performance are discussed.
LDRD final report on massively-parallel linear programming : the parPCx system.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar

2005-02-01

This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less
Optimizing ion channel models using a parallel genetic algorithm on graphical processors.

PubMed

Ben-Shalom, Roy; Aviv, Amit; Razon, Benjamin; Korngreen, Alon

2012-01-01

We have recently shown that we can semi-automatically constrain models of voltage-gated ion channels by combining a stochastic search algorithm with ionic currents measured using multiple voltage-clamp protocols. Although numerically successful, this approach is highly demanding computationally, with optimization on a high performance Linux cluster typically lasting several days. To solve this computational bottleneck we converted our optimization algorithm for work on a graphical processing unit (GPU) using NVIDIA's CUDA. Parallelizing the process on a Fermi graphic computing engine from NVIDIA increased the speed ∼180 times over an application running on an 80 node Linux cluster, considerably reducing simulation times. This application allows users to optimize models for ion channel kinetics on a single, inexpensive, desktop "super computer," greatly reducing the time and cost of building models relevant to neuronal physiology. We also demonstrate that the point of algorithm parallelization is crucial to its performance. We substantially reduced computing time by solving the ODEs (Ordinary Differential Equations) so as to massively reduce memory transfers to and from the GPU. This approach may be applied to speed up other data intensive applications requiring iterative solutions of ODEs. Copyright © 2012 Elsevier B.V. All rights reserved.
Graphics processing unit (GPU)-based computation of heat conduction in thermally anisotropic solids

NASA Astrophysics Data System (ADS)

Nahas, C. A.; Balasubramaniam, Krishnan; Rajagopal, Prabhu

2013-01-01

Numerical modeling of anisotropic media is a computationally intensive task since it brings additional complexity to the field problem in such a way that the physical properties are different in different directions. Largely used in the aerospace industry because of their lightweight nature, composite materials are a very good example of thermally anisotropic media. With advancements in video gaming technology, parallel processors are much cheaper today and accessibility to higher-end graphical processing devices has increased dramatically over the past couple of years. Since these massively parallel GPUs are very good in handling floating point arithmetic, they provide a new platform for engineers and scientists to accelerate their numerical models using commodity hardware. In this paper we implement a parallel finite difference model of thermal diffusion through anisotropic media using the NVIDIA CUDA (Compute Unified device Architecture). We use the NVIDIA GeForce GTX 560 Ti as our primary computing device which consists of 384 CUDA cores clocked at 1645 MHz with a standard desktop pc as the host platform. We compare the results from standard CPU implementation for its accuracy and speed and draw implications for simulation using the GPU paradigm.
Block iterative restoration of astronomical images with the massively parallel processor

NASA Technical Reports Server (NTRS)

Heap, Sara R.; Lindler, Don J.

1987-01-01

A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images.
A Massively Parallel Code for Polarization Calculations

NASA Astrophysics Data System (ADS)

Akiyama, Shizuka; Höflich, Peter

2001-03-01

We present an implementation of our Monte-Carlo radiation transport method for rapidly expanding, NLTE atmospheres for massively parallel computers which utilizes both the distributed and shared memory models. This allows us to take full advantage of the fast communication and low latency inherent to nodes with multiple CPUs, and to stretch the limits of scalability with the number of nodes compared to a version which is based on the shared memory model. Test calculations on a local 20-node Beowulf cluster with dual CPUs showed an improved scalability by about 40%.
Routing performance analysis and optimization within a massively parallel computer

DOEpatents

Archer, Charles Jens; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen

2013-04-16

An apparatus, program product and method optimize the operation of a massively parallel computer system by, in part, receiving actual performance data concerning an application executed by the plurality of interconnected nodes, and analyzing the actual performance data to identify an actual performance pattern. A desired performance pattern may be determined for the application, and an algorithm may be selected from among a plurality of algorithms stored within a memory, the algorithm being configured to achieve the desired performance pattern based on the actual performance data.
Progressive Vector Quantization on a massively parallel SIMD machine with application to multispectral image data

NASA Technical Reports Server (NTRS)

Manohar, Mareboyana; Tilton, James C.

1994-01-01

A progressive vector quantization (VQ) compression approach is discussed which decomposes image data into a number of levels using full search VQ. The final level is losslessly compressed, enabling lossless reconstruction. The computational difficulties are addressed by implementation on a massively parallel SIMD machine. We demonstrate progressive VQ on multispectral imagery obtained from the Advanced Very High Resolution Radiometer instrument and other Earth observation image data, and investigate the trade-offs in selecting the number of decomposition levels and codebook training method.
Designing Liquid Rocket Engine Injectors for Performance, Stability, and Cost

NASA Technical Reports Server (NTRS)

Westra, Douglas G.; West, Jeffrey S.

2014-01-01

NASA is developing the Space Launch System (SLS) for crewed exploration missions beyond low Earth orbit. Marshall Space Flight Center (MSFC) is designing rocket engines for the SLS Advanced Booster (AB) concepts being developed to replace the Shuttle-derived solid rocket boosters. One AB concept uses large, Rocket-Propellant (RP)-fueled engines that pose significant design challenges. The injectors for these engines require high performance and stable operation while still meeting aggressive cost reduction goals for access to space. Historically, combustion stability problems have been a critical issue for such injector designs. Traditional, empirical injector design tools and methodologies, however, lack the ability to reliably predict complex injector dynamics that often lead to combustion stability. Reliance on these tools alone would likely result in an unaffordable test-fail-fix cycle for injector development. Recently at MSFC, a massively parallel computational fluid dynamics (CFD) program was successfully applied in the SLS AB injector design process. High-fidelity reacting flow simulations were conducted for both single-element and seven-element representations of the full-scale injector. Data from the CFD simulations was then used to significantly augment and improve the empirical design tools, resulting in a high-performance, stable injector design.
Regional-scale calculation of the LS factor using parallel processing

NASA Astrophysics Data System (ADS)

Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong

2015-05-01

With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.
Tough2{_}MP: A parallel version of TOUGH2

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Keni; Wu, Yu-Shu; Ding, Chris

2003-04-09

TOUGH2{_}MP is a massively parallel version of TOUGH2. It was developed for running on distributed-memory parallel computers to simulate large simulation problems that may not be solved by the standard, single-CPU TOUGH2 code. The new code implements an efficient massively parallel scheme, while preserving the full capacity and flexibility of the original TOUGH2 code. The new software uses the METIS software package for grid partitioning and AZTEC software package for linear-equation solving. The standard message-passing interface is adopted for communication among processors. Numerical performance of the current version code has been tested on CRAY-T3E and IBM RS/6000 SP platforms. Inmore » addition, the parallel code has been successfully applied to real field problems of multi-million-cell simulations for three-dimensional multiphase and multicomponent fluid and heat flow, as well as solute transport. In this paper, we will review the development of the TOUGH2{_}MP, and discuss the basic features, modules, and their applications.« less
Massively parallel processor computer

NASA Technical Reports Server (NTRS)

Fung, L. W. (Inventor)

1983-01-01

An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.
Compact holographic optical neural network system for real-time pattern recognition

NASA Astrophysics Data System (ADS)

Lu, Taiwei; Mintzer, David T.; Kostrzewski, Andrew A.; Lin, Freddie S.

1996-08-01

One of the important characteristics of artificial neural networks is their capability for massive interconnection and parallel processing. Recently, specialized electronic neural network processors and VLSI neural chips have been introduced in the commercial market. The number of parallel channels they can handle is limited because of the limited parallel interconnections that can be implemented with 1D electronic wires. High-resolution pattern recognition problems can require a large number of neurons for parallel processing of an image. This paper describes a holographic optical neural network (HONN) that is based on high- resolution volume holographic materials and is capable of performing massive 3D parallel interconnection of tens of thousands of neurons. A HONN with more than 16,000 neurons packaged in an attache case has been developed. Rotation- shift-scale-invariant pattern recognition operations have been demonstrated with this system. System parameters such as the signal-to-noise ratio, dynamic range, and processing speed are discussed.
Isolation and characterization of circulating tumor cells using a novel workflow combining the CellSearch® system and the CellCelector™.

PubMed

Neumann, Martin Horst Dieter; Schneck, Helen; Decker, Yvonne; Schömer, Susanne; Franken, André; Endris, Volker; Pfarr, Nicole; Weichert, Wilko; Niederacher, Dieter; Fehm, Tanja; Neubauer, Hans

2017-01-01

Circulating tumor cells (CTC) are rare cells which have left the primary tumor to enter the blood stream. Although only a small CTC subgroup is capable of extravasating, the presence of CTCs is associated with an increased risk of metastasis and a shorter overall survival. Understanding the heterogeneous CTC biology will optimize treatment decisions and will thereby improve patient outcome. For this, robust workflows for detection and isolation of CTCs are urgently required. Here, we present a workflow to characterize CTCs by combining the advantages of both the CellSearch ® and the CellCelector™ micromanipulation system. CTCs were isolated from CellSearch ® cartridges using the CellCelector™ system and were deposited into PCR tubes for subsequent molecular analysis (whole genome amplification (WGA) and massive parallel multigene sequencing). By a CellCelector™ screen we reidentified 97% of CellSearch ® SKBR-3 cells. Furthermore, we isolated 97% of CellSearch ® -proven patient CTCs using the CellCelector™ system. Therein, we found an almost perfect correlation of R 2 = 0.98 (Spearman's rho correlation, n = 20, p < 0.00001) between the CellSearch ® CTC count (n = 271) and the CellCelector™ detected CTCs (n = 252). Isolated CTCs were analyzed by WGA and massive parallel multigene sequencing. In total, single nucleotide polymorphisms (SNPs) could be detected in 50 genes in seven CTCs, 12 MCF-7, and 3 T47D cells, respectively. Taken together, CTC quantification via the CellCelector™ system ensures a comprehensive detection of CTCs preidentified by the CellSearch ® system. Moreover, the isolation of CTCs after CellSearch ® using the CellCelector™ system guarantees for CTC enrichment without any contaminants enabling subsequent high throughput genomic analyses on single cell level. © 2016 American Institute of Chemical Engineers Biotechnol. Prog., 33:125-132, 2017. © 2016 American Institute of Chemical Engineers.
Supercomputing with TOUGH2 family codes for coupled multi-physics simulations of geologic carbon sequestration

NASA Astrophysics Data System (ADS)

Yamamoto, H.; Nakajima, K.; Zhang, K.; Nanai, S.

2015-12-01

Powerful numerical codes that are capable of modeling complex coupled processes of physics and chemistry have been developed for predicting the fate of CO2 in reservoirs as well as its potential impacts on groundwater and subsurface environments. However, they are often computationally demanding for solving highly non-linear models in sufficient spatial and temporal resolutions. Geological heterogeneity and uncertainties further increase the challenges in modeling works. Two-phase flow simulations in heterogeneous media usually require much longer computational time than that in homogeneous media. Uncertainties in reservoir properties may necessitate stochastic simulations with multiple realizations. Recently, massively parallel supercomputers with more than thousands of processors become available in scientific and engineering communities. Such supercomputers may attract attentions from geoscientist and reservoir engineers for solving the large and non-linear models in higher resolutions within a reasonable time. However, for making it a useful tool, it is essential to tackle several practical obstacles to utilize large number of processors effectively for general-purpose reservoir simulators. We have implemented massively-parallel versions of two TOUGH2 family codes (a multi-phase flow simulator TOUGH2 and a chemically reactive transport simulator TOUGHREACT) on two different types (vector- and scalar-type) of supercomputers with a thousand to tens of thousands of processors. After completing implementation and extensive tune-up on the supercomputers, the computational performance was measured for three simulations with multi-million grid models, including a simulation of the dissolution-diffusion-convection process that requires high spatial and temporal resolutions to simulate the growth of small convective fingers of CO2-dissolved water to larger ones in a reservoir scale. The performance measurement confirmed that the both simulators exhibit excellent scalabilities showing almost linear speedup against number of processors up to over ten thousand cores. Generally this allows us to perform coupled multi-physics (THC) simulations on high resolution geologic models with multi-million grid in a practical time (e.g., less than a second per time step).
Towards Anatomic Scale Agent-Based Modeling with a Massively Parallel Spatially Explicit General-Purpose Model of Enteric Tissue (SEGMEnT_HPC)

PubMed Central

Cockrell, Robert Chase; Christley, Scott; Chang, Eugene; An, Gary

2015-01-01

Perhaps the greatest challenge currently facing the biomedical research community is the ability to integrate highly detailed cellular and molecular mechanisms to represent clinical disease states as a pathway to engineer effective therapeutics. This is particularly evident in the representation of organ-level pathophysiology in terms of abnormal tissue structure, which, through histology, remains a mainstay in disease diagnosis and staging. As such, being able to generate anatomic scale simulations is a highly desirable goal. While computational limitations have previously constrained the size and scope of multi-scale computational models, advances in the capacity and availability of high-performance computing (HPC) resources have greatly expanded the ability of computational models of biological systems to achieve anatomic, clinically relevant scale. Diseases of the intestinal tract are exemplary examples of pathophysiological processes that manifest at multiple scales of spatial resolution, with structural abnormalities present at the microscopic, macroscopic and organ-levels. In this paper, we describe a novel, massively parallel computational model of the gut, the Spatially Explicitly General-purpose Model of Enteric Tissue_HPC (SEGMEnT_HPC), which extends an existing model of the gut epithelium, SEGMEnT, in order to create cell-for-cell anatomic scale simulations. We present an example implementation of SEGMEnT_HPC that simulates the pathogenesis of ileal pouchitis, and important clinical entity that affects patients following remedial surgery for ulcerative colitis. PMID:25806784
DOE Office of Scientific and Technical Information (OSTI.GOV)

Barrett, Brian; Brightwell, Ronald B.; Grant, Ryan

This report presents a specification for the Portals 4 networ k programming interface. Portals 4 is intended to allow scalable, high-performance network communication betwee n nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded syste ms. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platfor ms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is tarmore » geted to the next generation of machines employing advanced network interface architectures that support enh anced offload capabilities.« less
The Portals 4.0 network programming interface.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin

2012-11-01

This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities.« less
Scalable isosurface visualization of massive datasets on commodity off-the-shelf clusters

PubMed Central

Bajaj, Chandrajit

2009-01-01

Tomographic imaging and computer simulations are increasingly yielding massive datasets. Interactive and exploratory visualizations have rapidly become indispensable tools to study large volumetric imaging and simulation data. Our scalable isosurface visualization framework on commodity off-the-shelf clusters is an end-to-end parallel and progressive platform, from initial data access to the final display. Interactive browsing of extracted isosurfaces is made possible by using parallel isosurface extraction, and rendering in conjunction with a new specialized piece of image compositing hardware called Metabuffer. In this paper, we focus on the back end scalability by introducing a fully parallel and out-of-core isosurface extraction algorithm. It achieves scalability by using both parallel and out-of-core processing and parallel disks. It statically partitions the volume data to parallel disks with a balanced workload spectrum, and builds I/O-optimal external interval trees to minimize the number of I/O operations of loading large data from disk. We also describe an isosurface compression scheme that is efficient for progress extraction, transmission and storage of isosurfaces. PMID:19756231

3-D parallel program for numerical calculation of gas dynamics problems with heat conductivity on distributed memory computational systems (CS)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sofronov, I.D.; Voronin, B.L.; Butnev, O.I.

1997-12-31

The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle.more » The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.« less
Massively parallel multicanonical simulations

NASA Astrophysics Data System (ADS)

Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard

2018-03-01

Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.
Large-scale large eddy simulation of nuclear reactor flows: Issues and perspectives

DOE Office of Scientific and Technical Information (OSTI.GOV)

Merzari, Elia; Obabko, Aleks; Fischer, Paul

Numerical simulation has been an intrinsic part of nuclear engineering research since its inception. In recent years a transition is occurring toward predictive, first-principle-based tools such as computational fluid dynamics. Even with the advent of petascale computing, however, such tools still have significant limitations. In the present work some of these issues, and in particular the presence of massive multiscale separation, are discussed, as well as some of the research conducted to mitigate them. Petascale simulations at high fidelity (large eddy simulation/direct numerical simulation) were conducted with the massively parallel spectral element code Nek5000 on a series of representative problems.more » These simulations shed light on the requirements of several types of simulation: (1) axial flow around fuel rods, with particular attention to wall effects; (2) natural convection in the primary vessel; and (3) flow in a rod bundle in the presence of spacing devices. Finally, the focus of the work presented here is on the lessons learned and the requirements to perform these simulations at exascale. Additional physical insight gained from these simulations is also emphasized.« less
Toward Exascale Earthquake Ground Motion Simulations for Near-Fault Engineering Analysis

DOE PAGES

Johansen, Hans; Rodgers, Arthur; Petersson, N. Anders; ...

2017-09-01

Modernizing SW4 for massively parallel time-domain simulations of earthquake ground motions in 3D earth models increases resolution and provides ground motion estimates for critical infrastructure risk evaluations. Simulations of ground motions from large (M ≥ 7.0) earthquakes require domains on the order of 100 to500 km and spatial granularity on the order of 1 to5 m resulting in hundreds of billions of grid points. Surface-focused structured mesh refinement (SMR) allows for more constant grid point per wavelength scaling in typical Earth models, where wavespeeds increase with depth. In fact, MR allows for simulations to double the frequency content relative tomore » a fixed grid calculation on a given resource. The authors report improvements to the SW4 algorithm developed while porting the code to the Cori Phase 2 (Intel Xeon Phi) systems at the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory. As a result, investigations of the performance of the innermost loop of the calculations found that reorganizing the order of operations can improve performance for massive problems.« less
Toward Exascale Earthquake Ground Motion Simulations for Near-Fault Engineering Analysis

DOE Office of Scientific and Technical Information (OSTI.GOV)

Johansen, Hans; Rodgers, Arthur; Petersson, N. Anders

Modernizing SW4 for massively parallel time-domain simulations of earthquake ground motions in 3D earth models increases resolution and provides ground motion estimates for critical infrastructure risk evaluations. Simulations of ground motions from large (M ≥ 7.0) earthquakes require domains on the order of 100 to500 km and spatial granularity on the order of 1 to5 m resulting in hundreds of billions of grid points. Surface-focused structured mesh refinement (SMR) allows for more constant grid point per wavelength scaling in typical Earth models, where wavespeeds increase with depth. In fact, MR allows for simulations to double the frequency content relative tomore » a fixed grid calculation on a given resource. The authors report improvements to the SW4 algorithm developed while porting the code to the Cori Phase 2 (Intel Xeon Phi) systems at the National Energy Research Scientific Computing Center (NERSC) at Lawrence Berkeley National Laboratory. As a result, investigations of the performance of the innermost loop of the calculations found that reorganizing the order of operations can improve performance for massive problems.« less
Large-scale large eddy simulation of nuclear reactor flows: Issues and perspectives

DOE PAGES

Merzari, Elia; Obabko, Aleks; Fischer, Paul; ...

2016-11-03

Numerical simulation has been an intrinsic part of nuclear engineering research since its inception. In recent years a transition is occurring toward predictive, first-principle-based tools such as computational fluid dynamics. Even with the advent of petascale computing, however, such tools still have significant limitations. In the present work some of these issues, and in particular the presence of massive multiscale separation, are discussed, as well as some of the research conducted to mitigate them. Petascale simulations at high fidelity (large eddy simulation/direct numerical simulation) were conducted with the massively parallel spectral element code Nek5000 on a series of representative problems.more » These simulations shed light on the requirements of several types of simulation: (1) axial flow around fuel rods, with particular attention to wall effects; (2) natural convection in the primary vessel; and (3) flow in a rod bundle in the presence of spacing devices. Finally, the focus of the work presented here is on the lessons learned and the requirements to perform these simulations at exascale. Additional physical insight gained from these simulations is also emphasized.« less
Thought Leaders during Crises in Massive Social Networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Corley, Courtney D.; Farber, Robert M.; Reynolds, William

The vast amount of social media data that can be gathered from the internet coupled with workflows that utilize both commodity systems and massively parallel supercomputers, such as the Cray XMT, open new vistas for research to support health, defense, and national security. Computer technology now enables the analysis of graph structures containing more than 4 billion vertices joined by 34 billion edges along with metrics and massively parallel algorithms that exhibit near-linear scalability according to number of processors. The challenge lies in making this massive data and analysis comprehensible to an analyst and end-users that require actionable knowledge tomore » carry out their duties. Simply stated, we have developed language and content agnostic techniques to reduce large graphs built from vast media corpora into forms people can understand. Specifically, our tools and metrics act as a survey tool to identify thought leaders' -- those members that lead or reflect the thoughts and opinions of an online community, independent of the source language.« less
FWT2D: A massively parallel program for frequency-domain full-waveform tomography of wide-aperture seismic data—Part 1: Algorithm

NASA Astrophysics Data System (ADS)

Sourbier, Florent; Operto, Stéphane; Virieux, Jean; Amestoy, Patrick; L'Excellent, Jean-Yves

2009-03-01

This is the first paper in a two-part series that describes a massively parallel code that performs 2D frequency-domain full-waveform inversion of wide-aperture seismic data for imaging complex structures. Full-waveform inversion methods, namely quantitative seismic imaging methods based on the resolution of the full wave equation, are computationally expensive. Therefore, designing efficient algorithms which take advantage of parallel computing facilities is critical for the appraisal of these approaches when applied to representative case studies and for further improvements. Full-waveform modelling requires the resolution of a large sparse system of linear equations which is performed with the massively parallel direct solver MUMPS for efficient multiple-shot simulations. Efficiency of the multiple-shot solution phase (forward/backward substitutions) is improved by using the BLAS3 library. The inverse problem relies on a classic local optimization approach implemented with a gradient method. The direct solver returns the multiple-shot wavefield solutions distributed over the processors according to a domain decomposition driven by the distribution of the LU factors. The domain decomposition of the wavefield solutions is used to compute in parallel the gradient of the objective function and the diagonal Hessian, this latter providing a suitable scaling of the gradient. The algorithm allows one to test different strategies for multiscale frequency inversion ranging from successive mono-frequency inversion to simultaneous multifrequency inversion. These different inversion strategies will be illustrated in the following companion paper. The parallel efficiency and the scalability of the code will also be quantified.
Massively Parallel, Molecular Analysis Platform Developed Using a CMOS Integrated Circuit With Biological Nanopores

PubMed Central

Roever, Stefan

2012-01-01

A massively parallel, low cost molecular analysis platform will dramatically change the nature of protein, molecular and genomics research, DNA sequencing, and ultimately, molecular diagnostics. An integrated circuit (IC) with 264 sensors was fabricated using standard CMOS semiconductor processing technology. Each of these sensors is individually controlled with precision analog circuitry and is capable of single molecule measurements. Under electronic and software control, the IC was used to demonstrate the feasibility of creating and detecting lipid bilayers and biological nanopores using wild type α-hemolysin. The ability to dynamically create bilayers over each of the sensors will greatly accelerate pore development and pore mutation analysis. In addition, the noise performance of the IC was measured to be 30fA(rms). With this noise performance, single base detection of DNA was demonstrated using α-hemolysin. The data shows that a single molecule, electrical detection platform using biological nanopores can be operationalized and can ultimately scale to millions of sensors. Such a massively parallel platform will revolutionize molecular analysis and will completely change the field of molecular diagnostics in the future.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Wylie, Brian Neil; Moreland, Kenneth D.

Graphs are a vital way of organizing data with complex correlations. A good visualization of a graph can fundamentally change human understanding of the data. Consequently, there is a rich body of work on graph visualization. Although there are many techniques that are effective on small to medium sized graphs (tens of thousands of nodes), there is a void in the research for visualizing massive graphs containing millions of nodes. Sandia is one of the few entities in the world that has the means and motivation to handle data on such a massive scale. For example, homeland security generates graphsmore » from prolific media sources such as television, telephone, and the Internet. The purpose of this project is to provide the groundwork for visualizing such massive graphs. The research provides for two major feature gaps: a parallel, interactive visualization framework and scalable algorithms to make the framework usable to a practical application. Both the frameworks and algorithms are designed to run on distributed parallel computers, which are already available at Sandia. Some features are integrated into the ThreatView{trademark} application and future work will integrate further parallel algorithms.« less
Algorithmic trends in computational fluid dynamics; The Institute for Computer Applications in Science and Engineering (ICASE)/LaRC Workshop, NASA Langley Research Center, Hampton, VA, US, Sep. 15-17, 1991

NASA Technical Reports Server (NTRS)

Hussaini, M. Y. (Editor); Kumar, A. (Editor); Salas, M. D. (Editor)

1993-01-01

The purpose here is to assess the state of the art in the areas of numerical analysis that are particularly relevant to computational fluid dynamics (CFD), to identify promising new developments in various areas of numerical analysis that will impact CFD, and to establish a long-term perspective focusing on opportunities and needs. Overviews are given of discretization schemes, computational fluid dynamics, algorithmic trends in CFD for aerospace flow field calculations, simulation of compressible viscous flow, and massively parallel computation. Also discussed are accerelation methods, spectral and high-order methods, multi-resolution and subcell resolution schemes, and inherently multidimensional schemes.
The portals 4.0.1 network programming interface.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin

2013-04-01

This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities. 3« less
Merlin - Massively parallel heterogeneous computing

NASA Technical Reports Server (NTRS)

Wittie, Larry; Maples, Creve

1989-01-01

Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.
Massively parallel quantum computer simulator

NASA Astrophysics Data System (ADS)

De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.

2007-01-01

We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.
Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by dynamic global mapping of contended links

DOEpatents

Archer, Charles Jens [Rochester, MN; Musselman, Roy Glenn [Rochester, MN; Peters, Amanda [Rochester, MN; Pinnow, Kurt Walter [Rochester, MN; Swartz, Brent Allen [Chippewa Falls, WI; Wallenfelt, Brian Paul [Eden Prairie, MN

2011-10-04

A massively parallel nodal computer system periodically collects and broadcasts usage data for an internal communications network. A node sending data over the network makes a global routing determination using the network usage data. Preferably, network usage data comprises an N-bit usage value for each output buffer associated with a network link. An optimum routing is determined by summing the N-bit values associated with each link through which a data packet must pass, and comparing the sums associated with different possible routes.
Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by dynamically adjusting local routing strategies

DOEpatents

Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul

2010-03-16

A massively parallel computer system contains an inter-nodal communications network of node-to-node links. Each node implements a respective routing strategy for routing data through the network, the routing strategies not necessarily being the same in every node. The routing strategies implemented in the nodes are dynamically adjusted during application execution to shift network workload as required. Preferably, adjustment of routing policies in selective nodes is performed at synchronization points. The network may be dynamically monitored, and routing strategies adjusted according to detected network conditions.
Communication: A reduced scaling J-engine based reformulation of SOS-MP2 using graphics processing units.

PubMed

Maurer, S A; Kussmann, J; Ochsenfeld, C

2014-08-07

We present a low-prefactor, cubically scaling scaled-opposite-spin second-order Møller-Plesset perturbation theory (SOS-MP2) method which is highly suitable for massively parallel architectures like graphics processing units (GPU). The scaling is reduced from O(N⁵) to O(N³) by a reformulation of the MP2-expression in the atomic orbital basis via Laplace transformation and the resolution-of-the-identity (RI) approximation of the integrals in combination with efficient sparse algebra for the 3-center integral transformation. In contrast to previous works that employ GPUs for post Hartree-Fock calculations, we do not simply employ GPU-based linear algebra libraries to accelerate the conventional algorithm. Instead, our reformulation allows to replace the rate-determining contraction step with a modified J-engine algorithm, that has been proven to be highly efficient on GPUs. Thus, our SOS-MP2 scheme enables us to treat large molecular systems in an accurate and efficient manner on a single GPU-server.
Parallel processing architecture for H.264 deblocking filter on multi-core platforms

NASA Astrophysics Data System (ADS)

Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao

2012-03-01

Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking filter for multi core platforms such as HyperX technology. Parallel techniques such as parallel processing of independent macroblocks, sub blocks, and pixel row level are examined in this work. The deblocking architecture consists of a basic cell called deblocking filter unit (DFU) and dependent data buffer manager (DFM). The DFU can be used in several instances, catering to different performance needs the DFM serves the data required for the different number of DFUs, and also manages all the neighboring data required for future data processing of DFUs. This approach achieves the scalability, flexibility, and performance excellence required in deblocking filters.
Massively parallel data processing for quantitative total flow imaging with optical coherence microscopy and tomography

NASA Astrophysics Data System (ADS)

Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo

2017-08-01

We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU data processing time)
Time-dependent density-functional theory in massively parallel computer architectures: the octopus project

NASA Astrophysics Data System (ADS)

Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.

2012-06-01

Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.

Time-dependent density-functional theory in massively parallel computer architectures: the OCTOPUS project.

PubMed

Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A L

2012-06-13

Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
Parallel Tensor Compression for Large-Scale Scientific Data.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kolda, Tamara G.; Ballard, Grey; Austin, Woody Nathan

As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data. By viewing the data as a dense five way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 10000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed memorymore » parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.« less
Optical Symbolic Computing

NASA Astrophysics Data System (ADS)

Neff, John A.

1989-12-01

Experiments originating from Gestalt psychology have shown that representing information in a symbolic form provides a more effective means to understanding. Computer scientists have been struggling for the last two decades to determine how best to create, manipulate, and store collections of symbolic structures. In the past, much of this struggling led to software innovations because that was the path of least resistance. For example, the development of heuristics for organizing the searching through knowledge bases was much less expensive than building massively parallel machines that could search in parallel. That is now beginning to change with the emergence of parallel architectures which are showing the potential for handling symbolic structures. This paper will review the relationships between symbolic computing and parallel computing architectures, and will identify opportunities for optics to significantly impact the performance of such computing machines. Although neural networks are an exciting subset of massively parallel computing structures, this paper will not touch on this area since it is receiving a great deal of attention in the literature. That is, the concepts presented herein do not consider the distributed representation of knowledge.
Parallel auto-correlative statistics with VTK.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pebay, Philippe Pierre; Bennett, Janine Camille

2013-08-01

This report summarizes existing statistical engines in VTK and presents both the serial and parallel auto-correlative statistics engines. It is a sequel to [PT08, BPRT09b, PT09, BPT09, PT10] which studied the parallel descriptive, correlative, multi-correlative, principal component analysis, contingency, k-means, and order statistics engines. The ease of use of the new parallel auto-correlative statistics engine is illustrated by the means of C++ code snippets and algorithm verification is provided. This report justifies the design of the statistics engines with parallel scalability in mind, and provides scalability and speed-up analysis results for the autocorrelative statistics engine.
Massively Parallel DNA Sequencing Facilitates Diagnosis of Patients with Usher Syndrome Type 1

PubMed Central

Yoshimura, Hidekane; Iwasaki, Satoshi; Nishio, Shin-ya; Kumakawa, Kozo; Tono, Tetsuya; Kobayashi, Yumiko; Sato, Hiroaki; Nagai, Kyoko; Ishikawa, Kotaro; Ikezono, Tetsuo; Naito, Yasushi; Fukushima, Kunihiro; Oshikawa, Chie; Kimitsuki, Takashi; Nakanishi, Hiroshi; Usami, Shin-ichi

2014-01-01

Usher syndrome is an autosomal recessive disorder manifesting hearing loss, retinitis pigmentosa and vestibular dysfunction, and having three clinical subtypes. Usher syndrome type 1 is the most severe subtype due to its profound hearing loss, lack of vestibular responses, and retinitis pigmentosa that appears in prepuberty. Six of the corresponding genes have been identified, making early diagnosis through DNA testing possible, with many immediate and several long-term advantages for patients and their families. However, the conventional genetic techniques, such as direct sequence analysis, are both time-consuming and expensive. Targeted exon sequencing of selected genes using the massively parallel DNA sequencing technology will potentially enable us to systematically tackle previously intractable monogenic disorders and improve molecular diagnosis. Using this technique combined with direct sequence analysis, we screened 17 unrelated Usher syndrome type 1 patients and detected probable pathogenic variants in the 16 of them (94.1%) who carried at least one mutation. Seven patients had the MYO7A mutation (41.2%), which is the most common type in Japanese. Most of the mutations were detected by only the massively parallel DNA sequencing. We report here four patients, who had probable pathogenic mutations in two different Usher syndrome type 1 genes, and one case of MYO7A/PCDH15 digenic inheritance. This is the first report of Usher syndrome mutation analysis using massively parallel DNA sequencing and the frequency of Usher syndrome type 1 genes in Japanese. Mutation screening using this technique has the power to quickly identify mutations of many causative genes while maintaining cost-benefit performance. In addition, the simultaneous mutation analysis of large numbers of genes is useful for detecting mutations in different genes that are possibly disease modifiers or of digenic inheritance. PMID:24618850
Massively parallel DNA sequencing facilitates diagnosis of patients with Usher syndrome type 1.

PubMed

Yoshimura, Hidekane; Iwasaki, Satoshi; Nishio, Shin-Ya; Kumakawa, Kozo; Tono, Tetsuya; Kobayashi, Yumiko; Sato, Hiroaki; Nagai, Kyoko; Ishikawa, Kotaro; Ikezono, Tetsuo; Naito, Yasushi; Fukushima, Kunihiro; Oshikawa, Chie; Kimitsuki, Takashi; Nakanishi, Hiroshi; Usami, Shin-Ichi

2014-01-01

Usher syndrome is an autosomal recessive disorder manifesting hearing loss, retinitis pigmentosa and vestibular dysfunction, and having three clinical subtypes. Usher syndrome type 1 is the most severe subtype due to its profound hearing loss, lack of vestibular responses, and retinitis pigmentosa that appears in prepuberty. Six of the corresponding genes have been identified, making early diagnosis through DNA testing possible, with many immediate and several long-term advantages for patients and their families. However, the conventional genetic techniques, such as direct sequence analysis, are both time-consuming and expensive. Targeted exon sequencing of selected genes using the massively parallel DNA sequencing technology will potentially enable us to systematically tackle previously intractable monogenic disorders and improve molecular diagnosis. Using this technique combined with direct sequence analysis, we screened 17 unrelated Usher syndrome type 1 patients and detected probable pathogenic variants in the 16 of them (94.1%) who carried at least one mutation. Seven patients had the MYO7A mutation (41.2%), which is the most common type in Japanese. Most of the mutations were detected by only the massively parallel DNA sequencing. We report here four patients, who had probable pathogenic mutations in two different Usher syndrome type 1 genes, and one case of MYO7A/PCDH15 digenic inheritance. This is the first report of Usher syndrome mutation analysis using massively parallel DNA sequencing and the frequency of Usher syndrome type 1 genes in Japanese. Mutation screening using this technique has the power to quickly identify mutations of many causative genes while maintaining cost-benefit performance. In addition, the simultaneous mutation analysis of large numbers of genes is useful for detecting mutations in different genes that are possibly disease modifiers or of digenic inheritance.
A hierarchical, automated target recognition algorithm for a parallel analog processor

NASA Technical Reports Server (NTRS)

Woodward, Gail; Padgett, Curtis

1997-01-01

A hierarchical approach is described for an automated target recognition (ATR) system, VIGILANTE, that uses a massively parallel, analog processor (3DANN). The 3DANN processor is capable of performing 64 concurrent inner products of size 1x4096 every 250 nanoseconds.
Homemade Buckeye-Pi: A Learning Many-Node Platform for High-Performance Parallel Computing

NASA Astrophysics Data System (ADS)

Amooie, M. A.; Moortgat, J.

2017-12-01

We report on the "Buckeye-Pi" cluster, the supercomputer developed in The Ohio State University School of Earth Sciences from 128 inexpensive Raspberry Pi (RPi) 3 Model B single-board computers. Each RPi is equipped with fast Quad Core 1.2GHz ARMv8 64bit processor, 1GB of RAM, and 32GB microSD card for local storage. Therefore, the cluster has a total RAM of 128GB that is distributed on the individual nodes and a flash capacity of 4TB with 512 processors, while it benefits from low power consumption, easy portability, and low total cost. The cluster uses the Message Passing Interface protocol to manage the communications between each node. These features render our platform the most powerful RPi supercomputer to date and suitable for educational applications in high-performance-computing (HPC) and handling of large datasets. In particular, we use the Buckeye-Pi to implement optimized parallel codes in our in-house simulator for subsurface media flows with the goal of achieving a massively-parallelized scalable code. We present benchmarking results for the computational performance across various number of RPi nodes. We believe our project could inspire scientists and students to consider the proposed unconventional cluster architecture as a mainstream and a feasible learning platform for challenging engineering and scientific problems.
Functional Profiling Using the Saccharomyces Genome Deletion Project Collections.

PubMed

Nislow, Corey; Wong, Lai Hong; Lee, Amy Huei-Yi; Giaever, Guri

2016-09-01

The ability to measure and quantify the fitness of an entire organism requires considerably more complex approaches than simply using traditional "omic" methods that examine, for example, the abundance of RNA transcripts, proteins, or metabolites. The yeast deletion collections represent the only systematic, comprehensive set of null alleles for any organism in which such fitness measurements can be assayed. Generated by the Saccharomyces Genome Deletion Project, these collections allow the systematic and parallel analysis of gene functions using any measurable phenotype. The unique 20-bp molecular barcodes engineered into the genome of each deletion strain facilitate the massively parallel analysis of individual fitness. Here, we present functional genomic protocols for use with the yeast deletion collections. We describe how to maintain, propagate, and store the deletion collections and how to perform growth fitness assays on single and parallel screening platforms. Phenotypic fitness analyses of the yeast mutants, described in brief here, provide important insights into biological functions, mechanisms of drug action, and response to environmental stresses. It is important to bear in mind that the specific assays described in this protocol represent some of the many ways in which these collections can be assayed, and in this description particular attention is paid to maximizing throughput using growth as the phenotypic measure. © 2016 Cold Spring Harbor Laboratory Press.
Real-time electron dynamics for massively parallel excited-state simulations

NASA Astrophysics Data System (ADS)

Andrade, Xavier

The simulation of the real-time dynamics of electrons, based on time dependent density functional theory (TDDFT), is a powerful approach to study electronic excited states in molecular and crystalline systems. What makes the method attractive is its flexibility to simulate different kinds of phenomena beyond the linear-response regime, including strongly-perturbed electronic systems and non-adiabatic electron-ion dynamics. Electron-dynamics simulations are also attractive from a computational point of view. They can run efficiently on massively parallel architectures due to the low communication requirements. Our implementations of electron dynamics, based on the codes Octopus (real-space) and Qball (plane-waves), allow us to simulate systems composed of thousands of atoms and to obtain good parallel scaling up to 1.6 million processor cores. Due to the versatility of real-time electron dynamics and its parallel performance, we expect it to become the method of choice to apply the capabilities of exascale supercomputers for the simulation of electronic excited states.
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe

A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations

DOE PAGES

Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; ...

2017-11-14

A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations

NASA Astrophysics Data System (ADS)

Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; Gagliardi, Laura; de Jong, Wibe A.

2017-11-01

A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals for the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. The chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce.

PubMed

Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel

2013-08-01

Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS - a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive.
Hadoop-GIS: A High Performance Spatial Data Warehousing System over MapReduce

PubMed Central

Aji, Ablimit; Wang, Fusheng; Vo, Hoang; Lee, Rubao; Liu, Qiaoling; Zhang, Xiaodong; Saltz, Joel

2013-01-01

Support of high performance queries on large volumes of spatial data becomes increasingly important in many application domains, including geospatial problems in numerous fields, location based services, and emerging scientific applications that are increasingly data- and compute-intensive. The emergence of massive scale spatial data is due to the proliferation of cost effective and ubiquitous positioning technologies, development of high resolution imaging technologies, and contribution from a large number of community users. There are two major challenges for managing and querying massive spatial data to support spatial queries: the explosion of spatial data, and the high computational complexity of spatial queries. In this paper, we present Hadoop-GIS – a scalable and high performance spatial data warehousing system for running large scale spatial queries on Hadoop. Hadoop-GIS supports multiple types of spatial queries on MapReduce through spatial partitioning, customizable spatial query engine RESQUE, implicit parallel spatial query execution on MapReduce, and effective methods for amending query results through handling boundary objects. Hadoop-GIS utilizes global partition indexing and customizable on demand local spatial indexing to achieve efficient query processing. Hadoop-GIS is integrated into Hive to support declarative spatial queries with an integrated architecture. Our experiments have demonstrated the high efficiency of Hadoop-GIS on query response and high scalability to run on commodity clusters. Our comparative experiments have showed that performance of Hadoop-GIS is on par with parallel SDBMS and outperforms SDBMS for compute-intensive queries. Hadoop-GIS is available as a set of library for processing spatial queries, and as an integrated software package in Hive. PMID:24187650
Animated computer graphics models of space and earth sciences data generated via the massively parallel processor

NASA Technical Reports Server (NTRS)

Treinish, Lloyd A.; Gough, Michael L.; Wildenhain, W. David

1987-01-01

The capability was developed of rapidly producing visual representations of large, complex, multi-dimensional space and earth sciences data sets via the implementation of computer graphics modeling techniques on the Massively Parallel Processor (MPP) by employing techniques recently developed for typically non-scientific applications. Such capabilities can provide a new and valuable tool for the understanding of complex scientific data, and a new application of parallel computing via the MPP. A prototype system with such capabilities was developed and integrated into the National Space Science Data Center's (NSSDC) Pilot Climate Data System (PCDS) data-independent environment for computer graphics data display to provide easy access to users. While developing these capabilities, several problems had to be solved independently of the actual use of the MPP, all of which are outlined.
A cost-effective methodology for the design of massively-parallel VLSI functional units

NASA Technical Reports Server (NTRS)

Venkateswaran, N.; Sriram, G.; Desouza, J.

1993-01-01

In this paper we propose a generalized methodology for the design of cost-effective massively-parallel VLSI Functional Units. This methodology is based on a technique of generating and reducing a massive bit-array on the mask-programmable PAcube VLSI array. This methodology unifies (maintains identical data flow and control) the execution of complex arithmetic functions on PAcube arrays. It is highly regular, expandable and uniform with respect to problem-size and wordlength, thereby reducing the communication complexity. The memory-functional unit interface is regular and expandable. Using this technique functional units of dedicated processors can be mask-programmed on the naked PAcube arrays, reducing the turn-around time. The production cost of such dedicated processors can be drastically reduced since the naked PAcube arrays can be mass-produced. Analysis of the the performance of functional units designed by our method yields promising results.
Parallel computing on Unix workstation arrays

NASA Astrophysics Data System (ADS)

Reale, F.; Bocchino, F.; Sciortino, S.

1994-12-01

We have tested arrays of general-purpose Unix workstations used as MIMD systems for massive parallel computations. In particular we have solved numerically a demanding test problem with a 2D hydrodynamic code, generally developed to study astrophysical flows, by exucuting it on arrays either of DECstations 5000/200 on Ethernet LAN, or of DECstations 3000/400, equipped with powerful Alpha processors, on FDDI LAN. The code is appropriate for data-domain decomposition, and we have used a library for parallelization previously developed in our Institute, and easily extended to work on Unix workstation arrays by using the PVM software toolset. We have compared the parallel efficiencies obtained on arrays of several processors to those obtained on a dedicated MIMD parallel system, namely a Meiko Computing Surface (CS-1), equipped with Intel i860 processors. We discuss the feasibility of using non-dedicated parallel systems and conclude that the convenience depends essentially on the size of the computational domain as compared to the relative processor power and network bandwidth. We point out that for future perspectives a parallel development of processor and network technology is important, and that the software still offers great opportunities of improvement, especially in terms of latency times in the message-passing protocols. In conditions of significant gain in terms of speedup, such workstation arrays represent a cost-effective approach to massive parallel computations.
Transitioning NWChem to the Next Generation of Manycore Machines

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bylaska, Eric J.; Apra, E; Kowalski, Karol

The NorthWest chemistry (NWChem) modeling software is a popular molecular chemistry simulation software that was designed from the start to work on massively parallel processing supercomputers [1-3]. It contains an umbrella of modules that today includes self-consistent eld (SCF), second order Møller-Plesset perturbation theory (MP2), coupled cluster (CC), multiconguration self-consistent eld (MCSCF), selected conguration interaction (CI), tensor contraction engine (TCE) many body methods, density functional theory (DFT), time-dependent density functional theory (TDDFT), real-time time-dependent density functional theory, pseudopotential plane-wave density functional theory (PSPW), band structure (BAND), ab initio molecular dynamics (AIMD), Car-Parrinello molecular dynamics (MD), classical MD, hybrid quantum mechanicsmore » molecular mechanics (QM/MM), hybrid ab initio molecular dynamics molecular mechanics (AIMD/MM), gauge independent atomic orbital nuclear magnetic resonance (GIAO NMR), conductor like screening solvation model (COSMO), conductor-like screening solvation model based on density (COSMO-SMD), and reference interaction site model (RISM) solvation models, free energy simulations, reaction path optimization, parallel in time, among other capabilities [4]. Moreover, new capabilities continue to be added with each new release.« less
Embedded Implementation of VHR Satellite Image Segmentation

PubMed Central

Li, Chao; Balla-Arabé, Souleymane; Ginhac, Dominique; Yang, Fan

2016-01-01

Processing and analysis of Very High Resolution (VHR) satellite images provide a mass of crucial information, which can be used for urban planning, security issues or environmental monitoring. However, they are computationally expensive and, thus, time consuming, while some of the applications, such as natural disaster monitoring and prevention, require high efficiency performance. Fortunately, parallel computing techniques and embedded systems have made great progress in recent years, and a series of massively parallel image processing devices, such as digital signal processors or Field Programmable Gate Arrays (FPGAs), have been made available to engineers at a very convenient price and demonstrate significant advantages in terms of running-cost, embeddability, power consumption flexibility, etc. In this work, we designed a texture region segmentation method for very high resolution satellite images by using the level set algorithm and the multi-kernel theory in a high-abstraction C environment and realize its register-transfer level implementation with the help of a new proposed high-level synthesis-based design flow. The evaluation experiments demonstrate that the proposed design can produce high quality image segmentation with a significant running-cost advantage. PMID:27240370

Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by employing bandwidth shells at areas of overutilization

DOEpatents

Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul

2010-04-27

A massively parallel computer system contains an inter-nodal communications network of node-to-node links. An automated routing strategy routes packets through one or more intermediate nodes of the network to reach a final destination. The default routing strategy is altered responsive to detection of overutilization of a particular path of one or more links, and at least some traffic is re-routed by distributing the traffic among multiple paths (which may include the default path). An alternative path may require a greater number of link traversals to reach the destination node.
A Massively Parallel Bayesian Approach to Planetary Protection Trajectory Analysis and Design

NASA Technical Reports Server (NTRS)

Wallace, Mark S.

2015-01-01

The NASA Planetary Protection Office has levied a requirement that the upper stage of future planetary launches have a less than 10(exp -4) chance of impacting Mars within 50 years after launch. A brute-force approach requires a decade of computer time to demonstrate compliance. By using a Bayesian approach and taking advantage of the demonstrated reliability of the upper stage, the required number of fifty-year propagations can be massively reduced. By spreading the remaining embarrassingly parallel Monte Carlo simulations across multiple computers, compliance can be demonstrated in a reasonable time frame. The method used is described here.
Systems and methods for rapid processing and storage of data

DOEpatents

Stalzer, Mark A.

2017-01-24

Systems and methods of building massively parallel computing systems using low power computing complexes in accordance with embodiments of the invention are disclosed. A massively parallel computing system in accordance with one embodiment of the invention includes at least one Solid State Blade configured to communicate via a high performance network fabric. In addition, each Solid State Blade includes a processor configured to communicate with a plurality of low power computing complexes interconnected by a router, and each low power computing complex includes at least one general processing core, an accelerator, an I/O interface, and cache memory and is configured to communicate with non-volatile solid state memory.
Massively Parallel Sequencing Detected a Mutation in the MFN2 Gene Missed by Sanger Sequencing Due to a Primer Mismatch on an SNP Site.

PubMed

Neupauerová, Jana; Grečmalová, Dagmar; Seeman, Pavel; Laššuthová, Petra

2016-05-01

We describe a patient with early onset severe axonal Charcot-Marie-Tooth disease (CMT2) with dominant inheritance, in whom Sanger sequencing failed to detect a mutation in the mitofusin 2 (MFN2) gene because of a single nucleotide polymorphism (rs2236057) under the PCR primer sequence. The severe early onset phenotype and the family history with severely affected mother (died after delivery) was very suggestive of CMT2A and this suspicion was finally confirmed by a MFN2 mutation. The mutation p.His361Tyr was later detected in the patient by massively parallel sequencing with a gene panel for hereditary neuropathies. According to this information, new primers for amplification and sequencing were designed which bind away from the polymorphic sites of the patient's DNA. Sanger sequencing with these new primers then confirmed the heterozygous mutation in the MFN2 gene in this patient. This case report shows that massively parallel sequencing may in some rare cases be more sensitive than Sanger sequencing and highlights the importance of accurate primer design which requires special attention. © 2016 John Wiley & Sons Ltd/University College London.
Massively parallel whole genome amplification for single-cell sequencing using droplet microfluidics.

PubMed

Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato; Takeyama, Haruko

2017-07-12

Massively parallel single-cell genome sequencing is required to further understand genetic diversities in complex biological systems. Whole genome amplification (WGA) is the first step for single-cell sequencing, but its throughput and accuracy are insufficient in conventional reaction platforms. Here, we introduce single droplet multiple displacement amplification (sd-MDA), a method that enables massively parallel amplification of single cell genomes while maintaining sequence accuracy and specificity. Tens of thousands of single cells are compartmentalized in millions of picoliter droplets and then subjected to lysis and WGA by passive droplet fusion in microfluidic channels. Because single cells are isolated in compartments, their genomes are amplified to saturation without contamination. This enables the high-throughput acquisition of contamination-free and cell specific sequence reads from single cells (21,000 single-cells/h), resulting in enhancement of the sequence data quality compared to conventional methods. This method allowed WGA of both single bacterial cells and human cancer cells. The obtained sequencing coverage rivals those of conventional techniques with superior sequence quality. In addition, we also demonstrate de novo assembly of uncultured soil bacteria and obtain draft genomes from single cell sequencing. This sd-MDA is promising for flexible and scalable use in single-cell sequencing.
Modeling of High Speed Reacting Flows: Established Practices and Future Challenges

NASA Technical Reports Server (NTRS)

Baurle, R. A.

2004-01-01

Computational fluid dynamics (CFD) has proven to be an invaluable tool for the design and analysis of high- speed propulsion devices. Massively parallel computing, together with the maturation of robust CFD codes, has made it possible to perform simulations of complete engine flowpaths. Steady-state Reynolds-Averaged Navier-Stokes simulations are now routinely used in the scramjet engine development cycle to determine optimal fuel injector arrangements, investigate trends noted during testing, and extract various measures of engine efficiency. Unfortunately, the turbulence and combustion models used in these codes have not changed significantly over the past decade. Hence, the CFD practitioner must often rely heavily on existing measurements (at similar flow conditions) to calibrate model coefficients on a case- by-case basis. This paper provides an overview of the modeled equations typically employed by commercial- quality CFD codes for high-speed combustion applications. Careful attention is given to the approximations employed for each of the unclosed terms in the averaged equation set. The salient features (and shortcomings) of common models used to close these terms are covered in detail, and several academic efforts aimed at addressing these shortcomings are discussed.
AREVA Developments for an Efficient and Reliable use of Monte Carlo codes for Radiation Transport Applications

NASA Astrophysics Data System (ADS)

Chapoutier, Nicolas; Mollier, François; Nolin, Guillaume; Culioli, Matthieu; Mace, Jean-Reynald

2017-09-01

In the context of the rising of Monte Carlo transport calculations for any kind of application, AREVA recently improved its suite of engineering tools in order to produce efficient Monte Carlo workflow. Monte Carlo codes, such as MCNP or TRIPOLI, are recognized as reference codes to deal with a large range of radiation transport problems. However the inherent drawbacks of theses codes - laboring input file creation and long computation time - contrast with the maturity of the treatment of the physical phenomena. The goals of the recent AREVA developments were to reach similar efficiency as other mature engineering sciences such as finite elements analyses (e.g. structural or fluid dynamics). Among the main objectives, the creation of a graphical user interface offering CAD tools for geometry creation and other graphical features dedicated to the radiation field (source definition, tally definition) has been reached. The computations times are drastically reduced compared to few years ago thanks to the use of massive parallel runs, and above all, the implementation of hybrid variance reduction technics. From now engineering teams are capable to deliver much more prompt support to any nuclear projects dealing with reactors or fuel cycle facilities from conceptual phase to decommissioning.
Aerodynamic simulation on massively parallel systems

NASA Technical Reports Server (NTRS)

Haeuser, Jochem; Simon, Horst D.

1992-01-01

This paper briefly addresses the computational requirements for the analysis of complete configurations of aircraft and spacecraft currently under design to be used for advanced transportation in commercial applications as well as in space flight. The discussion clearly shows that massively parallel systems are the only alternative which is both cost effective and on the other hand can provide the necessary TeraFlops, needed to satisfy the narrow design margins of modern vehicles. It is assumed that the solution of the governing physical equations, i.e., the Navier-Stokes equations which may be complemented by chemistry and turbulence models, is done on multiblock grids. This technique is situated between the fully structured approach of classical boundary fitted grids and the fully unstructured tetrahedra grids. A fully structured grid best represents the flow physics, while the unstructured grid gives best geometrical flexibility. The multiblock grid employed is structured within a block, but completely unstructured on the block level. While a completely unstructured grid is not straightforward to parallelize, the above mentioned multiblock grid is inherently parallel, in particular for multiple instruction multiple datastream (MIMD) machines. In this paper guidelines are provided for setting up or modifying an existing sequential code so that a direct parallelization on a massively parallel system is possible. Results are presented for three parallel systems, namely the Intel hypercube, the Ncube hypercube, and the FPS 500 system. Some preliminary results for an 8K CM2 machine will also be mentioned. The code run is the two dimensional grid generation module of Grid, which is a general two dimensional and three dimensional grid generation code for complex geometries. A system of nonlinear Poisson equations is solved. This code is also a good testcase for complex fluid dynamics codes, since the same datastructures are used. All systems provided good speedups, but message passing MIMD systems seem to be best suited for large miltiblock applications.
A Survey of Parallel Computing

DTIC Science & Technology

1988-07-01

Evaluating Two Massively Parallel Machines. Communications of the ACM .9, , , 176 BIBLIOGRAPHY 29, 8 (August), pp. 752-758. Gajski , D.D., Padua, D.A., Kuck...Computer Architecture, edited by Gajski , D. D., Milutinovic, V. M. Siegel, H. J. and Furht, B. P. IEEE Computer Society Press, Washington, D.C., pp. 387-407
CESDIS

NASA Technical Reports Server (NTRS)

1994-01-01

CESDIS, the Center of Excellence in Space Data and Information Sciences was developed jointly by NASA, Universities Space Research Association (USRA), and the University of Maryland in 1988 to focus on the design of advanced computing techniques and data systems to support NASA Earth and space science research programs. CESDIS is operated by USRA under contract to NASA. The Director, Associate Director, Staff Scientists, and administrative staff are located on-site at NASA's Goddard Space Flight Center in Greenbelt, Maryland. The primary CESDIS mission is to increase the connection between computer science and engineering research programs at colleges and universities and NASA groups working with computer applications in Earth and space science. Research areas of primary interest at CESDIS include: 1) High performance computing, especially software design and performance evaluation for massively parallel machines; 2) Parallel input/output and data storage systems for high performance parallel computers; 3) Data base and intelligent data management systems for parallel computers; 4) Image processing; 5) Digital libraries; and 6) Data compression. CESDIS funds multiyear projects at U. S. universities and colleges. Proposals are accepted in response to calls for proposals and are selected on the basis of peer reviews. Funds are provided to support faculty and graduate students working at their home institutions. Project personnel visit Goddard during academic recess periods to attend workshops, present seminars, and collaborate with NASA scientists on research projects. Additionally, CESDIS takes on specific research tasks of shorter duration for computer science research requested by NASA Goddard scientists.
Computational mechanics analysis tools for parallel-vector supercomputers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Baddourah, Majdi; Qin, Jiangning

1993-01-01

Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigensolution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization search analysis and domain decomposition. The source code for many of these algorithms is available.
Parallel Preconditioning for CFD Problems on the CM-5

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Kremenetsky, Mark D.; Richardson, John; Lasinski, T. A. (Technical Monitor)

1994-01-01

Up to today, preconditioning methods on massively parallel systems have faced a major difficulty. The most successful preconditioning methods in terms of accelerating the convergence of the iterative solver such as incomplete LU factorizations are notoriously difficult to implement on parallel machines for two reasons: (1) the actual computation of the preconditioner is not very floating-point intensive, but requires a large amount of unstructured communication, and (2) the application of the preconditioning matrix in the iteration phase (i.e. triangular solves) are difficult to parallelize because of the recursive nature of the computation. Here we present a new approach to preconditioning for very large, sparse, unsymmetric, linear systems, which avoids both difficulties. We explicitly compute an approximate inverse to our original matrix. This new preconditioning matrix can be applied most efficiently for iterative methods on massively parallel machines, since the preconditioning phase involves only a matrix-vector multiplication, with possibly a dense matrix. Furthermore the actual computation of the preconditioning matrix has natural parallelism. For a problem of size n, the preconditioning matrix can be computed by solving n independent small least squares problems. The algorithm and its implementation on the Connection Machine CM-5 are discussed in detail and supported by extensive timings obtained from real problem data.
A biconjugate gradient type algorithm on massively parallel architectures

NASA Technical Reports Server (NTRS)

Freund, Roland W.; Hochbruck, Marlis

1991-01-01

The biconjugate gradient (BCG) method is the natural generalization of the classical conjugate gradient algorithm for Hermitian positive definite matrices to general non-Hermitian linear systems. Unfortunately, the original BCG algorithm is susceptible to possible breakdowns and numerical instabilities. Recently, Freund and Nachtigal have proposed a novel BCG type approach, the quasi-minimal residual method (QMR), which overcomes the problems of BCG. Here, an implementation is presented of QMR based on an s-step version of the nonsymmetric look-ahead Lanczos algorithm. The main feature of the s-step Lanczos algorithm is that, in general, all inner products, except for one, can be computed in parallel at the end of each block; this is unlike the other standard Lanczos process where inner products are generated sequentially. The resulting implementation of QMR is particularly attractive on massively parallel SIMD architectures, such as the Connection Machine.
Massive parallelization of serial inference algorithms for a complex generalized linear model

PubMed Central

Suchard, Marc A.; Simpson, Shawn E.; Zorych, Ivan; Ryan, Patrick; Madigan, David

2014-01-01

Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this paper we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety. PMID:25328363
Implementation, capabilities, and benchmarking of Shift, a massively parallel Monte Carlo radiation transport code

DOE PAGES

Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; ...

2015-12-21

This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Somemore » specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000 ® problems. These benchmark and scaling studies show promising results.« less
Crystal MD: The massively parallel molecular dynamics software for metal with BCC structure

NASA Astrophysics Data System (ADS)

Hu, Changjun; Bai, He; He, Xinfu; Zhang, Boyao; Nie, Ningming; Wang, Xianmeng; Ren, Yingwen

2017-02-01

Material irradiation effect is one of the most important keys to use nuclear power. However, the lack of high-throughput irradiation facility and knowledge of evolution process, lead to little understanding of the addressed issues. With the help of high-performance computing, we could make a further understanding of micro-level-material. In this paper, a new data structure is proposed for the massively parallel simulation of the evolution of metal materials under irradiation environment. Based on the proposed data structure, we developed the new molecular dynamics software named Crystal MD. The simulation with Crystal MD achieved over 90% parallel efficiency in test cases, and it takes more than 25% less memory on multi-core clusters than LAMMPS and IMD, which are two popular molecular dynamics simulation software. Using Crystal MD, a two trillion particles simulation has been performed on Tianhe-2 cluster.
Scalable Evaluation of Polarization Energy and Associated Forces in Polarizable Molecular Dynamics: II.Towards Massively Parallel Computations using Smooth Particle Mesh Ewald.

PubMed

Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip

2014-02-28

In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations.
Scalable Evaluation of Polarization Energy and Associated Forces in Polarizable Molecular Dynamics: II.Towards Massively Parallel Computations using Smooth Particle Mesh Ewald

PubMed Central

Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip

2015-01-01

In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations. PMID:26512230
Large-scale molecular dynamics simulation of DNA: implementation and validation of the AMBER98 force field in LAMMPS.

PubMed

Grindon, Christina; Harris, Sarah; Evans, Tom; Novik, Keir; Coveney, Peter; Laughton, Charles

2004-07-15

Molecular modelling played a central role in the discovery of the structure of DNA by Watson and Crick. Today, such modelling is done on computers: the more powerful these computers are, the more detailed and extensive can be the study of the dynamics of such biological macromolecules. To fully harness the power of modern massively parallel computers, however, we need to develop and deploy algorithms which can exploit the structure of such hardware. The Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is a scalable molecular dynamics code including long-range Coulomb interactions, which has been specifically designed to function efficiently on parallel platforms. Here we describe the implementation of the AMBER98 force field in LAMMPS and its validation for molecular dynamics investigations of DNA structure and flexibility against the benchmark of results obtained with the long-established code AMBER6 (Assisted Model Building with Energy Refinement, version 6). Extended molecular dynamics simulations on the hydrated DNA dodecamer d(CTTTTGCAAAAG)(2), which has previously been the subject of extensive dynamical analysis using AMBER6, show that it is possible to obtain excellent agreement in terms of static, dynamic and thermodynamic parameters between AMBER6 and LAMMPS. In comparison with AMBER6, LAMMPS shows greatly improved scalability in massively parallel environments, opening up the possibility of efficient simulations of order-of-magnitude larger systems and/or for order-of-magnitude greater simulation times.
Relationship Between Faults Oriented Parallel and Oblique to Bedding in Neogene Massive Siliceous Mudstones at The Horonobe Underground Research Laboratory, Japan

NASA Astrophysics Data System (ADS)

Hayano, Akira; Ishii, Eiichi

2016-10-01

This study investigates the mechanical relationship between bedding-parallel and bedding-oblique faults in a Neogene massive siliceous mudstone at the site of the Horonobe Underground Research Laboratory (URL) in Hokkaido, Japan, on the basis of observations of drill-core recovered from pilot boreholes and fracture mapping on shaft and gallery walls. Four bedding-parallel faults with visible fault gouge, named respectively the MM Fault, the Last MM Fault, the S1 Fault, and the S2 Fault (stratigraphically, from the highest to the lowest), were observed in two pilot boreholes (PB-V01 and SAB-1). The distribution of the bedding-parallel faults at 350 m depth in the Horonobe URL indicates that these faults are spread over at least several tens of meters in parallel along a bedding plane. The observation that the bedding-oblique fault displaces the Last MM fault is consistent with the previous interpretation that the bedding- oblique faults formed after the bedding-parallel faults. In addition, the bedding-parallel faults terminate near the MM and S1 faults, indicating that the bedding-parallel faults with visible fault gouge act to terminate the propagation of younger bedding-oblique faults. In particular, the MM and S1 faults, which have a relatively thick fault gouge, appear to have had a stronger control on the propagation of bedding-oblique faults than did the Last MM fault, which has a relatively thin fault gouge.

3-D readout-electronics packaging for high-bandwidth massively paralleled imager

DOEpatents

Kwiatkowski, Kris; Lyke, James

2007-12-18

Dense, massively parallel signal processing electronics are co-packaged behind associated sensor pixels. Microchips containing a linear or bilinear arrangement of photo-sensors, together with associated complex electronics, are integrated into a simple 3-D structure (a "mirror cube"). An array of photo-sensitive cells are disposed on a stacked CMOS chip's surface at a 45.degree. angle from light reflecting mirror surfaces formed on a neighboring CMOS chip surface. Image processing electronics are held within the stacked CMOS chip layers. Electrical connections couple each of said stacked CMOS chip layers and a distribution grid, the connections for distributing power and signals to components associated with each stacked CSMO chip layer.
Scalable load balancing for massively parallel distributed Monte Carlo particle transport

DOE Office of Scientific and Technical Information (OSTI.GOV)

O'Brien, M. J.; Brantley, P. S.; Joy, K. I.

2013-07-01

In order to run computer simulations efficiently on massively parallel computers with hundreds of thousands or millions of processors, care must be taken that the calculation is load balanced across the processors. Examining the workload of every processor leads to an unscalable algorithm, with run time at least as large as O(N), where N is the number of processors. We present a scalable load balancing algorithm, with run time 0(log(N)), that involves iterated processor-pair-wise balancing steps, ultimately leading to a globally balanced workload. We demonstrate scalability of the algorithm up to 2 million processors on the Sequoia supercomputer at Lawrencemore » Livermore National Laboratory. (authors)« less
Method and apparatus for obtaining stack traceback data for multiple computing nodes of a massively parallel computer system

DOEpatents

Gooding, Thomas Michael; McCarthy, Patrick Joseph

2010-03-02

A data collector for a massively parallel computer system obtains call-return stack traceback data for multiple nodes by retrieving partial call-return stack traceback data from each node, grouping the nodes in subsets according to the partial traceback data, and obtaining further call-return stack traceback data from a representative node or nodes of each subset. Preferably, the partial data is a respective instruction address from each node, nodes having identical instruction address being grouped together in the same subset. Preferably, a single node of each subset is chosen and full stack traceback data is retrieved from the call-return stack within the chosen node.
Method and apparatus for analyzing error conditions in a massively parallel computer system by identifying anomalous nodes within a communicator set

DOEpatents

Gooding, Thomas Michael [Rochester, MN

2011-04-19

An analytical mechanism for a massively parallel computer system automatically analyzes data retrieved from the system, and identifies nodes which exhibit anomalous behavior in comparison to their immediate neighbors. Preferably, anomalous behavior is determined by comparing call-return stack tracebacks for each node, grouping like nodes together, and identifying neighboring nodes which do not themselves belong to the group. A node, not itself in the group, having a large number of neighbors in the group, is a likely locality of error. The analyzer preferably presents this information to the user by sorting the neighbors according to number of adjoining members of the group.
Estimating water flow through a hillslope using the massively parallel processor

NASA Technical Reports Server (NTRS)

Devaney, Judy E.; Camillo, P. J.; Gurney, R. J.

1988-01-01

A new two-dimensional model of water flow in a hillslope has been implemented on the Massively Parallel Processor at the Goddard Space Flight Center. Flow in the soil both in the saturated and unsaturated zones, evaporation and overland flow are all modelled, and the rainfall rates are allowed to vary spatially. Previous models of this type had always been very limited computationally. This model takes less than a minute to model all the components of the hillslope water flow for a day. The model can now be used in sensitivity studies to specify which measurements should be taken and how accurate they should be to describe such flows for environmental studies.
Recovery Act - CAREER: Sustainable Silicon -- Energy-Efficient VLSI Interconnect for Extreme-Scale Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chiang, Patrick

2014-01-31

The research goal of this CAREER proposal is to develop energy-efficient, VLSI interconnect circuits and systems that will facilitate future massively-parallel, high-performance computing. Extreme-scale computing will exhibit massive parallelism on multiple vertical levels, from thou sands of computational units on a single processor to thousands of processors in a single data center. Unfortunately, the energy required to communicate between these units at every level (on chip, off-chip, off-rack) will be the critical limitation to energy efficiency. Therefore, the PI's career goal is to become a leading researcher in the design of energy-efficient VLSI interconnect for future computing systems.
De novo assembly of human genomes with massively parallel short read sequencing.

PubMed

Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue; Qian, Wubin; Fang, Xiaodong; Shi, Zhongbin; Li, Yingrui; Li, Shengting; Shan, Gao; Kristiansen, Karsten; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun

2010-02-01

Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.
Parallel computations and control of adaptive structures

NASA Technical Reports Server (NTRS)

Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)

1991-01-01

The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.
A Generic Mesh Data Structure with Parallel Applications

ERIC Educational Resources Information Center

Cochran, William Kenneth, Jr.

2009-01-01

High performance, massively-parallel multi-physics simulations are built on efficient mesh data structures. Most data structures are designed from the bottom up, focusing on the implementation of linear algebra routines. In this thesis, we explore a top-down approach to design, evaluating the various needs of many aspects of simulation, not just…
Parallel computer vision

DOE Office of Scientific and Technical Information (OSTI.GOV)

Uhr, L.

1987-01-01

This book is written by research scientists involved in the development of massively parallel, but hierarchically structured, algorithms, architectures, and programs for image processing, pattern recognition, and computer vision. The book gives an integrated picture of the programs and algorithms that are being developed, and also of the multi-computer hardware architectures for which these systems are designed.
DICE/ColDICE: 6D collisionless phase space hydrodynamics using a lagrangian tesselation

NASA Astrophysics Data System (ADS)

Sousbie, Thierry

2018-01-01

DICE is a C++ template library designed to solve collisionless fluid dynamics in 6D phase space using massively parallel supercomputers via an hybrid OpenMP/MPI parallelization. ColDICE, based on DICE, implements a cosmological and physical VLASOV-POISSON solver for cold systems such as dark matter (CDM) dynamics.
A Review of Lightweight Thread Approaches for High Performance Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Castello, Adrian; Pena, Antonio J.; Seo, Sangmin

High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonlyfound patterns in current parallel codes. Moreover, wemore » study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns and that those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.« less
Parallel image compression

NASA Technical Reports Server (NTRS)

Reif, John H.

1987-01-01

A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.
Computational mechanics analysis tools for parallel-vector supercomputers

NASA Technical Reports Server (NTRS)

Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.; Qin, J.

1993-01-01

Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigen-solution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization algorithm and domain decomposition. The source code for many of these algorithms is available from NASA Langley.
High Performance Distributed Computing in a Supercomputer Environment: Computational Services and Applications Issues

NASA Technical Reports Server (NTRS)

Kramer, Williams T. C.; Simon, Horst D.

1994-01-01

This tutorial proposes to be a practical guide for the uninitiated to the main topics and themes of high-performance computing (HPC), with particular emphasis to distributed computing. The intent is first to provide some guidance and directions in the rapidly increasing field of scientific computing using both massively parallel and traditional supercomputers. Because of their considerable potential computational power, loosely or tightly coupled clusters of workstations are increasingly considered as a third alternative to both the more conventional supercomputers based on a small number of powerful vector processors, as well as high massively parallel processors. Even though many research issues concerning the effective use of workstation clusters and their integration into a large scale production facility are still unresolved, such clusters are already used for production computing. In this tutorial we will utilize the unique experience made at the NAS facility at NASA Ames Research Center. Over the last five years at NAS massively parallel supercomputers such as the Connection Machines CM-2 and CM-5 from Thinking Machines Corporation and the iPSC/860 (Touchstone Gamma Machine) and Paragon Machines from Intel were used in a production supercomputer center alongside with traditional vector supercomputers such as the Cray Y-MP and C90.
Development of massive multilevel molecular dynamics simulation program, Platypus (PLATform for dYnamic Protein Unified Simulation), for the elucidation of protein functions.

PubMed

Takano, Yu; Nakata, Kazuto; Yonezawa, Yasushige; Nakamura, Haruki

2016-05-05

A massively parallel program for quantum mechanical-molecular mechanical (QM/MM) molecular dynamics simulation, called Platypus (PLATform for dYnamic Protein Unified Simulation), was developed to elucidate protein functions. The speedup and the parallelization ratio of Platypus in the QM and QM/MM calculations were assessed for a bacteriochlorophyll dimer in the photosynthetic reaction center (DIMER) on the K computer, a massively parallel computer achieving 10 PetaFLOPs with 705,024 cores. Platypus exhibited the increase in speedup up to 20,000 core processors at the HF/cc-pVDZ and B3LYP/cc-pVDZ, and up to 10,000 core processors by the CASCI(16,16)/6-31G** calculations. We also performed excited QM/MM-MD simulations on the chromophore of Sirius (SIRIUS) in water. Sirius is a pH-insensitive and photo-stable ultramarine fluorescent protein. Platypus accelerated on-the-fly excited-state QM/MM-MD simulations for SIRIUS in water, using over 4000 core processors. In addition, it also succeeded in 50-ps (200,000-step) on-the-fly excited-state QM/MM-MD simulations for the SIRIUS in water. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.
Communication: A reduced scaling J-engine based reformulation of SOS-MP2 using graphics processing units

DOE Office of Scientific and Technical Information (OSTI.GOV)

Maurer, S. A.; Kussmann, J.; Ochsenfeld, C., E-mail: Christian.Ochsenfeld@cup.uni-muenchen.de

2014-08-07

We present a low-prefactor, cubically scaling scaled-opposite-spin second-order Møller-Plesset perturbation theory (SOS-MP2) method which is highly suitable for massively parallel architectures like graphics processing units (GPU). The scaling is reduced from O(N{sup 5}) to O(N{sup 3}) by a reformulation of the MP2-expression in the atomic orbital basis via Laplace transformation and the resolution-of-the-identity (RI) approximation of the integrals in combination with efficient sparse algebra for the 3-center integral transformation. In contrast to previous works that employ GPUs for post Hartree-Fock calculations, we do not simply employ GPU-based linear algebra libraries to accelerate the conventional algorithm. Instead, our reformulation allows tomore » replace the rate-determining contraction step with a modified J-engine algorithm, that has been proven to be highly efficient on GPUs. Thus, our SOS-MP2 scheme enables us to treat large molecular systems in an accurate and efficient manner on a single GPU-server.« less
A spiking neural network model of 3D perception for event-based neuromorphic stereo vision systems

PubMed Central

Osswald, Marc; Ieng, Sio-Hoi; Benosman, Ryad; Indiveri, Giacomo

2017-01-01

Stereo vision is an important feature that enables machine vision systems to perceive their environment in 3D. While machine vision has spawned a variety of software algorithms to solve the stereo-correspondence problem, their implementation and integration in small, fast, and efficient hardware vision systems remains a difficult challenge. Recent advances made in neuromorphic engineering offer a possible solution to this problem, with the use of a new class of event-based vision sensors and neural processing devices inspired by the organizing principles of the brain. Here we propose a radically novel model that solves the stereo-correspondence problem with a spiking neural network that can be directly implemented with massively parallel, compact, low-latency and low-power neuromorphic engineering devices. We validate the model with experimental results, highlighting features that are in agreement with both computational neuroscience stereo vision theories and experimental findings. We demonstrate its features with a prototype neuromorphic hardware system and provide testable predictions on the role of spike-based representations and temporal dynamics in biological stereo vision processing systems. PMID:28079187
A spiking neural network model of 3D perception for event-based neuromorphic stereo vision systems.

PubMed

Osswald, Marc; Ieng, Sio-Hoi; Benosman, Ryad; Indiveri, Giacomo

2017-01-12

Stereo vision is an important feature that enables machine vision systems to perceive their environment in 3D. While machine vision has spawned a variety of software algorithms to solve the stereo-correspondence problem, their implementation and integration in small, fast, and efficient hardware vision systems remains a difficult challenge. Recent advances made in neuromorphic engineering offer a possible solution to this problem, with the use of a new class of event-based vision sensors and neural processing devices inspired by the organizing principles of the brain. Here we propose a radically novel model that solves the stereo-correspondence problem with a spiking neural network that can be directly implemented with massively parallel, compact, low-latency and low-power neuromorphic engineering devices. We validate the model with experimental results, highlighting features that are in agreement with both computational neuroscience stereo vision theories and experimental findings. We demonstrate its features with a prototype neuromorphic hardware system and provide testable predictions on the role of spike-based representations and temporal dynamics in biological stereo vision processing systems.
A spiking neural network model of 3D perception for event-based neuromorphic stereo vision systems

NASA Astrophysics Data System (ADS)

Osswald, Marc; Ieng, Sio-Hoi; Benosman, Ryad; Indiveri, Giacomo

2017-01-01

Stereo vision is an important feature that enables machine vision systems to perceive their environment in 3D. While machine vision has spawned a variety of software algorithms to solve the stereo-correspondence problem, their implementation and integration in small, fast, and efficient hardware vision systems remains a difficult challenge. Recent advances made in neuromorphic engineering offer a possible solution to this problem, with the use of a new class of event-based vision sensors and neural processing devices inspired by the organizing principles of the brain. Here we propose a radically novel model that solves the stereo-correspondence problem with a spiking neural network that can be directly implemented with massively parallel, compact, low-latency and low-power neuromorphic engineering devices. We validate the model with experimental results, highlighting features that are in agreement with both computational neuroscience stereo vision theories and experimental findings. We demonstrate its features with a prototype neuromorphic hardware system and provide testable predictions on the role of spike-based representations and temporal dynamics in biological stereo vision processing systems.

Long-time atomistic simulations with the Parallel Replica Dynamics method

NASA Astrophysics Data System (ADS)

Perez, Danny

Molecular Dynamics (MD) -- the numerical integration of atomistic equations of motion -- is a workhorse of computational materials science. Indeed, MD can in principle be used to obtain any thermodynamic or kinetic quantity, without introducing any approximation or assumptions beyond the adequacy of the interaction potential. It is therefore an extremely powerful and flexible tool to study materials with atomistic spatio-temporal resolution. These enviable qualities however come at a steep computational price, hence limiting the system sizes and simulation times that can be achieved in practice. While the size limitation can be efficiently addressed with massively parallel implementations of MD based on spatial decomposition strategies, allowing for the simulation of trillions of atoms, the same approach usually cannot extend the timescales much beyond microseconds. In this article, we discuss an alternative parallel-in-time approach, the Parallel Replica Dynamics (ParRep) method, that aims at addressing the timescale limitation of MD for systems that evolve through rare state-to-state transitions. We review the formal underpinnings of the method and demonstrate that it can provide arbitrarily accurate results for any definition of the states. When an adequate definition of the states is available, ParRep can simulate trajectories with a parallel speedup approaching the number of replicas used. We demonstrate the usefulness of ParRep by presenting different examples of materials simulations where access to long timescales was essential to access the physical regime of interest and discuss practical considerations that must be addressed to carry out these simulations. Work supported by the United States Department of Energy (U.S. DOE), Office of Science, Office of Basic Energy Sciences, Materials Sciences and Engineering Division.
Processor-In-Memory (PIM) Based Architectures for PetaFlops Potential Massively Parallel Processing

NASA Technical Reports Server (NTRS)

Kogge, Peter M.

1996-01-01

The report summarizes the work performed at the University of Notre Dame under a NASA grant from July 15, 1995 through July 14, 1996. Researchers involved in the work included the PI, Dr. Peter M. Kogge, and three graduate students under his direction in the Computer Science and Engineering Department: Stephen Dartt, Costin Iancu, and Lakshmi Narayanaswany. The organization of this report is as follows. Section 2 is a summary of the problem addressed by this work. Section 3 is a summary of the project's objectives and approach. Section 4 summarizes PIM technology briefly. Section 5 overviews the main results of the work. Section 6 then discusses the importance of the results and future directions. Also attached to this report are copies of several technical reports and publications whose contents directly reflect results developed during this study.
Adjusting process count on demand for petascale global optimization

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sosonkina, Masha; Watson, Layne T.; Radcliffe, Nicholas R.

2012-11-23

There are many challenges that need to be met before efficient and reliable computation at the petascale is possible. Many scientific and engineering codes running at the petascale are likely to be memory intensive, which makes thrashing a serious problem for many petascale applications. One way to overcome this challenge is to use a dynamic number of processes, so that the total amount of memory available for the computation can be increased on demand. This paper describes modifications made to the massively parallel global optimization code pVTdirect in order to allow for a dynamic number of processes. In particular, themore » modified version of the code monitors memory use and spawns new processes if the amount of available memory is determined to be insufficient. The primary design challenges are discussed, and performance results are presented and analyzed.« less
Coupling LAMMPS with Lattice Boltzmann fluid solver: theory, implementation, and applications

NASA Astrophysics Data System (ADS)

Tan, Jifu; Sinno, Talid; Diamond, Scott

2016-11-01

Studying of fluid flow coupled with solid has many applications in biological and engineering problems, e.g., blood cell transport, particulate flow, drug delivery. We present a partitioned approach to solve the coupled Multiphysics problem. The fluid motion is solved by the Lattice Boltzmann method, while the solid displacement and deformation is simulated by Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS). The coupling is achieved through the immersed boundary method so that the expensive remeshing step is eliminated. The code can model both rigid and deformable solids. The code also shows very good scaling results. It was validated with classic problems such as migration of rigid particles, ellipsoid particle's orbit in shear flow. Examples of the applications in blood flow, drug delivery, platelet adhesion and rupture are also given in the paper. NIH.
Scalable descriptive and correlative statistics with Titan.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Thompson, David C.; Pebay, Philippe Pierre

This report summarizes the existing statistical engines in VTK/Titan and presents the parallel versions thereof which have already been implemented. The ease of use of these parallel engines is illustrated by the means of C++ code snippets. Furthermore, this report justifies the design of these engines with parallel scalability in mind; then, this theoretical property is verified with test runs that demonstrate optimal parallel speed-up with up to 200 processors.
Efficient Parallel Levenberg-Marquardt Model Fitting towards Real-Time Automated Parametric Imaging Microscopy

PubMed Central

Zhu, Xiang; Zhang, Dianwen

2013-01-01

We present a fast, accurate and robust parallel Levenberg-Marquardt minimization optimizer, GPU-LMFit, which is implemented on graphics processing unit for high performance scalable parallel model fitting processing. GPU-LMFit can provide a dramatic speed-up in massive model fitting analyses to enable real-time automated pixel-wise parametric imaging microscopy. We demonstrate the performance of GPU-LMFit for the applications in superresolution localization microscopy and fluorescence lifetime imaging microscopy. PMID:24130785
Commodity cluster and hardware-based massively parallel implementations of hyperspectral imaging algorithms

NASA Astrophysics Data System (ADS)

Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David

2006-05-01

The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.
CMOS VLSI Layout and Verification of a SIMD Computer

NASA Technical Reports Server (NTRS)

Zheng, Jianqing

1996-01-01

A CMOS VLSI layout and verification of a 3 x 3 processor parallel computer has been completed. The layout was done using the MAGIC tool and the verification using HSPICE. Suggestions for expanding the computer into a million processor network are presented. Many problems that might be encountered when implementing a massively parallel computer are discussed.
Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by routing through transporter nodes

DOEpatents

Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul

2010-11-16

A massively parallel computer system contains an inter-nodal communications network of node-to-node links. An automated routing strategy routes packets through one or more intermediate nodes of the network to reach a destination. Some packets are constrained to be routed through respective designated transporter nodes, the automated routing strategy determining a path from a respective source node to a respective transporter node, and from a respective transporter node to a respective destination node. Preferably, the source node chooses a routing policy from among multiple possible choices, and that policy is followed by all intermediate nodes. The use of transporter nodes allows greater flexibility in routing.
Parallel design patterns for a low-power, software-defined compressed video encoder

NASA Astrophysics Data System (ADS)

Bruns, Michael W.; Hunt, Martin A.; Prasad, Durga; Gunupudi, Nageswara R.; Sonachalam, Sekar

2011-06-01

Video compression algorithms such as H.264 offer much potential for parallel processing that is not always exploited by the technology of a particular implementation. Consumer mobile encoding devices often achieve real-time performance and low power consumption through parallel processing in Application Specific Integrated Circuit (ASIC) technology, but many other applications require a software-defined encoder. High quality compression features needed for some applications such as 10-bit sample depth or 4:2:2 chroma format often go beyond the capability of a typical consumer electronics device. An application may also need to efficiently combine compression with other functions such as noise reduction, image stabilization, real time clocks, GPS data, mission/ESD/user data or software-defined radio in a low power, field upgradable implementation. Low power, software-defined encoders may be implemented using a massively parallel memory-network processor array with 100 or more cores and distributed memory. The large number of processor elements allow the silicon device to operate more efficiently than conventional DSP or CPU technology. A dataflow programming methodology may be used to express all of the encoding processes including motion compensation, transform and quantization, and entropy coding. This is a declarative programming model in which the parallelism of the compression algorithm is expressed as a hierarchical graph of tasks with message communication. Data parallel and task parallel design patterns are supported without the need for explicit global synchronization control. An example is described of an H.264 encoder developed for a commercially available, massively parallel memorynetwork processor device.
Ordered fast fourier transforms on a massively parallel hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Tong, Charles; Swarztrauber, Paul N.

1989-01-01

Design alternatives for ordered Fast Fourier Transformation (FFT) algorithms were examined on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end, the order and computational phases of the FFT were combined, and the sequence to processor maps that reduce communication were used. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, standard-order and A-order which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 exp r elements and the hypercube has P = 2 exp d processors, then a standard-order FFT can be implemented with d + r/2 + 1 parallel transmissions. An A-order sequence can be transformed with 2d - r/2 parallel transmissions which is r - d + 1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine.
Implementation of molecular dynamics and its extensions with the coarse-grained UNRES force field on massively parallel systems; towards millisecond-scale simulations of protein structure, dynamics, and thermodynamics

PubMed Central

Liwo, Adam; Ołdziej, Stanisław; Czaplewski, Cezary; Kleinerman, Dana S.; Blood, Philip; Scheraga, Harold A.

2010-01-01

We report the implementation of our united-residue UNRES force field for simulations of protein structure and dynamics with massively parallel architectures. In addition to coarse-grained parallelism already implemented in our previous work, in which each conformation was treated by a different task, we introduce a fine-grained level in which energy and gradient evaluation are split between several tasks. The Message Passing Interface (MPI) libraries have been utilized to construct the parallel code. The parallel performance of the code has been tested on a professional Beowulf cluster (Xeon Quad Core), a Cray XT3 supercomputer, and two IBM BlueGene/P supercomputers with canonical and replica-exchange molecular dynamics. With IBM BlueGene/P, about 50 % efficiency and 120-fold speed-up of the fine-grained part was achieved for a single trajectory of a 767-residue protein with use of 256 processors/trajectory. Because of averaging over the fast degrees of freedom, UNRES provides an effective 1000-fold speed-up compared to the experimental time scale and, therefore, enables us to effectively carry out millisecond-scale simulations of proteins with 500 and more amino-acid residues in days of wall-clock time. PMID:20305729
Optimisation of a parallel ocean general circulation model

NASA Astrophysics Data System (ADS)

Beare, M. I.; Stevens, D. P.

1997-10-01

This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce.

PubMed

Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng

2013-11-01

The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS - a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing.
Gravitational spreading of steep-sided ridges ("sackung") in Western United States

USGS Publications Warehouse

Radbruch-Hall, D. H.; Varnes, D.J.; Savage, W.Z.

1976-01-01

Large-scale gravitational spreading and movement along fractures of steep-sided ridges in the mountainous areas of the western United States are characterized by linear fissures, trenches, and uphill-facing scarps on tops and sides of ridges. Spreading appears to take place by movement along disconnected planes and/or by slow plastic deformation of a rock mass. In some places, valleyward squeezing out of weak shales overlain by rigid rocks causes extensional fracturing and outward movement of the rigid layers, as illustrated by extension of two laccoliths overlying Mancos Shale, one at Dolores Peak and another at Crested Butte in western Colorado. Gravitational forces acting on a ridge of more homogeneous material causes tensional spreading of the ridge parallel to its long axis, for example in fractured granitic rock north of Mt. Massive in central Colorado, where a survey course has been established to monitor the movement. Recognition and understanding of these large-scale gravitational features and the mechanism that causes them are pertinent to site selection and design of engineering structures in high mountains. If fractures extend to considerable depth and if movement is continuing, engineering structures in valleys or tunnels through the spreading ridges could be damaged. ?? 1976 International Association of Engineering Geology.
Three pillars for achieving quantum mechanical molecular dynamics simulations of huge systems: Divide-and-conquer, density-functional tight-binding, and massively parallel computation.

PubMed

Nishizawa, Hiroaki; Nishimura, Yoshifumi; Kobayashi, Masato; Irle, Stephan; Nakai, Hiromi

2016-08-05

The linear-scaling divide-and-conquer (DC) quantum chemical methodology is applied to the density-functional tight-binding (DFTB) theory to develop a massively parallel program that achieves on-the-fly molecular reaction dynamics simulations of huge systems from scratch. The functions to perform large scale geometry optimization and molecular dynamics with DC-DFTB potential energy surface are implemented to the program called DC-DFTB-K. A novel interpolation-based algorithm is developed for parallelizing the determination of the Fermi level in the DC method. The performance of the DC-DFTB-K program is assessed using a laboratory computer and the K computer. Numerical tests show the high efficiency of the DC-DFTB-K program, a single-point energy gradient calculation of a one-million-atom system is completed within 60 s using 7290 nodes of the K computer. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Particle simulation of plasmas on the massively parallel processor

NASA Technical Reports Server (NTRS)

Gledhill, I. M. A.; Storey, L. R. O.

1987-01-01

Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.
A Massively Parallel Computational Method of Reading Index Files for SOAPsnv.

PubMed

Zhu, Xiaoqian; Peng, Shaoliang; Liu, Shaojie; Cui, Yingbo; Gu, Xiang; Gao, Ming; Fang, Lin; Fang, Xiaodong

2015-12-01

SOAPsnv is the software used for identifying the single nucleotide variation in cancer genes. However, its performance is yet to match the massive amount of data to be processed. Experiments reveal that the main performance bottleneck of SOAPsnv software is the pileup algorithm. The original pileup algorithm's I/O process is time-consuming and inefficient to read input files. Moreover, the scalability of the pileup algorithm is also poor. Therefore, we designed a new algorithm, named BamPileup, aiming to improve the performance of sequential read, and the new pileup algorithm implemented a parallel read mode based on index. Using this method, each thread can directly read the data start from a specific position. The results of experiments on the Tianhe-2 supercomputer show that, when reading data in a multi-threaded parallel I/O way, the processing time of algorithm is reduced to 3.9 s and the application program can achieve a speedup up to 100×. Moreover, the scalability of the new algorithm is also satisfying.
Quantum supercharger library: hyper-parallelism of the Hartree-Fock method.

PubMed

Fernandes, Kyle D; Renison, C Alicia; Naidoo, Kevin J

2015-07-05

We present here a set of algorithms that completely rewrites the Hartree-Fock (HF) computations common to many legacy electronic structure packages (such as GAMESS-US, GAMESS-UK, and NWChem) into a massively parallel compute scheme that takes advantage of hardware accelerators such as Graphical Processing Units (GPUs). The HF compute algorithm is core to a library of routines that we name the Quantum Supercharger Library (QSL). We briefly evaluate the QSL's performance and report that it accelerates a HF 6-31G Self-Consistent Field (SCF) computation by up to 20 times for medium sized molecules (such as a buckyball) when compared with mature Central Processing Unit algorithms available in the legacy codes in regular use by researchers. It achieves this acceleration by massive parallelization of the one- and two-electron integrals and optimization of the SCF and Direct Inversion in the Iterative Subspace routines through the use of GPU linear algebra libraries. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
Applications of Parallel Process HiMAP for Large Scale Multidisciplinary Problems

NASA Technical Reports Server (NTRS)

Guruswamy, Guru P.; Potsdam, Mark; Rodriguez, David; Kwak, Dochay (Technical Monitor)

2000-01-01

HiMAP is a three level parallel middleware that can be interfaced to a large scale global design environment for code independent, multidisciplinary analysis using high fidelity equations. Aerospace technology needs are rapidly changing. Computational tools compatible with the requirements of national programs such as space transportation are needed. Conventional computation tools are inadequate for modern aerospace design needs. Advanced, modular computational tools are needed, such as those that incorporate the technology of massively parallel processors (MPP).

Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems

NASA Technical Reports Server (NTRS)

Guruswamy, Guru P.; Kwak, Dochan (Technical Monitor)

2002-01-01

A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel supercomputers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.
Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems

NASA Technical Reports Server (NTRS)

Guruswamy, Guru P.; Byun, Chansup; Kwak, Dochan (Technical Monitor)

2001-01-01

A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel super computers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.
Adaptive parallel logic networks

NASA Technical Reports Server (NTRS)

Martinez, Tony R.; Vidal, Jacques J.

1988-01-01

Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.
Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P

DOE Office of Scientific and Technical Information (OSTI.GOV)

Candel, A.; Kabel, A.; Lee, L.

In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).
Experience in highly parallel processing using DAP

NASA Technical Reports Server (NTRS)

Parkinson, D.

1987-01-01

Distributed Array Processors (DAP) have been in day to day use for ten years and a large amount of user experience has been gained. The profile of user applications is similar to that of the Massively Parallel Processor (MPP) working group. Experience has shown that contrary to expectations, highly parallel systems provide excellent performance on so-called dirty problems such as the physics part of meteorological codes. The reasons for this observation are discussed. The arguments against replacing bit processors with floating point processors are also discussed.
Dynamic load balancing of applications

DOEpatents

Wheat, Stephen R.

1997-01-01

An application-level method for dynamically maintaining global load balance on a parallel computer, particularly on massively parallel MIMD computers. Global load balancing is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated.
Practical aspects of prestack depth migration with finite differences

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ober, C.C.; Oldfield, R.A.; Womble, D.E.

1997-07-01

Finite-difference, prestack, depth migrations offers significant improvements over Kirchhoff methods in imaging near or under salt structures. The authors have implemented a finite-difference prestack depth migration algorithm for use on massively parallel computers which is discussed. The image quality of the finite-difference scheme has been investigated and suggested improvements are discussed. In this presentation, the authors discuss an implicit finite difference migration code, called Salvo, that has been developed through an ACTI (Advanced Computational Technology Initiative) joint project. This code is designed to be efficient on a variety of massively parallel computers. It takes advantage of both frequency and spatialmore » parallelism as well as the use of nodes dedicated to data input/output (I/O). Besides giving an overview of the finite-difference algorithm and some of the parallelism techniques used, migration results using both Kirchhoff and finite-difference migration will be presented and compared. The authors start out with a very simple Cartoon model where one can intuitively see the multiple travel paths and some of the potential problems that will be encountered with Kirchhoff migration. More complex synthetic models as well as results from actual seismic data from the Gulf of Mexico will be shown.« less
Massively Parallel Dantzig-Wolfe Decomposition Applied to Traffic Flow Scheduling

NASA Technical Reports Server (NTRS)

Rios, Joseph Lucio; Ross, Kevin

2009-01-01

Optimal scheduling of air traffic over the entire National Airspace System is a computationally difficult task. To speed computation, Dantzig-Wolfe decomposition is applied to a known linear integer programming approach for assigning delays to flights. The optimization model is proven to have the block-angular structure necessary for Dantzig-Wolfe decomposition. The subproblems for this decomposition are solved in parallel via independent computation threads. Experimental evidence suggests that as the number of subproblems/threads increases (and their respective sizes decrease), the solution quality, convergence, and runtime improve. A demonstration of this is provided by using one flight per subproblem, which is the finest possible decomposition. This results in thousands of subproblems and associated computation threads. This massively parallel approach is compared to one with few threads and to standard (non-decomposed) approaches in terms of solution quality and runtime. Since this method generally provides a non-integral (relaxed) solution to the original optimization problem, two heuristics are developed to generate an integral solution. Dantzig-Wolfe followed by these heuristics can provide a near-optimal (sometimes optimal) solution to the original problem hundreds of times faster than standard (non-decomposed) approaches. In addition, when massive decomposition is employed, the solution is shown to be more likely integral, which obviates the need for an integerization step. These results indicate that nationwide, real-time, high fidelity, optimal traffic flow scheduling is achievable for (at least) 3 hour planning horizons.
A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Madduri, Kamesh; Ediger, David; Jiang, Karl

2009-02-15

We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 millionmore » vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
A Faster Parallel Algorithm and Efficient Multithreaded Implementations for Evaluating Betweenness Centrality on Massive Datasets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Madduri, Kamesh; Ediger, David; Jiang, Karl

2009-05-29

We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor.more » For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
Learning Quantitative Sequence-Function Relationships from Massively Parallel Experiments

NASA Astrophysics Data System (ADS)

Atwal, Gurinder S.; Kinney, Justin B.

2016-03-01

A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships—functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes"—directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.
Multifaceted free-space image distributor for optical interconnects in massively parrallel processing

NASA Astrophysics Data System (ADS)

Zhao, Feng; Frietman, Edward E. E.; Han, Zhong; Chen, Ray T.

1999-04-01

A characteristic feature of a conventional von Neumann computer is that computing power is delivered by a single processing unit. Although increasing the clock frequency improves the performance of the computer, the switching speed of the semiconductor devices and the finite speed at which electrical signals propagate along the bus set the boundaries. Architectures containing large numbers of nodes can solve this performance dilemma, with the comment that main obstacles in designing such systems are caused by difficulties to come up with solutions that guarantee efficient communications among the nodes. Exchanging data becomes really a bottleneck should al nodes be connected by a shared resource. Only optics, due to its inherent parallelism, could solve that bottleneck. Here, we explore a multi-faceted free space image distributor to be used in optical interconnects in massively parallel processing. In this paper, physical and optical models of the image distributor are focused on from diffraction theory of light wave to optical simulations. the general features and the performance of the image distributor are also described. The new structure of an image distributor and the simulations for it are discussed. From the digital simulation and experiment, it is found that the multi-faceted free space image distributing technique is quite suitable for free space optical interconnection in massively parallel processing and new structure of the multifaceted free space image distributor would perform better.
Computer Sciences and Data Systems, volume 2

NASA Technical Reports Server (NTRS)

1987-01-01

Topics addressed include: data storage; information network architecture; VHSIC technology; fiber optics; laser applications; distributed processing; spaceborne optical disk controller; massively parallel processors; and advanced digital SAR processors.
Design considerations for parallel graphics libraries

NASA Technical Reports Server (NTRS)

Crockett, Thomas W.

1994-01-01

Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++

NASA Technical Reports Server (NTRS)

Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis

1994-01-01

Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
Method and apparatus for routing data in an inter-nodal communications lattice of a massively parallel computer system by semi-randomly varying routing policies for different packets

DOEpatents

Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul

2010-11-23

A massively parallel computer system contains an inter-nodal communications network of node-to-node links. Nodes vary a choice of routing policy for routing data in the network in a semi-random manner, so that similarly situated packets are not always routed along the same path. Semi-random variation of the routing policy tends to avoid certain local hot spots of network activity, which might otherwise arise using more consistent routing determinations. Preferably, the originating node chooses a routing policy for a packet, and all intermediate nodes in the path route the packet according to that policy. Policies may be rotated on a round-robin basis, selected by generating a random number, or otherwise varied.
Phase space simulation of collisionless stellar systems on the massively parallel processor

NASA Technical Reports Server (NTRS)

White, Richard L.

1987-01-01

A numerical technique for solving the collisionless Boltzmann equation describing the time evolution of a self gravitating fluid in phase space was implemented on the Massively Parallel Processor (MPP). The code performs calculations for a two dimensional phase space grid (with one space and one velocity dimension). Some results from calculations are presented. The execution speed of the code is comparable to the speed of a single processor of a Cray-XMP. Advantages and disadvantages of the MPP architecture for this type of problem are discussed. The nearest neighbor connectivity of the MPP array does not pose a significant obstacle. Future MPP-like machines should have much more local memory and easier access to staging memory and disks in order to be effective for this type of problem.
Applications of massively parallel computers in telemetry processing

NASA Technical Reports Server (NTRS)

El-Ghazawi, Tarek A.; Pritchard, Jim; Knoble, Gordon

1994-01-01

Telemetry processing refers to the reconstruction of full resolution raw instrumentation data with artifacts, of space and ground recording and transmission, removed. Being the first processing phase of satellite data, this process is also referred to as level-zero processing. This study is aimed at investigating the use of massively parallel computing technology in providing level-zero processing to spaceflights that adhere to the recommendations of the Consultative Committee on Space Data Systems (CCSDS). The workload characteristics, of level-zero processing, are used to identify processing requirements in high-performance computing systems. An example of level-zero functions on a SIMD MPP, such as the MasPar, is discussed. The requirements in this paper are based in part on the Earth Observing System (EOS) Data and Operation System (EDOS).
Genetic heterogeneity of RPMI-8402, a T-acute lymphoblastic leukemia cell line

PubMed Central

STOCZYNSKA-FIDELUS, EWELINA; PIASKOWSKI, SYLWESTER; PAWLOWSKA, ROZA; SZYBKA, MALGORZATA; PECIAK, JOANNA; HULAS-BIGOSZEWSKA, KRYSTYNA; WINIECKA-KLIMEK, MARTA; RIESKE, PIOTR

2016-01-01

Thorough examination of genetic heterogeneity of cell lines is uncommon. In order to address this issue, the present study analyzed the genetic heterogeneity of RPMI-8402, a T-acute lymphoblastic leukemia (T-ALL) cell line. For this purpose, traditional techniques such as fluorescence in situ hybridization and immunocytochemistry were used, in addition to more advanced techniques, including cell sorting, Sanger sequencing and massive parallel sequencing. The results indicated that the RPMI-8402 cell line consists of several genetically different cell subpopulations. Furthermore, massive parallel sequencing of RPMI-8402 provided insight into the evolution of T-ALL carcinogenesis, since this cell line exhibited the genetic heterogeneity typical of T-ALL. Therefore, the use of cell lines for drug testing in future studies may aid the progress of anticancer drug research. PMID:26870252
Big data mining analysis method based on cloud computing

NASA Astrophysics Data System (ADS)

Cai, Qing Qiu; Cui, Hong Gang; Tang, Hao

2017-08-01

Information explosion era, large data super-large, discrete and non-(semi) structured features have gone far beyond the traditional data management can carry the scope of the way. With the arrival of the cloud computing era, cloud computing provides a new technical way to analyze the massive data mining, which can effectively solve the problem that the traditional data mining method cannot adapt to massive data mining. This paper introduces the meaning and characteristics of cloud computing, analyzes the advantages of using cloud computing technology to realize data mining, designs the mining algorithm of association rules based on MapReduce parallel processing architecture, and carries out the experimental verification. The algorithm of parallel association rule mining based on cloud computing platform can greatly improve the execution speed of data mining.

Hybrid massively parallel fast sweeping method for static Hamilton-Jacobi equations

NASA Astrophysics Data System (ADS)

Detrixhe, Miles; Gibou, Frédéric

2016-10-01

The fast sweeping method is a popular algorithm for solving a variety of static Hamilton-Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling, and show state-of-the-art speedup values for the fast sweeping method.
PFLOTRAN User Manual: A Massively Parallel Reactive Flow and Transport Model for Describing Surface and Subsurface Processes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lichtner, Peter C.; Hammond, Glenn E.; Lu, Chuan

PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Writtenmore » in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 2 32 = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.« less
SISYPHUS: A high performance seismic inversion factory

NASA Astrophysics Data System (ADS)

Gokhberg, Alexey; Simutė, Saulė; Boehm, Christian; Fichtner, Andreas

2016-04-01

In the recent years the massively parallel high performance computers became the standard instruments for solving the forward and inverse problems in seismology. The respective software packages dedicated to forward and inverse waveform modelling specially designed for such computers (SPECFEM3D, SES3D) became mature and widely available. These packages achieve significant computational performance and provide researchers with an opportunity to solve problems of bigger size at higher resolution within a shorter time. However, a typical seismic inversion process contains various activities that are beyond the common solver functionality. They include management of information on seismic events and stations, 3D models, observed and synthetic seismograms, pre-processing of the observed signals, computation of misfits and adjoint sources, minimization of misfits, and process workflow management. These activities are time consuming, seldom sufficiently automated, and therefore represent a bottleneck that can substantially offset performance benefits provided by even the most powerful modern supercomputers. Furthermore, a typical system architecture of modern supercomputing platforms is oriented towards the maximum computational performance and provides limited standard facilities for automation of the supporting activities. We present a prototype solution that automates all aspects of the seismic inversion process and is tuned for the modern massively parallel high performance computing systems. We address several major aspects of the solution architecture, which include (1) design of an inversion state database for tracing all relevant aspects of the entire solution process, (2) design of an extensible workflow management framework, (3) integration with wave propagation solvers, (4) integration with optimization packages, (5) computation of misfits and adjoint sources, and (6) process monitoring. The inversion state database represents a hierarchical structure with branches for the static process setup, inversion iterations, and solver runs, each branch specifying information at the event, station and channel levels. The workflow management framework is based on an embedded scripting engine that allows definition of various workflow scenarios using a high-level scripting language and provides access to all available inversion components represented as standard library functions. At present the SES3D wave propagation solver is integrated in the solution; the work is in progress for interfacing with SPECFEM3D. A separate framework is designed for interoperability with an optimization module; the workflow manager and optimization process run in parallel and cooperate by exchanging messages according to a specially designed protocol. A library of high-performance modules implementing signal pre-processing, misfit and adjoint computations according to established good practices is included. Monitoring is based on information stored in the inversion state database and at present implements a command line interface; design of a graphical user interface is in progress. The software design fits well into the common massively parallel system architecture featuring a large number of computational nodes running distributed applications under control of batch-oriented resource managers. The solution prototype has been implemented on the "Piz Daint" supercomputer provided by the Swiss Supercomputing Centre (CSCS).
Method and apparatus of parallel computing with simultaneously operating stream prefetching and list prefetching engines

DOEpatents

Boyle, Peter A.; Christ, Norman H.; Gara, Alan; Mawhinney, Robert D.; Ohmacht, Martin; Sugavanam, Krishnan

2012-12-11

A prefetch system improves a performance of a parallel computing system. The parallel computing system includes a plurality of computing nodes. A computing node includes at least one processor and at least one memory device. The prefetch system includes at least one stream prefetch engine and at least one list prefetch engine. The prefetch system operates those engines simultaneously. After the at least one processor issues a command, the prefetch system passes the command to a stream prefetch engine and a list prefetch engine. The prefetch system operates the stream prefetch engine and the list prefetch engine to prefetch data to be needed in subsequent clock cycles in the processor in response to the passed command.
Angular trapping of anisometric nano-objects in a fluid.

PubMed

Celebrano, Michele; Rosman, Christina; Sönnichsen, Carsten; Krishnan, Madhavi

2012-11-14

We demonstrate the ability to trap, levitate, and orient single anisometric nanoscale objects with high angular precision in a fluid. An electrostatic fluidic trap confines a spherical object at a spatial location defined by the minimum of the electrostatic system free energy. For an anisometric object and a potential well lacking angular symmetry, the system free energy can further strongly depend on the object's orientation in the trap. Engineering the morphology of the trap thus enables precise spatial and angular confinement of a single levitating nano-object, and the process can be massively parallelized. Since the physics of the trap depends strongly on the surface charge of the object, the method is insensitive to the object's dielectric function. Furthermore, levitation of the assembled objects renders them amenable to individual manipulation using externally applied optical, electrical, or hydrodynamic fields, raising prospects for reconfigurable chip-based nano-object assemblies.
Protein Science by DNA Sequencing: How Advances in Molecular Biology Are Accelerating Biochemistry.

PubMed

Higgins, Sean A; Savage, David F

2018-01-09

A fundamental goal of protein biochemistry is to determine the sequence-function relationship, but the vastness of sequence space makes comprehensive evaluation of this landscape difficult. However, advances in DNA synthesis and sequencing now allow researchers to assess the functional impact of every single mutation in many proteins, but challenges remain in library construction and the development of general assays applicable to a diverse range of protein functions. This Perspective briefly outlines the technical innovations in DNA manipulation that allow massively parallel protein biochemistry and then summarizes the methods currently available for library construction and the functional assays of protein variants. Areas in need of future innovation are highlighted with a particular focus on assay development and the use of computational analysis with machine learning to effectively traverse the sequence-function landscape. Finally, applications in the fundamentals of protein biochemistry, disease prediction, and protein engineering are presented.
Biomorphic architectures for autonomous Nanosat designs

NASA Technical Reports Server (NTRS)

Hasslacher, Brosl; Tilden, Mark W.

1995-01-01

Modern space tool design is the science of making a machine both massively complex while at the same time extremely robust and dependable. We propose a novel nonlinear control technique that produces capable, self-organizing, micron-scale space machines at low cost and in large numbers by parallel silicon assembly. Experiments using biomorphic architectures (with ideal space attributes) have produced a wide spectrum of survival-oriented machines that are reliably domesticated for work applications in specific environments. In particular, several one-chip satellite prototypes show interesting control properties that can be turned into numerous application-specific machines for autonomous, disposable space tasks. We believe that the real power of these architectures lies in their potential to self-assemble into larger, robust, loosely coupled structures. Assembly takes place at hierarchical space scales, with different attendant properties, allowing for inexpensive solutions to many daunting work tasks. The nature of biomorphic control, design, engineering options, and applications are discussed.
3D simulation of floral oil storage in the scopa of South American insects

NASA Astrophysics Data System (ADS)

Ruettgers, Alexander; Griebel, Michael; Pastrik, Lars; Schmied, Heiko; Wittmann, Dieter; Scherrieble, Andreas; Dinkelmann, Albrecht; Stegmaier, Thomas; InstituteNumerical Simulation Team; Institute of Crop Science; Resource Conservation Team; Institute of Textile Technology; Process Engineering Team

2014-11-01

Several species of bees in South America possess structures to store and transport floral oils. By using closely spaced hairs at their back legs, the so called scopa, these bees can absorb and release oil droplets without loss. The high efficiency of this process is a matter of ongoing research. Basing on recent x-ray microtomography scans from the scopa of these bees at the Institute of Textile Technology and Process Engineering Denkendorf, we build a three-dimensional computer model. Using NaSt3DGPF, a two-phase flow solver developed at the Institute for Numerical Simulation of the University of Bonn, we perform massively parallel flow simulations with the complex micro-CT data. In this talk, we discuss the results of our simulations and the transfer of the x-ray measurement into a computer model. This research was funded under GR 1144/18-1 by the Deutsche Forschungsgemeinschaft (DFG).
Membrane-less microfiltration using inertial microfluidics

PubMed Central

Warkiani, Majid Ebrahimi; Tay, Andy Kah Ping; Guan, Guofeng; Han, Jongyoon

2015-01-01

Microfiltration is a ubiquitous and often crucial part of many industrial processes, including biopharmaceutical manufacturing. Yet, all existing filtration systems suffer from the issue of membrane clogging, which fundamentally limits the efficiency and reliability of the filtration process. Herein, we report the development of a membrane-less microfiltration system by massively parallelizing inertial microfluidics to achieve a macroscopic volume processing rates (~ 500 mL/min). We demonstrated the systems engineered for CHO (10–20 μm) and yeast (3–5 μm) cells filtration, which are two main cell types used for large-scale bioreactors. Our proposed system can replace existing filtration membrane and provide passive (no external force fields), continuous filtration, thus eliminating the need for membrane replacement. This platform has the desirable combinations of high throughput, low-cost, and scalability, making it compatible for a myriad of microfiltration applications and industrial purposes. PMID:26154774
ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains

PubMed Central

Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz

2016-01-01

With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity. PMID:27420734
Dynamic load balancing of applications

DOEpatents

Wheat, S.R.

1997-05-13

An application-level method for dynamically maintaining global load balance on a parallel computer, particularly on massively parallel MIMD computers is disclosed. Global load balancing is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated. 13 figs.
Computational methods and software systems for dynamics and control of large space structures

NASA Technical Reports Server (NTRS)

Park, K. C.; Felippa, C. A.; Farhat, C.; Pramono, E.

1990-01-01

Two key areas of crucial importance to the computer-based simulation of large space structures are discussed. The first area involves multibody dynamics (MBD) of flexible space structures, with applications directed to deployment, construction, and maneuvering. The second area deals with advanced software systems, with emphasis on parallel processing. The latest research thrust in the second area involves massively parallel computers.
DGDFT: A massively parallel method for large scale density functional theory calculations.

PubMed

Hu, Wei; Lin, Lin; Yang, Chao

2015-09-28

We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.
Ocean Modeling and Visualization on Massively Parallel Computer

NASA Technical Reports Server (NTRS)

Chao, Yi; Li, P. Peggy; Wang, Ping; Katz, Daniel S.; Cheng, Benny N.

1997-01-01

Climate modeling is one of the grand challenges of computational science, and ocean modeling plays an important role in both understanding the current climatic conditions and predicting future climate change.
Theories of central engine for long gamma-ray bursts

NASA Astrophysics Data System (ADS)

Nagataki, Shigehiro

2018-02-01

Long GRBs are the most powerful explosions in the universe since the Big Bang. At least, some fraction of long GRBs are born from the death of massive stars. Likewise, only some fraction of massive stars that satisfy additional special conditions explode as long GRBs associated with supernovae/hypernovae. In this paper, we discuss the explosion mechanism of long GRBs associated with hypernovae: ‘the central engine of long GRBs’. The central engine of long GRBs is very different from that of core-collapse supernovae, although the mechanism of the engine is still not firmly established. In this paper, we review theoretical studies of the central engine of long GRBs. First, we discuss possible progenitor stars. Then several promising mechanisms of the central engine—such as black hole and magnetar formation—will be reviewed. We will also mention some more exotic models. Finally, we describe prospects for future studies of the central engine of long GRBs.
Sensitive and specific detection of EML4-ALK rearrangements in non-small cell lung cancer (NSCLC) specimens by multiplex amplicon RNA massive parallel sequencing.

PubMed

Moskalev, Evgeny A; Frohnauer, Judith; Merkelbach-Bruse, Sabine; Schildhaus, Hans-Ulrich; Dimmler, Arno; Schubert, Thomas; Boltze, Carsten; König, Helmut; Fuchs, Florian; Sirbu, Horia; Rieker, Ralf J; Agaimy, Abbas; Hartmann, Arndt; Haller, Florian

2014-06-01

Recurrent gene fusions of anaplastic lymphoma receptor tyrosine kinase (ALK) and echinoderm microtubule-associated protein-like 4 (EML4) have been recently identified in ∼5% of non-small cell lung cancers (NSCLCs) and are targets for selective tyrosine kinase inhibitors. While fluorescent in situ hybridization (FISH) is the current gold standard for detection of EML4-ALK rearrangements, several limitations exist including high costs, time-consuming evaluation and somewhat equivocal interpretation of results. In contrast, targeted massive parallel sequencing has been introduced as a powerful method for simultaneous and sensitive detection of multiple somatic mutations even in limited biopsies, and is currently evolving as the method of choice for molecular diagnostic work-up of NSCLCs. We developed a novel approach for indirect detection of EML4-ALK rearrangements based on 454 massive parallel sequencing after reverse transcription and subsequent multiplex amplification (multiplex ALK RNA-seq) which takes advantage of unbalanced expression of the 5' and 3' ALK mRNA regions. Two lung cancer cell lines and a selected series of 32 NSCLC samples including 11 cases with EML4-ALK rearrangement were analyzed with this novel approach in comparison to ALK FISH, ALK qRT-PCR and EML4-ALK RT-PCR. The H2228 cell line with known EML4-ALK rearrangement showed 171 and 729 reads for 5' and 3' ALK regions, respectively, demonstrating a clearly unbalanced expression pattern. In contrast, the H1299 cell line with ALK wildtype status displayed no reads for both ALK regions. Considering a threshold of 100 reads for 3' ALK region as indirect indicator of EML4-ALK rearrangement, there was 100% concordance between the novel multiplex ALK RNA-seq approach and ALK FISH among all 32 NSCLC samples. Multiplex ALK RNA-seq is a sensitive and specific method for indirect detection of EML4-ALK rearrangements, and can be easily implemented in panel based molecular diagnostic work-up of NSCLCs by massive parallel sequencing. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
WESTPA: An interoperable, highly scalable software package for weighted ensemble simulation and analysis

PubMed Central

Zwier, Matthew C.; Adelman, Joshua L.; Kaus, Joseph W.; Pratt, Adam J.; Wong, Kim F.; Rego, Nicholas B.; Suárez, Ernesto; Lettieri, Steven; Wang, David W.; Grabe, Michael; Zuckerman, Daniel M.; Chong, Lillian T.

2015-01-01

The weighted ensemble (WE) path sampling approach orchestrates an ensemble of parallel calculations with intermittent communication to enhance the sampling of rare events, such as molecular associations or conformational changes in proteins or peptides. Trajectories are replicated and pruned in a way that focuses computational effort on under-explored regions of configuration space while maintaining rigorous kinetics. To enable the simulation of rare events at any scale (e.g. atomistic, cellular), we have developed an open-source, interoperable, and highly scalable software package for the execution and analysis of WE simulations: WESTPA (The Weighted Ensemble Simulation Toolkit with Parallelization and Analysis). WESTPA scales to thousands of CPU cores and includes a suite of analysis tools that have been implemented in a massively parallel fashion. The software has been designed to interface conveniently with any dynamics engine and has already been used with a variety of molecular dynamics (e.g. GROMACS, NAMD, OpenMM, AMBER) and cell-modeling packages (e.g. BioNetGen, MCell). WESTPA has been in production use for over a year, and its utility has been demonstrated for a broad set of problems, ranging from atomically detailed host-guest associations to non-spatial chemical kinetics of cellular signaling networks. The following describes the design and features of WESTPA, including the facilities it provides for running WE simulations, storing and analyzing WE simulation data, as well as examples of input and output. PMID:26392815
Speeding up parallel processing

NASA Technical Reports Server (NTRS)

Denning, Peter J.

1988-01-01

In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.
Porting LAMMPS to GPUs.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brown, William Michael; Plimpton, Steven James; Wang, Peng

2010-03-01

LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for soft materials (biomolecules, polymers) and solid-state materials (metals, semiconductors) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. The code is designed to be easy to modify or extend with new functionality.
Support for non-locking parallel reception of packets belonging to a single memory reception FIFO

DOEpatents

Chen, Dong [Yorktown Heights, NY; Heidelberger, Philip [Yorktown Heights, NY; Salapura, Valentina [Yorktown Heights, NY; Senger, Robert M [Yorktown Heights, NY; Steinmacher-Burow, Burkhard [Boeblingen, DE; Sugawara, Yutaka [Yorktown Heights, NY

2011-01-27

A method and apparatus for distributed parallel messaging in a parallel computing system. A plurality of DMA engine units are configured in a multiprocessor system to operate in parallel, one DMA engine unit for transferring a current packet received at a network reception queue to a memory location in a memory FIFO (rmFIFO) region of a memory. A control unit implements logic to determine whether any prior received packet destined for that rmFIFO is still in a process of being stored in the associated memory by another DMA engine unit of the plurality, and prevent the one DMA engine unit from indicating completion of storing the current received packet in the reception memory FIFO (rmFIFO) until all prior received packets destined for that rmFIFO are completely stored by the other DMA engine units. Thus, there is provided non-locking support so that multiple packets destined for a single rmFIFO are transferred and stored in parallel to predetermined locations in a memory.

NVIDIA OptiX ray-tracing engine as a new tool for modelling medical imaging systems

NASA Astrophysics Data System (ADS)

Pietrzak, Jakub; Kacperski, Krzysztof; Cieślar, Marek

2015-03-01

The most accurate technique to model the X- and gamma radiation path through a numerically defined object is the Monte Carlo simulation which follows single photons according to their interaction probabilities. A simplified and much faster approach, which just integrates total interaction probabilities along selected paths, is known as ray tracing. Both techniques are used in medical imaging for simulating real imaging systems and as projectors required in iterative tomographic reconstruction algorithms. These approaches are ready for massive parallel implementation e.g. on Graphics Processing Units (GPU), which can greatly accelerate the computation time at a relatively low cost. In this paper we describe the application of the NVIDIA OptiX ray-tracing engine, popular in professional graphics and rendering applications, as a new powerful tool for X- and gamma ray-tracing in medical imaging. It allows the implementation of a variety of physical interactions of rays with pixel-, mesh- or nurbs-based objects, and recording any required quantities, like path integrals, interaction sites, deposited energies, and others. Using the OptiX engine we have implemented a code for rapid Monte Carlo simulations of Single Photon Emission Computed Tomography (SPECT) imaging, as well as the ray-tracing projector, which can be used in reconstruction algorithms. The engine generates efficient, scalable and optimized GPU code, ready to run on multi GPU heterogeneous systems. We have compared the results our simulations with the GATE package. With the OptiX engine the computation time of a Monte Carlo simulation can be reduced from days to minutes.
AdiosStMan: Parallelizing Casacore Table Data System using Adaptive IO System

NASA Astrophysics Data System (ADS)

Wang, R.; Harris, C.; Wicenec, A.

2016-07-01

In this paper, we investigate the Casacore Table Data System (CTDS) used in the casacore and CASA libraries, and methods to parallelize it. CTDS provides a storage manager plugin mechanism for third-party developers to design and implement their own CTDS storage managers. Having this in mind, we looked into various storage backend techniques that can possibly enable parallel I/O for CTDS by implementing new storage managers. After carrying on benchmarks showing the excellent parallel I/O throughput of the Adaptive IO System (ADIOS), we implemented an ADIOS based parallel CTDS storage manager. We then applied the CASA MSTransform frequency split task to verify the ADIOS Storage Manager. We also ran a series of performance tests to examine the I/O throughput in a massively parallel scenario.
Dubinett - Targeted Sequencing 2012 — EDRN Public Portal

Cancer.gov

we propose to use targeted massively parallel DNA sequencing to identify somatic alterations within mutational hotspots in matched sets of primary lung tumors, premalignant lesions, and adjacent,histologically normal lung tissue.
Massively parallel electrical conductivity imaging of the subsurface: Applications to hydrocarbon exploration

NASA Astrophysics Data System (ADS)

Newman, Gregory A.; Commer, Michael

2009-07-01

Three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical conductivity mapping of potential offshore oil and gas reservoirs. The imaging technology employs controlled source electromagnetic (CSEM) and magnetotelluric (MT) fields and treats geological media exhibiting transverse anisotropy. Moreover when combined with established seismic methods, direct imaging of reservoir fluids is possible. Because of the size of the 3D conductivity imaging problem, strategies are required exploiting computational parallelism and optimal meshing. The algorithm thus developed has been shown to scale to tens of thousands of processors. In one imaging experiment, 32,768 tasks/processors on the IBM Watson Research Blue Gene/L supercomputer were successfully utilized. Over a 24 hour period we were able to image a large scale field data set that previously required over four months of processing time on distributed clusters based on Intel or AMD processors utilizing 1024 tasks on an InfiniBand fabric. Electrical conductivity imaging using massively parallel computational resources produces results that cannot be obtained otherwise and are consistent with timeframes required for practical exploration problems.
Running ATLAS workloads within massively parallel distributed applications using Athena Multi-Process framework (AthenaMP)

NASA Astrophysics Data System (ADS)

Calafiura, Paolo; Leggett, Charles; Seuster, Rolf; Tsulaia, Vakhtang; Van Gemmeren, Peter

2015-12-01

AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write mechanisms, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows the running of AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the diversity of ATLAS event processing workloads on various computing resources: Grid, opportunistic resources and HPC.
Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array

NASA Astrophysics Data System (ADS)

Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul

2008-04-01

This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.
GPU-accelerated Tersoff potentials for massively parallel Molecular Dynamics simulations

NASA Astrophysics Data System (ADS)

Nguyen, Trung Dac

2017-03-01

The Tersoff potential is one of the empirical many-body potentials that has been widely used in simulation studies at atomic scales. Unlike pair-wise potentials, the Tersoff potential involves three-body terms, which require much more arithmetic operations and data dependency. In this contribution, we have implemented the GPU-accelerated version of several variants of the Tersoff potential for LAMMPS, an open-source massively parallel Molecular Dynamics code. Compared to the existing MPI implementation in LAMMPS, the GPU implementation exhibits a better scalability and offers a speedup of 2.2X when run on 1000 compute nodes on the Titan supercomputer. On a single node, the speedup ranges from 2.0 to 8.0 times, depending on the number of atoms per GPU and hardware configurations. The most notable features of our GPU-accelerated version include its design for MPI/accelerator heterogeneous parallelism, its compatibility with other functionalities in LAMMPS, its ability to give deterministic results and to support both NVIDIA CUDA- and OpenCL-enabled accelerators. Our implementation is now part of the GPU package in LAMMPS and accessible for public use.
Progress report on PIXIE3D, a fully implicit 3D extended MHD solver

NASA Astrophysics Data System (ADS)

Chacon, Luis

2008-11-01

Recently, invited talk at DPP07 an optimal, massively parallel implicit algorithm for 3D resistive magnetohydrodynamics (PIXIE3D) was demonstrated. Excellent algorithmic and parallel results were obtained with up to 4096 processors and 138 million unknowns. While this is a remarkable result, further developments are still needed for PIXIE3D to become a 3D extended MHD production code in general geometries. In this poster, we present an update on the status of PIXIE3D on several fronts. On the physics side, we will describe our progress towards the full Braginskii model, including: electron Hall terms, anisotropic heat conduction, and gyroviscous corrections. Algorithmically, we will discuss progress towards a robust, optimal, nonlinear solver for arbitrary geometries, including preconditioning for the new physical effects described, the implementation of a coarse processor-grid solver (to maintain optimal algorithmic performance for an arbitrarily large number of processors in massively parallel computations), and of a multiblock capability to deal with complicated geometries. L. Chac'on, Phys. Plasmas 15, 056103 (2008);
Towards implementation of cellular automata in Microbial Fuel Cells.

PubMed

Tsompanas, Michail-Antisthenis I; Adamatzky, Andrew; Sirakoulis, Georgios Ch; Greenman, John; Ieropoulos, Ioannis

2017-01-01

The Microbial Fuel Cell (MFC) is a bio-electrochemical transducer converting waste products into electricity using microbial communities. Cellular Automaton (CA) is a uniform array of finite-state machines that update their states in discrete time depending on states of their closest neighbors by the same rule. Arrays of MFCs could, in principle, act as massive-parallel computing devices with local connectivity between elementary processors. We provide a theoretical design of such a parallel processor by implementing CA in MFCs. We have chosen Conway's Game of Life as the 'benchmark' CA because this is the most popular CA which also exhibits an enormously rich spectrum of patterns. Each cell of the Game of Life CA is realized using two MFCs. The MFCs are linked electrically and hydraulically. The model is verified via simulation of an electrical circuit demonstrating equivalent behaviours. The design is a first step towards future implementations of fully autonomous biological computing devices with massive parallelism. The energy independence of such devices counteracts their somewhat slow transitions-compared to silicon circuitry-between the different states during computation.
Towards implementation of cellular automata in Microbial Fuel Cells

PubMed Central

Adamatzky, Andrew; Sirakoulis, Georgios Ch.; Greenman, John; Ieropoulos, Ioannis

2017-01-01

The Microbial Fuel Cell (MFC) is a bio-electrochemical transducer converting waste products into electricity using microbial communities. Cellular Automaton (CA) is a uniform array of finite-state machines that update their states in discrete time depending on states of their closest neighbors by the same rule. Arrays of MFCs could, in principle, act as massive-parallel computing devices with local connectivity between elementary processors. We provide a theoretical design of such a parallel processor by implementing CA in MFCs. We have chosen Conway’s Game of Life as the ‘benchmark’ CA because this is the most popular CA which also exhibits an enormously rich spectrum of patterns. Each cell of the Game of Life CA is realized using two MFCs. The MFCs are linked electrically and hydraulically. The model is verified via simulation of an electrical circuit demonstrating equivalent behaviours. The design is a first step towards future implementations of fully autonomous biological computing devices with massive parallelism. The energy independence of such devices counteracts their somewhat slow transitions—compared to silicon circuitry—between the different states during computation. PMID:28498871
Hybrid massively parallel fast sweeping method for static Hamilton–Jacobi equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Detrixhe, Miles, E-mail: mdetrixhe@engineering.ucsb.edu; University of California Santa Barbara, Santa Barbara, CA, 93106; Gibou, Frédéric, E-mail: fgibou@engineering.ucsb.edu

The fast sweeping method is a popular algorithm for solving a variety of static Hamilton–Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling,more » and show state-of-the-art speedup values for the fast sweeping method.« less
Transmissive Nanohole Arrays for Massively-Parallel Optical Biosensing

PubMed Central

2015-01-01

A high-throughput optical biosensing technique is proposed and demonstrated. This hybrid technique combines optical transmission of nanoholes with colorimetric silver staining. The size and spacing of the nanoholes are chosen so that individual nanoholes can be independently resolved in massive parallel using an ordinary transmission optical microscope, and, in place of determining a spectral shift, the brightness of each nanohole is recorded to greatly simplify the readout. Each nanohole then acts as an independent sensor, and the blocking of nanohole optical transmission by enzymatic silver staining defines the specific detection of a biological agent. Nearly 10000 nanoholes can be simultaneously monitored under the field of view of a typical microscope. As an initial proof of concept, biotinylated lysozyme (biotin-HEL) was used as a model analyte, giving a detection limit as low as 0.1 ng/mL. PMID:25530982
Large-eddy simulations of compressible convection on massively parallel computers. [stellar physics

NASA Technical Reports Server (NTRS)

Xie, Xin; Toomre, Juri

1993-01-01

We report preliminary implementation of the large-eddy simulation (LES) technique in 2D simulations of compressible convection carried out on the CM-2 massively parallel computer. The convective flow fields in our simulations possess structures similar to those found in a number of direct simulations, with roll-like flows coherent across the entire depth of the layer that spans several density scale heights. Our detailed assessment of the effects of various subgrid scale (SGS) terms reveals that they may affect the gross character of convection. Yet, somewhat surprisingly, we find that our LES solutions, and another in which the SGS terms are turned off, only show modest differences. The resulting 2D flows realized here are rather laminar in character, and achieving substantial turbulence may require stronger forcing and less dissipation.
Contextual classification on the massively parallel processor

NASA Technical Reports Server (NTRS)

Tilton, James C.

1987-01-01

Classifiers are often used to produce land cover maps from multispectral Earth observation imagery. Conventionally, these classifiers have been designed to exploit the spectral information contained in the imagery. Very few classifiers exploit the spatial information content of the imagery, and the few that do rarely exploit spatial information content in conjunction with spectral and/or temporal information. A contextual classifier that exploits spatial and spectral information in combination through a general statistical approach was studied. Early test results obtained from an implementation of the classifier on a VAX-11/780 minicomputer were encouraging, but they are of limited meaning because they were produced from small data sets. An implementation of the contextual classifier is presented on the Massively Parallel Processor (MPP) at Goddard that for the first time makes feasible the testing of the classifier on large data sets.
Massively Parallel Sequencing of Patients with Intellectual Disability, Congenital Anomalies and/or Autism Spectrum Disorders with a Targeted Gene Panel

PubMed Central

Brett, Maggie; McPherson, John; Zang, Zhi Jiang; Lai, Angeline; Tan, Ee-Shien; Ng, Ivy; Ong, Lai-Choo; Cham, Breana; Tan, Patrick; Rozen, Steve; Tan, Ene-Choo

2014-01-01

Developmental delay and/or intellectual disability (DD/ID) affects 1–3% of all children. At least half of these are thought to have a genetic etiology. Recent studies have shown that massively parallel sequencing (MPS) using a targeted gene panel is particularly suited for diagnostic testing for genetically heterogeneous conditions. We report on our experiences with using massively parallel sequencing of a targeted gene panel of 355 genes for investigating the genetic etiology of eight patients with a wide range of phenotypes including DD/ID, congenital anomalies and/or autism spectrum disorder. Targeted sequence enrichment was performed using the Agilent SureSelect Target Enrichment Kit and sequenced on the Illumina HiSeq2000 using paired-end reads. For all eight patients, 81–84% of the targeted regions achieved read depths of at least 20×, with average read depths overlapping targets ranging from 322× to 798×. Causative variants were successfully identified in two of the eight patients: a nonsense mutation in the ATRX gene and a canonical splice site mutation in the L1CAM gene. In a third patient, a canonical splice site variant in the USP9X gene could likely explain all or some of her clinical phenotypes. These results confirm the value of targeted MPS for investigating DD/ID in children for diagnostic purposes. However, targeted gene MPS was less likely to provide a genetic diagnosis for children whose phenotype includes autism. PMID:24690944
Inter-laboratory evaluation of the EUROFORGEN Global ancestry-informative SNP panel by massively parallel sequencing using the Ion PGM™.

PubMed

Eduardoff, M; Gross, T E; Santos, C; de la Puente, M; Ballard, D; Strobl, C; Børsting, C; Morling, N; Fusco, L; Hussing, C; Egyed, B; Souto, L; Uacyisrael, J; Syndercombe Court, D; Carracedo, Á; Lareu, M V; Schneider, P M; Parson, W; Phillips, C; Parson, W; Phillips, C

2016-07-01

The EUROFORGEN Global ancestry-informative SNP (AIM-SNPs) panel is a forensic multiplex of 128 markers designed to differentiate an individual's ancestry from amongst the five continental population groups of Africa, Europe, East Asia, Native America, and Oceania. A custom multiplex of AmpliSeq™ PCR primers was designed for the Global AIM-SNPs to perform massively parallel sequencing using the Ion PGM™ system. This study assessed individual SNP genotyping precision using the Ion PGM™, the forensic sensitivity of the multiplex using dilution series, degraded DNA plus simple mixtures, and the ancestry differentiation power of the final panel design, which required substitution of three original ancestry-informative SNPs with alternatives. Fourteen populations that had not been previously analyzed were genotyped using the custom multiplex and these studies allowed assessment of genotyping performance by comparison of data across five laboratories. Results indicate a low level of genotyping error can still occur from sequence misalignment caused by homopolymeric tracts close to the target SNP, despite careful scrutiny of candidate SNPs at the design stage. Such sequence misalignment required the exclusion of component SNP rs2080161 from the Global AIM-SNPs panel. However, the overall genotyping precision and sensitivity of this custom multiplex indicates the Ion PGM™ assay for the Global AIM-SNPs is highly suitable for forensic ancestry analysis with massively parallel sequencing. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
A general method to eliminate laboratory induced recombinants during massive, parallel sequencing of cDNA library.

PubMed

Waugh, Caryll; Cromer, Deborah; Grimm, Andrew; Chopra, Abha; Mallal, Simon; Davenport, Miles; Mak, Johnson

2015-04-09

Massive, parallel sequencing is a potent tool for dissecting the regulation of biological processes by revealing the dynamics of the cellular RNA profile under different conditions. Similarly, massive, parallel sequencing can be used to reveal the complexity of viral quasispecies that are often found in the RNA virus infected host. However, the production of cDNA libraries for next-generation sequencing (NGS) necessitates the reverse transcription of RNA into cDNA and the amplification of the cDNA template using PCR, which may introduce artefact in the form of phantom nucleic acids species that can bias the composition and interpretation of original RNA profiles. Using HIV as a model we have characterised the major sources of error during the conversion of viral RNA to cDNA, namely excess RNA template and the RNaseH activity of the polymerase enzyme, reverse transcriptase. In addition we have analysed the effect of PCR cycle on detection of recombinants and assessed the contribution of transfection of highly similar plasmid DNA to the formation of recombinant species during the production of our control viruses. We have identified RNA template concentrations, RNaseH activity of reverse transcriptase, and PCR conditions as key parameters that must be carefully optimised to minimise chimeric artefacts. Using our optimised RT-PCR conditions, in combination with our modified PCR amplification procedure, we have developed a reliable technique for accurate determination of RNA species using NGS technology.
Sierra Structural Dynamics User's Notes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Reese, Garth M.

2015-10-19

Sierra/SD provides a massively parallel implementation of structural dynamics finite element analysis, required for high fidelity, validated models used in modal, vibration, static and shock analysis of weapons systems. This document provides a users guide to the input for Sierra/SD. Details of input specifications for the different solution types, output options, element types and parameters are included. The appendices contain detailed examples, and instructions for running the software on parallel platforms.
Reverse time migration: A seismic processing application on the connection machine

NASA Technical Reports Server (NTRS)

Fiebrich, Rolf-Dieter

1987-01-01

The implementation of a reverse time migration algorithm on the Connection Machine, a massively parallel computer is described. Essential architectural features of this machine as well as programming concepts are presented. The data structures and parallel operations for the implementation of the reverse time migration algorithm are described. The algorithm matches the Connection Machine architecture closely and executes almost at the peak performance of this machine.
Massively-Parallel Architectures for Automatic Recognition of Visual Speech Signals

DTIC Science & Technology

1988-10-12

Secusrity Clamifieation, Nlassively-Parallel Architectures for Automa ic Recognitio of Visua, Speech Signals 12. PERSONAL AUTHOR(S) Terrence J...characteristics of speech from tJhe, visual speech signals. Neural networks have been trained on a database of vowels. The rqw images of faces , aligned and...images of faces , aligned and preprocessed, were used as input to these network which were trained to estimate the corresponding envelope of the

Solving Navier-Stokes equations on a massively parallel processor; The 1 GFLOP performance

DOE Office of Scientific and Technical Information (OSTI.GOV)

Saati, A.; Biringen, S.; Farhat, C.

This paper reports on experience in solving large-scale fluid dynamics problems on the Connection Machine model CM-2. The authors have implemented a parallel version of the MacCormack scheme for the solution of the Navier-Stokes equations. By using triad floating point operations and reducing the number of interprocessor communications, they have achieved a sustained performance rate of 1.42 GFLOPS.
Sierra/SD User's Notes.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Munday, Lynn Brendon; Day, David M.; Bunting, Gregory

Sierra/SD provides a massively parallel implementation of structural dynamics finite element analysis, required for high fidelity, validated models used in modal, vibration, static and shock analysis of weapons systems. This document provides a users guide to the input for Sierra/SD. Details of input specifications for the different solution types, output options, element types and parameters are included. The appendices contain detailed examples, and instructions for running the software on parallel platforms.
Representing and computing regular languages on massively parallel networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Miller, M.I.; O'Sullivan, J.A.; Boysam, B.

1991-01-01

This paper proposes a general method for incorporating rule-based constraints corresponding to regular languages into stochastic inference problems, thereby allowing for a unified representation of stochastic and syntactic pattern constraints. The authors' approach first established the formal connection of rules to Chomsky grammars, and generalizes the original work of Shannon on the encoding of rule-based channel sequences to Markov chains of maximum entropy. This maximum entropy probabilistic view leads to Gibb's representations with potentials which have their number of minima growing at precisely the exponential rate that the language of deterministically constrained sequences grow. These representations are coupled to stochasticmore » diffusion algorithms, which sample the language-constrained sequences by visiting the energy minima according to the underlying Gibbs' probability law. The coupling to stochastic search methods yields the all-important practical result that fully parallel stochastic cellular automata may be derived to generate samples from the rule-based constraint sets. The production rules and neighborhood state structure of the language of sequences directly determines the necessary connection structures of the required parallel computing surface. Representations of this type have been mapped to the DAP-510 massively-parallel processor consisting of 1024 mesh-connected bit-serial processing elements for performing automated segmentation of electron-micrograph images.« less
Distributed parallel messaging for multiprocessor systems

DOEpatents

Chen, Dong; Heidelberger, Philip; Salapura, Valentina; Senger, Robert M; Steinmacher-Burrow, Burhard; Sugawara, Yutaka

2013-06-04

A method and apparatus for distributed parallel messaging in a parallel computing system. The apparatus includes, at each node of a multiprocessor network, multiple injection messaging engine units and reception messaging engine units, each implementing a DMA engine and each supporting both multiple packet injection into and multiple reception from a network, in parallel. The reception side of the messaging unit (MU) includes a switch interface enabling writing of data of a packet received from the network to the memory system. The transmission side of the messaging unit, includes switch interface for reading from the memory system when injecting packets into the network.
Parallelization and checkpointing of GPU applications through program transformation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Solano-Quinde, Lizandro Damian

2012-01-01

GPUs have emerged as a powerful tool for accelerating general-purpose applications. The availability of programming languages that makes writing general-purpose applications for running on GPUs tractable have consolidated GPUs as an alternative for accelerating general purpose applications. Among the areas that have benefited from GPU acceleration are: signal and image processing, computational fluid dynamics, quantum chemistry, and, in general, the High Performance Computing (HPC) Industry. In order to continue to exploit higher levels of parallelism with GPUs, multi-GPU systems are gaining popularity. In this context, single-GPU applications are parallelized for running in multi-GPU systems. Furthermore, multi-GPU systems help to solvemore » the GPU memory limitation for applications with large application memory footprint. Parallelizing single-GPU applications has been approached by libraries that distribute the workload at runtime, however, they impose execution overhead and are not portable. On the other hand, on traditional CPU systems, parallelization has been approached through application transformation at pre-compile time, which enhances the application to distribute the workload at application level and does not have the issues of library-based approaches. Hence, a parallelization scheme for GPU systems based on application transformation is needed. Like any computing engine of today, reliability is also a concern in GPUs. GPUs are vulnerable to transient and permanent failures. Current checkpoint/restart techniques are not suitable for systems with GPUs. Checkpointing for GPU systems present new and interesting challenges, primarily due to the natural differences imposed by the hardware design, the memory subsystem architecture, the massive number of threads, and the limited amount of synchronization among threads. Therefore, a checkpoint/restart technique suitable for GPU systems is needed. The goal of this work is to exploit higher levels of parallelism and to develop support for application-level fault tolerance in applications using multiple GPUs. Our techniques reduce the burden of enhancing single-GPU applications to support these features. To achieve our goal, this work designs and implements a framework for enhancing a single-GPU OpenCL application through application transformation.« less
Cost-effective GPU-grid for genome-wide epistasis calculations.

PubMed

Pütz, B; Kam-Thong, T; Karbalai, N; Altmann, A; Müller-Myhsok, B

2013-01-01

Until recently, genotype studies were limited to the investigation of single SNP effects due to the computational burden incurred when studying pairwise interactions of SNPs. However, some genetic effects as simple as coloring (in plants and animals) cannot be ascribed to a single locus but only understood when epistasis is taken into account [1]. It is expected that such effects are also found in complex diseases where many genes contribute to the clinical outcome of affected individuals. Only recently have such problems become feasible computationally. The inherently parallel structure of the problem makes it a perfect candidate for massive parallelization on either grid or cloud architectures. Since we are also dealing with confidential patient data, we were not able to consider a cloud-based solution but had to find a way to process the data in-house and aimed to build a local GPU-based grid structure. Sequential epistatsis calculations were ported to GPU using CUDA at various levels. Parallelization on the CPU was compared to corresponding GPU counterparts with regards to performance and cost. A cost-effective solution was created by combining custom-built nodes equipped with relatively inexpensive consumer-level graphics cards with highly parallel GPUs in a local grid. The GPU method outperforms current cluster-based systems on a price/performance criterion, as a single GPU shows speed performance comparable up to 200 CPU cores. The outlined approach will work for problems that easily lend themselves to massive parallelization. Code for various tasks has been made available and ongoing development of tools will further ease the transition from sequential to parallel algorithms.
A Combined Water-Bromotrifluoromethane Crash-Fire Protection System for a T-56 Turbopropeller Engine

NASA Technical Reports Server (NTRS)

Campbell, John A.; Busch, Arthur M.

1959-01-01

A crash-fire protection system is described which will suppress the ignition of crash-spilled fuel that may be ingested by a T-56 turbo-propeller engine. This system includes means for rapidly extinguishing the combustor flame, means for cooling and inerting with water the hot engine parts likely to ignite engine ingested fuel, and means for blanketing with bromotrifluoromethane massive metal parts that may reheat after the engine stops rotating. Combustion-chamber flames were rapidly extinguished at the engine fuel nozzles by a fuel shutoff and drain valve. Hot engine parts were inerted and cooled by 42 pounds of water discharged at seven engine stations. Massive metal parts that could reheat were inerted with 10 pounds of bromotrifluoromethane discharged at two engine stations. Performance trials of the crash-fire protection system were conducted by bringing the engine up to takeoff temperature, actuating the crash-fire protection system, and then spraying fuel into the engine to simulate crash-ingested fuel. No fires occurred during these trials, although fuel was sprayed into the engine from 0.3 second to 15 minutes after actuating the crash-fire protection system.
Novel Scalable 3-D MT Inverse Solver

NASA Astrophysics Data System (ADS)

Kuvshinov, A. V.; Kruglyakov, M.; Geraskin, A.

2016-12-01

We present a new, robust and fast, three-dimensional (3-D) magnetotelluric (MT) inverse solver. As a forward modelling engine a highly-scalable solver extrEMe [1] is used. The (regularized) inversion is based on an iterative gradient-type optimization (quasi-Newton method) and exploits adjoint sources approach for fast calculation of the gradient of the misfit. The inverse solver is able to deal with highly detailed and contrasting models, allows for working (separately or jointly) with any type of MT (single-site and/or inter-site) responses, and supports massive parallelization. Different parallelization strategies implemented in the code allow for optimal usage of available computational resources for a given problem set up. To parameterize an inverse domain a mask approach is implemented, which means that one can merge any subset of forward modelling cells in order to account for (usually) irregular distribution of observation sites. We report results of 3-D numerical experiments aimed at analysing the robustness, performance and scalability of the code. In particular, our computational experiments carried out at different platforms ranging from modern laptops to high-performance clusters demonstrate practically linear scalability of the code up to thousands of nodes. 1. Kruglyakov, M., A. Geraskin, A. Kuvshinov, 2016. Novel accurate and scalable 3-D MT forward solver based on a contracting integral equation method, Computers and Geosciences, in press.
Demonstration of Hadoop-GIS: A Spatial Data Warehousing System Over MapReduce

PubMed Central

Aji, Ablimit; Sun, Xiling; Vo, Hoang; Liu, Qioaling; Lee, Rubao; Zhang, Xiaodong; Saltz, Joel; Wang, Fusheng

2016-01-01

The proliferation of GPS-enabled devices, and the rapid improvement of scientific instruments have resulted in massive amounts of spatial data in the last decade. Support of high performance spatial queries on large volumes data has become increasingly important in numerous fields, which requires a scalable and efficient spatial data warehousing solution as existing approaches exhibit scalability limitations and efficiency bottlenecks for large scale spatial applications. In this demonstration, we present Hadoop-GIS – a scalable and high performance spatial query system over MapReduce. Hadoop-GIS provides an efficient spatial query engine to process spatial queries, data and space based partitioning, and query pipelines that parallelize queries implicitly on MapReduce. Hadoop-GIS also provides an expressive, SQL-like spatial query language for workload specification. We will demonstrate how spatial queries are expressed in spatially extended SQL queries, and submitted through a command line/web interface for execution. Parallel to our system demonstration, we explain the system architecture and details on how queries are translated to MapReduce operators, optimized, and executed on Hadoop. In addition, we will showcase how the system can be used to support two representative real world use cases: large scale pathology analytical imaging, and geo-spatial data warehousing. PMID:27617325
Use Massive Parallel Sequencing and Exome Capture Technology to Sequence the Exome of Fanconi Anemia Children and Their Patents

ClinicalTrials.gov

2013-11-21

Fanconi Anemia; Autosomal or Sex Linked Recessive Genetic Disease; Bone Marrow Hematopoiesis Failure, Multiple Congenital Abnormalities, and Susceptibility to Neoplastic Diseases.; Hematopoiesis Maintainance.
High Throughput Optical Lithography by Scanning a Massive Array of Bowtie Aperture Antennas at Near-Field

DTIC Science & Technology

2015-11-03

scale optical projection system powered by spatial light modulators, such as digital micro-mirror device ( DMD ). Figure 4 shows the parallel lithography ...1Scientific RepoRts | 5:16192 | DOi: 10.1038/srep16192 www.nature.com/scientificreports High throughput optical lithography by scanning a massive...array of bowtie aperture antennas at near-field X. Wen1,2,3,*, A. Datta1,*, L. M. Traverso1, L. Pan1, X. Xu1 & E. E. Moon4 Optical lithography , the
Reactive Transport Modeling of Induced Calcite Precipitation Reaction Fronts in Porous Media Using A Parallel, Fully Coupled, Fully Implicit Approach

NASA Astrophysics Data System (ADS)

Guo, L.; Huang, H.; Gaston, D.; Redden, G. D.; Fox, D. T.; Fujita, Y.

2010-12-01

Inducing mineral precipitation in the subsurface is one potential strategy for immobilizing trace metal and radionuclide contaminants. Generating mineral precipitates in situ can be achieved by manipulating chemical conditions, typically through injection or in situ generation of reactants. How these reactants transport, mix and react within the medium controls the spatial distribution and composition of the resulting mineral phases. Multiple processes, including fluid flow, dispersive/diffusive transport of reactants, biogeochemical reactions and changes in porosity-permeability, are tightly coupled over a number of scales. Numerical modeling can be used to investigate the nonlinear coupling effects of these processes which are quite challenging to explore experimentally. Many subsurface reactive transport simulators employ a de-coupled or operator-splitting approach where transport equations and batch chemistry reactions are solved sequentially. However, such an approach has limited applicability for biogeochemical systems with fast kinetics and strong coupling between chemical reactions and medium properties. A massively parallel, fully coupled, fully implicit Reactive Transport simulator (referred to as “RAT”) based on a parallel multi-physics object-oriented simulation framework (MOOSE) has been developed at the Idaho National Laboratory. Within this simulator, systems of transport and reaction equations can be solved simultaneously in a fully coupled, fully implicit manner using the Jacobian Free Newton-Krylov (JFNK) method with additional advanced computing capabilities such as (1) physics-based preconditioning for solution convergence acceleration, (2) massively parallel computing and scalability, and (3) adaptive mesh refinements for 2D and 3D structured and unstructured mesh. The simulator was first tested against analytical solutions, then applied to simulating induced calcium carbonate mineral precipitation in 1D columns and 2D flow cells as analogs to homogeneous and heterogeneous porous media, respectively. In 1D columns, calcium carbonate mineral precipitation was driven by urea hydrolysis catalyzed by urease enzyme, and in 2D flow cells, calcium carbonate mineral forming reactants were injected sequentially, forming migrating reaction fronts that are typically highly nonuniform. The RAT simulation results for the spatial and temporal distributions of precipitates, reaction rates and major species in the system, and also for changes in porosity and permeability, were compared to both laboratory experimental data and computational results obtained using other reactive transport simulators. The comparisons demonstrate the ability of RAT to simulate complex nonlinear systems and the advantages of fully coupled approaches, over de-coupled methods, for accurate simulation of complex, dynamic processes such as engineered mineral precipitation in subsurface environments.
Super and parallel computers and their impact on civil engineering

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kamat, M.P.

1986-01-01

This book presents the papers given at a conference on the use of supercomputers in civil engineering. Topics considered at the conference included solving nonlinear equations on a hypercube, a custom architectured parallel processing system, distributed data processing, algorithms, computer architecture, parallel processing, vector processing, computerized simulation, and cost benefit analysis.
Role of APOE Isoforms in the Pathogenesis of TBI Induced Alzheimer’s Disease

DTIC Science & Technology

2015-10-01

global deletion, APOE targeted replacement, complex breeding, CCI model optimization, mRNA library generation, high throughput massive parallel ...ATP binding cassette transporter A1 (ABCA1) is a lipid transporter that controls the generation of HDL in plasma and ApoE-containing lipoproteins in... parallel sequencing, mRNA-seq, behavioral testing, mem- ory impairement, recovery. 3 Overall Project Summary During the reported period, we have been able
Applications and accuracy of the parallel diagonal dominant algorithm

NASA Technical Reports Server (NTRS)

Sun, Xian-He

1993-01-01

The Parallel Diagonal Dominant (PDD) algorithm is a highly efficient, ideally scalable tridiagonal solver. In this paper, a detailed study of the PDD algorithm is given. First the PDD algorithm is introduced. Then the algorithm is extended to solve periodic tridiagonal systems. A variant, the reduced PDD algorithm, is also proposed. Accuracy analysis is provided for a class of tridiagonal systems, the symmetric, and anti-symmetric Toeplitz tridiagonal systems. Implementation results show that the analysis gives a good bound on the relative error, and the algorithm is a good candidate for the emerging massively parallel machines.
Scalability and Portability of Two Parallel Implementations of ADI

NASA Technical Reports Server (NTRS)

Phung, Thanh; VanderWijngaart, Rob F.

1994-01-01

Two domain decompositions for the implementation of the NAS Scalar Penta-diagonal Parallel Benchmark on MIMD systems are investigated, namely transposition and multi-partitioning. Hardware platforms considered are the Intel iPSC/860 and Paragon XP/S-15, and clusters of SGI workstations on ethernet, communicating through PVM. It is found that the multi-partitioning strategy offers the kind of coarse granularity that allows scaling up to hundreds of processors on a massively parallel machine. Moreover, efficiency is retained when the code is ported verbatim (save message passing syntax) to a PVM environment on a modest size cluster of workstations.
Nuclide Depletion Capabilities in the Shift Monte Carlo Code

DOE PAGES

Davidson, Gregory G.; Pandya, Tara M.; Johnson, Seth R.; ...

2017-12-21

A new depletion capability has been developed in the Exnihilo radiation transport code suite. This capability enables massively parallel domain-decomposed coupling between the Shift continuous-energy Monte Carlo solver and the nuclide depletion solvers in ORIGEN to perform high-performance Monte Carlo depletion calculations. This paper describes this new depletion capability and discusses its various features, including a multi-level parallel decomposition, high-order transport-depletion coupling, and energy-integrated power renormalization. Several test problems are presented to validate the new capability against other Monte Carlo depletion codes, and the parallel performance of the new capability is analyzed.
Google Earth Engine: a new cloud-computing platform for global-scale earth observation data and analysis

NASA Astrophysics Data System (ADS)

Moore, R. T.; Hansen, M. C.

2011-12-01

Google Earth Engine is a new technology platform that enables monitoring and measurement of changes in the earth's environment, at planetary scale, on a large catalog of earth observation data. The platform offers intrinsically-parallel computational access to thousands of computers in Google's data centers. Initial efforts have focused primarily on global forest monitoring and measurement, in support of REDD+ activities in the developing world. The intent is to put this platform into the hands of scientists and developing world nations, in order to advance the broader operational deployment of existing scientific methods, and strengthen the ability for public institutions and civil society to better understand, manage and report on the state of their natural resources. Earth Engine currently hosts online nearly the complete historical Landsat archive of L5 and L7 data collected over more than twenty-five years. Newly-collected Landsat imagery is downloaded from USGS EROS Center into Earth Engine on a daily basis. Earth Engine also includes a set of historical and current MODIS data products. The platform supports generation, on-demand, of spatial and temporal mosaics, "best-pixel" composites (for example to remove clouds and gaps in satellite imagery), as well as a variety of spectral indices. Supervised learning methods are available over the Landsat data catalog. The platform also includes a new application programming framework, or "API", that allows scientists access to these computational and data resources, to scale their current algorithms or develop new ones. Under the covers of the Google Earth Engine API is an intrinsically-parallel image-processing system. Several forest monitoring applications powered by this API are currently in development and expected to be operational in 2011. Combining science with massive data and technology resources in a cloud-computing framework can offer advantages of computational speed, ease-of-use and collaboration, as well as transparency in data and methods. Methods developed for global processing of MODIS data to map land cover are being adopted for use with Landsat data. Specifically, the MODIS Vegetation Continuous Field product methodology has been applied for mapping forest extent and change at national scales using Landsat time-series data sets. Scaling this method to continental and global scales is enabled by Google Earth Engine computing capabilities. By combining the supervised learning VCF approach with the Landsat archive and cloud computing, unprecedented monitoring of land cover dynamics is enabled.
The Mercury System: Embedding Computation into Disk Drives

DTIC Science & Technology

2004-08-20

enabling technologies to build extremely fast data search engines . We do this by moving the search closer to the data, and performing it in hardware...engine searches in parallel across a disk or disk surface 2. System Parallelism: Searching is off-loaded to search engines and main processor can
Improving Performance and Predictability of Storage Arrays

ERIC Educational Resources Information Center

Altiparmak, Nihat

2013-01-01

Massive amount of data is generated everyday through sensors, Internet transactions, social networks, video, and all other digital sources available. Many organizations store this data to enable breakthrough discoveries and innovation in science, engineering, medicine, and commerce. Such massive scale of data poses new research problems called big…

Simulation of an array-based neural net model

NASA Technical Reports Server (NTRS)

Barnden, John A.

1987-01-01

Research in cognitive science suggests that much of cognition involves the rapid manipulation of complex data structures. However, it is very unclear how this could be realized in neural networks or connectionist systems. A core question is: how could the interconnectivity of items in an abstract-level data structure be neurally encoded? The answer appeals mainly to positional relationships between activity patterns within neural arrays, rather than directly to neural connections in the traditional way. The new method was initially devised to account for abstract symbolic data structures, but it also supports cognitively useful spatial analogue, image-like representations. As the neural model is based on massive, uniform, parallel computations over 2D arrays, the massively parallel processor is a convenient tool for simulation work, although there are complications in using the machine to the fullest advantage. An MPP Pascal simulation program for a small pilot version of the model is running.
Computations on the massively parallel processor at the Goddard Space Flight Center

NASA Technical Reports Server (NTRS)

Strong, James P.

1991-01-01

Described are four significant algorithms implemented on the massively parallel processor (MPP) at the Goddard Space Flight Center. Two are in the area of image analysis. Of the other two, one is a mathematical simulation experiment and the other deals with the efficient transfer of data between distantly separated processors in the MPP array. The first algorithm presented is the automatic determination of elevations from stereo pairs. The second algorithm solves mathematical logistic equations capable of producing both ordered and chaotic (or random) solutions. This work can potentially lead to the simulation of artificial life processes. The third algorithm is the automatic segmentation of images into reasonable regions based on some similarity criterion, while the fourth is an implementation of a bitonic sort of data which significantly overcomes the nearest neighbor interconnection constraints on the MPP for transferring data between distant processors.
The CAnadian NIRISS Unbiased Cluster Survey (CANUCS)

NASA Astrophysics Data System (ADS)

Ravindranath, Swara; NIRISS GTO Team

2017-06-01

CANUCS GTO program is a JWST spectroscopy and imaging survey of five massive galaxy clusters and ten parallel fields using the NIRISS low-resolution grisms, NIRCam imaging and NIRSpec multi-object spectroscopy. The primary goal is to understand the evolution of low mass galaxies across cosmic time. The resolved emission line maps and line ratios for many galaxies, with some at resolution of 100pc via the magnification by gravitational lensing will enable determining the spatial distribution of star formation, dust and metals. Other science goals include the detection and characterization of galaxies within the reionization epoch, using multiply-imaged lensed galaxies to constrain cluster mass distributions and dark matter substructure, and understanding star-formation suppression in the most massive galaxy clusters. In this talk I will describe the science goals of the CANUCS program. The proposed prime and parallel observations will be presented with details of the implementation of the observation strategy using JWST proposal planning tools.
The MasPar MP-1 As a Computer Arithmetic Laboratory

PubMed Central

Anuta, Michael A.; Lozier, Daniel W.; Turner, Peter R.

1996-01-01

This paper is a blueprint for the use of a massively parallel SIMD computer architecture for the simulation of various forms of computer arithmetic. The particular system used is a DEC/MasPar MP-1 with 4096 processors in a square array. This architecture has many advantages for such simulations due largely to the simplicity of the individual processors. Arithmetic operations can be spread across the processor array to simulate a hardware chip. Alternatively they may be performed on individual processors to allow simulation of a massively parallel implementation of the arithmetic. Compromises between these extremes permit speed-area tradeoffs to be examined. The paper includes a description of the architecture and its features. It then summarizes some of the arithmetic systems which have been, or are to be, implemented. The implementation of the level-index and symmetric level-index, LI and SLI, systems is described in some detail. An extensive bibliography is included. PMID:27805123
Integration of targeted sequencing and NIPT into clinical practice in a Chinese family with maple syrup urine disease.

PubMed

You, Yanqin; Sun, Yan; Li, Xuchao; Li, Yali; Wei, Xiaoming; Chen, Fang; Ge, Huijuan; Lan, Zhangzhang; Zhu, Qian; Tang, Ying; Wang, Shujuan; Gao, Ya; Jiang, Fuman; Song, Jiaping; Shi, Quan; Zhu, Xuan; Mu, Feng; Dong, Wei; Gao, Vince; Jiang, Hui; Yi, Xin; Wang, Wei; Gao, Zhiying

2014-08-01

This article demonstrates a prominent noninvasive prenatal approach to assist the clinical diagnosis of a single-gene disorder disease, maple syrup urine disease, using targeted sequencing knowledge from the affected family. The method reported here combines novel mutant discovery in known genes by targeted massively parallel sequencing with noninvasive prenatal testing. By applying this new strategy, we successfully revealed novel mutations in the gene BCKDHA (Ex2_4dup and c.392A>G) in this Chinese family and developed a prenatal haplotype-assisted approach to noninvasively detect the genotype of the fetus (transmitted from both parents). This is the first report of integration of targeted sequencing and noninvasive prenatal testing into clinical practice. Our study has demonstrated that this massively parallel sequencing-based strategy can potentially be used for single-gene disorder diagnosis in the future.
Simultaneous digital quantification and fluorescence-based size characterization of massively parallel sequencing libraries.

PubMed

Laurie, Matthew T; Bertout, Jessica A; Taylor, Sean D; Burton, Joshua N; Shendure, Jay A; Bielas, Jason H

2013-08-01

Due to the high cost of failed runs and suboptimal data yields, quantification and determination of fragment size range are crucial steps in the library preparation process for massively parallel sequencing (or next-generation sequencing). Current library quality control methods commonly involve quantification using real-time quantitative PCR and size determination using gel or capillary electrophoresis. These methods are laborious and subject to a number of significant limitations that can make library calibration unreliable. Herein, we propose and test an alternative method for quality control of sequencing libraries using droplet digital PCR (ddPCR). By exploiting a correlation we have discovered between droplet fluorescence and amplicon size, we achieve the joint quantification and size determination of target DNA with a single ddPCR assay. We demonstrate the accuracy and precision of applying this method to the preparation of sequencing libraries.
Dissecting Cell-Type Composition and Activity-Dependent Transcriptional State in Mammalian Brains by Massively Parallel Single-Nucleus RNA-Seq.

PubMed

Hu, Peng; Fabyanic, Emily; Kwon, Deborah Y; Tang, Sheng; Zhou, Zhaolan; Wu, Hao

2017-12-07

Massively parallel single-cell RNA sequencing can precisely resolve cellular diversity in a high-throughput manner at low cost, but unbiased isolation of intact single cells from complex tissues such as adult mammalian brains is challenging. Here, we integrate sucrose-gradient-assisted purification of nuclei with droplet microfluidics to develop a highly scalable single-nucleus RNA-seq approach (sNucDrop-seq), which is free of enzymatic dissociation and nucleus sorting. By profiling ∼18,000 nuclei isolated from cortical tissues of adult mice, we demonstrate that sNucDrop-seq not only accurately reveals neuronal and non-neuronal subtype composition with high sensitivity but also enables in-depth analysis of transient transcriptional states driven by neuronal activity, at single-cell resolution, in vivo. Copyright © 2017 Elsevier Inc. All rights reserved.
ls1 mardyn: The Massively Parallel Molecular Dynamics Code for Large Systems.

PubMed

Niethammer, Christoph; Becker, Stefan; Bernreuther, Martin; Buchholz, Martin; Eckhardt, Wolfgang; Heinecke, Alexander; Werth, Stephan; Bungartz, Hans-Joachim; Glass, Colin W; Hasse, Hans; Vrabec, Jadran; Horsch, Martin

2014-10-14

The molecular dynamics simulation code ls1 mardyn is presented. It is a highly scalable code, optimized for massively parallel execution on supercomputing architectures and currently holds the world record for the largest molecular simulation with over four trillion particles. It enables the application of pair potentials to length and time scales that were previously out of scope for molecular dynamics simulation. With an efficient dynamic load balancing scheme, it delivers high scalability even for challenging heterogeneous configurations. Presently, multicenter rigid potential models based on Lennard-Jones sites, point charges, and higher-order polarities are supported. Due to its modular design, ls1 mardyn can be extended to new physical models, methods, and algorithms, allowing future users to tailor it to suit their respective needs. Possible applications include scenarios with complex geometries, such as fluids at interfaces, as well as nonequilibrium molecular dynamics simulation of heat and mass transfer.
Massively parallel polymerase cloning and genome sequencing of single cells using nanoliter microwells

PubMed Central

Gole, Jeff; Gore, Athurva; Richards, Andrew; Chiu, Yu-Jui; Fung, Ho-Lim; Bushman, Diane; Chiang, Hsin-I; Chun, Jerold; Lo, Yu-Hwa; Zhang, Kun

2013-01-01

Genome sequencing of single cells has a variety of applications, including characterizing difficult-to-culture microorganisms and identifying somatic mutations in single cells from mammalian tissues. A major hurdle in this process is the bias in amplifying the genetic material from a single cell, a procedure known as polymerase cloning. Here we describe the microwell displacement amplification system (MIDAS), a massively parallel polymerase cloning method in which single cells are randomly distributed into hundreds to thousands of nanoliter wells and simultaneously amplified for shotgun sequencing. MIDAS reduces amplification bias because polymerase cloning occurs in physically separated nanoliter-scale reactors, facilitating the de novo assembly of near-complete microbial genomes from single E. coli cells. In addition, MIDAS allowed us to detect single-copy number changes in primary human adult neurons at 1–2 Mb resolution. MIDAS will further the characterization of genomic diversity in many heterogeneous cell populations. PMID:24213699
Development of a Grid-Independent Geos-Chem Chemical Transport Model (v9-02) as an Atmospheric Chemistry Module for Earth System Models

NASA Technical Reports Server (NTRS)

Long, M. S.; Yantosca, R.; Nielsen, J. E; Keller, C. A.; Da Silva, A.; Sulprizio, M. P.; Pawson, S.; Jacob, D. J.

2015-01-01

The GEOS-Chem global chemical transport model (CTM), used by a large atmospheric chemistry research community, has been re-engineered to also serve as an atmospheric chemistry module for Earth system models (ESMs). This was done using an Earth System Modeling Framework (ESMF) interface that operates independently of the GEOSChem scientific code, permitting the exact same GEOSChem code to be used as an ESM module or as a standalone CTM. In this manner, the continual stream of updates contributed by the CTM user community is automatically passed on to the ESM module, which remains state of science and referenced to the latest version of the standard GEOS-Chem CTM. A major step in this re-engineering was to make GEOS-Chem grid independent, i.e., capable of using any geophysical grid specified at run time. GEOS-Chem data sockets were also created for communication between modules and with external ESM code. The grid-independent, ESMF-compatible GEOS-Chem is now the standard version of the GEOS-Chem CTM. It has been implemented as an atmospheric chemistry module into the NASA GEOS- 5 ESM. The coupled GEOS-5-GEOS-Chem system was tested for scalability and performance with a tropospheric oxidant-aerosol simulation (120 coupled species, 66 transported tracers) using 48-240 cores and message-passing interface (MPI) distributed-memory parallelization. Numerical experiments demonstrate that the GEOS-Chem chemistry module scales efficiently for the number of cores tested, with no degradation as the number of cores increases. Although inclusion of atmospheric chemistry in ESMs is computationally expensive, the excellent scalability of the chemistry module means that the relative cost goes down with increasing number of cores in a massively parallel environment.
High Resolution Size Analysis of Fetal DNA in the Urine of Pregnant Women by Paired-End Massively Parallel Sequencing

PubMed Central

Tsui, Nancy B. Y.; Jiang, Peiyong; Chow, Katherine C. K.; Su, Xiaoxi; Leung, Tak Y.; Sun, Hao; Chan, K. C. Allen; Chiu, Rossa W. K.; Lo, Y. M. Dennis

2012-01-01

Background Fetal DNA in maternal urine, if present, would be a valuable source of fetal genetic material for noninvasive prenatal diagnosis. However, the existence of fetal DNA in maternal urine has remained controversial. The issue is due to the lack of appropriate technology to robustly detect the potentially highly degraded fetal DNA in maternal urine. Methodology We have used massively parallel paired-end sequencing to investigate cell-free DNA molecules in maternal urine. Catheterized urine samples were collected from seven pregnant women during the third trimester of pregnancies. We detected fetal DNA by identifying sequenced reads that contained fetal-specific alleles of the single nucleotide polymorphisms. The sizes of individual urinary DNA fragments were deduced from the alignment positions of the paired reads. We measured the fractional fetal DNA concentration as well as the size distributions of fetal and maternal DNA in maternal urine. Principal Findings Cell-free fetal DNA was detected in five of the seven maternal urine samples, with the fractional fetal DNA concentrations ranged from 1.92% to 4.73%. Fetal DNA became undetectable in maternal urine after delivery. The total urinary cell-free DNA molecules were less intact when compared with plasma DNA. Urinary fetal DNA fragments were very short, and the most dominant fetal sequences were between 29 bp and 45 bp in length. Conclusions With the use of massively parallel sequencing, we have confirmed the existence of transrenal fetal DNA in maternal urine, and have shown that urinary fetal DNA was heavily degraded. PMID:23118982
Energy-efficient STDP-based learning circuits with memristor synapses

NASA Astrophysics Data System (ADS)

Wu, Xinyu; Saxena, Vishal; Campbell, Kristy A.

2014-05-01

It is now accepted that the traditional von Neumann architecture, with processor and memory separation, is ill suited to process parallel data streams which a mammalian brain can efficiently handle. Moreover, researchers now envision computing architectures which enable cognitive processing of massive amounts of data by identifying spatio-temporal relationships in real-time and solving complex pattern recognition problems. Memristor cross-point arrays, integrated with standard CMOS technology, are expected to result in massively parallel and low-power Neuromorphic computing architectures. Recently, significant progress has been made in spiking neural networks (SNN) which emulate data processing in the cortical brain. These architectures comprise of a dense network of neurons and the synapses formed between the axons and dendrites. Further, unsupervised or supervised competitive learning schemes are being investigated for global training of the network. In contrast to a software implementation, hardware realization of these networks requires massive circuit overhead for addressing and individually updating network weights. Instead, we employ bio-inspired learning rules such as the spike-timing-dependent plasticity (STDP) to efficiently update the network weights locally. To realize SNNs on a chip, we propose to use densely integrating mixed-signal integrate-andfire neurons (IFNs) and cross-point arrays of memristors in back-end-of-the-line (BEOL) of CMOS chips. Novel IFN circuits have been designed to drive memristive synapses in parallel while maintaining overall power efficiency (<1 pJ/spike/synapse), even at spike rate greater than 10 MHz. We present circuit design details and simulation results of the IFN with memristor synapses, its response to incoming spike trains and STDP learning characterization.
Massively parallel cis-regulatory analysis in the mammalian central nervous system

PubMed Central

Shen, Susan Q.; Myers, Connie A.; Hughes, Andrew E.O.; Byrne, Leah C.; Flannery, John G.; Corbo, Joseph C.

2016-01-01

Cis-regulatory elements (CREs, e.g., promoters and enhancers) regulate gene expression, and variants within CREs can modulate disease risk. Next-generation sequencing has enabled the rapid generation of genomic data that predict the locations of CREs, but a bottleneck lies in functionally interpreting these data. To address this issue, massively parallel reporter assays (MPRAs) have emerged, in which barcoded reporter libraries are introduced into cells, and the resulting barcoded transcripts are quantified by next-generation sequencing. Thus far, MPRAs have been largely restricted to assaying short CREs in a limited repertoire of cultured cell types. Here, we present two advances that extend the biological relevance and applicability of MPRAs. First, we adapt exome capture technology to instead capture candidate CREs, thereby tiling across the targeted regions and markedly increasing the length of CREs that can be readily assayed. Second, we package the library into adeno-associated virus (AAV), thereby allowing delivery to target organs in vivo. As a proof of concept, we introduce a capture library of about 46,000 constructs, corresponding to roughly 3500 DNase I hypersensitive (DHS) sites, into the mouse retina by ex vivo plasmid electroporation and into the mouse cerebral cortex by in vivo AAV injection. We demonstrate tissue-specific cis-regulatory activity of DHSs and provide examples of high-resolution truncation mutation analysis for multiplex parsing of CREs. Our approach should enable massively parallel functional analysis of a wide range of CREs in any organ or species that can be infected by AAV, such as nonhuman primates and human stem cell–derived organoids. PMID:26576614
Massively Parallel Simulations of Diffusion in Dense Polymeric Structures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Faulon, Jean-Loup, Wilcox, R.T.

1997-11-01

An original computational technique to generate close-to-equilibrium dense polymeric structures is proposed. Diffusion of small gases are studied on the equilibrated structures using massively parallel molecular dynamics simulations running on the Intel Teraflops (9216 Pentium Pro processors) and Intel Paragon(1840 processors). Compared to the current state-of-the-art equilibration methods this new technique appears to be faster by some orders of magnitude.The main advantage of the technique is that one can circumvent the bottlenecks in configuration space that inhibit relaxation in molecular dynamics simulations. The technique is based on the fact that tetravalent atoms (such as carbon and silicon) fit in themore » center of a regular tetrahedron and that regular tetrahedrons can be used to mesh the three-dimensional space. Thus, the problem of polymer equilibration described by continuous equations in molecular dynamics is reduced to a discrete problem where solutions are approximated by simple algorithms. Practical modeling applications include the constructing of butyl rubber and ethylene-propylene-dimer-monomer (EPDM) models for oxygen and water diffusion calculations. Butyl and EPDM are used in O-ring systems and serve as sealing joints in many manufactured objects. Diffusion coefficients of small gases have been measured experimentally on both polymeric systems, and in general the diffusion coefficients in EPDM are an order of magnitude larger than in butyl. In order to better understand the diffusion phenomena, 10, 000 atoms models were generated and equilibrated for butyl and EPDM. The models were submitted to a massively parallel molecular dynamics simulation to monitor the trajectories of the diffusing species.« less
Implementation and Characterization of Three-Dimensional Particle-in-Cell Codes on Multiple-Instruction-Multiple-Data Massively Parallel Supercomputers

NASA Technical Reports Server (NTRS)

Lyster, P. M.; Liewer, P. C.; Decyk, V. K.; Ferraro, R. D.

1995-01-01

A three-dimensional electrostatic particle-in-cell (PIC) plasma simulation code has been developed on coarse-grain distributed-memory massively parallel computers with message passing communications. Our implementation is the generalization to three-dimensions of the general concurrent particle-in-cell (GCPIC) algorithm. In the GCPIC algorithm, the particle computation is divided among the processors using a domain decomposition of the simulation domain. In a three-dimensional simulation, the domain can be partitioned into one-, two-, or three-dimensional subdomains ("slabs," "rods," or "cubes") and we investigate the efficiency of the parallel implementation of the push for all three choices. The present implementation runs on the Intel Touchstone Delta machine at Caltech; a multiple-instruction-multiple-data (MIMD) parallel computer with 512 nodes. We find that the parallel efficiency of the push is very high, with the ratio of communication to computation time in the range 0.3%-10.0%. The highest efficiency (> 99%) occurs for a large, scaled problem with 64(sup 3) particles per processing node (approximately 134 million particles of 512 nodes) which has a push time of about 250 ns per particle per time step. We have also developed expressions for the timing of the code which are a function of both code parameters (number of grid points, particles, etc.) and machine-dependent parameters (effective FLOP rate, and the effective interprocessor bandwidths for the communication of particles and grid points). These expressions can be used to estimate the performance of scaled problems--including those with inhomogeneous plasmas--to other parallel machines once the machine-dependent parameters are known.
Mapper: high throughput maskless lithography

NASA Astrophysics Data System (ADS)

Kuiper, V.; Kampherbeek, B. J.; Wieland, M. J.; de Boer, G.; ten Berge, G. F.; Boers, J.; Jager, R.; van de Peut, T.; Peijster, J. J. M.; Slot, E.; Steenbrink, S. W. H. K.; Teepen, T. F.; van Veen, A. H. V.

2009-01-01

Maskless electron beam lithography, or electron beam direct write, has been around for a long time in the semiconductor industry and was pioneered from the mid-1960s onwards. This technique has been used for mask writing applications as well as device engineering and in some cases chip manufacturing. However because of its relatively low throughput compared to optical lithography, electron beam lithography has never been the mainstream lithography technology. To extend optical lithography double patterning, as a bridging technology, and EUV lithography are currently explored. Irrespective of the technical viability of both approaches, one thing seems clear. They will be expensive [1]. MAPPER Lithography is developing a maskless lithography technology based on massively-parallel electron-beam writing with high speed optical data transport for switching the electron beams. In this way optical columns can be made with a throughput of 10-20 wafers per hour. By clustering several of these columns together high throughputs can be realized in a small footprint. This enables a highly cost-competitive alternative to double patterning and EUV alternatives. In 2007 MAPPER obtained its Proof of Lithography milestone by exposing in its Demonstrator 45 nm half pitch structures with 110 electron beams in parallel, where all the beams where individually switched on and off [2]. In 2008 MAPPER has taken a next step in its development by building several tools. A new platform has been designed and built which contains a 300 mm wafer stage, a wafer handler and an electron beam column with 110 parallel electron beams. This manuscript describes the first patterning results with this 300 mm platform.
Solving very large, sparse linear systems on mesh-connected parallel computers

NASA Technical Reports Server (NTRS)

Opsahl, Torstein; Reif, John

1987-01-01

The implementation of Pan and Reif's Parallel Nested Dissection (PND) algorithm on mesh connected parallel computers is described. This is the first known algorithm that allows very large, sparse linear systems of equations to be solved efficiently in polylog time using a small number of processors. How the processor bound of PND can be matched to the number of processors available on a given parallel computer by slowing down the algorithm by constant factors is described. Also, for the important class of problems where G(A) is a grid graph, a unique memory mapping that reduces the inter-processor communication requirements of PND to those that can be executed on mesh connected parallel machines is detailed. A description of an implementation on the Goodyear Massively Parallel Processor (MPP), located at Goddard is given. Also, a detailed discussion of data mappings and performance issues is given.
System and method for representing and manipulating three-dimensional objects on massively parallel architectures

DOEpatents

Karasick, Michael S.; Strip, David R.

1996-01-01

A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modelling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modelling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modelling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication.
An Overview of Mesoscale Modeling Software for Energetic Materials Research

DTIC Science & Technology

2010-03-01

12 2.9 Large-scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ...13 Table 10. LAMMPS summary...extensive reviews, lectures and workshops are available on multiscale modeling of materials applications (76-78). • Multi-phase mixtures of
GPU computing in medical physics: a review.

PubMed

Pratx, Guillem; Xing, Lei

2011-05-01

The graphics processing unit (GPU) has emerged as a competitive platform for computing massively parallel problems. Many computing applications in medical physics can be formulated as data-parallel tasks that exploit the capabilities of the GPU for reducing processing times. The authors review the basic principles of GPU computing as well as the main performance optimization techniques, and survey existing applications in three areas of medical physics, namely image reconstruction, dose calculation and treatment plan optimization, and image processing.

Massively parallel and linear-scaling algorithm for second-order Møller-Plesset perturbation theory applied to the study of supramolecular wires

NASA Astrophysics Data System (ADS)

Kjærgaard, Thomas; Baudin, Pablo; Bykov, Dmytro; Eriksen, Janus Juul; Ettenhuber, Patrick; Kristensen, Kasper; Larkin, Jeff; Liakh, Dmitry; Pawłowski, Filip; Vose, Aaron; Wang, Yang Min; Jørgensen, Poul

2017-03-01

We present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide-Expand-Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide-Expand-Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalability of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the "resolution of the identity second-order Møller-Plesset perturbation theory" (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.
Mobility Data Analytics Center.

DOT National Transportation Integrated Search

2016-01-01

Mobility Data Analytics Center aims at building a centralized data engine to efficiently manipulate : large-scale data for smart decision making. Integrating and learning the massive data are the key to : the data engine. The ultimate goal of underst...
Massive parallel 3D PIC simulation of negative ion extraction

NASA Astrophysics Data System (ADS)

Revel, Adrien; Mochalskyy, Serhiy; Montellano, Ivar Mauricio; Wünderlich, Dirk; Fantz, Ursel; Minea, Tiberiu

2017-09-01

The 3D PIC-MCC code ONIX is dedicated to modeling Negative hydrogen/deuterium Ion (NI) extraction and co-extraction of electrons from radio-frequency driven, low pressure plasma sources. It provides valuable insight on the complex phenomena involved in the extraction process. In previous calculations, a mesh size larger than the Debye length was used, implying numerical electron heating. Important steps have been achieved in terms of computation performance and parallelization efficiency allowing successful massive parallel calculations (4096 cores), imperative to resolve the Debye length. In addition, the numerical algorithms have been improved in terms of grid treatment, i.e., the electric field near the complex geometry boundaries (plasma grid) is calculated more accurately. The revised model preserves the full 3D treatment, but can take advantage of a highly refined mesh. ONIX was used to investigate the role of the mesh size, the re-injection scheme for lost particles (extracted or wall absorbed), and the electron thermalization process on the calculated extracted current and plasma characteristics. It is demonstrated that all numerical schemes give the same NI current distribution for extracted ions. Concerning the electrons, the pair-injection technique is found well-adapted to simulate the sheath in front of the plasma grid.
GPAW - massively parallel electronic structure calculations with Python-based software.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Enkovaara, J.; Romero, N.; Shende, S.

2011-01-01

Electronic structure calculations are a widely used tool in materials science and large consumer of supercomputing resources. Traditionally, the software packages for these kind of simulations have been implemented in compiled languages, where Fortran in its different versions has been the most popular choice. While dynamic, interpreted languages, such as Python, can increase the effciency of programmer, they cannot compete directly with the raw performance of compiled languages. However, by using an interpreted language together with a compiled language, it is possible to have most of the productivity enhancing features together with a good numerical performance. We have used thismore » approach in implementing an electronic structure simulation software GPAW using the combination of Python and C programming languages. While the chosen approach works well in standard workstations and Unix environments, massively parallel supercomputing systems can present some challenges in porting, debugging and profiling the software. In this paper we describe some details of the implementation and discuss the advantages and challenges of the combined Python/C approach. We show that despite the challenges it is possible to obtain good numerical performance and good parallel scalability with Python based software.« less
Massive Exploration of Perturbed Conditions of the Blood Coagulation Cascade through GPU Parallelization

PubMed Central

Cazzaniga, Paolo; Nobile, Marco S.; Besozzi, Daniela; Bellini, Matteo; Mauri, Giancarlo

2014-01-01

The introduction of general-purpose Graphics Processing Units (GPUs) is boosting scientific applications in Bioinformatics, Systems Biology, and Computational Biology. In these fields, the use of high-performance computing solutions is motivated by the need of performing large numbers of in silico analysis to study the behavior of biological systems in different conditions, which necessitate a computing power that usually overtakes the capability of standard desktop computers. In this work we present coagSODA, a CUDA-powered computational tool that was purposely developed for the analysis of a large mechanistic model of the blood coagulation cascade (BCC), defined according to both mass-action kinetics and Hill functions. coagSODA allows the execution of parallel simulations of the dynamics of the BCC by automatically deriving the system of ordinary differential equations and then exploiting the numerical integration algorithm LSODA. We present the biological results achieved with a massive exploration of perturbed conditions of the BCC, carried out with one-dimensional and bi-dimensional parameter sweep analysis, and show that GPU-accelerated parallel simulations of this model can increase the computational performances up to a 181× speedup compared to the corresponding sequential simulations. PMID:25025072
Massively Multithreaded Maxflow for Image Segmentation on the Cray XMT-2

PubMed Central

Bokhari, Shahid H.; Çatalyürek, Ümit V.; Gurcan, Metin N.

2014-01-01

SUMMARY Image segmentation is a very important step in the computerized analysis of digital images. The maxflow mincut approach has been successfully used to obtain minimum energy segmentations of images in many fields. Classical algorithms for maxflow in networks do not directly lend themselves to efficient parallel implementations on contemporary parallel processors. We present the results of an implementation of Goldberg-Tarjan preflow-push algorithm on the Cray XMT-2 massively multithreaded supercomputer. This machine has hardware support for 128 threads in each physical processor, a uniformly accessible shared memory of up to 4 TB and hardware synchronization for each 64 bit word. It is thus well-suited to the parallelization of graph theoretic algorithms, such as preflow-push. We describe the implementation of the preflow-push code on the XMT-2 and present the results of timing experiments on a series of synthetically generated as well as real images. Our results indicate very good performance on large images and pave the way for practical applications of this machine architecture for image analysis in a production setting. The largest images we have run are 320002 pixels in size, which are well beyond the largest previously reported in the literature. PMID:25598745
A massively asynchronous, parallel brain.

PubMed

Zeki, Semir

2015-05-19

Whether the visual brain uses a parallel or a serial, hierarchical, strategy to process visual signals, the end result appears to be that different attributes of the visual scene are perceived asynchronously--with colour leading form (orientation) by 40 ms and direction of motion by about 80 ms. Whatever the neural root of this asynchrony, it creates a problem that has not been properly addressed, namely how visual attributes that are perceived asynchronously over brief time windows after stimulus onset are bound together in the longer term to give us a unified experience of the visual world, in which all attributes are apparently seen in perfect registration. In this review, I suggest that there is no central neural clock in the (visual) brain that synchronizes the activity of different processing systems. More likely, activity in each of the parallel processing-perceptual systems of the visual brain is reset independently, making of the brain a massively asynchronous organ, just like the new generation of more efficient computers promise to be. Given the asynchronous operations of the brain, it is likely that the results of activities in the different processing-perceptual systems are not bound by physiological interactions between cells in the specialized visual areas, but post-perceptually, outside the visual brain.
Suppressing correlations in massively parallel simulations of lattice models

NASA Astrophysics Data System (ADS)

Kelling, Jeffrey; Ódor, Géza; Gemming, Sibylle

2017-11-01

For lattice Monte Carlo simulations parallelization is crucial to make studies of large systems and long simulation time feasible, while sequential simulations remain the gold-standard for correlation-free dynamics. Here, various domain decomposition schemes are compared, concluding with one which delivers virtually correlation-free simulations on GPUs. Extensive simulations of the octahedron model for 2 + 1 dimensional Kardar-Parisi-Zhang surface growth, which is very sensitive to correlation in the site-selection dynamics, were performed to show self-consistency of the parallel runs and agreement with the sequential algorithm. We present a GPU implementation providing a speedup of about 30 × over a parallel CPU implementation on a single socket and at least 180 × with respect to the sequential reference.
The remote sensing image segmentation mean shift algorithm parallel processing based on MapReduce

NASA Astrophysics Data System (ADS)

Chen, Xi; Zhou, Liqing

2015-12-01

With the development of satellite remote sensing technology and the remote sensing image data, traditional remote sensing image segmentation technology cannot meet the massive remote sensing image processing and storage requirements. This article put cloud computing and parallel computing technology in remote sensing image segmentation process, and build a cheap and efficient computer cluster system that uses parallel processing to achieve MeanShift algorithm of remote sensing image segmentation based on the MapReduce model, not only to ensure the quality of remote sensing image segmentation, improved split speed, and better meet the real-time requirements. The remote sensing image segmentation MeanShift algorithm parallel processing algorithm based on MapReduce shows certain significance and a realization of value.
Line-drawing algorithms for parallel machines

NASA Technical Reports Server (NTRS)

Pang, Alex T.

1990-01-01

The fact that conventional line-drawing algorithms, when applied directly on parallel machines, can lead to very inefficient codes is addressed. It is suggested that instead of modifying an existing algorithm for a parallel machine, a more efficient implementation can be produced by going back to the invariants in the definition. Popular line-drawing algorithms are compared with two alternatives; distance to a line (a point is on the line if sufficiently close to it) and intersection with a line (a point on the line if an intersection point). For massively parallel single-instruction-multiple-data (SIMD) machines (with thousands of processors and up), the alternatives provide viable line-drawing algorithms. Because of the pixel-per-processor mapping, their performance is independent of the line length and orientation.
Cloud parallel processing of tandem mass spectrometry based proteomics data.

PubMed

Mohammed, Yassene; Mostovenko, Ekaterina; Henneman, Alex A; Marissen, Rob J; Deelder, André M; Palmblad, Magnus

2012-10-05

Data analysis in mass spectrometry based proteomics struggles to keep pace with the advances in instrumentation and the increasing rate of data acquisition. Analyzing this data involves multiple steps requiring diverse software, using different algorithms and data formats. Speed and performance of the mass spectral search engines are continuously improving, although not necessarily as needed to face the challenges of acquired big data. Improving and parallelizing the search algorithms is one possibility; data decomposition presents another, simpler strategy for introducing parallelism. We describe a general method for parallelizing identification of tandem mass spectra using data decomposition that keeps the search engine intact and wraps the parallelization around it. We introduce two algorithms for decomposing mzXML files and recomposing resulting pepXML files. This makes the approach applicable to different search engines, including those relying on sequence databases and those searching spectral libraries. We use cloud computing to deliver the computational power and scientific workflow engines to interface and automate the different processing steps. We show how to leverage these technologies to achieve faster data analysis in proteomics and present three scientific workflows for parallel database as well as spectral library search using our data decomposition programs, X!Tandem and SpectraST.
An efficient parallel algorithm for matrix-vector multiplication

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendrickson, B.; Leland, R.; Plimpton, S.

The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
Beginning the Dialogue on the e-Transformation: Behavior Analysis' First Massive Open Online Course (MOOC).

PubMed

Rehfeldt, Ruth Anne; Jung, Heidi L; Aguirre, Angelica; Nichols, Jane L; Root, William B

2016-03-01

The e-Transformation in higher education, in which Massive Open Online Courses (MOOCs) are playing a pivotal role, has had an impact on the modality in which behavior analysis is taught. In this paper, we survey the history and implications of online education including MOOCs and describe the implementation and results for the discipline's first MOOC, delivered at Southern Illinois University in spring 2015. Implications for the globalization and free access of higher education are discussed, as well as the parallel between MOOCs and Skinner's teaching machines.
A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moreland, Kenneth; Geveci, Berk

2014-11-01

The evolution of the computing world from teraflop to petaflop has been relatively effortless, with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipelinemore » model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.« less
A Fast Algorithm for Massively Parallel, Long-Term, Simulation of Complex Molecular Dynamics Systems

NASA Technical Reports Server (NTRS)

Jaramillo-Botero, Andres; Goddard, William A, III; Fijany, Amir

1997-01-01

The advances in theory and computing technology over the last decade have led to enormous progress in applying atomistic molecular dynamics (MD) methods to the characterization, prediction, and design of chemical, biological, and material systems,.
Branched Polymers for Enhancing Polymer Gel Strength and Toughness

DTIC Science & Technology

2013-02-01

Molecular Massively Parallel Simulator ( LAMMPS ) program and the stress-strain relations were calculated with varying strain-rates (figure 6). A...Acronyms ARL U.S. Army Research Laboratory D3 hexamethylcyclotrisiloxane FTIR Fourier transform infrared GPC gel permeation chromatography LAMMPS
DOE Office of Scientific and Technical Information (OSTI.GOV)

SmartImport.py is a Python source-code file that implements a replacement for the standard Python module importer. The code is derived from knee.py, a file in the standard Python diestribution , and adds functionality to improve the performance of Python module imports in massively parallel contexts.
Image segmentation by iterative parallel region growing with application to data compression and image analysis

NASA Technical Reports Server (NTRS)

Tilton, James C.

1988-01-01

Image segmentation can be a key step in data compression and image analysis. However, the segmentation results produced by most previous approaches to region growing are suspect because they depend on the order in which portions of the image are processed. An iterative parallel segmentation algorithm avoids this problem by performing globally best merges first. Such a segmentation approach, and two implementations of the approach on NASA's Massively Parallel Processor (MPP) are described. Application of the segmentation approach to data compression and image analysis is then described, and results of such application are given for a LANDSAT Thematic Mapper image.
Low profile, highly configurable, current sharing paralleled wide band gap power device power module

DOEpatents

McPherson, Brice; Killeen, Peter D.; Lostetter, Alex; Shaw, Robert; Passmore, Brandon; Hornberger, Jared; Berry, Tony M

2016-08-23

A power module with multiple equalized parallel power paths supporting multiple parallel bare die power devices constructed with low inductance equalized current paths for even current sharing and clean switching events. Wide low profile power contacts provide low inductance, short current paths, and large conductor cross section area provides for massive current carrying. An internal gate & source kelvin interconnection substrate is provided with individual ballast resistors and simple bolted construction. Gate drive connectors are provided on either left or right size of the module. The module is configurable as half bridge, full bridge, common source, and common drain topologies.
Progress on the Multiphysics Capabilities of the Parallel Electromagnetic ACE3P Simulation Suite

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kononenko, Oleksiy

2015-03-26

ACE3P is a 3D parallel simulation suite that is being developed at SLAC National Accelerator Laboratory. Effectively utilizing supercomputer resources, ACE3P has become a key tool for the coupled electromagnetic, thermal and mechanical research and design of particle accelerators. Based on the existing finite-element infrastructure, a massively parallel eigensolver is developed for modal analysis of mechanical structures. It complements a set of the multiphysics tools in ACE3P and, in particular, can be used for the comprehensive study of microphonics in accelerating cavities ensuring the operational reliability of a particle accelerator.

Biomimetic Models for An Ecological Approach to Massively-Deployed Sensor Networks

NASA Technical Reports Server (NTRS)

Jones, Kennie H.; Lodding, Kenneth N.; Olariu, Stephan; Wilson, Larry; Xin, Chunsheng

2005-01-01

Promises of ubiquitous control of the physical environment by massively-deployed wireless sensor networks open avenues for new applications that will redefine the way we live and work. Due to small size and low cost of sensor devices, visionaries promise systems enabled by deployment of massive numbers of sensors ubiquitous throughout our environment working in concert. Recent research has concentrated on developing techniques for performing relatively simple tasks with minimal energy expense, assuming some form of centralized control. Unfortunately, centralized control is not conducive to parallel activities and does not scale to massive size networks. Execution of simple tasks in sparse networks will not lead to the sophisticated applications predicted. We propose a new way of looking at massively-deployed sensor networks, motivated by lessons learned from the way biological ecosystems are organized. We demonstrate that in such a model, fully distributed data aggregation can be performed in a scalable fashion in massively deployed sensor networks, where motes operate on local information, making local decisions that are aggregated across the network to achieve globally-meaningful effects. We show that such architectures may be used to facilitate communication and synchronization in a fault-tolerant manner, while balancing workload and required energy expenditure throughout the network.
Land-Grant Colleges and American Engineers: Redefining Professional and Vocational Engineering Education in the American Midwest, 1862-1917

ERIC Educational Resources Information Center

Nienkamp, Paul

2010-01-01

During the twentieth-century, American engineers harnessed the atom, sent men to the moon, and literally reshaped the world. They re-routed rivers to create giant hydroelectric dams, created a massive and interconnected highway system, and designed skyscrapers, jets, computers, and the internet. As a modern profession, engineering boasted strong…
Dual compile strategy for parallel heterogeneous execution

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, Tyler Barratt; Perry, James Thomas

2012-06-01

The purpose of the Dual Compile Strategy is to increase our trust in the Compute Engine during its execution of instructions. This is accomplished by introducing a heterogeneous Monitor Engine that checks the execution of the Compute Engine. This leads to the production of a second and custom set of instructions designed for monitoring the execution of the Compute Engine at runtime. This use of multiple engines differs from redundancy in that one engine is working on the application while the other engine is monitoring and checking in parallel instead of both applications (and engines) performing the same work atmore » the same time.« less
Component-based integration of chemistry and optimization software.

PubMed

Kenny, Joseph P; Benson, Steven J; Alexeev, Yuri; Sarich, Jason; Janssen, Curtis L; McInnes, Lois Curfman; Krishnan, Manojkumar; Nieplocha, Jarek; Jurrus, Elizabeth; Fahlstrom, Carl; Windus, Theresa L

2004-11-15

Typical scientific software designs make rigid assumptions regarding programming language and data structures, frustrating software interoperability and scientific collaboration. Component-based software engineering is an emerging approach to managing the increasing complexity of scientific software. Component technology facilitates code interoperability and reuse. Through the adoption of methodology and tools developed by the Common Component Architecture Forum, we have developed a component architecture for molecular structure optimization. Using the NWChem and Massively Parallel Quantum Chemistry packages, we have produced chemistry components that provide capacity for energy and energy derivative evaluation. We have constructed geometry optimization applications by integrating the Toolkit for Advanced Optimization, Portable Extensible Toolkit for Scientific Computation, and Global Arrays packages, which provide optimization and linear algebra capabilities. We present a brief overview of the component development process and a description of abstract interfaces for chemical optimizations. The components conforming to these abstract interfaces allow the construction of applications using different chemistry and mathematics packages interchangeably. Initial numerical results for the component software demonstrate good performance, and highlight potential research enabled by this platform.
A review of GPU-based medical image reconstruction.

PubMed

Després, Philippe; Jia, Xun

2017-10-01

Tomographic image reconstruction is a computationally demanding task, even more so when advanced models are used to describe a more complete and accurate picture of the image formation process. Such advanced modeling and reconstruction algorithms can lead to better images, often with less dose, but at the price of long calculation times that are hardly compatible with clinical workflows. Fortunately, reconstruction tasks can often be executed advantageously on Graphics Processing Units (GPUs), which are exploited as massively parallel computational engines. This review paper focuses on recent developments made in GPU-based medical image reconstruction, from a CT, PET, SPECT, MRI and US perspective. Strategies and approaches to get the most out of GPUs in image reconstruction are presented as well as innovative applications arising from an increased computing capacity. The future of GPU-based image reconstruction is also envisioned, based on current trends in high-performance computing. Copyright © 2017 Associazione Italiana di Fisica Medica. Published by Elsevier Ltd. All rights reserved.
Granular Simulation of NEO Anchoring

NASA Technical Reports Server (NTRS)

Mazhar, Hammad

2011-01-01

NASA is interested in designing a spacecraft capable of visiting a Near Earth Object (NEO), performing experiments, and then returning safely. Certain periods of this mission will require the spacecraft to remain stationary relative to the NEO. Such situations require an anchoring mechanism that is compact, easy to deploy and upon mission completion, easily removed. The design philosophy used in the project relies on the simulation capability of a multibody dynamics physics engine. On Earth it is difficult to create low gravity conditions and testing in low gravity environments, whether artificial or in space is costly and therefore not feasible. Through simulation, gravity can be controlled with great accuracy, making it ideally suited to analyze the problem at hand. Using Chrono::Engine [1], a simulation package capable of utilizing massively parallel GPU hardware, several validation experiments will be performed. Once there is sufficient confidence, modeling of the NEO regolith interaction will begin after which the anchor tests will be performed and analyzed. The outcome of this task is a study with an analysis of several different anchor designs, along with a recommendation on which anchor is better suited to the task of anchoring. With the anchors tested against a range of parameters relating to soil, environment and anchor penetration angles/velocities on a NEO.
A computational model of the human visual cortex

NASA Astrophysics Data System (ADS)

Albus, James S.

2008-04-01

The brain is first and foremost a control system that is capable of building an internal representation of the external world, and using this representation to make decisions, set goals and priorities, formulate plans, and control behavior with intent to achieve its goals. The computational model proposed here assumes that this internal representation resides in arrays of cortical columns. More specifically, it models each cortical hypercolumn together with its underlying thalamic nuclei as a Fundamental Computational Unit (FCU) consisting of a frame-like data structure (containing attributes and pointers) plus the computational processes and mechanisms required to maintain it. In sensory-processing areas of the brain, FCUs enable segmentation, grouping, and classification. Pointers stored in FCU frames link pixels and signals to objects and events in situations and episodes that are overlaid with meaning and emotional values. In behavior-generating areas of the brain, FCUs make decisions, set goals and priorities, generate plans, and control behavior. Pointers are used to define rules, grammars, procedures, plans, and behaviors. It is suggested that it may be possible to reverse engineer the human brain at the FCU level of fidelity using nextgeneration massively parallel computer hardware and software. Key Words: computational modeling, human cortex, brain modeling, reverse engineering the brain, image processing, perception, segmentation, knowledge representation
Memristor-based cellular nonlinear/neural network: design, analysis, and applications.

PubMed

Duan, Shukai; Hu, Xiaofang; Dong, Zhekang; Wang, Lidan; Mazumder, Pinaki

2015-06-01

Cellular nonlinear/neural network (CNN) has been recognized as a powerful massively parallel architecture capable of solving complex engineering problems by performing trillions of analog operations per second. The memristor was theoretically predicted in the late seventies, but it garnered nascent research interest due to the recent much-acclaimed discovery of nanocrossbar memories by engineers at the Hewlett-Packard Laboratory. The memristor is expected to be co-integrated with nanoscale CMOS technology to revolutionize conventional von Neumann as well as neuromorphic computing. In this paper, a compact CNN model based on memristors is presented along with its performance analysis and applications. In the new CNN design, the memristor bridge circuit acts as the synaptic circuit element and substitutes the complex multiplication circuit used in traditional CNN architectures. In addition, the negative differential resistance and nonlinear current-voltage characteristics of the memristor have been leveraged to replace the linear resistor in conventional CNNs. The proposed CNN design has several merits, for example, high density, nonvolatility, and programmability of synaptic weights. The proposed memristor-based CNN design operations for implementing several image processing functions are illustrated through simulation and contrasted with conventional CNNs. Monte-Carlo simulation has been used to demonstrate the behavior of the proposed CNN due to the variations in memristor synaptic weights.
A Linked-Cell Domain Decomposition Method for Molecular Dynamics Simulation on a Scalable Multiprocessor

DOE PAGES

Yang, L. H.; Brooks III, E. D.; Belak, J.

1992-01-01

A molecular dynamics algorithm for performing large-scale simulations using the Parallel C Preprocessor (PCP) programming paradigm on the BBN TC2000, a massively parallel computer, is discussed. The algorithm uses a linked-cell data structure to obtain the near neighbors of each atom as time evoles. Each processor is assigned to a geometric domain containing many subcells and the storage for that domain is private to the processor. Within this scheme, the interdomain (i.e., interprocessor) communication is minimized.
Reconstruction of coded aperture images

NASA Technical Reports Server (NTRS)

Bielefeld, Michael J.; Yin, Lo I.

1987-01-01

Balanced correlation method and the Maximum Entropy Method (MEM) were implemented to reconstruct a laboratory X-ray source as imaged by a Uniformly Redundant Array (URA) system. Although the MEM method has advantages over the balanced correlation method, it is computationally time consuming because of the iterative nature of its solution. Massively Parallel Processing, with its parallel array structure is ideally suited for such computations. These preliminary results indicate that it is possible to use the MEM method in future coded-aperture experiments with the help of the MPP.
Function algorithms for MPP scientific subroutines, volume 1

NASA Technical Reports Server (NTRS)

Gouch, J. G.

1984-01-01

Design documentation and user documentation for function algorithms for the Massively Parallel Processor (MPP) are presented. The contract specifies development of MPP assembler instructions to perform the following functions: natural logarithm; exponential (e to the x power); square root; sine; cosine; and arctangent. To fulfill the requirements of the contract, parallel array and solar implementations for these functions were developed on the PDP11/34 Program Development and Management Unit (PDMU) that is resident at the MPP testbed installation located at the NASA Goddard facility.
GPU-completeness: theory and implications

NASA Astrophysics Data System (ADS)

Lin, I.-Jong

2011-01-01

This paper formalizes a major insight into a class of algorithms that relate parallelism and performance. The purpose of this paper is to define a class of algorithms that trades off parallelism for quality of result (e.g. visual quality, compression rate), and we propose a similar method for algorithmic classification based on NP-Completeness techniques, applied toward parallel acceleration. We will define this class of algorithm as "GPU-Complete" and will postulate the necessary properties of the algorithms for admission into this class. We will also formally relate his algorithmic space and imaging algorithms space. This concept is based upon our experience in the print production area where GPUs (Graphic Processing Units) have shown a substantial cost/performance advantage within the context of HPdelivered enterprise services and commercial printing infrastructure. While CPUs and GPUs are converging in their underlying hardware and functional blocks, their system behaviors are clearly distinct in many ways: memory system design, programming paradigms, and massively parallel SIMD architecture. There are applications that are clearly suited to each architecture: for CPU: language compilation, word processing, operating systems, and other applications that are highly sequential in nature; for GPU: video rendering, particle simulation, pixel color conversion, and other problems clearly amenable to massive parallelization. While GPUs establishing themselves as a second, distinct computing architecture from CPUs, their end-to-end system cost/performance advantage in certain parts of computation inform the structure of algorithms and their efficient parallel implementations. While GPUs are merely one type of architecture for parallelization, we show that their introduction into the design space of printing systems demonstrate the trade-offs against competing multi-core, FPGA, and ASIC architectures. While each architecture has its own optimal application, we believe that the selection of architecture can be defined in terms of properties of GPU-Completeness. For a welldefined subset of algorithms, GPU-Completeness is intended to connect the parallelism, algorithms and efficient architectures into a unified framework to show that multiple layers of parallel implementation are guided by the same underlying trade-off.
Data intensive computing at Sandia.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wilson, Andrew T.

2010-09-01

Data-Intensive Computing is parallel computing where you design your algorithms and your software around efficient access and traversal of a data set; where hardware requirements are dictated by data size as much as by desired run times usually distilling compact results from massive data.
Optimization of Particle-in-Cell Codes on RISC Processors

NASA Technical Reports Server (NTRS)

Decyk, Viktor K.; Karmesin, Steve Roy; Boer, Aeint de; Liewer, Paulette C.

1996-01-01

General strategies are developed to optimize particle-cell-codes written in Fortran for RISC processors which are commonly used on massively parallel computers. These strategies include data reorganization to improve cache utilization and code reorganization to improve efficiency of arithmetic pipelines.
Optimized Landing of Autonomous Unmanned Aerial Vehicle Swarms

DTIC Science & Technology

2012-06-01

understanding about the world. Examples of these emergent behaviors include construction of complex structures (e.g., hives, termite mounds), trends in economic...Sep. 2007. [16] M. Resnick, Turtles, Termites , and Traffic Jams: Explorations in Massively Parallel Microworlds. MIT Press, 1997. [Online]. Available
Constraint-Based Scheduling System

NASA Technical Reports Server (NTRS)

Zweben, Monte; Eskey, Megan; Stock, Todd; Taylor, Will; Kanefsky, Bob; Drascher, Ellen; Deale, Michael; Daun, Brian; Davis, Gene

1995-01-01

Report describes continuing development of software for constraint-based scheduling system implemented eventually on massively parallel computer. Based on machine learning as means of improving scheduling. Designed to learn when to change search strategy by analyzing search progress and learning general conditions under which resource bottleneck occurs.
The novel high-performance 3-D MT inverse solver

NASA Astrophysics Data System (ADS)

Kruglyakov, Mikhail; Geraskin, Alexey; Kuvshinov, Alexey

2016-04-01

We present novel, robust, scalable, and fast 3-D magnetotelluric (MT) inverse solver. The solver is written in multi-language paradigm to make it as efficient, readable and maintainable as possible. Separation of concerns and single responsibility concepts go through implementation of the solver. As a forward modelling engine a modern scalable solver extrEMe, based on contracting integral equation approach, is used. Iterative gradient-type (quasi-Newton) optimization scheme is invoked to search for (regularized) inverse problem solution, and adjoint source approach is used to calculate efficiently the gradient of the misfit. The inverse solver is able to deal with highly detailed and contrasting models, allows for working (separately or jointly) with any type of MT responses, and supports massive parallelization. Moreover, different parallelization strategies implemented in the code allow optimal usage of available computational resources for a given problem statement. To parameterize an inverse domain the so-called mask parameterization is implemented, which means that one can merge any subset of forward modelling cells in order to account for (usually) irregular distribution of observation sites. We report results of 3-D numerical experiments aimed at analysing the robustness, performance and scalability of the code. In particular, our computational experiments carried out at different platforms ranging from modern laptops to HPC Piz Daint (6th supercomputer in the world) demonstrate practically linear scalability of the code up to thousands of nodes.
Massively parallel sequencing analysis of mucinous ovarian carcinomas: genomic profiling and differential diagnoses.

PubMed

Mueller, Jennifer J; Schlappe, Brooke A; Kumar, Rahul; Olvera, Narciso; Dao, Fanny; Abu-Rustum, Nadeem; Aghajanian, Carol; DeLair, Deborah; Hussein, Yaser R; Soslow, Robert A; Levine, Douglas A; Weigelt, Britta

2018-05-21

Mucinous ovarian cancer (MOC) is a rare type of epithelial ovarian cancer resistant to standard chemotherapy regimens. We sought to characterize the repertoire of somatic mutations in MOCs and to define the contribution of massively parallel sequencing to the classification of tumors diagnosed as primary MOCs. Following gynecologic pathology and chart review, DNA samples obtained from primary MOCs and matched normal tissues/blood were subjected to whole-exome (n = 9) or massively parallel sequencing targeting 341 cancer genes (n = 15). Immunohistochemical analysis of estrogen receptor, progesterone receptor, PTEN, ARID1A/BAF250a, and the DNA mismatch (MMR) proteins MSH6 and PMS2 was performed for all cases. Mutational frequencies of MOCs were compared to those of high-grade serous ovarian cancers (HGSOCs) and mucinous tumors from other sites. MOCs were heterogeneous at the genetic level, frequently harboring TP53 (75%) mutations, KRAS (71%) mutations and/or CDKN2A/B homozygous deletions/mutations (33%). Although established criteria for diagnosis were employed, four cases harbored mutational and immunohistochemical profiles similar to those of endometrioid carcinomas, and one case for colorectal or endometrioid carcinoma. Significant differences in the frequencies of KRAS, TP53, CDKN2A, FBXW7, PIK3CA and/or APC mutations between the confirmed primary MOCs (n = 19) and HGSOCs, mucinous gastric and/or mucinous colorectal carcinomas were found, whereas no differences in the 341 genes studied between MOCs and mucinous pancreatic carcinomas were identified. Our findings suggest that the assessment of mutations affecting TP53, KRAS, PIK3CA, ARID1A and POLE, and DNA MMR protein expression may be used to further aid the diagnosis and treatment decision-making of primary MOC. Copyright © 2018 Elsevier Inc. All rights reserved.
Identification of rare X-linked neuroligin variants by massively parallel sequencing in males with autism spectrum disorder.

PubMed

Steinberg, Karyn Meltz; Ramachandran, Dhanya; Patel, Viren C; Shetty, Amol C; Cutler, David J; Zwick, Michael E

2012-09-28

Autism spectrum disorder (ASD) is highly heritable, but the genetic risk factors for it remain largely unknown. Although structural variants with large effect sizes may explain up to 15% ASD, genome-wide association studies have failed to uncover common single nucleotide variants with large effects on phenotype. The focus within ASD genetics is now shifting to the examination of rare sequence variants of modest effect, which is most often achieved via exome selection and sequencing. This strategy has indeed identified some rare candidate variants; however, the approach does not capture the full spectrum of genetic variation that might contribute to the phenotype. We surveyed two loci with known rare variants that contribute to ASD, the X-linked neuroligin genes by performing massively parallel Illumina sequencing of the coding and noncoding regions from these genes in males from families with multiplex autism. We annotated all variant sites and functionally tested a subset to identify other rare mutations contributing to ASD susceptibility. We found seven rare variants at evolutionary conserved sites in our study population. Functional analyses of the three 3' UTR variants did not show statistically significant effects on the expression of NLGN3 and NLGN4X. In addition, we identified two NLGN3 intronic variants located within conserved transcription factor binding sites that could potentially affect gene regulation. These data demonstrate the power of massively parallel, targeted sequencing studies of affected individuals for identifying rare, potentially disease-contributing variation. However, they also point out the challenges and limitations of current methods of direct functional testing of rare variants and the difficulties of identifying alleles with modest effects.
Identification of rare X-linked neuroligin variants by massively parallel sequencing in males with autism spectrum disorder

PubMed Central

2012-01-01

Background Autism spectrum disorder (ASD) is highly heritable, but the genetic risk factors for it remain largely unknown. Although structural variants with large effect sizes may explain up to 15% ASD, genome-wide association studies have failed to uncover common single nucleotide variants with large effects on phenotype. The focus within ASD genetics is now shifting to the examination of rare sequence variants of modest effect, which is most often achieved via exome selection and sequencing. This strategy has indeed identified some rare candidate variants; however, the approach does not capture the full spectrum of genetic variation that might contribute to the phenotype. Methods We surveyed two loci with known rare variants that contribute to ASD, the X-linked neuroligin genes by performing massively parallel Illumina sequencing of the coding and noncoding regions from these genes in males from families with multiplex autism. We annotated all variant sites and functionally tested a subset to identify other rare mutations contributing to ASD susceptibility. Results We found seven rare variants at evolutionary conserved sites in our study population. Functional analyses of the three 3’ UTR variants did not show statistically significant effects on the expression of NLGN3 and NLGN4X. In addition, we identified two NLGN3 intronic variants located within conserved transcription factor binding sites that could potentially affect gene regulation. Conclusions These data demonstrate the power of massively parallel, targeted sequencing studies of affected individuals for identifying rare, potentially disease-contributing variation. However, they also point out the challenges and limitations of current methods of direct functional testing of rare variants and the difficulties of identifying alleles with modest effects. PMID:23020841

PARAVT: Parallel Voronoi tessellation code

NASA Astrophysics Data System (ADS)

González, R. E.

2016-10-01

In this study, we present a new open source code for massive parallel computation of Voronoi tessellations (VT hereafter) in large data sets. The code is focused for astrophysical purposes where VT densities and neighbors are widely used. There are several serial Voronoi tessellation codes, however no open source and parallel implementations are available to handle the large number of particles/galaxies in current N-body simulations and sky surveys. Parallelization is implemented under MPI and VT using Qhull library. Domain decomposition takes into account consistent boundary computation between tasks, and includes periodic conditions. In addition, the code computes neighbors list, Voronoi density, Voronoi cell volume, density gradient for each particle, and densities on a regular grid. Code implementation and user guide are publicly available at https://github.com/regonzar/paravt.
Parallel Computational Fluid Dynamics: Current Status and Future Requirements

NASA Technical Reports Server (NTRS)

Simon, Horst D.; VanDalsem, William R.; Dagum, Leonardo; Kutler, Paul (Technical Monitor)

1994-01-01

One or the key objectives of the Applied Research Branch in the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Allies Research Center is the accelerated introduction of highly parallel machines into a full operational environment. In this report we discuss the performance results obtained from the implementation of some computational fluid dynamics (CFD) applications on the Connection Machine CM-2 and the Intel iPSC/860. We summarize some of the experiences made so far with the parallel testbed machines at the NAS Applied Research Branch. Then we discuss the long term computational requirements for accomplishing some of the grand challenge problems in computational aerosciences. We argue that only massively parallel machines will be able to meet these grand challenge requirements, and we outline the computer science and algorithm research challenges ahead.
Parallel dispatch: a new paradigm of electrical power system dispatch

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Jun Jason; Wang, Fei-Yue; Wang, Qiang

Modern power systems are evolving into sociotechnical systems with massive complexity, whose real-time operation and dispatch go beyond human capability. Thus, the need for developing and applying new intelligent power system dispatch tools are of great practical significance. In this paper, we introduce the overall business model of power system dispatch, the top level design approach of an intelligent dispatch system, and the parallel intelligent technology with its dispatch applications. We expect that a new dispatch paradigm, namely the parallel dispatch, can be established by incorporating various intelligent technologies, especially the parallel intelligent technology, to enable secure operation of complexmore » power grids, extend system operators U+02BC capabilities, suggest optimal dispatch strategies, and to provide decision-making recommendations according to power system operational goals.« less
ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.

PubMed

Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping

2018-04-27

A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
Bioelectricity-AQA, one of the first MOOC courses in engineering.

PubMed

Barr, Roger C

2014-01-01

Bioelectricity-AQA was one of the first massively open online courses in engineering, having been given the first time via Coursera starting in September, 2012. This report provides some detail on its background, presentation, enrollment, and lessons learned.
Massively parallel and linear-scaling algorithm for second-order Moller–Plesset perturbation theory applied to the study of supramolecular wires

DOE PAGES

Kjaergaard, Thomas; Baudin, Pablo; Bykov, Dmytro; ...

2016-11-16

Here, we present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide–Expand–Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide–Expand–Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalabilitymore » of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the “resolution of the identity second-order Moller–Plesset perturbation theory” (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.« less
GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit

PubMed Central

Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik

2013-01-01

Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358
Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.

PubMed

Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias

2011-01-01

The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.
Sewage Reflects the Distriubtion of Human Faecal Lachnospiraceae

EPA Science Inventory

Faecal pollution contains a rich and diverse community of bacteria derived from animals and humans,many of which might serve as alternatives to the traditional enterococci and Escherichia coli faecal indicators. We used massively parallel sequencing (MPS)of the 16S rRNA gene to ...
Genetics Home Reference: medullary cystic kidney disease type 1

MedlinePlus

... They lead to the production of an altered protein. It is unclear how this change causes kidney disease. ... cystic kidney disease type 1 lie in a large VNTR in MUC1 missed by massively parallel sequencing. Nat Genet. 2013 Mar;45(3):299-303. ...
Large-scale enrichment and discovery of gene-associated SNPs

USDA-ARS?s Scientific Manuscript database

With the recent advent of massively parallel pyrosequencing by 454 Life Sciences it has become feasible to cost-effectively identify numerous single nucleotide polymorphisms (SNPs) within the recombinogenic regions of the maize (Zea mays L.) genome. We developed a modified version of hypomethylated...
Software Applications on the Peregrine System | High-Performance Computing

Science.gov Websites

programming and optimization. Gaussian Chemistry Program for calculating molecular electronic structure and Materials Science Open-source classical molecular dynamics program designed for massively parallel systems framework Q-Chem Chemistry ab initio quantum chemistry package for predictin molecular structures
Flow cytometry for enrichment and titration in massively parallel DNA sequencing

PubMed Central

Sandberg, Julia; Ståhl, Patrik L.; Ahmadian, Afshin; Bjursell, Magnus K.; Lundeberg, Joakim

2009-01-01

Massively parallel DNA sequencing is revolutionizing genomics research throughout the life sciences. However, the reagent costs and labor requirements in current sequencing protocols are still substantial, although improvements are continuously being made. Here, we demonstrate an effective alternative to existing sample titration protocols for the Roche/454 system using Fluorescence Activated Cell Sorting (FACS) technology to determine the optimal DNA-to-bead ratio prior to large-scale sequencing. Our method, which eliminates the need for the costly pilot sequencing of samples during titration is capable of rapidly providing accurate DNA-to-bead ratios that are not biased by the quantification and sedimentation steps included in current protocols. Moreover, we demonstrate that FACS sorting can be readily used to highly enrich fractions of beads carrying template DNA, with near total elimination of empty beads and no downstream sacrifice of DNA sequencing quality. Automated enrichment by FACS is a simple approach to obtain pure samples for bead-based sequencing systems, and offers an efficient, low-cost alternative to current enrichment protocols. PMID:19304748
A technique for setting analytical thresholds in massively parallel sequencing-based forensic DNA analysis

PubMed Central

2017-01-01

Amplicon (targeted) sequencing by massively parallel sequencing (PCR-MPS) is a potential method for use in forensic DNA analyses. In this application, PCR-MPS may supplement or replace other instrumental analysis methods such as capillary electrophoresis and Sanger sequencing for STR and mitochondrial DNA typing, respectively. PCR-MPS also may enable the expansion of forensic DNA analysis methods to include new marker systems such as single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) that currently are assayable using various instrumental analysis methods including microarray and quantitative PCR. Acceptance of PCR-MPS as a forensic method will depend in part upon developing protocols and criteria that define the limitations of a method, including a defensible analytical threshold or method detection limit. This paper describes an approach to establish objective analytical thresholds suitable for multiplexed PCR-MPS methods. A definition is proposed for PCR-MPS method background noise, and an analytical threshold based on background noise is described. PMID:28542338
Automation of a Wave-Optics Simulation and Image Post-Processing Package on Riptide

NASA Astrophysics Data System (ADS)

Werth, M.; Lucas, J.; Thompson, D.; Abercrombie, M.; Holmes, R.; Roggemann, M.

Detailed wave-optics simulations and image post-processing algorithms are computationally expensive and benefit from the massively parallel hardware available at supercomputing facilities. We created an automated system that interfaces with the Maui High Performance Computing Center (MHPCC) Distributed MATLAB® Portal interface to submit massively parallel waveoptics simulations to the IBM iDataPlex (Riptide) supercomputer. This system subsequently postprocesses the output images with an improved version of physically constrained iterative deconvolution (PCID) and analyzes the results using a series of modular algorithms written in Python. With this architecture, a single person can simulate thousands of unique scenarios and produce analyzed, archived, and briefing-compatible output products with very little effort. This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views, opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.
Massively parallel sequencing and the emergence of forensic genomics: Defining the policy and legal issues for law enforcement.

PubMed

Scudder, Nathan; McNevin, Dennis; Kelty, Sally F; Walsh, Simon J; Robertson, James

2018-03-01

Use of DNA in forensic science will be significantly influenced by new technology in coming years. Massively parallel sequencing and forensic genomics will hasten the broadening of forensic DNA analysis beyond short tandem repeats for identity towards a wider array of genetic markers, in applications as diverse as predictive phenotyping, ancestry assignment, and full mitochondrial genome analysis. With these new applications come a range of legal and policy implications, as forensic science touches on areas as diverse as 'big data', privacy and protected health information. Although these applications have the potential to make a more immediate and decisive forensic intelligence contribution to criminal investigations, they raise policy issues that will require detailed consideration if this potential is to be realised. The purpose of this paper is to identify the scope of the issues that will confront forensic and user communities. Copyright © 2017 The Chartered Society of Forensic Sciences. All rights reserved.
A technique for setting analytical thresholds in massively parallel sequencing-based forensic DNA analysis.

PubMed

Young, Brian; King, Jonathan L; Budowle, Bruce; Armogida, Luigi

2017-01-01

Amplicon (targeted) sequencing by massively parallel sequencing (PCR-MPS) is a potential method for use in forensic DNA analyses. In this application, PCR-MPS may supplement or replace other instrumental analysis methods such as capillary electrophoresis and Sanger sequencing for STR and mitochondrial DNA typing, respectively. PCR-MPS also may enable the expansion of forensic DNA analysis methods to include new marker systems such as single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) that currently are assayable using various instrumental analysis methods including microarray and quantitative PCR. Acceptance of PCR-MPS as a forensic method will depend in part upon developing protocols and criteria that define the limitations of a method, including a defensible analytical threshold or method detection limit. This paper describes an approach to establish objective analytical thresholds suitable for multiplexed PCR-MPS methods. A definition is proposed for PCR-MPS method background noise, and an analytical threshold based on background noise is described.
Massively parallel de novo protein design for targeted therapeutics.

PubMed

Chevalier, Aaron; Silva, Daniel-Adriano; Rocklin, Gabriel J; Hicks, Derrick R; Vergara, Renan; Murapa, Patience; Bernard, Steffen M; Zhang, Lu; Lam, Kwok-Ho; Yao, Guorui; Bahl, Christopher D; Miyashita, Shin-Ichiro; Goreshnik, Inna; Fuller, James T; Koday, Merika T; Jenkins, Cody M; Colvin, Tom; Carter, Lauren; Bohn, Alan; Bryan, Cassie M; Fernández-Velasco, D Alejandro; Stewart, Lance; Dong, Min; Huang, Xuhui; Jin, Rongsheng; Wilson, Ian A; Fuller, Deborah H; Baker, David

2017-10-05

De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37-43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing.
Massively parallel de novo protein design for targeted therapeutics

NASA Astrophysics Data System (ADS)

Chevalier, Aaron; Silva, Daniel-Adriano; Rocklin, Gabriel J.; Hicks, Derrick R.; Vergara, Renan; Murapa, Patience; Bernard, Steffen M.; Zhang, Lu; Lam, Kwok-Ho; Yao, Guorui; Bahl, Christopher D.; Miyashita, Shin-Ichiro; Goreshnik, Inna; Fuller, James T.; Koday, Merika T.; Jenkins, Cody M.; Colvin, Tom; Carter, Lauren; Bohn, Alan; Bryan, Cassie M.; Fernández-Velasco, D. Alejandro; Stewart, Lance; Dong, Min; Huang, Xuhui; Jin, Rongsheng; Wilson, Ian A.; Fuller, Deborah H.; Baker, David

2017-10-01

De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37-43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing.
Massively parallel de novo protein design for targeted therapeutics

PubMed Central

Chevalier, Aaron; Silva, Daniel-Adriano; Rocklin, Gabriel J.; Hicks, Derrick R.; Vergara, Renan; Murapa, Patience; Bernard, Steffen M.; Zhang, Lu; Lam, Kwok-Ho; Yao, Guorui; Bahl, Christopher D.; Miyashita, Shin-Ichiro; Goreshnik, Inna; Fuller, James T.; Koday, Merika T.; Jenkins, Cody M.; Colvin, Tom; Carter, Lauren; Bohn, Alan; Bryan, Cassie M.; Fernández-Velasco, D. Alejandro; Stewart, Lance; Dong, Min; Huang, Xuhui; Jin, Rongsheng; Wilson, Ian A.; Fuller, Deborah H.; Baker, David

2018-01-01

De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37–43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing. PMID:28953867

Visualization of Pulsar Search Data

NASA Astrophysics Data System (ADS)

Foster, R. S.; Wolszczan, A.

1993-05-01

The search for periodic signals from rotating neutron stars or pulsars has been a computationally taxing problem to astronomers for more than twenty-five years. Over this time interval, increases in computational capability have allowed ever more sensitive searches, covering a larger parameter space. The volume of input data and the general presence of radio frequency interference typically produce numerous spurious signals. Visualization of the search output and enhanced real-time processing of significant candidate events allow the pulsar searcher to optimally processes and search for new radio pulsars. The pulsar search algorithm and visualization system presented in this paper currently runs on serial RISC based workstations, a traditional vector based super computer, and a massively parallel computer. A description of the serial software algorithm and its modifications for massively parallel computing are describe. The results of four successive searches for millisecond period radio pulsars using the Arecibo telescope at 430 MHz have resulted in the successful detection of new long-period and millisecond period radio pulsars.
Use of massively parallel pyrosequencing to evaluate the diversity of and selection on Plasmodium falciparum csp T-cell epitopes in Lilongwe, Malawi.

PubMed

Bailey, Jeffrey A; Mvalo, Tisungane; Aragam, Nagesh; Weiser, Matthew; Congdon, Seth; Kamwendo, Debbie; Martinson, Francis; Hoffman, Irving; Meshnick, Steven R; Juliano, Jonathan J

2012-08-15

The development of an effective malaria vaccine has been hampered by the genetic diversity of commonly used target antigens. This diversity has led to concerns about allele-specific immunity limiting the effectiveness of vaccines. Despite extensive genetic diversity of circumsporozoite protein (CS), the most successful malaria vaccine is RTS/S, a monovalent CS vaccine. By use of massively parallel pyrosequencing, we evaluated the diversity of CS haplotypes across the T-cell epitopes in parasites from Lilongwe, Malawi. We identified 57 unique parasite haplotypes from 100 participants. By use of ecological and molecular indexes of diversity, we saw no difference in the diversity of CS haplotypes between adults and children. We saw evidence of weak variant-specific selection within this region of CS, suggesting naturally acquired immunity does induce variant-specific selection on CS. Therefore, the impact of CS vaccines on variant frequencies with widespread implementation of vaccination requires further study.
Identification of the Bovine Arachnomelia Mutation by Massively Parallel Sequencing Implicates Sulfite Oxidase (SUOX) in Bone Development

PubMed Central

Drögemüller, Cord; Tetens, Jens; Sigurdsson, Snaevar; Gentile, Arcangelo; Testoni, Stefania; Lindblad-Toh, Kerstin; Leeb, Tosso

2010-01-01

Arachnomelia is a monogenic recessive defect of skeletal development in cattle. The causative mutation was previously mapped to a ∼7 Mb interval on chromosome 5. Here we show that array-based sequence capture and massively parallel sequencing technology, combined with the typical family structure in livestock populations, facilitates the identification of the causative mutation. We re-sequenced the entire critical interval in a healthy partially inbred cow carrying one copy of the critical chromosome segment in its ancestral state and one copy of the same segment with the arachnomelia mutation, and we detected a single heterozygous position. The genetic makeup of several partially inbred cattle provides extremely strong support for the causality of this mutation. The mutation represents a single base insertion leading to a premature stop codon in the coding sequence of the SUOX gene and is perfectly associated with the arachnomelia phenotype. Our findings suggest an important role for sulfite oxidase in bone development. PMID:20865119
Use of Massive Parallel Computing Libraries in the Context of Global Gravity Field Determination from Satellite Data

NASA Astrophysics Data System (ADS)

Brockmann, J. M.; Schuh, W.-D.

2011-07-01

The estimation of the global Earth's gravity field parametrized as a finite spherical harmonic series is computationally demanding. The computational effort depends on the one hand on the maximal resolution of the spherical harmonic expansion (i.e. the number of parameters to be estimated) and on the other hand on the number of observations (which are several millions for e.g. observations from the GOCE satellite missions). To circumvent these restrictions, a massive parallel software based on high-performance computing (HPC) libraries as ScaLAPACK, PBLAS and BLACS was designed in the context of GOCE HPF WP6000 and the GOCO consortium. A prerequisite for the use of these libraries is that all matrices are block-cyclic distributed on a processor grid comprised by a large number of (distributed memory) computers. Using this set of standard HPC libraries has the benefit that once the matrices are distributed across the computer cluster, a huge set of efficient and highly scalable linear algebra operations can be used.
Track finding in ATLAS using GPUs

NASA Astrophysics Data System (ADS)

Mattmann, J.; Schmitt, C.

2012-12-01

The reconstruction and simulation of collision events is a major task in modern HEP experiments involving several ten thousands of standard CPUs. On the other hand the graphics processors (GPUs) have become much more powerful and are by far outperforming the standard CPUs in terms of floating point operations due to their massive parallel approach. The usage of these GPUs could therefore significantly reduce the overall reconstruction time per event or allow for the usage of more sophisticated algorithms. In this paper the track finding in the ATLAS experiment will be used as an example on how the GPUs can be used in this context: the implementation on the GPU requires a change in the algorithmic flow to allow the code to work in the rather limited environment on the GPU in terms of memory, cache, and transfer speed from and to the GPU and to make use of the massive parallel computation. Both, the specific implementation of parts of the ATLAS track reconstruction chain and the performance improvements obtained will be discussed.
System and method for representing and manipulating three-dimensional objects on massively parallel architectures

DOEpatents

Karasick, M.S.; Strip, D.R.

1996-01-30

A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modeling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modeling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modeling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication. 8 figs.
Parallel k-means++ for Multiple Shared-Memory Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mackey, Patrick S.; Lewis, Robert R.

2016-09-22

In recent years k-means++ has become a popular initialization technique for improved k-means clustering. To date, most of the work done to improve its performance has involved parallelizing algorithms that are only approximations of k-means++. In this paper we present a parallelization of the exact k-means++ algorithm, with a proof of its correctness. We develop implementations for three distinct shared-memory architectures: multicore CPU, high performance GPU, and the massively multithreaded Cray XMT platform. We demonstrate the scalability of the algorithm on each platform. In addition we present a visual approach for showing which platform performed k-means++ the fastest for varyingmore » data sizes.« less
Password Cracking Using Sony Playstations

NASA Astrophysics Data System (ADS)

Kleinhans, Hugo; Butts, Jonathan; Shenoi, Sujeet

Law enforcement agencies frequently encounter encrypted digital evidence for which the cryptographic keys are unknown or unavailable. Password cracking - whether it employs brute force or sophisticated cryptanalytic techniques - requires massive computational resources. This paper evaluates the benefits of using the Sony PlayStation 3 (PS3) to crack passwords. The PS3 offers massive computational power at relatively low cost. Moreover, multiple PS3 systems can be introduced easily to expand parallel processing when additional power is needed. This paper also describes a distributed framework designed to enable law enforcement agents to crack encrypted archives and applications in an efficient and cost-effective manner.
A massively asynchronous, parallel brain

PubMed Central

Zeki, Semir

2015-01-01

Whether the visual brain uses a parallel or a serial, hierarchical, strategy to process visual signals, the end result appears to be that different attributes of the visual scene are perceived asynchronously—with colour leading form (orientation) by 40 ms and direction of motion by about 80 ms. Whatever the neural root of this asynchrony, it creates a problem that has not been properly addressed, namely how visual attributes that are perceived asynchronously over brief time windows after stimulus onset are bound together in the longer term to give us a unified experience of the visual world, in which all attributes are apparently seen in perfect registration. In this review, I suggest that there is no central neural clock in the (visual) brain that synchronizes the activity of different processing systems. More likely, activity in each of the parallel processing-perceptual systems of the visual brain is reset independently, making of the brain a massively asynchronous organ, just like the new generation of more efficient computers promise to be. Given the asynchronous operations of the brain, it is likely that the results of activities in the different processing-perceptual systems are not bound by physiological interactions between cells in the specialized visual areas, but post-perceptually, outside the visual brain. PMID:25823871
Bioelectricity-AQA, one of the first MOOC courses in Engineering

PubMed Central

Barr, Roger C.

2014-01-01

Bioelectricity-AQA was one of the first massively open online courses in engineering, having been given the first time via Coursera starting in September, 2012. This report provides some detail on its background, presentation, enrollment, and lessons learned. PMID:25570327
Low latency, high bandwidth data communications between compute nodes in a parallel computer

DOEpatents

Archer, Charles J.; Blocksome, Michael A.; Ratterman, Joseph D.; Smith, Brian E.

2010-11-02

Methods, parallel computers, and computer program products are disclosed for low latency, high bandwidth data communications between compute nodes in a parallel computer. Embodiments include receiving, by an origin direct memory access (`DMA`) engine of an origin compute node, data for transfer to a target compute node; sending, by the origin DMA engine of the origin compute node to a target DMA engine on the target compute node, a request to send (`RTS`) message; transferring, by the origin DMA engine, a predetermined portion of the data to the target compute node using memory FIFO operation; determining, by the origin DMA engine whether an acknowledgement of the RTS message has been received from the target DMA engine; if the an acknowledgement of the RTS message has not been received, transferring, by the origin DMA engine, another predetermined portion of the data to the target compute node using a memory FIFO operation; and if the acknowledgement of the RTS message has been received by the origin DMA engine, transferring, by the origin DMA engine, any remaining portion of the data to the target compute node using a direct put operation.
LSPRAY: Lagrangian Spray Solver for Applications With Parallel Computing and Unstructured Gas-Phase Flow Solvers

NASA Technical Reports Server (NTRS)

Raju, Manthena S.

1998-01-01

Sprays occur in a wide variety of industrial and power applications and in the processing of materials. A liquid spray is a phase flow with a gas as the continuous phase and a liquid as the dispersed phase (in the form of droplets or ligaments). Interactions between the two phases, which are coupled through exchanges of mass, momentum, and energy, can occur in different ways at different times and locations involving various thermal, mass, and fluid dynamic factors. An understanding of the flow, combustion, and thermal properties of a rapidly vaporizing spray requires careful modeling of the rate-controlling processes associated with the spray's turbulent transport, mixing, chemical kinetics, evaporation, and spreading rates, as well as other phenomena. In an attempt to advance the state-of-the-art in multidimensional numerical methods, we at the NASA Lewis Research Center extended our previous work on sprays to unstructured grids and parallel computing. LSPRAY, which was developed by M.S. Raju of Nyma, Inc., is designed to be massively parallel and could easily be coupled with any existing gas-phase flow and/or Monte Carlo probability density function (PDF) solver. The LSPRAY solver accommodates the use of an unstructured mesh with mixed triangular, quadrilateral, and/or tetrahedral elements in the gas-phase solvers. It is used specifically for fuel sprays within gas turbine combustors, but it has many other uses. The spray model used in LSPRAY provided favorable results when applied to stratified-charge rotary combustion (Wankel) engines and several other confined and unconfined spray flames. The source code will be available with the National Combustion Code (NCC) as a complete package.
A distributed query execution engine of big attributed graphs.

PubMed

Batarfi, Omar; Elshawi, Radwa; Fayoumi, Ayman; Barnawi, Ahmed; Sakr, Sherif

2016-01-01

A graph is a popular data model that has become pervasively used for modeling structural relationships between objects. In practice, in many real-world graphs, the graph vertices and edges need to be associated with descriptive attributes. Such type of graphs are referred to as attributed graphs. G-SPARQL has been proposed as an expressive language, with a centralized execution engine, for querying attributed graphs. G-SPARQL supports various types of graph querying operations including reachability, pattern matching and shortest path where any G-SPARQL query may include value-based predicates on the descriptive information (attributes) of the graph edges/vertices in addition to the structural predicates. In general, a main limitation of centralized systems is that their vertical scalability is always restricted by the physical limits of computer systems. This article describes the design, implementation in addition to the performance evaluation of DG-SPARQL, a distributed, hybrid and adaptive parallel execution engine of G-SPARQL queries. In this engine, the topology of the graph is distributed over the main memory of the underlying nodes while the graph data are maintained in a relational store which is replicated on the disk of each of the underlying nodes. DG-SPARQL evaluates parts of the query plan via SQL queries which are pushed to the underlying relational stores while other parts of the query plan, as necessary, are evaluated via indexless memory-based graph traversal algorithms. Our experimental evaluation shows the efficiency and the scalability of DG-SPARQL on querying massive attributed graph datasets in addition to its ability to outperform the performance of Apache Giraph, a popular distributed graph processing system, by orders of magnitudes.
Sort-Seq Approach to Engineering a Formaldehyde-Inducible Promoter for Dynamically Regulated Escherichia coli Growth on Methanol

PubMed Central

2017-01-01

Tight and tunable control of gene expression is a highly desirable goal in synthetic biology for constructing predictable gene circuits and achieving preferred phenotypes. Elucidating the sequence–function relationship of promoters is crucial for manipulating gene expression at the transcriptional level, particularly for inducible systems dependent on transcriptional regulators. Sort-seq methods employing fluorescence-activated cell sorting (FACS) and high-throughput sequencing allow for the quantitative analysis of sequence–function relationships in a robust and rapid way. Here we utilized a massively parallel sort-seq approach to analyze the formaldehyde-inducible Escherichia coli promoter (Pfrm) with single-nucleotide resolution. A library of mutated formaldehyde-inducible promoters was cloned upstream of gfp on a plasmid. The library was partitioned into bins via FACS on the basis of green fluorescent protein (GFP) expression level, and mutated promoters falling into each expression bin were identified with high-throughput sequencing. The resulting analysis identified two 19 base pair repressor binding sites, one upstream of the −35 RNA polymerase (RNAP) binding site and one overlapping with the −10 site, and assessed the relative importance of each position and base therein. Key mutations were identified for tuning expression levels and were used to engineer formaldehyde-inducible promoters with predictable activities. Engineered variants demonstrated up to 14-fold lower basal expression, 13-fold higher induced expression, and a 3.6-fold stronger response as indicated by relative dynamic range. Finally, an engineered formaldehyde-inducible promoter was employed to drive the expression of heterologous methanol assimilation genes and achieved increased biomass levels on methanol, a non-native substrate of E. coli. PMID:28463494
The engine design engine. A clustered computer platform for the aerodynamic inverse design and analysis of a full engine

NASA Technical Reports Server (NTRS)

Sanz, J.; Pischel, K.; Hubler, D.

1992-01-01

An application for parallel computation on a combined cluster of powerful workstations and supercomputers was developed. A Parallel Virtual Machine (PVM) is used as message passage language on a macro-tasking parallelization of the Aerodynamic Inverse Design and Analysis for a Full Engine computer code. The heterogeneous nature of the cluster is perfectly handled by the controlling host machine. Communication is established via Ethernet with the TCP/IP protocol over an open network. A reasonable overhead is imposed for internode communication, rendering an efficient utilization of the engaged processors. Perhaps one of the most interesting features of the system is its versatile nature, that permits the usage of the computational resources available that are experiencing less use at a given point in time.
Ray tracing on the MPP

NASA Technical Reports Server (NTRS)

Dorband, John E.

1987-01-01

Generating graphics to faithfully represent information can be a computationally intensive task. A way of using the Massively Parallel Processor to generate images by ray tracing is presented. This technique uses sort computation, a method of performing generalized routing interspersed with computation on a single-instruction-multiple-data (SIMD) computer.
Multiplexed microsatellite recovery using massively parallel sequencing

Treesearch

T.N. Jennings; B.J. Knaus; T.D. Mullins; S.M. Haig; R.C. Cronn

2011-01-01

Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of...
DNA methylation profiling using HpaII tiny fragment enrichment by ligation-mediated PCR (HELP)

PubMed Central

Suzuki, Masako; Greally, John M.

2010-01-01

The HELP assay is a technique that allows genome-wide analysis of cytosine methylation. Here we describe the assay, its relative strengths and weaknesses, and the transition of the assay from a microarray to massively-parallel sequencing-based foundation. PMID:20434563
Genetics Home Reference: Ochoa syndrome

MedlinePlus

... Other researchers believe that a defective heparanase 2 protein may lead to problems with the development of the urinary tract or with muscle ... Peng W, Xu J, Li J, Owens KM, Bloom D, Innis JW. Exome capture and massively parallel sequencing identifies a novel HPSE2 mutation in a Saudi ...
Gene discovery using massively parallel pyrosequencing to develop ESTs for the flesh fly Sarcophaga crassipalpis

USDA-ARS?s Scientific Manuscript database

Flesh flies in the genus Sarcophaga are important models for investigating endocrinology, diapause, cold hardiness, reproduction, and immunity. Despite the prominence of Sarcophaga flesh flies as models for insect physiology and biochemistry, and in forensic studies, little genomic or transcriptom...

Medical applications for high-performance computers in SKIF-GRID network.

PubMed

Zhuchkov, Alexey; Tverdokhlebov, Nikolay

2009-01-01

The paper presents a set of software services for massive mammography image processing by using high-performance parallel computers of SKIF-family which are linked into a service-oriented grid-network. An experience of a prototype system implementation in two medical institutions is also described.
Data decomposition method for parallel polygon rasterization considering load balancing

NASA Astrophysics Data System (ADS)

Zhou, Chen; Chen, Zhenjie; Liu, Yongxue; Li, Feixue; Cheng, Liang; Zhu, A.-xing; Li, Manchun

2015-12-01

It is essential to adopt parallel computing technology to rapidly rasterize massive polygon data. In parallel rasterization, it is difficult to design an effective data decomposition method. Conventional methods ignore load balancing of polygon complexity in parallel rasterization and thus fail to achieve high parallel efficiency. In this paper, a novel data decomposition method based on polygon complexity (DMPC) is proposed. First, four factors that possibly affect the rasterization efficiency were investigated. Then, a metric represented by the boundary number and raster pixel number in the minimum bounding rectangle was developed to calculate the complexity of each polygon. Using this metric, polygons were rationally allocated according to the polygon complexity, and each process could achieve balanced loads of polygon complexity. To validate the efficiency of DMPC, it was used to parallelize different polygon rasterization algorithms and tested on different datasets. Experimental results showed that DMPC could effectively parallelize polygon rasterization algorithms. Furthermore, the implemented parallel algorithms with DMPC could achieve good speedup ratios of at least 15.69 and generally outperformed conventional decomposition methods in terms of parallel efficiency and load balancing. In addition, the results showed that DMPC exhibited consistently better performance for different spatial distributions of polygons.
Terascale direct numerical simulations of turbulent combustion using S3D

NASA Astrophysics Data System (ADS)

Chen, J. H.; Choudhary, A.; de Supinski, B.; DeVries, M.; Hawkes, E. R.; Klasky, S.; Liao, W. K.; Ma, K. L.; Mellor-Crummey, J.; Podhorszki, N.; Sankaran, R.; Shende, S.; Yoo, C. S.

2009-01-01

Computational science is paramount to the understanding of underlying processes in internal combustion engines of the future that will utilize non-petroleum-based alternative fuels, including carbon-neutral biofuels, and burn in new combustion regimes that will attain high efficiency while minimizing emissions of particulates and nitrogen oxides. Next-generation engines will likely operate at higher pressures, with greater amounts of dilution and utilize alternative fuels that exhibit a wide range of chemical and physical properties. Therefore, there is a significant role for high-fidelity simulations, direct numerical simulations (DNS), specifically designed to capture key turbulence-chemistry interactions in these relatively uncharted combustion regimes, and in particular, that can discriminate the effects of differences in fuel properties. In DNS, all of the relevant turbulence and flame scales are resolved numerically using high-order accurate numerical algorithms. As a consequence terascale DNS are computationally intensive, require massive amounts of computing power and generate tens of terabytes of data. Recent results from terascale DNS of turbulent flames are presented here, illustrating its role in elucidating flame stabilization mechanisms in a lifted turbulent hydrogen/air jet flame in a hot air coflow, and the flame structure of a fuel-lean turbulent premixed jet flame. Computing at this scale requires close collaborations between computer and combustion scientists to provide optimized scaleable algorithms and software for terascale simulations, efficient collective parallel I/O, tools for volume visualization of multiscale, multivariate data and automating the combustion workflow. The enabling computer science, applied to combustion science, is also required in many other terascale physics and engineering simulations. In particular, performance monitoring is used to identify the performance of key kernels in the DNS code, S3D and especially memory intensive loops in the code. Through the careful application of loop transformations, data reuse in cache is exploited thereby reducing memory bandwidth needs, and hence, improving S3D's nodal performance. To enhance collective parallel I/O in S3D, an MPI-I/O caching design is used to construct a two-stage write-behind method for improving the performance of write-only operations. The simulations generate tens of terabytes of data requiring analysis. Interactive exploration of the simulation data is enabled by multivariate time-varying volume visualization. The visualization highlights spatial and temporal correlations between multiple reactive scalar fields using an intuitive user interface based on parallel coordinates and time histogram. Finally, an automated combustion workflow is designed using Kepler to manage large-scale data movement, data morphing, and archival and to provide a graphical display of run-time diagnostics.
Substructured multibody molecular dynamics.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grest, Gary Stephen; Stevens, Mark Jackson; Plimpton, Steven James

2006-11-01

We have enhanced our parallel molecular dynamics (MD) simulation software LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator, lammps.sandia.gov) to include many new features for accelerated simulation including articulated rigid body dynamics via coupling to the Rensselaer Polytechnic Institute code POEMS (Parallelizable Open-source Efficient Multibody Software). We use new features of the LAMMPS software package to investigate rhodopsin photoisomerization, and water model surface tension and capillary waves at the vapor-liquid interface. Finally, we motivate the recipes of MD for practitioners and researchers in numerical analysis and computational mechanics.
High Performance Programming Using Explicit Shared Memory Model on the Cray T3D

NASA Technical Reports Server (NTRS)

Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)

1994-01-01

The Cray T3D is the first-phase system in Cray Research Inc.'s (CRI) three-phase massively parallel processing program. In this report we describe the architecture of the T3D, as well as the CRAFT (Cray Research Adaptive Fortran) programming model, and contrast it with PVM, which is also supported on the T3D We present some performance data based on the NAS Parallel Benchmarks to illustrate both architectural and software features of the T3D.
On the conversion of infrared radiation from fission reactor-based photon engine into parallel beam

NASA Astrophysics Data System (ADS)

Gulevich, Andrey V.; Levchenko, Vladislav E.; Loginov, Nicolay I.; Kukharchuk, Oleg F.; Evtodiev, Denis A.; Zrodnikov, Anatoly V.

2002-01-01

The efficiency of infrared radiation conversion from photon engine based on fission reactor into parallel photon beam is discussed. Two different ways of doing that are considered. One of them is to use the parabolic mirror to convert of infrared radiation into parallel photon beam. The another one is based on the use of special lattice consisting of numerous light conductors. The experimental facility and some results are described. .
Satisfiability Test with Synchronous Simulated Annealing on the Fujitsu AP1000 Massively-Parallel Multiprocessor

NASA Technical Reports Server (NTRS)

Sohn, Andrew; Biswas, Rupak

1996-01-01

Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.
Quakefinder: A scalable data mining system for detecting earthquakes from space

DOE Office of Scientific and Technical Information (OSTI.GOV)

Stolorz, P.; Dean, C.

1996-12-31

We present an application of novel massively parallel datamining techniques to highly precise inference of important physical processes from remote sensing imagery. Specifically, we have developed and applied a system, Quakefinder, that automatically detects and measures tectonic activity in the earth`s crust by examination of satellite data. We have used Quakefinder to automatically map the direction and magnitude of ground displacements due to the 1992 Landers earthquake in Southern California, over a spatial region of several hundred square kilometers, at a resolution of 10 meters, to a (sub-pixel) precision of 1 meter. This is the first calculation that has evermore » been able to extract area-mapped information about 2D tectonic processes at this level of detail. We outline the architecture of the Quakefinder system, based upon a combination of techniques drawn from the fields of statistical inference, massively parallel computing and global optimization. We confirm the overall correctness of the procedure by comparison of our results with known locations of targeted faults obtained by careful and time-consuming field measurements. The system also performs knowledge discovery by indicating novel unexplained tectonic activity away from the primary faults that has never before been observed. We conclude by discussing the future potential of this data mining system in the broad context of studying subtle spatio-temporal processes within massive image streams.« less
High temporal resolution mapping of seismic noise sources using heterogeneous supercomputers

NASA Astrophysics Data System (ADS)

Gokhberg, Alexey; Ermert, Laura; Paitz, Patrick; Fichtner, Andreas

2017-04-01

Time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems. Significant interest in seismic noise source maps with high temporal resolution (days) is expected to come from a number of domains, including natural resources exploration, analysis of active earthquake fault zones and volcanoes, as well as geothermal and hydrocarbon reservoir monitoring. Currently, knowledge of noise sources is insufficient for high-resolution subsurface monitoring applications. Near-real-time seismic data, as well as advanced imaging methods to constrain seismic noise sources have recently become available. These methods are based on the massive cross-correlation of seismic noise records from all available seismic stations in the region of interest and are therefore very computationally intensive. Heterogeneous massively parallel supercomputing systems introduced in the recent years combine conventional multi-core CPU with GPU accelerators and provide an opportunity for manifold increase and computing performance. Therefore, these systems represent an efficient platform for implementation of a noise source mapping solution. We present the first results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service that provides seismic noise source maps for Central Europe with high temporal resolution (days to few weeks depending on frequency and data availability). The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept in order to provide the interested external researchers the regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for collecting, on a periodic basis, raw seismic records from the European seismic networks, (2) high-performance noise source mapping application responsible for generation of source maps using cross-correlation of seismic records, (3) back-end infrastructure for the coordination of various tasks and computations, (4) front-end Web interface providing the service to the end-users and (5) data repository. The noise mapping application is composed of four principal modules: (1) pre-processing of raw data, (2) massive cross-correlation, (3) post-processing of correlation data based on computation of logarithmic energy ratio and (4) generation of source maps from post-processed data. Implementation of the solution posed various challenges, in particular, selection of data sources and transfer protocols, automation and monitoring of daily data downloads, ensuring the required data processing performance, design of a general service oriented architecture for coordination of various sub-systems, and engineering an appropriate data storage solution. The present pilot version of the service implements noise source maps for Switzerland. Extension of the solution to Central Europe is planned for the next project phase.
Tumor Genomic Profiling in Breast Cancer Patients Using Targeted Massively Parallel Sequencing

DTIC Science & Technology

2015-04-30

recently, we identified several novel alterations in in ER+ breast tumors, including translocations in ESR1 , the gene that encodes the estrogen receptor...modified our bait design to include genomic coordinates across select introns in ESR1 . In addition, two recent papers from the Broad Institute published
The Five Central Psychological Challenges Facing Effective Mobile Learning

ERIC Educational Resources Information Center

Terras, Melody M.; Ramsay, Judith

2012-01-01

Web 2.0 technology not only offers the opportunity of massively parallel interconnected networks that support the provision of information and communication anytime and anywhere but also offers immense opportunities for collaboration and sharing of user-generated content. This information-rich environment may support both formal and informal…
Associative Networks on a Massively Parallel Computer.

DTIC Science & Technology

1985-10-01

lgbt (as a group of numbers, in this case), but this only leads to sensible queries when a statistical function is applied: "What is the largest salary...34.*"* . •.,. 64 the siW~pe operations being used during ascend, each movement step costs the same as executing an operation
Parallel Hybrid Gas-Electric Geared Turbofan Engine Conceptual Design and Benefits Analysis

NASA Technical Reports Server (NTRS)

Lents, Charles; Hardin, Larry; Rheaume, Jonathan; Kohlman, Lee

2016-01-01

The conceptual design of a parallel gas-electric hybrid propulsion system for a conventional single aisle twin engine tube and wing vehicle has been developed. The study baseline vehicle and engine technology are discussed, followed by results of the hybrid propulsion system sizing and performance analysis. The weights analysis for the electric energy storage & conversion system and thermal management system is described. Finally, the potential system benefits are assessed.
On the suitability of the connection machine for direct particle simulation

NASA Technical Reports Server (NTRS)

Dagum, Leonard

1990-01-01

The algorithmic structure was examined of the vectorizable Stanford particle simulation (SPS) method and the structure is reformulated in data parallel form. Some of the SPS algorithms can be directly translated to data parallel, but several of the vectorizable algorithms have no direct data parallel equivalent. This requires the development of new, strictly data parallel algorithms. In particular, a new sorting algorithm is developed to identify collision candidates in the simulation and a master/slave algorithm is developed to minimize communication cost in large table look up. Validation of the method is undertaken through test calculations for thermal relaxation of a gas, shock wave profiles, and shock reflection from a stationary wall. A qualitative measure is provided of the performance of the Connection Machine for direct particle simulation. The massively parallel architecture of the Connection Machine is found quite suitable for this type of calculation. However, there are difficulties in taking full advantage of this architecture because of lack of a broad based tradition of data parallel programming. An important outcome of this work has been new data parallel algorithms specifically of use for direct particle simulation but which also expand the data parallel diction.
Flow of a Gas Turbine Engine Low-Pressure Subsystem Simulated

NASA Technical Reports Server (NTRS)

Veres, Joseph P.

1997-01-01

The NASA Lewis Research Center is managing a task to numerically simulate overnight, on a parallel computing testbed, the aerodynamic flow in the complete low-pressure subsystem (LPS) of a gas turbine engine. The model solves the three-dimensional Navier- Stokes flow equations through all the components within the LPS, as well as the external flow around the engine nacelle. The LPS modeling task is being performed by Allison Engine Company under the Small Engine Technology contract. The large computer simulation was evaluated on networked computer systems using 8, 16, and 32 processors, with the parallel computing efficiency reaching 75 percent when 16 processors were used.
Analysis and selection of optimal function implementations in massively parallel computer

DOEpatents

Archer, Charles Jens [Rochester, MN; Peters, Amanda [Rochester, MN; Ratterman, Joseph D [Rochester, MN

2011-05-31

An apparatus, program product and method optimize the operation of a parallel computer system by, in part, collecting performance data for a set of implementations of a function capable of being executed on the parallel computer system based upon the execution of the set of implementations under varying input parameters in a plurality of input dimensions. The collected performance data may be used to generate selection program code that is configured to call selected implementations of the function in response to a call to the function under varying input parameters. The collected performance data may be used to perform more detailed analysis to ascertain the comparative performance of the set of implementations of the function under the varying input parameters.
ORCA Project: Research on high-performance parallel computer programming environments. Final report, 1 Apr-31 Mar 90

DOE Office of Scientific and Technical Information (OSTI.GOV)

Snyder, L.; Notkin, D.; Adams, L.

1990-03-31

This task relates to research on programming massively parallel computers. Previous work on the Ensamble concept of programming was extended and investigation into nonshared memory models of parallel computation was undertaken. Previous work on the Ensamble concept defined a set of programming abstractions and was used to organize the programming task into three distinct levels; Composition of machine instruction, composition of processes, and composition of phases. It was applied to shared memory models of computations. During the present research period, these concepts were extended to nonshared memory models. During the present research period, one Ph D. thesis was completed, onemore » book chapter, and six conference proceedings were published.« less
a Spatiotemporal Aggregation Query Method Using Multi-Thread Parallel Technique Based on Regional Division

NASA Astrophysics Data System (ADS)

Liao, S.; Chen, L.; Li, J.; Xiong, W.; Wu, Q.

2015-07-01

Existing spatiotemporal database supports spatiotemporal aggregation query over massive moving objects datasets. Due to the large amounts of data and single-thread processing method, the query speed cannot meet the application requirements. On the other hand, the query efficiency is more sensitive to spatial variation then temporal variation. In this paper, we proposed a spatiotemporal aggregation query method using multi-thread parallel technique based on regional divison and implemented it on the server. Concretely, we divided the spatiotemporal domain into several spatiotemporal cubes, computed spatiotemporal aggregation on all cubes using the technique of multi-thread parallel processing, and then integrated the query results. By testing and analyzing on the real datasets, this method has improved the query speed significantly.
cellGPU: Massively parallel simulations of dynamic vertex models

NASA Astrophysics Data System (ADS)

Sussman, Daniel M.

2017-10-01

Vertex models represent confluent tissue by polygonal or polyhedral tilings of space, with the individual cells interacting via force laws that depend on both the geometry of the cells and the topology of the tessellation. This dependence on the connectivity of the cellular network introduces several complications to performing molecular-dynamics-like simulations of vertex models, and in particular makes parallelizing the simulations difficult. cellGPU addresses this difficulty and lays the foundation for massively parallelized, GPU-based simulations of these models. This article discusses its implementation for a pair of two-dimensional models, and compares the typical performance that can be expected between running cellGPU entirely on the CPU versus its performance when running on a range of commercial and server-grade graphics cards. By implementing the calculation of topological changes and forces on cells in a highly parallelizable fashion, cellGPU enables researchers to simulate time- and length-scales previously inaccessible via existing single-threaded CPU implementations. Program Files doi:http://dx.doi.org/10.17632/6j2cj29t3r.1 Licensing provisions: MIT Programming language: CUDA/C++ Nature of problem: Simulations of off-lattice "vertex models" of cells, in which the interaction forces depend on both the geometry and the topology of the cellular aggregate. Solution method: Highly parallelized GPU-accelerated dynamical simulations in which the force calculations and the topological features can be handled on either the CPU or GPU. Additional comments: The code is hosted at https://gitlab.com/dmsussman/cellGPU, with documentation additionally maintained at http://dmsussman.gitlab.io/cellGPUdocumentation
GPU COMPUTING FOR PARTICLE TRACKING

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nishimura, Hiroshi; Song, Kai; Muriki, Krishna

2011-03-25

This is a feasibility study of using a modern Graphics Processing Unit (GPU) to parallelize the accelerator particle tracking code. To demonstrate the massive parallelization features provided by GPU computing, a simplified TracyGPU program is developed for dynamic aperture calculation. Performances, issues, and challenges from introducing GPU are also discussed. General purpose Computation on Graphics Processing Units (GPGPU) bring massive parallel computing capabilities to numerical calculation. However, the unique architecture of GPU requires a comprehensive understanding of the hardware and programming model to be able to well optimize existing applications. In the field of accelerator physics, the dynamic aperture calculationmore » of a storage ring, which is often the most time consuming part of the accelerator modeling and simulation, can benefit from GPU due to its embarrassingly parallel feature, which fits well with the GPU programming model. In this paper, we use the Tesla C2050 GPU which consists of 14 multi-processois (MP) with 32 cores on each MP, therefore a total of 448 cores, to host thousands ot threads dynamically. Thread is a logical execution unit of the program on GPU. In the GPU programming model, threads are grouped into a collection of blocks Within each block, multiple threads share the same code, and up to 48 KB of shared memory. Multiple thread blocks form a grid, which is executed as a GPU kernel. A simplified code that is a subset of Tracy++ [2] is developed to demonstrate the possibility of using GPU to speed up the dynamic aperture calculation by having each thread track a particle.« less

Development and characterization of hollow microprobe array as a potential tool for versatile and massively parallel manipulation of single cells.

PubMed

Nagai, Moeto; Oohara, Kiyotaka; Kato, Keita; Kawashima, Takahiro; Shibata, Takayuki

2015-04-01

Parallel manipulation of single cells is important for reconstructing in vivo cellular microenvironments and studying cell functions. To manipulate single cells and reconstruct their environments, development of a versatile manipulation tool is necessary. In this study, we developed an array of hollow probes using microelectromechanical systems fabrication technology and demonstrated the manipulation of single cells. We conducted a cell aspiration experiment with a glass pipette and modeled a cell using a standard linear solid model, which provided information for designing hollow stepped probes for minimally invasive single-cell manipulation. We etched a silicon wafer on both sides and formed through holes with stepped structures. The inner diameters of the holes were reduced by SiO2 deposition of plasma-enhanced chemical vapor deposition to trap cells on the tips. This fabrication process makes it possible to control the wall thickness, inner diameter, and outer diameter of the probes. With the fabricated probes, single cells were manipulated and placed in microwells at a single-cell level in a parallel manner. We studied the capture, release, and survival rates of cells at different suction and release pressures and found that the cell trapping rate was directly proportional to the suction pressure, whereas the release rate and viability decreased with increasing the suction pressure. The proposed manipulation system makes it possible to place cells in a well array and observe the adherence, spreading, culture, and death of the cells. This system has potential as a tool for massively parallel manipulation and for three-dimensional hetero cellular assays.
GPU: the biggest key processor for AI and parallel processing

NASA Astrophysics Data System (ADS)

Baji, Toru

2017-07-01

Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.
Supersonic Transport Noise Reduction Technology Program - Phase 2. Volume 1

DTIC Science & Technology

1975-09-01

transport aircraft . In addition, PNL and EPNL con- tributions made by each major engine component ( jet , turbine , combustor and compressor) were... Turbine noise was studied using a J85 engine with massive Inlet suppressor and open nozzle to unmask the turbine . Second-stage turbine blade /nozzle...17. Kty Words (Suggnted by Author(tl) Jet Noise, High Velocity Suppression, Aircraft Engine Suppression, Turbomachlnery Noise, Hybrid Inlet
Massively parallel simulator of optical coherence tomography of inhomogeneous turbid media.

PubMed

Malektaji, Siavash; Lima, Ivan T; Escobar I, Mauricio R; Sherif, Sherif S

2017-10-01

An accurate and practical simulator for Optical Coherence Tomography (OCT) could be an important tool to study the underlying physical phenomena in OCT such as multiple light scattering. Recently, many researchers have investigated simulation of OCT of turbid media, e.g., tissue, using Monte Carlo methods. The main drawback of these earlier simulators is the long computational time required to produce accurate results. We developed a massively parallel simulator of OCT of inhomogeneous turbid media that obtains both Class I diffusive reflectivity, due to ballistic and quasi-ballistic scattered photons, and Class II diffusive reflectivity due to multiply scattered photons. This Monte Carlo-based simulator is implemented on graphic processing units (GPUs), using the Compute Unified Device Architecture (CUDA) platform and programming model, to exploit the parallel nature of propagation of photons in tissue. It models an arbitrary shaped sample medium as a tetrahedron-based mesh and uses an advanced importance sampling scheme. This new simulator speeds up simulations of OCT of inhomogeneous turbid media by about two orders of magnitude. To demonstrate this result, we have compared the computation times of our new parallel simulator and its serial counterpart using two samples of inhomogeneous turbid media. We have shown that our parallel implementation reduced simulation time of OCT of the first sample medium from 407 min to 92 min by using a single GPU card, to 12 min by using 8 GPU cards and to 7 min by using 16 GPU cards. For the second sample medium, the OCT simulation time was reduced from 209 h to 35.6 h by using a single GPU card, and to 4.65 h by using 8 GPU cards, and to only 2 h by using 16 GPU cards. Therefore our new parallel simulator is considerably more practical to use than its central processing unit (CPU)-based counterpart. Our new parallel OCT simulator could be a practical tool to study the different physical phenomena underlying OCT, or to design OCT systems with improved performance. Copyright © 2017 Elsevier B.V. All rights reserved.
Smart integrated microsystems: the energy efficiency challenge (Conference Presentation) (Plenary Presentation)

NASA Astrophysics Data System (ADS)

Benini, Luca

2017-06-01

The "internet of everything" envisions trillions of connected objects loaded with high-bandwidth sensors requiring massive amounts of local signal processing, fusion, pattern extraction and classification. From the computational viewpoint, the challenge is formidable and can be addressed only by pushing computing fabrics toward massive parallelism and brain-like energy efficiency levels. CMOS technology can still take us a long way toward this goal, but technology scaling is losing steam. Energy efficiency improvement will increasingly hinge on architecture, circuits, design techniques such as heterogeneous 3D integration, mixed-signal preprocessing, event-based approximate computing and non-Von-Neumann architectures for scalable acceleration.
THE ENGINES BEHIND SUPERNOVAE AND GAMMA-RAY BURSTS

DOE Office of Scientific and Technical Information (OSTI.GOV)

FRYER, CHRISTOPHER LEE

2007-01-23

The authors review the different engines behind supernova (SNe) and gamma-ray bursts (GRBs), focusing on those engines driving explosions in massive stars: core-collapse SNe and long-duration GRBs. Convection and rotation play important roles in the engines of both these explosions. They outline the basic physics and discuss the wide variety of ways scientists have proposed that this physics can affect the supernova explosion mechanism, concluding with a review of the current status in these fields.
Parallel VLSI architecture emulation and the organization of APSA/MPP

NASA Technical Reports Server (NTRS)

Odonnell, John T.

1987-01-01

The Applicative Programming System Architecture (APSA) combines an applicative language interpreter with a novel parallel computer architecture that is well suited for Very Large Scale Integration (VLSI) implementation. The Massively Parallel Processor (MPP) can simulate VLSI circuits by allocating one processing element in its square array to an area on a square VLSI chip. As long as there are not too many long data paths, the MPP can simulate a VLSI clock cycle very rapidly. The APSA circuit contains a binary tree with a few long paths and many short ones. A skewed H-tree layout allows every processing element to simulate a leaf cell and up to four tree nodes, with no loss in parallelism. Emulation of a key APSA algorithm on the MPP resulted in performance 16,000 times faster than a Vax. This speed will make it possible for the APSA language interpreter to run fast enough to support research in parallel list processing algorithms.
Patterning via optical saturable transitions

NASA Astrophysics Data System (ADS)

Cantu, Precious

For the past 40 years, optical lithography has been the patterning workhorse for the semiconductor industry. However, as integrated circuits have become more and more complex, and as device geometries shrink, more innovative methods are required to meet these needs. In the far-field, the smallest feature that can be generated with light is limited to approximately half the wavelength. This, so called far-field diffraction limit or the Abbe limit (after Prof. Ernst Abbe who first recognized this), effectively prevents the use of long-wavelength photons >300nm from patterning nanostructures <100nm. Even with a 193nm laser source and extremely complicated processing, patterns below ˜20nm are incredibly challenging to create. Sources with even shorter wavelengths can potentially be used. However, these tend be much more expensive and of much lower brightness, which in turn limits their patterning speed. Multi-photon reactions have been proposed to overcome the diffraction limit. However, these require very large intensities for modest gain in resolution. Moreover, the large intensities make it difficult to parallelize, thus limiting the patterning speed. In this dissertation, a novel nanopatterning technique using wavelength-selective small molecules that undergo single-photon reactions, enabling rapid top-down nanopatterning over large areas at low-light intensities, thereby allowing for the circumvention of the far-field diffraction barrier is developed and experimentally verified. This approach, which I refer to as Patterning via Optical Saturable Transitions (POST) has the potential for massive parallelism, enabling the creation of nanostructures and devices at a speed far surpassing what is currently possible with conventional optical lithographic techniques. The fundamental understanding of this technique goes beyond optical lithography in the semiconductor industry and is applicable to any area that requires the rapid patterning of large-area two or three-dimensional complex geometries. At a basic level, this research intertwines the fields of electrochemistry, material science, electrical engineering, optics, physics, and mechanical engineering with the goal of developing a novel super-resolution lithographic technique.
Dynamic file-access characteristics of a production parallel scientific workload

NASA Technical Reports Server (NTRS)

Kotz, David; Nieuwejaar, Nils

1994-01-01

Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to tremendous amounts of data in parallel to hundreds or thousands of processors. Most successful systems are based on a solid understanding of the expected workload, but thus far there have been no comprehensive workload characterizations of multiprocessor file systems. This paper presents the results of a three week tracing study in which all file-related activity on a massively parallel computer was recorded. Our instrumentation differs from previous efforts in that it collects information about every I/O request and about the mix of jobs running in a production environment. We also present the results of a trace-driven caching simulation and recommendations for designers of multiprocessor file systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Reed, D.A.; Grunwald, D.C.

The spectrum of parallel processor designs can be divided into three sections according to the number and complexity of the processors. At one end there are simple, bit-serial processors. Any one of thee processors is of little value, but when it is coupled with many others, the aggregate computing power can be large. This approach to parallel processing can be likened to a colony of termites devouring a log. The most notable examples of this approach are the NASA/Goodyear Massively Parallel Processor, which has 16K one-bit processors, and the Thinking Machines Connection Machine, which has 64K one-bit processors. At themore » other end of the spectrum, a small number of processors, each built using the fastest available technology and the most sophisticated architecture, are combined. An example of this approach is the Cray X-MP. This type of parallel processing is akin to four woodmen attacking the log with chainsaws.« less
3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies

PubMed Central

Cuomo, Salvatore; De Michele, Pasquale; Piccialli, Francesco

2014-01-01

Nonlocal Means (NLM) algorithm is widely considered as a state-of-the-art denoising filter in many research fields. Its high computational complexity leads researchers to the development of parallel programming approaches and the use of massively parallel architectures such as the GPUs. In the recent years, the GPU devices had led to achieving reasonable running times by filtering, slice-by-slice, and 3D datasets with a 2D NLM algorithm. In our approach we design and implement a fully 3D NonLocal Means parallel approach, adopting different algorithm mapping strategies on GPU architecture and multi-GPU framework, in order to demonstrate its high applicability and scalability. The experimental results we obtained encourage the usability of our approach in a large spectrum of applicative scenarios such as magnetic resonance imaging (MRI) or video sequence denoising. PMID:25045397
Three-Dimensional High-Lift Analysis Using a Parallel Unstructured Multigrid Solver

NASA Technical Reports Server (NTRS)

Mavriplis, Dimitri J.

1998-01-01

A directional implicit unstructured agglomeration multigrid solver is ported to shared and distributed memory massively parallel machines using the explicit domain-decomposition and message-passing approach. Because the algorithm operates on local implicit lines in the unstructured mesh, special care is required in partitioning the problem for parallel computing. A weighted partitioning strategy is described which avoids breaking the implicit lines across processor boundaries, while incurring minimal additional communication overhead. Good scalability is demonstrated on a 128 processor SGI Origin 2000 machine and on a 512 processor CRAY T3E machine for reasonably fine grids. The feasibility of performing large-scale unstructured grid calculations with the parallel multigrid algorithm is demonstrated by computing the flow over a partial-span flap wing high-lift geometry on a highly resolved grid of 13.5 million points in approximately 4 hours of wall clock time on the CRAY T3E.
High throughput optical lithography by scanning a massive array of bowtie aperture antennas at near-field

PubMed Central

Wen, X.; Datta, A.; Traverso, L. M.; Pan, L.; Xu, X.; Moon, E. E.

2015-01-01

Optical lithography, the enabling process for defining features, has been widely used in semiconductor industry and many other nanotechnology applications. Advances of nanotechnology require developments of high-throughput optical lithography capabilities to overcome the optical diffraction limit and meet the ever-decreasing device dimensions. We report our recent experimental advancements to scale up diffraction unlimited optical lithography in a massive scale using the near field nanolithography capabilities of bowtie apertures. A record number of near-field optical elements, an array of 1,024 bowtie antenna apertures, are simultaneously employed to generate a large number of patterns by carefully controlling their working distances over the entire array using an optical gap metrology system. Our experimental results reiterated the ability of using massively-parallel near-field devices to achieve high-throughput optical nanolithography, which can be promising for many important nanotechnology applications such as computation, data storage, communication, and energy. PMID:26525906
Massive problem reports mining and analysis based parallelism for similar search

NASA Astrophysics Data System (ADS)

Zhou, Ya; Hu, Cailin; Xiong, Han; Wei, Xiafei; Li, Ling

2017-05-01

Massive problem reports and solutions accumulated over time and continuously collected in XML Spreadsheet (XMLSS) format from enterprises and organizations, which record a series of comprehensive description about problems that can help technicians to trace problems and their solutions. It's a significant and challenging issue to effectively manage and analyze these massive semi-structured data to provide similar problem solutions, decisions of immediate problem and assisting product optimization for users during hardware and software maintenance. For this purpose, we build a data management system to manage, mine and analyze these data search results that can be categorized and organized into several categories for users to quickly find out where their interesting results locate. Experiment results demonstrate that this system is better than traditional centralized management system on the performance and the adaptive capability of heterogeneous data greatly. Besides, because of re-extracting topics, it enables each cluster to be described more precise and reasonable.
Advanced propulsion system concept for hybrid vehicles

NASA Technical Reports Server (NTRS)

Bhate, S.; Chen, H.; Dochat, G.

1980-01-01

A series hybrid system, utilizing a free piston Stirling engine with a linear alternator, and a parallel hybrid system, incorporating a kinematic Stirling engine, are analyzed for various specified reference missions/vehicles ranging from a small two passenger commuter vehicle to a van. Parametric studies for each configuration, detail tradeoff studies to determine engine, battery and system definition, short term energy storage evaluation, and detail life cycle cost studies were performed. Results indicate that the selection of a parallel Stirling engine/electric, hybrid propulsion system can significantly reduce petroleum consumption by 70 percent over present conventional vehicles.
A Cyber-ITS Framework for Massive Traffic Data Analysis Using Cyber Infrastructure

PubMed Central

Fontaine, Michael D.

2013-01-01

Traffic data is commonly collected from widely deployed sensors in urban areas. This brings up a new research topic, data-driven intelligent transportation systems (ITSs), which means to integrate heterogeneous traffic data from different kinds of sensors and apply it for ITS applications. This research, taking into consideration the significant increase in the amount of traffic data and the complexity of data analysis, focuses mainly on the challenge of solving data-intensive and computation-intensive problems. As a solution to the problems, this paper proposes a Cyber-ITS framework to perform data analysis on Cyber Infrastructure (CI), by nature parallel-computing hardware and software systems, in the context of ITS. The techniques of the framework include data representation, domain decomposition, resource allocation, and parallel processing. All these techniques are based on data-driven and application-oriented models and are organized as a component-and-workflow-based model in order to achieve technical interoperability and data reusability. A case study of the Cyber-ITS framework is presented later based on a traffic state estimation application that uses the fusion of massive Sydney Coordinated Adaptive Traffic System (SCATS) data and GPS data. The results prove that the Cyber-ITS-based implementation can achieve a high accuracy rate of traffic state estimation and provide a significant computational speedup for the data fusion by parallel computing. PMID:23766690
A massively parallel strategy for STR marker development, capture, and genotyping.

PubMed

Kistler, Logan; Johnson, Stephen M; Irwin, Mitchell T; Louis, Edward E; Ratan, Aakrosh; Perry, George H

2017-09-06

Short tandem repeat (STR) variants are highly polymorphic markers that facilitate powerful population genetic analyses. STRs are especially valuable in conservation and ecological genetic research, yielding detailed information on population structure and short-term demographic fluctuations. Massively parallel sequencing has not previously been leveraged for scalable, efficient STR recovery. Here, we present a pipeline for developing STR markers directly from high-throughput shotgun sequencing data without a reference genome, and an approach for highly parallel target STR recovery. We employed our approach to capture a panel of 5000 STRs from a test group of diademed sifakas (Propithecus diadema, n = 3), endangered Malagasy rainforest lemurs, and we report extremely efficient recovery of targeted loci-97.3-99.6% of STRs characterized with ≥10x non-redundant sequence coverage. We then tested our STR capture strategy on P. diadema fecal DNA, and report robust initial results and suggestions for future implementations. In addition to STR targets, this approach also generates large, genome-wide single nucleotide polymorphism (SNP) panels from flanking regions. Our method provides a cost-effective and scalable solution for rapid recovery of large STR and SNP datasets in any species without needing a reference genome, and can be used even with suboptimal DNA more easily acquired in conservation and ecological studies. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.
A Cyber-ITS framework for massive traffic data analysis using cyber infrastructure.

PubMed

Xia, Yingjie; Hu, Jia; Fontaine, Michael D

2013-01-01

Traffic data is commonly collected from widely deployed sensors in urban areas. This brings up a new research topic, data-driven intelligent transportation systems (ITSs), which means to integrate heterogeneous traffic data from different kinds of sensors and apply it for ITS applications. This research, taking into consideration the significant increase in the amount of traffic data and the complexity of data analysis, focuses mainly on the challenge of solving data-intensive and computation-intensive problems. As a solution to the problems, this paper proposes a Cyber-ITS framework to perform data analysis on Cyber Infrastructure (CI), by nature parallel-computing hardware and software systems, in the context of ITS. The techniques of the framework include data representation, domain decomposition, resource allocation, and parallel processing. All these techniques are based on data-driven and application-oriented models and are organized as a component-and-workflow-based model in order to achieve technical interoperability and data reusability. A case study of the Cyber-ITS framework is presented later based on a traffic state estimation application that uses the fusion of massive Sydney Coordinated Adaptive Traffic System (SCATS) data and GPS data. The results prove that the Cyber-ITS-based implementation can achieve a high accuracy rate of traffic state estimation and provide a significant computational speedup for the data fusion by parallel computing.
Development of a parallel FE simulator for modeling the whole trans-scale failure process of rock from meso- to engineering-scale

NASA Astrophysics Data System (ADS)

Li, Gen; Tang, Chun-An; Liang, Zheng-Zhao

2017-01-01

Multi-scale high-resolution modeling of rock failure process is a powerful means in modern rock mechanics studies to reveal the complex failure mechanism and to evaluate engineering risks. However, multi-scale continuous modeling of rock, from deformation, damage to failure, has raised high requirements on the design, implementation scheme and computation capacity of the numerical software system. This study is aimed at developing the parallel finite element procedure, a parallel rock failure process analysis (RFPA) simulator that is capable of modeling the whole trans-scale failure process of rock. Based on the statistical meso-damage mechanical method, the RFPA simulator is able to construct heterogeneous rock models with multiple mechanical properties, deal with and represent the trans-scale propagation of cracks, in which the stress and strain fields are solved for the damage evolution analysis of representative volume element by the parallel finite element method (FEM) solver. This paper describes the theoretical basis of the approach and provides the details of the parallel implementation on a Windows - Linux interactive platform. A numerical model is built to test the parallel performance of FEM solver. Numerical simulations are then carried out on a laboratory-scale uniaxial compression test, and field-scale net fracture spacing and engineering-scale rock slope examples, respectively. The simulation results indicate that relatively high speedup and computation efficiency can be achieved by the parallel FEM solver with a reasonable boot process. In laboratory-scale simulation, the well-known physical phenomena, such as the macroscopic fracture pattern and stress-strain responses, can be reproduced. In field-scale simulation, the formation process of net fracture spacing from initiation, propagation to saturation can be revealed completely. In engineering-scale simulation, the whole progressive failure process of the rock slope can be well modeled. It is shown that the parallel FE simulator developed in this study is an efficient tool for modeling the whole trans-scale failure process of rock from meso- to engineering-scale.
The Cognitive Architecture for Chaining of Two Mental Operations

ERIC Educational Resources Information Center

Sackur, Jerome; Dehaene, Stanislas

2009-01-01

A simple view, which dates back to Turing, proposes that complex cognitive operations are composed of serially arranged elementary operations, each passing intermediate results to the next. However, whether and how such serial processing is achieved with a brain composed of massively parallel processors, remains an open question. Here, we study…

Meeting the challenges of non-referenced genome assembly from short-read sequence data

Treesearch

M. Parks; A. Liston; R. Cronn

2010-01-01

Massively parallel sequencing technologies (MPST) offer unprecedented opportunities for novel sequencing projects. MPST, while offering tremendous sequencing capacity, are typically most effective in resequencing projects (as opposed to the sequencing of novel genomes) due to the fact that sequence is returned in relatively short reads. Nonetheless, there is great...
Sleep EEG Changes during Adolescence: An Index of a Fundamental Brain Reorganization

ERIC Educational Resources Information Center

Feinberg, Irwin; Campbell, Ian G.

2010-01-01

Delta (1-4 Hz) EEG power in non-rapid eye movement (NREM) sleep declines massively during adolescence. This observation stimulated the hypothesis that during adolescence the human brain undergoes an extensive reorganization driven by synaptic elimination. The parallel declines in synaptic density, delta wave amplitude and cortical metabolic rate…
GRIZZLY

DOE Office of Scientific and Technical Information (OSTI.GOV)

2012-12-17

Grizzly is a simulation tool for assessing the effects of age-related degradation on systems, structures, and components of nuclear power plants. Grizzly is built on the MOOSE framework, and uses a Jacobian-free Newton Krylov method to obtain solutions to tightly coupled thermo-mechanical simulations. Grizzly runs on a wide range of hardware, from a single processor to massively parallel machines.
Recovery of Peripheral Nerve with Massive Loss Defect by Tissue Engineered Guiding Regenerative Gel

PubMed Central

Nevo, Zvi

2014-01-01

Objective. Guiding Regeneration Gel (GRG) was developed in response to the clinical need of improving treatment for peripheral nerve injuries and helping patients regenerate massive regional losses in peripheral nerves. The efficacy of GRG based on tissue engineering technology for the treatment of complete peripheral nerve injury with significant loss defect was investigated. Background. Many severe peripheral nerve injuries can only be treated through surgical reconstructive procedures. Such procedures are challenging, since functional recovery is slow and can be unsatisfactory. One of the most promising solutions already in clinical practice is synthetic nerve conduits connecting the ends of damaged nerve supporting nerve regeneration. However, this solution still does not enable recovery of massive nerve loss defect. The proposed technology is a biocompatible and biodegradable gel enhancing axonal growth and nerve regeneration. It is composed of a complex of substances comprising transparent, highly viscous gel resembling the extracellular matrix that is almost impermeable to liquids and gasses, flexible, elastic, malleable, and adaptable to various shapes and formats. Preclinical study on rat model of peripheral nerve injury showed that GRG enhanced nerve regeneration when placed in nerve conduits, enabling recovery of massive nerve loss, previously unbridgeable, and enabled nerve regeneration at least as good as with autologous nerve graft “gold standard” treatment. PMID:25105121
Tiger LDRD final report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Steich, D J; Brugger, S T; Kallman, J S

2000-02-01

This final report describes our efforts on the Three-Dimensional Massively Parallel CEM Technologies LDRD project (97-ERD-009). Significant need exists for more advanced time domain computational electromagnetics modeling. Bookkeeping details and modifying inflexible software constitute a vast majority of the effort required to address such needs. The required effort escalates rapidly as problem complexity increases. For example, hybrid meshes requiring hybrid numerics on massively parallel platforms (MPPs). This project attempts to alleviate the above limitations by investigating flexible abstractions for these numerical algorithms on MPPs using object-oriented methods, providing a programming environment insulating physics from bookkeeping. The three major design iterationsmore » during the project, known as TIGER-I to TIGER-III, are discussed. Each version of TIGER is briefly discussed along with lessons learned during the development and implementation. An Application Programming Interface (API) of the object-oriented interface for Tiger-III is included in three appendices. The three appendices contain the Utilities, Entity-Attribute, and Mesh libraries developed during the project. The API libraries represent a snapshot of our latest attempt at insulated the physics from the bookkeeping.« less
Familial retinoblastoma due to intronic LINE-1 insertion causes aberrant and noncanonical mRNA splicing of the RB1 gene.

PubMed

Rodríguez-Martín, Carlos; Cidre, Florencia; Fernández-Teijeiro, Ana; Gómez-Mariano, Gema; de la Vega, Leticia; Ramos, Patricia; Zaballos, Ángel; Monzón, Sara; Alonso, Javier

2016-05-01

Retinoblastoma (RB, MIM 180200) is the paradigm of hereditary cancer. Individuals harboring a constitutional mutation in one allele of the RB1 gene have a high predisposition to develop RB. Here, we present the first case of familial RB caused by a de novo insertion of a full-length long interspersed element-1 (LINE-1) into intron 14 of the RB1 gene that caused a highly heterogeneous splicing pattern of RB1 mRNA. LINE-1 insertion was inferred by mRNA studies and full-length sequenced by massive parallel sequencing. Some of the aberrant mRNAs were produced by noncanonical acceptor splice sites, a new finding that up to date has not been described to occur upon LINE-1 retrotransposition. Our results clearly show that RNA-based strategies have the potential to detect disease-causing transposon insertions. It also confirms that the incorporation of new genetic approaches, such as massive parallel sequencing, contributes to characterize at the sequence level these unique and exceptional genetic alterations.
Simultaneous mutation and copy number variation (CNV) detection by multiplex PCR-based GS-FLX sequencing.

PubMed

Goossens, Dirk; Moens, Lotte N; Nelis, Eva; Lenaerts, An-Sofie; Glassee, Wim; Kalbe, Andreas; Frey, Bruno; Kopal, Guido; De Jonghe, Peter; De Rijk, Peter; Del-Favero, Jurgen

2009-03-01

We evaluated multiplex PCR amplification as a front-end for high-throughput sequencing, to widen the applicability of massive parallel sequencers for the detailed analysis of complex genomes. Using multiplex PCR reactions, we sequenced the complete coding regions of seven genes implicated in peripheral neuropathies in 40 individuals on a GS-FLX genome sequencer (Roche). The resulting dataset showed highly specific and uniform amplification. Comparison of the GS-FLX sequencing data with the dataset generated by Sanger sequencing confirmed the detection of all variants present and proved the sensitivity of the method for mutation detection. In addition, we showed that we could exploit the multiplexed PCR amplicons to determine individual copy number variation (CNV), increasing the spectrum of detected variations to both genetic and genomic variants. We conclude that our straightforward procedure substantially expands the applicability of the massive parallel sequencers for sequencing projects of a moderate number of amplicons (50-500) with typical applications in resequencing exons in positional or functional candidate regions and molecular genetic diagnostics. 2008 Wiley-Liss, Inc.
Spectral Calculation of ICRF Wave Propagation and Heating in 2-D Using Massively Parallel Computers

NASA Astrophysics Data System (ADS)

Jaeger, E. F.; D'Azevedo, E.; Berry, L. A.; Carter, M. D.; Batchelor, D. B.

2000-10-01

Spectral calculations of ICRF wave propagation in plasmas have the natural advantage that they require no assumption regarding the smallness of the ion Larmor radius ρ relative to wavelength λ. Results are therefore applicable to all orders in k_bot ρ where k_bot = 2π/λ. But because all modes in the spectral representation are coupled, the solution requires inversion of a large dense matrix. In contrast, finite difference algorithms involve only matrices that are sparse and banded. Thus, spectral calculations of wave propagation and heating in tokamak plasmas have so far been limited to 1-D. In this paper, we extend the spectral method to 2-D by taking advantage of new matrix inversion techniques that utilize massively parallel computers. By spreading the dense matrix over 576 processors on the ORNL IBM RS/6000 SP supercomputer, we are able to solve up to 120,000 coupled complex equations requiring 230 GBytes of memory and achieving over 500 Gflops/sec. Initial results for ASDEX and NSTX will be presented using up to 200 modes in both the radial and vertical dimensions.
IFRD1 Is a Candidate Gene for SMNA on Chromosome 7q22-q23

PubMed Central

Brkanac, Zoran; Spencer, David; Shendure, Jay; Robertson, Peggy D.; Matsushita, Mark; Vu, Tiffany; Bird, Thomas D.; Olson, Maynard V.; Raskind, Wendy H.

2009-01-01

We have established strong linkage evidence that supports mapping autosomal-dominant sensory/motor neuropathy with ataxia (SMNA) to chromosome 7q22-q32. SMNA is a rare neurological disorder whose phenotype encompasses both the central and the peripheral nervous system. In order to identify a gene responsible for SMNA, we have undertaken a comprehensive genomic evaluation of the region of linkage, including evaluation for repeat expansion and small deletions or duplications, capillary sequencing of candidate genes, and massively parallel sequencing of all coding exons. We excluded repeat expansion and small deletions or duplications as causative, and through microarray-based hybrid capture and massively parallel short-read sequencing, we identified a nonsynonymous variant in the human interferon-related developmental regulator gene 1 (IFRD1) as a disease-causing candidate. Sequence conservation, animal models, and protein structure evaluation support the involvement of IFRD1 in SMNA. Mutation analysis of IFRD1 in additional patients with similar phenotypes is needed for demonstration of causality and further evaluation of its importance in neurological diseases. PMID:19409521
Performance Evaluation in Network-Based Parallel Computing

NASA Technical Reports Server (NTRS)

Dezhgosha, Kamyar

1996-01-01

Network-based parallel computing is emerging as a cost-effective alternative for solving many problems which require use of supercomputers or massively parallel computers. The primary objective of this project has been to conduct experimental research on performance evaluation for clustered parallel computing. First, a testbed was established by augmenting our existing SUNSPARCs' network with PVM (Parallel Virtual Machine) which is a software system for linking clusters of machines. Second, a set of three basic applications were selected. The applications consist of a parallel search, a parallel sort, a parallel matrix multiplication. These application programs were implemented in C programming language under PVM. Third, we conducted performance evaluation under various configurations and problem sizes. Alternative parallel computing models and workload allocations for application programs were explored. The performance metric was limited to elapsed time or response time which in the context of parallel computing can be expressed in terms of speedup. The results reveal that the overhead of communication latency between processes in many cases is the restricting factor to performance. That is, coarse-grain parallelism which requires less frequent communication between processes will result in higher performance in network-based computing. Finally, we are in the final stages of installing an Asynchronous Transfer Mode (ATM) switch and four ATM interfaces (each 155 Mbps) which will allow us to extend our study to newer applications, performance metrics, and configurations.
Delivering better power: the role of simulation in reducing the environmental impact of aircraft engines.

PubMed

Menzies, Kevin

2014-08-13

The growth in simulation capability over the past 20 years has led to remarkable changes in the design process for gas turbines. The availability of relatively cheap computational power coupled to improvements in numerical methods and physical modelling in simulation codes have enabled the development of aircraft propulsion systems that are more powerful and yet more efficient than ever before. However, the design challenges are correspondingly greater, especially to reduce environmental impact. The simulation requirements to achieve a reduced environmental impact are described along with the implications of continued growth in available computational power. It is concluded that achieving the environmental goals will demand large-scale multi-disciplinary simulations requiring significantly increased computational power, to enable optimization of the airframe and propulsion system over the entire operational envelope. However even with massive parallelization, the limits imposed by communications latency will constrain the time required to achieve a solution, and therefore the position of such large-scale calculations in the industrial design process. © 2014 The Author(s) Published by the Royal Society. All rights reserved.
Notes on the KIVA-2 software and chemically reactive fluid mechanics

NASA Astrophysics Data System (ADS)

Holst, M. J.

1992-09-01

Working notes regarding the mechanics of chemically reactive fluids with sprays, and their numerical simulation with the KIVA-2 software are presented. KIVA-2 is a large FORTRAN program developed at Los Alamos National Laboratory for internal combustion engine simulation. It is our hope that these notes summarize some of the necessary background material in fluid mechanics and combustion, explain the numerical methods currently used in KIVA-2 and similar combustion codes, and provide an outline of the overall structure of KIVA-2 as a representative combustion program, in order to aid the researcher in the task of implementing KIVA-2 or a similar combustion code on a massively parallel computer. The notes are organized into three parts as follows. In Part 1, a brief introduction to continuum mechanics, to fluid mechanics, and to the mechanics of chemically reactive fluids with sprays is presented. In Part 2, a close look at the governing equations of KIVA-2 is taken, and the methods employed in the numerical solution of these equations is discussed. Some conclusions are drawn and some observations are made in Part 3.
Linking Chaotic Advection with Subsurface Biogeochemical Processes

NASA Astrophysics Data System (ADS)

Mays, D. C.; Freedman, V. L.; White, S. K.; Fang, Y.; Neupauer, R.

2017-12-01

This work investigates the extent to which groundwater flow kinematics drive subsurface biogeochemical processes. In terms of groundwater flow kinematics, we consider chaotic advection, whose essential ingredient is stretching and folding of plumes. Chaotic advection is appealing within the context of groundwater remediation because it has been shown to optimize plume spreading in the laminar flows characteristic of aquifers. In terms of subsurface biogeochemical processes, we consider an existing model for microbially-mediated reduction of relatively mobile uranium(VI) to relatively immobile uranium(IV) following injection of acetate into a floodplain aquifer beneath a former uranium mill in Rifle, Colorado. This model has been implemented in the reactive transport code eSTOMP, the massively parallel version of STOMP (Subsurface Transport Over Multiple Phases). This presentation will report preliminary numerical simulations in which the hydraulic boundary conditions in the eSTOMP model are manipulated to simulate chaotic advection resulting from engineered injection and extraction of water through a manifold of wells surrounding the plume of injected acetate. This approach provides an avenue to simulate the impact of chaotic advection within the existing framework of the eSTOMP code.
MCBooster: a library for fast Monte Carlo generation of phase-space decays on massively parallel platforms.

NASA Astrophysics Data System (ADS)

Alves Júnior, A. A.; Sokoloff, M. D.

2017-10-01

MCBooster is a header-only, C++11-compliant library that provides routines to generate and perform calculations on large samples of phase space Monte Carlo events. To achieve superior performance, MCBooster is capable to perform most of its calculations in parallel using CUDA- and OpenMP-enabled devices. MCBooster is built on top of the Thrust library and runs on Linux systems. This contribution summarizes the main features of MCBooster. A basic description of the user interface and some examples of applications are provided, along with measurements of performance in a variety of environments
In-memory integration of existing software components for parallel adaptive unstructured mesh workflows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, Cameron W.; Granzow, Brian; Diamond, Gerrett

Unstructured mesh methods, like finite elements and finite volumes, support the effective analysis of complex physical behaviors modeled by partial differential equations over general threedimensional domains. The most reliable and efficient methods apply adaptive procedures with a-posteriori error estimators that indicate where and how the mesh is to be modified. Although adaptive meshes can have two to three orders of magnitude fewer elements than a more uniform mesh for the same level of accuracy, there are many complex simulations where the meshes required are so large that they can only be solved on massively parallel systems.
In-memory integration of existing software components for parallel adaptive unstructured mesh workflows

DOE PAGES

Smith, Cameron W.; Granzow, Brian; Diamond, Gerrett; ...

2017-01-01

Unstructured mesh methods, like finite elements and finite volumes, support the effective analysis of complex physical behaviors modeled by partial differential equations over general threedimensional domains. The most reliable and efficient methods apply adaptive procedures with a-posteriori error estimators that indicate where and how the mesh is to be modified. Although adaptive meshes can have two to three orders of magnitude fewer elements than a more uniform mesh for the same level of accuracy, there are many complex simulations where the meshes required are so large that they can only be solved on massively parallel systems.
Towards massively parallelized all-optical magnetic recording

NASA Astrophysics Data System (ADS)

Davies, C. S.; Janušonis, J.; Kimel, A. V.; Kirilyuk, A.; Tsukamoto, A.; Rasing, Th.; Tobey, R. I.

2018-06-01

We demonstrate an approach to parallel all-optical writing of magnetic domains using spatial and temporal interference of two ultrashort light pulses. We explore how the fluence and grating periodicity of the optical transient grating influence the size and uniformity of the written bits. Using a total incident optical energy of 3.5 μJ, we demonstrate the capability of simultaneously writing 102 spatially separated bits, each featuring a relevant lateral width of ˜1 μm. We discuss viable routes to extend this technique to write individually addressable, sub-diffraction-limited magnetic domains in a wide range of materials.
Parallelized seeded region growing using CUDA.

PubMed

Park, Seongjin; Lee, Jeongjin; Lee, Hyunna; Shin, Juneseuk; Seo, Jinwook; Lee, Kyoung Ho; Shin, Yeong-Gil; Kim, Bohyoung

2014-01-01

This paper presents a novel method for parallelizing the seeded region growing (SRG) algorithm using Compute Unified Device Architecture (CUDA) technology, with intention to overcome the theoretical weakness of SRG algorithm of its computation time being directly proportional to the size of a segmented region. The segmentation performance of the proposed CUDA-based SRG is compared with SRG implementations on single-core CPUs, quad-core CPUs, and shader language programming, using synthetic datasets and 20 body CT scans. Based on the experimental results, the CUDA-based SRG outperforms the other three implementations, advocating that it can substantially assist the segmentation during massive CT screening tests.
Optical Interconnection Via Computer-Generated Holograms

NASA Technical Reports Server (NTRS)

Liu, Hua-Kuang; Zhou, Shaomin

1995-01-01

Method of free-space optical interconnection developed for data-processing applications like parallel optical computing, neural-network computing, and switching in optical communication networks. In method, multiple optical connections between multiple sources of light in one array and multiple photodetectors in another array made via computer-generated holograms in electrically addressed spatial light modulators (ESLMs). Offers potential advantages of massive parallelism, high space-bandwidth product, high time-bandwidth product, low power consumption, low cross talk, and low time skew. Also offers advantage of programmability with flexibility of reconfiguration, including variation of strengths of optical connections in real time.
The CP-PACS parallel computer

NASA Astrophysics Data System (ADS)

Ukawa, Akira

1998-05-01

The CP-PACS computer is a massively parallel computer consisting of 2048 processing units and having a peak speed of 614 GFLOPS and 128 GByte of main memory. It was developed over the four years from 1992 to 1996 at the Center for Computational Physics, University of Tsukuba, for large-scale numerical simulations in computational physics, especially those of lattice QCD. The CP-PACS computer has been in full operation for physics computations since October 1996. In this article we describe the chronology of the development, the hardware and software characteristics of the computer, and its performance for lattice QCD simulations.

Advanced miniature processing handware for ATR applications

NASA Technical Reports Server (NTRS)

Chao, Tien-Hsin (Inventor); Daud, Taher (Inventor); Thakoor, Anikumar (Inventor)

2003-01-01

A Hybrid Optoelectronic Neural Object Recognition System (HONORS), is disclosed, comprising two major building blocks: (1) an advanced grayscale optical correlator (OC) and (2) a massively parallel three-dimensional neural-processor. The optical correlator, with its inherent advantages in parallel processing and shift invariance, is used for target of interest (TOI) detection and segmentation. The three-dimensional neural-processor, with its robust neural learning capability, is used for target classification and identification. The hybrid optoelectronic neural object recognition system, with its powerful combination of optical processing and neural networks, enables real-time, large frame, automatic target recognition (ATR).
Vice President Pence Visits SLS Engineering Test Facility

NASA Image and Video Library

2017-09-25

The Vice President toured the SLS engineering facility where the engine section of the rocket’s massive core stage is undergoing a major stress test. The rocket’s four RS-25 engines and the two solid rocket boosters that attach to the SLS engine section will produce more than 8 million pounds of thrust to launch the Orion spacecraft beyond low-Earth orbit. More than 3,000 measurements using sensors installed on the test section will help ensure the core stage for all SLS missions can withstand the extreme forces of flight.
Done in 60 seconds- See a Massive Rocket Fuel Tank Built in A Minute

NASA Image and Video Library

2016-08-18

The 7.5-minute test conducted at NASA’s Stennis Space Center is part of a series of tests designed to put the upgraded former space shuttle engines through the rigorous temperature and pressure conditions they will experience during a launch. The tests also support the development of a new controller, or “brain,” for the engine, which monitors engine status and communicates between the rocket and the engine, relaying commands to the engine and transmitting data back to the rocket.
Parallel Algorithms for Groebner-Basis Reduction

DTIC Science & Technology

1987-09-25

22209 ELEMENT NO. NO. NO. ACCESSION NO. 11. TITLE (Include Security Classification) * PARALLEL ALGORITHMS FOR GROEBNER -BASIS REDUCTION 12. PERSONAL...All other editions are obsolete. Productivity Engineering in the UNIXt Environment p Parallel Algorithms for Groebner -Basis Reduction Technical Report
Scalable Parallel Density-based Clustering and Applications

NASA Astrophysics Data System (ADS)

Patwary, Mostofa Ali

2014-04-01

Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
On the Accuracy and Parallelism of GPGPU-Powered Incremental Clustering Algorithms.

PubMed

Chen, Chunlei; He, Li; Zhang, Huixiang; Zheng, Hao; Wang, Lei

2017-01-01

Incremental clustering algorithms play a vital role in various applications such as massive data analysis and real-time data processing. Typical application scenarios of incremental clustering raise high demand on computing power of the hardware platform. Parallel computing is a common solution to meet this demand. Moreover, General Purpose Graphic Processing Unit (GPGPU) is a promising parallel computing device. Nevertheless, the incremental clustering algorithm is facing a dilemma between clustering accuracy and parallelism when they are powered by GPGPU. We formally analyzed the cause of this dilemma. First, we formalized concepts relevant to incremental clustering like evolving granularity. Second, we formally proved two theorems. The first theorem proves the relation between clustering accuracy and evolving granularity. Additionally, this theorem analyzes the upper and lower bounds of different-to-same mis-affiliation. Fewer occurrences of such mis-affiliation mean higher accuracy. The second theorem reveals the relation between parallelism and evolving granularity. Smaller work-depth means superior parallelism. Through the proofs, we conclude that accuracy of an incremental clustering algorithm is negatively related to evolving granularity while parallelism is positively related to the granularity. Thus the contradictory relations cause the dilemma. Finally, we validated the relations through a demo algorithm. Experiment results verified theoretical conclusions.
Massive parallelization of a 3D finite difference electromagnetic forward solution using domain decomposition methods on multiple CUDA enabled GPUs

NASA Astrophysics Data System (ADS)

Schultz, A.

2010-12-01

3D forward solvers lie at the core of inverse formulations used to image the variation of electrical conductivity within the Earth's interior. This property is associated with variations in temperature, composition, phase, presence of volatiles, and in specific settings, the presence of groundwater, geothermal resources, oil/gas or minerals. The high cost of 3D solutions has been a stumbling block to wider adoption of 3D methods. Parallel algorithms for modeling frequency domain 3D EM problems have not achieved wide scale adoption, with emphasis on fairly coarse grained parallelism using MPI and similar approaches. The communications bandwidth as well as the latency required to send and receive network communication packets is a limiting factor in implementing fine grained parallel strategies, inhibiting wide adoption of these algorithms. Leading Graphics Processor Unit (GPU) companies now produce GPUs with hundreds of GPU processor cores per die. The footprint, in silicon, of the GPU's restricted instruction set is much smaller than the general purpose instruction set required of a CPU. Consequently, the density of processor cores on a GPU can be much greater than on a CPU. GPUs also have local memory, registers and high speed communication with host CPUs, usually through PCIe type interconnects. The extremely low cost and high computational power of GPUs provides the EM geophysics community with an opportunity to achieve fine grained (i.e. massive) parallelization of codes on low cost hardware. The current generation of GPUs (e.g. NVidia Fermi) provides 3 billion transistors per chip die, with nearly 500 processor cores and up to 6 GB of fast (DDR5) GPU memory. This latest generation of GPU supports fast hardware double precision (64 bit) floating point operations of the type required for frequency domain EM forward solutions. Each Fermi GPU board can sustain nearly 1 TFLOP in double precision, and multiple boards can be installed in the host computer system. We describe our ongoing efforts to achieve massive parallelization on a novel hybrid GPU testbed machine currently configured with 12 Intel Westmere Xeon CPU cores (or 24 parallel computational threads) with 96 GB DDR3 system memory, 4 GPU subsystems which in aggregate contain 960 NVidia Tesla GPU cores with 16 GB dedicated DDR3 GPU memory, and a second interleved bank of 4 GPU subsystems containing in aggregate 1792 NVidia Fermi GPU cores with 12 GB dedicated DDR5 GPU memory. We are applying domain decomposition methods to a modified version of Weiss' (2001) 3D frequency domain full physics EM finite difference code, an open source GPL licensed f90 code available for download from www.OpenEM.org. This will be the core of a new hybrid 3D inversion that parallelizes frequencies across CPUs and individual forward solutions across GPUs. We describe progress made in modifying the code to use direct solvers in GPU cores dedicated to each small subdomain, iteratively improving the solution by matching adjacent subdomain boundary solutions, rather than iterative Krylov space sparse solvers as currently applied to the whole domain.
Toward 4D Nanoprinting with Tip-Induced Organic Surface Reactions.

PubMed

Carbonell, Carlos; Braunschweig, Adam B

2017-02-21

Future nanomanufacturing tools will prepare organic materials with complex four-dimensional (4D) structure, where the position (x, y, z) and chemical composition within a volume is controlled with sub-1 μm spatial resolution. Such tools could produce substrates that mimic biological interfaces, like the cell surface or the extracellular matrix, whose topology and chemical complexity combine to direct some of the most sophisticated biological events. The control of organic materials at the nanoscale-level of spatial resolution could revolutionize the assembly of next generation optical and electronic devices or substrates for tissue engineering or enable fundamental biological or material science investigations. Organic chemistry provides the requisite control over the orientation and position of matter within a nanoscale reference frame through the formation of new covalent bonds. Several challenges however preclude the integration of organic chemistry with conventional nanomanufacturing approaches, namely most nanolithography platforms would denature or destroy delicate organic and biologically active matter, confirming covalent bond formation at interfaces remains difficult, and finally, only a small handful of the reactions used to transform molecules in solution have been validated on surfaces. Thus, entirely new approaches, where organic transformations and spatial control are considered equally important contributors, are needed to create 4D organic nanoprinting platforms. This Account describes efforts from our group to reconcile nanolithography, and specifically massively parallel scanning probe lithography (SPL), with organic chemistry to further the goal of 4D organic nanoprinting. Massively parallel SPL involves arrays of elastomeric pyramids mounted onto piezoelectric actuators, and creates patterns with feature diameters below 50 nm by using the pyramidal tips for either the direct deposition of ink or the localized delivery of energy to a surface. While other groups have focused on tip and array architetctures, our efforts have been on exploring their use for localizing organic chemistry on surfaces with nanoscale spatial resolution in 3D. Herein we describe the use of massively parallel SPL to create covalently immobilized patterns of organic materials using thermal, catalytic, photochemical, and force-accelerated reactions. In doing so, we have developed a high-throughput protocol for confirming interfacial bond formation. These efforts have resulted in new opportunities for the preparation of glycan arrays, novel approaches for covalently patterning graphene, and a 3D nanoprinter by combining photochemical brush polymerizations with SPL. Achieving true 4D nanoprinting involves advances in surface chemistry and instrumentation development, and to this end 4D micropatterns were produced in a microfluidic photoreactor that can position polymers composed of different monomers within micrometer proximity. A substantial gap remains, however, between these current technologies and the future's 4D nanomanufacturing tools, but the marriage of SPL with organic chemistry is an important step toward this goal. As this field continues to mature we can expect bottom-up 4D nanomanufacturing to begin supplanting conventional top-down strategies for preparing electronics, bioarrays, and functional substrates. In addition, these new printing technologies may enable the preparation of synthetic targets, such as artificial biological interfaces, with a level of organic sophistication that is entirely unachievable using existing technologies.
Microresonator-based solitons for massively parallel coherent optical communications

NASA Astrophysics Data System (ADS)

Marin-Palomo, Pablo; Kemal, Juned N.; Karpov, Maxim; Kordts, Arne; Pfeifle, Joerg; Pfeiffer, Martin H. P.; Trocha, Philipp; Wolf, Stefan; Brasch, Victor; Anderson, Miles H.; Rosenberger, Ralf; Vijayan, Kovendhan; Freude, Wolfgang; Kippenberg, Tobias J.; Koos, Christian

2017-06-01

Solitons are waveforms that preserve their shape while propagating, as a result of a balance of dispersion and nonlinearity. Soliton-based data transmission schemes were investigated in the 1980s and showed promise as a way of overcoming the limitations imposed by dispersion of optical fibres. However, these approaches were later abandoned in favour of wavelength-division multiplexing schemes, which are easier to implement and offer improved scalability to higher data rates. Here we show that solitons could make a comeback in optical communications, not as a competitor but as a key element of massively parallel wavelength-division multiplexing. Instead of encoding data on the soliton pulse train itself, we use continuous-wave tones of the associated frequency comb as carriers for communication. Dissipative Kerr solitons (DKSs) (solitons that rely on a double balance of parametric gain and cavity loss, as well as dispersion and nonlinearity) are generated as continuously circulating pulses in an integrated silicon nitride microresonator via four-photon interactions mediated by the Kerr nonlinearity, leading to low-noise, spectrally smooth, broadband optical frequency combs. We use two interleaved DKS frequency combs to transmit a data stream of more than 50 terabits per second on 179 individual optical carriers that span the entire telecommunication C and L bands (centred around infrared telecommunication wavelengths of 1.55 micrometres). We also demonstrate coherent detection of a wavelength-division multiplexing data stream by using a pair of DKS frequency combs—one as a multi-wavelength light source at the transmitter and the other as the corresponding local oscillator at the receiver. This approach exploits the scalability of microresonator-based DKS frequency comb sources for massively parallel optical communications at both the transmitter and the receiver. Our results demonstrate the potential of these sources to replace the arrays of continuous-wave lasers that are currently used in high-speed communications. In combination with advanced spatial multiplexing schemes and highly integrated silicon photonic circuits, DKS frequency combs could bring chip-scale petabit-per-second transceivers into reach.
Microtubule Actin Crosslinking Factor 1 Regulates the Balbiani Body and Animal-Vegetal Polarity of the Zebrafish Oocyte

PubMed Central

Gupta, Tripti; Marlow, Florence L.; Ferriola, Deborah; Mackiewicz, Katarzyna; Dapprich, Johannes; Monos, Dimitri; Mullins, Mary C.

2010-01-01

Although of fundamental importance in developmental biology, the genetic basis for the symmetry breaking events that polarize the vertebrate oocyte and egg are largely unknown. In vertebrates, the first morphological asymmetry in the oocyte is the Balbiani body, a highly conserved, transient structure found in vertebrates and invertebrates including Drosophila, Xenopus, human, and mouse. We report the identification of the zebrafish magellan (mgn) mutant, which exhibits a novel enlarged Balbiani body phenotype and a disruption of oocyte polarity. To determine the molecular identity of the mgn gene, we positionally cloned the gene, employing a novel DNA capture method to target region-specific genomic DNA of 600 kb for massively parallel sequencing. Using this technique, we were able to enrich for the genomic region linked to our mutation within one week and then identify the mutation in mgn using massively parallel sequencing. This is one of the first successful uses of genomic DNA enrichment combined with massively parallel sequencing to determine the molecular identity of a gene associated with a mutant phenotype. We anticipate that the combination of these technologies will have wide applicability for the efficient identification of mutant genes in all organisms. We identified the mutation in mgn as a deletion in the coding sequence of the zebrafish microtubule actin crosslinking factor 1 (macf1) gene. macf1 is a member of the highly conserved spectraplakin family of cytoskeletal linker proteins, which play diverse roles in polarized cells such as neurons, muscle cells, and epithelial cells. In mgn mutants, the oocyte nucleus is mislocalized; and the Balbiani body, localized mRNAs, and organelles are absent from the periphery of the oocyte, consistent with a function for macf1 in nuclear anchoring and cortical localization. These data provide the first evidence for a role for spectraplakins in polarization of the vertebrate oocyte and egg. PMID:20808893
Microtubule actin crosslinking factor 1 regulates the Balbiani body and animal-vegetal polarity of the zebrafish oocyte.

PubMed

Gupta, Tripti; Marlow, Florence L; Ferriola, Deborah; Mackiewicz, Katarzyna; Dapprich, Johannes; Monos, Dimitri; Mullins, Mary C

2010-08-19

Although of fundamental importance in developmental biology, the genetic basis for the symmetry breaking events that polarize the vertebrate oocyte and egg are largely unknown. In vertebrates, the first morphological asymmetry in the oocyte is the Balbiani body, a highly conserved, transient structure found in vertebrates and invertebrates including Drosophila, Xenopus, human, and mouse. We report the identification of the zebrafish magellan (mgn) mutant, which exhibits a novel enlarged Balbiani body phenotype and a disruption of oocyte polarity. To determine the molecular identity of the mgn gene, we positionally cloned the gene, employing a novel DNA capture method to target region-specific genomic DNA of 600 kb for massively parallel sequencing. Using this technique, we were able to enrich for the genomic region linked to our mutation within one week and then identify the mutation in mgn using massively parallel sequencing. This is one of the first successful uses of genomic DNA enrichment combined with massively parallel sequencing to determine the molecular identity of a gene associated with a mutant phenotype. We anticipate that the combination of these technologies will have wide applicability for the efficient identification of mutant genes in all organisms. We identified the mutation in mgn as a deletion in the coding sequence of the zebrafish microtubule actin crosslinking factor 1 (macf1) gene. macf1 is a member of the highly conserved spectraplakin family of cytoskeletal linker proteins, which play diverse roles in polarized cells such as neurons, muscle cells, and epithelial cells. In mgn mutants, the oocyte nucleus is mislocalized; and the Balbiani body, localized mRNAs, and organelles are absent from the periphery of the oocyte, consistent with a function for macf1 in nuclear anchoring and cortical localization. These data provide the first evidence for a role for spectraplakins in polarization of the vertebrate oocyte and egg.
Tinker-HP: a massively parallel molecular dynamics package for multiscale simulations of large complex systems with advanced point dipole polarizable force fields.

PubMed

Lagardère, Louis; Jolly, Luc-Henri; Lipparini, Filippo; Aviat, Félix; Stamm, Benjamin; Jing, Zhifeng F; Harger, Matthew; Torabifard, Hedieh; Cisneros, G Andrés; Schnieders, Michael J; Gresh, Nohad; Maday, Yvon; Ren, Pengyu Y; Ponder, Jay W; Piquemal, Jean-Philip

2018-01-28

We present Tinker-HP, a massively MPI parallel package dedicated to classical molecular dynamics (MD) and to multiscale simulations, using advanced polarizable force fields (PFF) encompassing distributed multipoles electrostatics. Tinker-HP is an evolution of the popular Tinker package code that conserves its simplicity of use and its reference double precision implementation for CPUs. Grounded on interdisciplinary efforts with applied mathematics, Tinker-HP allows for long polarizable MD simulations on large systems up to millions of atoms. We detail in the paper the newly developed extension of massively parallel 3D spatial decomposition to point dipole polarizable models as well as their coupling to efficient Krylov iterative and non-iterative polarization solvers. The design of the code allows the use of various computer systems ranging from laboratory workstations to modern petascale supercomputers with thousands of cores. Tinker-HP proposes therefore the first high-performance scalable CPU computing environment for the development of next generation point dipole PFFs and for production simulations. Strategies linking Tinker-HP to Quantum Mechanics (QM) in the framework of multiscale polarizable self-consistent QM/MD simulations are also provided. The possibilities, performances and scalability of the software are demonstrated via benchmarks calculations using the polarizable AMOEBA force field on systems ranging from large water boxes of increasing size and ionic liquids to (very) large biosystems encompassing several proteins as well as the complete satellite tobacco mosaic virus and ribosome structures. For small systems, Tinker-HP appears to be competitive with the Tinker-OpenMM GPU implementation of Tinker. As the system size grows, Tinker-HP remains operational thanks to its access to distributed memory and takes advantage of its new algorithmic enabling for stable long timescale polarizable simulations. Overall, a several thousand-fold acceleration over a single-core computation is observed for the largest systems. The extension of the present CPU implementation of Tinker-HP to other computational platforms is discussed.
Evaluation of targeted exome sequencing for 28 protein-based blood group systems, including the homologous gene systems, for blood group genotyping.

PubMed

Schoeman, Elizna M; Lopez, Genghis H; McGowan, Eunike C; Millard, Glenda M; O'Brien, Helen; Roulis, Eileen V; Liew, Yew-Wah; Martin, Jacqueline R; McGrath, Kelli A; Powley, Tanya; Flower, Robert L; Hyland, Catherine A

2017-04-01

Blood group single nucleotide polymorphism genotyping probes for a limited range of polymorphisms. This study investigated whether massively parallel sequencing (also known as next-generation sequencing), with a targeted exome strategy, provides an extended blood group genotype and the extent to which massively parallel sequencing correctly genotypes in homologous gene systems, such as RH and MNS. Donor samples (n = 28) that were extensively phenotyped and genotyped using single nucleotide polymorphism typing, were analyzed using the TruSight One Sequencing Panel and MiSeq platform. Genes for 28 protein-based blood group systems, GATA1, and KLF1 were analyzed. Copy number variation analysis was used to characterize complex structural variants in the GYPC and RH systems. The average sequencing depth per target region was 66.2 ± 39.8. Each sample harbored on average 43 ± 9 variants, of which 10 ± 3 were used for genotyping. For the 28 samples, massively parallel sequencing variant sequences correctly matched expected sequences based on single nucleotide polymorphism genotyping data. Copy number variation analysis defined the Rh C/c alleles and complex RHD hybrids. Hybrid RHD*D-CE-D variants were correctly identified, but copy number variation analysis did not confidently distinguish between D and CE exon deletion versus rearrangement. The targeted exome sequencing strategy employed extended the range of blood group genotypes detected compared with single nucleotide polymorphism typing. This single-test format included detection of complex MNS hybrid cases and, with copy number variation analysis, defined RH hybrid genes along with the RHCE*C allele hitherto difficult to resolve by variant detection. The approach is economical compared with whole-genome sequencing and is suitable for a red blood cell reference laboratory setting. © 2017 AABB.
EUPDF-II: An Eulerian Joint Scalar Monte Carlo PDF Module : User's Manual

NASA Technical Reports Server (NTRS)

Raju, M. S.; Liu, Nan-Suey (Technical Monitor)

2004-01-01

EUPDF-II provides the solution for the species and temperature fields based on an evolution equation for PDF (Probability Density Function) and it is developed mainly for application with sprays, combustion, parallel computing, and unstructured grids. It is designed to be massively parallel and could easily be coupled with any existing gas-phase CFD and spray solvers. The solver accommodates the use of an unstructured mesh with mixed elements of either triangular, quadrilateral, and/or tetrahedral type. The manual provides the user with an understanding of the various models involved in the PDF formulation, its code structure and solution algorithm, and various other issues related to parallelization and its coupling with other solvers. The source code of EUPDF-II will be available with National Combustion Code (NCC) as a complete package.
Plasmonic Nanoholes in a Multi-Channel Microarray Format for Parallel Kinetic Assays and Differential Sensing

PubMed Central

Im, Hyungsoon; Lesuffleur, Antoine; Lindquist, Nathan C.; Oh, Sang-Hyun

2009-01-01

We present nanohole arrays in a gold film integrated with a 6-channel microfluidic chip for parallel measurements of molecular binding kinetics. Surface plasmon resonance effects in the nanohole arrays enable real-time label-free measurements of molecular binding events in each channel, while adjacent negative reference channels can record measurement artifacts such as bulk solution index changes, temperature variations, or changing light absorption in the liquid. Using this platform, streptavidin-biotin specific binding kinetics are measured at various concentrations with negative controls. A high-density microarray of 252 biosensing pixels is also demonstrated with a packing density of 106 sensing elements/cm2, which can potentially be coupled with a massively parallel array of microfluidic channels for protein microarray applications. PMID:19284776
Query-Structure Based Web Page Indexing

DTIC Science & Technology

2012-11-01

the massive amount of data present on the web. In our third participation in the web track at TREC 2012, we explore the idea of building an...the ad-hoc and diversity task. 1 INTRODUCTION The rapid growth and massive quantities of data on the Internet have increased the importance and...complexity of information retrieval systems. The amount and the diversity of the web data introduce shortcomings in the way search engines rank their
Virtual earthquake engineering laboratory with physics-based degrading materials on parallel computers

NASA Astrophysics Data System (ADS)

Cho, In Ho

For the last few decades, we have obtained tremendous insight into underlying microscopic mechanisms of degrading quasi-brittle materials from persistent and near-saintly efforts in laboratories, and at the same time we have seen unprecedented evolution in computational technology such as massively parallel computers. Thus, time is ripe to embark on a novel approach to settle unanswered questions, especially for the earthquake engineering community, by harmoniously combining the microphysics mechanisms with advanced parallel computing technology. To begin with, it should be stressed that we placed a great deal of emphasis on preserving clear meaning and physical counterparts of all the microscopic material models proposed herein, since it is directly tied to the belief that by doing so, the more physical mechanisms we incorporate, the better prediction we can obtain. We departed from reviewing representative microscopic analysis methodologies, selecting out "fixed-type" multidirectional smeared crack model as the base framework for nonlinear quasi-brittle materials, since it is widely believed to best retain the physical nature of actual cracks. Microscopic stress functions are proposed by integrating well-received existing models to update normal stresses on the crack surfaces (three orthogonal surfaces are allowed to initiate herein) under cyclic loading. Unlike the normal stress update, special attention had to be paid to the shear stress update on the crack surfaces, due primarily to the well-known pathological nature of the fixed-type smeared crack model---spurious large stress transfer over the open crack under nonproportional loading. In hopes of exploiting physical mechanism to resolve this deleterious nature of the fixed crack model, a tribology-inspired three-dimensional (3d) interlocking mechanism has been proposed. Following the main trend of tribology (i.e., the science and engineering of interacting surfaces), we introduced the base fabric of solid particle-soft matrix to explain realistic interlocking over rough crack surfaces, and the adopted Gaussian distribution feeds random particle sizes to the entire domain. Validation against a well-documented rough crack experiment reveals promising accuracy of the proposed 3d interlocking model. A consumed energy-based damage model has been proposed for the weak correlation between the normal and shear stresses on the crack surfaces, and also for describing the nature of irrecoverable damage. Since the evaluation of the consumed energy is directly linked to the microscopic deformation, which can be efficiently tracked on the crack surfaces, the proposed damage model is believed to provide a more physical interpretation than existing damage mechanics, which fundamentally stem from mathematical derivation with few physical counterparts. Another novel point of the present work lies in the topological transition-based "smart" steel bar model, notably with evolving compressive buckling length. We presented a systematic framework of information flow between the key ingredients of composite materials (i.e., steel bar and its surrounding concrete elements). The smart steel model suggested can incorporate smooth transition during reversal loading, tensile rupture, early buckling after reversal from excessive tensile loading, and even compressive buckling. Especially, the buckling length is made to evolve according to the damage states of the surrounding elements of each bar, while all other dominant models leave the length unchanged. What lies behind all the aforementioned novel attempts is, of course, the problem-optimized parallel platform. In fact, the parallel computing in our field has been restricted to monotonic shock or blast loading with explicit algorithm which is characteristically feasible to be parallelized. In the present study, efficient parallelization strategies for the highly demanding implicit nonlinear finite element analysis (FEA) program for real-scale reinforced concrete (RC) structures under cyclic loading are proposed. Quantitative comparison of state-of-the-art parallel strategies, in terms of factorization, had been carried out, leading to the problem-optimized solver, which is successfully embracing the penalty method and banded nature. Particularly, the penalty method employed imparts considerable smoothness to the global response, which yields a practical superiority of the parallel triangular system solver over other advanced solvers such as parallel preconditioned conjugate gradient method. Other salient issues on parallelization are also addressed. The parallel platform established offers unprecedented access to simulations of real-scale structures, giving new understanding about the physics-based mechanisms adopted and probabilistic randomness at the entire system level. Particularly, the platform enables bold simulations of real-scale RC structures exposed to cyclic loading---H-shaped wall system and 4-story T-shaped wall system. The simulations show the desired capability of accurate prediction of global force-displacement responses, postpeak softening behavior, and compressive buckling of longitudinal steel bars. It is fascinating to see that intrinsic randomness of the 3d interlocking model appears to cause "localized" damage of the real-scale structures, which is consistent with reported observations in different fields such as granular media. Equipped with accuracy, stability and scalability as demonstrated so far, the parallel platform is believed to serve as a fertile ground for the introducing of further physical mechanisms into various research fields as well as the earthquake engineering community. In the near future, it can be further expanded to run in concert with reliable FEA programs such as FRAME3d or OPENSEES. Following the central notion of "multiscale" analysis technique, actual infrastructures exposed to extreme natural hazard can be successfully tackled by this next generation analysis tool---the harmonious union of the parallel platform and a general FEA program. At the same time, any type of experiments can be easily conducted by this "virtual laboratory."
Parallel Algorithms for Switching Edges in Heterogeneous Graphs.

PubMed

Bhuiyan, Hasanuzzaman; Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav

2017-06-01

An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors.
Parallel Algorithms for Switching Edges in Heterogeneous Graphs☆

PubMed Central

Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav

2017-01-01

An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors. PMID:28757680
Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation

PubMed Central

Su, Huayou; Wen, Mei; Wu, Nan; Ren, Ju; Zhang, Chunyuan

2014-01-01

Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA's GPU. Not only the compute intensive components of the H.264 encoder are parallelized but also the control intensive components are realized effectively, such as CAVLC and deblocking filter. In addition, we proposed serial optimization methods, including the multiresolution multiwindow for motion estimation, multilevel parallel strategy to enhance the parallelism of intracoding as much as possible, component-based parallel CAVLC, and direction-priority deblocking filter. More than 96% of workload of H.264 encoder is offloaded to GPU. Experimental results show that the parallel implementation outperforms the serial program by 20 times of speedup ratio and satisfies the requirement of the real-time HD encoding of 30 fps. The loss of PSNR is from 0.14 dB to 0.77 dB, when keeping the same bitrate. Through the analysis to the kernels, we found that speedup ratios of the compute intensive algorithms are proportional with the computation power of the GPU. However, the performance of the control intensive parts (CAVLC) is much related to the memory bandwidth, which gives an insight for new architecture design. PMID:24757432

Some links on this page may take you to non-federal websites. Their policies may differ from this site.