Topical perspective on massive threading and parallelism.
Farber, Robert M
2011-09-01
Unquestionably computer architectures have undergone a recent and noteworthy paradigm shift that now delivers multi- and many-core systems with tens to many thousands of concurrent hardware processing elements per workstation or supercomputer node. GPGPU (General Purpose Graphics Processor Unit) technology in particular has attracted significant attention as new software development capabilities, namely CUDA (Compute Unified Device Architecture) and OpenCL™, have made it possible for students as well as small and large research organizations to achieve excellent speedup for many applications over more conventional computing architectures. The current scientific literature reflects this shift with numerous examples of GPGPU applications that have achieved one, two, and in some special cases, three-orders of magnitude increased computational performance through the use of massive threading to exploit parallelism. Multi-core architectures are also evolving quickly to exploit both massive-threading and massive-parallelism such as the 1.3 million threads Blue Waters supercomputer. The challenge confronting scientists in planning future experimental and theoretical research efforts--be they individual efforts with one computer or collaborative efforts proposing to use the largest supercomputers in the world is how to capitalize on these new massively threaded computational architectures--especially as not all computational problems will scale to massive parallelism. In particular, the costs associated with restructuring software (and potentially redesigning algorithms) to exploit the parallelism of these multi- and many-threaded machines must be considered along with application scalability and lifespan. This perspective is an overview of the current state of threading and parallelize with some insight into the future. Published by Elsevier Inc.
The Five Central Psychological Challenges Facing Effective Mobile Learning
ERIC Educational Resources Information Center
Terras, Melody M.; Ramsay, Judith
2012-01-01
Web 2.0 technology not only offers the opportunity of massively parallel interconnected networks that support the provision of information and communication anytime and anywhere but also offers immense opportunities for collaboration and sharing of user-generated content. This information-rich environment may support both formal and informal…
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing
Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin
2016-01-01
With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate. PMID:27070606
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing.
Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin
2016-04-07
With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.
Human-Robot Teams Collaborating Socially Organizationally, and Culturally
2011-09-01
C., Myerscough, Graham, S., & Oldroyd, B. P. (2004). Honey Bee Nest Thermoregulation: Diversity Promotes Stability. Science, 305, 5682, 402-404...from biology tend to be very robust, stable, and adaptable. These systems tend to be massively parallel, with a many to one mapping of agents to...Kaminski, G., Chaline, N., & Jaisson, P. (2007). Individual experience alone can generate lasting division of labor in ants. Current Biology 17, 1308
2015-08-01
Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) Software by N Scott Weingarten and James P Larentzos Approved for...Massively Parallel Simulator ( LAMMPS ) Software by N Scott Weingarten Weapons and Materials Research Directorate, ARL James P Larentzos Engility...Shifted Periodic Boundary Conditions in the Large-Scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) Software 5a. CONTRACT NUMBER 5b
Ultrascalable petaflop parallel supercomputer
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Chiu, George [Cross River, NY; Cipolla, Thomas M [Katonah, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Hall, Shawn [Pleasantville, NY; Haring, Rudolf A [Cortlandt Manor, NY; Heidelberger, Philip [Cortlandt Manor, NY; Kopcsay, Gerard V [Yorktown Heights, NY; Ohmacht, Martin [Yorktown Heights, NY; Salapura, Valentina [Chappaqua, NY; Sugavanam, Krishnan [Mahopac, NY; Takken, Todd [Brewster, NY
2010-07-20
A massively parallel supercomputer of petaOPS-scale includes node architectures based upon System-On-a-Chip technology, where each processing node comprises a single Application Specific Integrated Circuit (ASIC) having up to four processing elements. The ASIC nodes are interconnected by multiple independent networks that optimally maximize the throughput of packet communications between nodes with minimal latency. The multiple networks may include three high-speed networks for parallel algorithm message passing including a Torus, collective network, and a Global Asynchronous network that provides global barrier and notification functions. These multiple independent networks may be collaboratively or independently utilized according to the needs or phases of an algorithm for optimizing algorithm processing performance. The use of a DMA engine is provided to facilitate message passing among the nodes without the expenditure of processing resources at the node.
The 2nd Symposium on the Frontiers of Massively Parallel Computations
NASA Technical Reports Server (NTRS)
Mills, Ronnie (Editor)
1988-01-01
Programming languages, computer graphics, neural networks, massively parallel computers, SIMD architecture, algorithms, digital terrain models, sort computation, simulation of charged particle transport on the massively parallel processor and image processing are among the topics discussed.
Has Research on Collaborative Learning Technologies Addressed Massiveness? A Literature Review
ERIC Educational Resources Information Center
Manathunga, Kalpani; Hernández-Leo, Davinia
2015-01-01
There is a growing interest in understanding to what extent innovative educational technologies can be used to support massive courses. Collaboration is one of the main desired elements in massive learning actions involving large communities of participants. Accumulated research in collaborative learning technologies has proposed and evaluated…
Massively parallel information processing systems for space applications
NASA Technical Reports Server (NTRS)
Schaefer, D. H.
1979-01-01
NASA is developing massively parallel systems for ultra high speed processing of digital image data collected by satellite borne instrumentation. Such systems contain thousands of processing elements. Work is underway on the design and fabrication of the 'Massively Parallel Processor', a ground computer containing 16,384 processing elements arranged in a 128 x 128 array. This computer uses existing technology. Advanced work includes the development of semiconductor chips containing thousands of feedthrough paths. Massively parallel image analog to digital conversion technology is also being developed. The goal is to provide compact computers suitable for real-time onboard processing of images.
Massively parallel algorithms for real-time wavefront control of a dense adaptive optics system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fijany, A.; Milman, M.; Redding, D.
1994-12-31
In this paper massively parallel algorithms and architectures for real-time wavefront control of a dense adaptive optic system (SELENE) are presented. The authors have already shown that the computation of a near optimal control algorithm for SELENE can be reduced to the solution of a discrete Poisson equation on a regular domain. Although, this represents an optimal computation, due the large size of the system and the high sampling rate requirement, the implementation of this control algorithm poses a computationally challenging problem since it demands a sustained computational throughput of the order of 10 GFlops. They develop a novel algorithm,more » designated as Fast Invariant Imbedding algorithm, which offers a massive degree of parallelism with simple communication and synchronization requirements. Due to these features, this algorithm is significantly more efficient than other Fast Poisson Solvers for implementation on massively parallel architectures. The authors also discuss two massively parallel, algorithmically specialized, architectures for low-cost and optimal implementation of the Fast Invariant Imbedding algorithm.« less
RAMA: A file system for massively parallel computers
NASA Technical Reports Server (NTRS)
Miller, Ethan L.; Katz, Randy H.
1993-01-01
This paper describes a file system design for massively parallel computers which makes very efficient use of a few disks per processor. This overcomes the traditional I/O bottleneck of massively parallel machines by storing the data on disks within the high-speed interconnection network. In addition, the file system, called RAMA, requires little inter-node synchronization, removing another common bottleneck in parallel processor file systems. Support for a large tertiary storage system can easily be integrated in lo the file system; in fact, RAMA runs most efficiently when tertiary storage is used.
Parallel Logic Programming and Parallel Systems Software and Hardware
1989-07-29
Conference, Dallas TX. January 1985. (55) [Rous75] Roussel, P., "PROLOG: Manuel de Reference et d’Uilisation", Group d’ Intelligence Artificielle , Universite d...completed. Tools were provided for software development using artificial intelligence techniques. Al software for massively parallel architectures was...using artificial intelligence tech- niques. Al software for massively parallel architectures was started. 1. Introduction We describe research conducted
ERIC Educational Resources Information Center
Clarà, Marc; Kelly, Nick; Mauri, Teresa; Danaher, P. A.
2017-01-01
This paper explores the possibility that virtual communities of teachers with large numbers of members (referred to as "massive communities of teachers") can offer support to novice teachers by means of collaborative reflection. The paper examines and conceptualises some problems found in professional massive communities and proposes…
Design of a massively parallel computer using bit serial processing elements
NASA Technical Reports Server (NTRS)
Aburdene, Maurice F.; Khouri, Kamal S.; Piatt, Jason E.; Zheng, Jianqing
1995-01-01
A 1-bit serial processor designed for a parallel computer architecture is described. This processor is used to develop a massively parallel computational engine, with a single instruction-multiple data (SIMD) architecture. The computer is simulated and tested to verify its operation and to measure its performance for further development.
The EMCC / DARPA Massively Parallel Electromagnetic Scattering Project
NASA Technical Reports Server (NTRS)
Woo, Alex C.; Hill, Kueichien C.
1996-01-01
The Electromagnetic Code Consortium (EMCC) was sponsored by the Advanced Research Program Agency (ARPA) to demonstrate the effectiveness of massively parallel computing in large scale radar signature predictions. The EMCC/ARPA project consisted of three parts.
Box schemes and their implementation on the iPSC/860
NASA Technical Reports Server (NTRS)
Chattot, J. J.; Merriam, M. L.
1991-01-01
Research on algoriths for efficiently solving fluid flow problems on massively parallel computers is continued in the present paper. Attention is given to the implementation of a box scheme on the iPSC/860, a massively parallel computer with a peak speed of 10 Gflops and a memory of 128 Mwords. A domain decomposition approach to parallelism is used.
Scan line graphics generation on the massively parallel processor
NASA Technical Reports Server (NTRS)
Dorband, John E.
1988-01-01
Described here is how researchers implemented a scan line graphics generation algorithm on the Massively Parallel Processor (MPP). Pixels are computed in parallel and their results are applied to the Z buffer in large groups. To perform pixel value calculations, facilitate load balancing across the processors and apply the results to the Z buffer efficiently in parallel requires special virtual routing (sort computation) techniques developed by the author especially for use on single-instruction multiple-data (SIMD) architectures.
A massively parallel computational approach to coupled thermoelastic/porous gas flow problems
NASA Technical Reports Server (NTRS)
Shia, David; Mcmanus, Hugh L.
1995-01-01
A new computational scheme for coupled thermoelastic/porous gas flow problems is presented. Heat transfer, gas flow, and dynamic thermoelastic governing equations are expressed in fully explicit form, and solved on a massively parallel computer. The transpiration cooling problem is used as an example problem. The numerical solutions have been verified by comparison to available analytical solutions. Transient temperature, pressure, and stress distributions have been obtained. Small spatial oscillations in pressure and stress have been observed, which would be impractical to predict with previously available schemes. Comparisons between serial and massively parallel versions of the scheme have also been made. The results indicate that for small scale problems the serial and parallel versions use practically the same amount of CPU time. However, as the problem size increases the parallel version becomes more efficient than the serial version.
Bit-parallel arithmetic in a massively-parallel associative processor
NASA Technical Reports Server (NTRS)
Scherson, Isaac D.; Kramer, David A.; Alleyne, Brian D.
1992-01-01
A simple but powerful new architecture based on a classical associative processor model is presented. Algorithms for performing the four basic arithmetic operations both for integer and floating point operands are described. For m-bit operands, the proposed architecture makes it possible to execute complex operations in O(m) cycles as opposed to O(m exp 2) for bit-serial machines. A word-parallel, bit-parallel, massively-parallel computing system can be constructed using this architecture with VLSI technology. The operation of this system is demonstrated for the fast Fourier transform and matrix multiplication.
Increasing the reach of forensic genetics with massively parallel sequencing.
Budowle, Bruce; Schmedes, Sarah E; Wendt, Frank R
2017-09-01
The field of forensic genetics has made great strides in the analysis of biological evidence related to criminal and civil matters. More so, the discipline has set a standard of performance and quality in the forensic sciences. The advent of massively parallel sequencing will allow the field to expand its capabilities substantially. This review describes the salient features of massively parallel sequencing and how it can impact forensic genetics. The features of this technology offer increased number and types of genetic markers that can be analyzed, higher throughput of samples, and the capability of targeting different organisms, all by one unifying methodology. While there are many applications, three are described where massively parallel sequencing will have immediate impact: molecular autopsy, microbial forensics and differentiation of monozygotic twins. The intent of this review is to expose the forensic science community to the potential enhancements that have or are soon to arrive and demonstrate the continued expansion the field of forensic genetics and its service in the investigation of legal matters.
Massively parallel implementation of 3D-RISM calculation with volumetric 3D-FFT.
Maruyama, Yutaka; Yoshida, Norio; Tadano, Hiroto; Takahashi, Daisuke; Sato, Mitsuhisa; Hirata, Fumio
2014-07-05
A new three-dimensional reference interaction site model (3D-RISM) program for massively parallel machines combined with the volumetric 3D fast Fourier transform (3D-FFT) was developed, and tested on the RIKEN K supercomputer. The ordinary parallel 3D-RISM program has a limitation on the number of parallelizations because of the limitations of the slab-type 3D-FFT. The volumetric 3D-FFT relieves this limitation drastically. We tested the 3D-RISM calculation on the large and fine calculation cell (2048(3) grid points) on 16,384 nodes, each having eight CPU cores. The new 3D-RISM program achieved excellent scalability to the parallelization, running on the RIKEN K supercomputer. As a benchmark application, we employed the program, combined with molecular dynamics simulation, to analyze the oligomerization process of chymotrypsin Inhibitor 2 mutant. The results demonstrate that the massive parallel 3D-RISM program is effective to analyze the hydration properties of the large biomolecular systems. Copyright © 2014 Wiley Periodicals, Inc.
Parallel computing of a climate model on the dawn 1000 by domain decomposition method
NASA Astrophysics Data System (ADS)
Bi, Xunqiang
1997-12-01
In this paper the parallel computing of a grid-point nine-level atmospheric general circulation model on the Dawn 1000 is introduced. The model was developed by the Institute of Atmospheric Physics (IAP), Chinese Academy of Sciences (CAS). The Dawn 1000 is a MIMD massive parallel computer made by National Research Center for Intelligent Computer (NCIC), CAS. A two-dimensional domain decomposition method is adopted to perform the parallel computing. The potential ways to increase the speed-up ratio and exploit more resources of future massively parallel supercomputation are also discussed.
A sweep algorithm for massively parallel simulation of circuit-switched networks
NASA Technical Reports Server (NTRS)
Gaujal, Bruno; Greenberg, Albert G.; Nicol, David M.
1992-01-01
A new massively parallel algorithm is presented for simulating large asymmetric circuit-switched networks, controlled by a randomized-routing policy that includes trunk-reservation. A single instruction multiple data (SIMD) implementation is described, and corresponding experiments on a 16384 processor MasPar parallel computer are reported. A multiple instruction multiple data (MIMD) implementation is also described, and corresponding experiments on an Intel IPSC/860 parallel computer, using 16 processors, are reported. By exploiting parallelism, our algorithm increases the possible execution rate of such complex simulations by as much as an order of magnitude.
Multi-threading: A new dimension to massively parallel scientific computation
NASA Astrophysics Data System (ADS)
Nielsen, Ida M. B.; Janssen, Curtis L.
2000-06-01
Multi-threading is becoming widely available for Unix-like operating systems, and the application of multi-threading opens new ways for performing parallel computations with greater efficiency. We here briefly discuss the principles of multi-threading and illustrate the application of multi-threading for a massively parallel direct four-index transformation of electron repulsion integrals. Finally, other potential applications of multi-threading in scientific computing are outlined.
Neural Parallel Engine: A toolbox for massively parallel neural signal processing.
Tam, Wing-Kin; Yang, Zhi
2018-05-01
Large-scale neural recordings provide detailed information on neuronal activities and can help elicit the underlying neural mechanisms of the brain. However, the computational burden is also formidable when we try to process the huge data stream generated by such recordings. In this study, we report the development of Neural Parallel Engine (NPE), a toolbox for massively parallel neural signal processing on graphical processing units (GPUs). It offers a selection of the most commonly used routines in neural signal processing such as spike detection and spike sorting, including advanced algorithms such as exponential-component-power-component (EC-PC) spike detection and binary pursuit spike sorting. We also propose a new method for detecting peaks in parallel through a parallel compact operation. Our toolbox is able to offer a 5× to 110× speedup compared with its CPU counterparts depending on the algorithms. A user-friendly MATLAB interface is provided to allow easy integration of the toolbox into existing workflows. Previous efforts on GPU neural signal processing only focus on a few rudimentary algorithms, are not well-optimized and often do not provide a user-friendly programming interface to fit into existing workflows. There is a strong need for a comprehensive toolbox for massively parallel neural signal processing. A new toolbox for massively parallel neural signal processing has been created. It can offer significant speedup in processing signals from large-scale recordings up to thousands of channels. Copyright © 2018 Elsevier B.V. All rights reserved.
Chen, Yuanfang; Lee, Gyu Myoung; Shu, Lei; Crespi, Noel
2016-02-06
The development of an efficient and cost-effective solution to solve a complex problem (e.g., dynamic detection of toxic gases) is an important research issue in the industrial applications of the Internet of Things (IoT). An industrial intelligent ecosystem enables the collection of massive data from the various devices (e.g., sensor-embedded wireless devices) dynamically collaborating with humans. Effectively collaborative analytics based on the collected massive data from humans and devices is quite essential to improve the efficiency of industrial production/service. In this study, we propose a collaborative sensing intelligence (CSI) framework, combining collaborative intelligence and industrial sensing intelligence. The proposed CSI facilitates the cooperativity of analytics with integrating massive spatio-temporal data from different sources and time points. To deploy the CSI for achieving intelligent and efficient industrial production/service, the key challenges and open issues are discussed, as well.
Chen, Yuanfang; Lee, Gyu Myoung; Shu, Lei; Crespi, Noel
2016-01-01
The development of an efficient and cost-effective solution to solve a complex problem (e.g., dynamic detection of toxic gases) is an important research issue in the industrial applications of the Internet of Things (IoT). An industrial intelligent ecosystem enables the collection of massive data from the various devices (e.g., sensor-embedded wireless devices) dynamically collaborating with humans. Effectively collaborative analytics based on the collected massive data from humans and devices is quite essential to improve the efficiency of industrial production/service. In this study, we propose a collaborative sensing intelligence (CSI) framework, combining collaborative intelligence and industrial sensing intelligence. The proposed CSI facilitates the cooperativity of analytics with integrating massive spatio-temporal data from different sources and time points. To deploy the CSI for achieving intelligent and efficient industrial production/service, the key challenges and open issues are discussed, as well. PMID:26861345
The language parallel Pascal and other aspects of the massively parallel processor
NASA Technical Reports Server (NTRS)
Reeves, A. P.; Bruner, J. D.
1982-01-01
A high level language for the Massively Parallel Processor (MPP) was designed. This language, called Parallel Pascal, is described in detail. A description of the language design, a description of the intermediate language, Parallel P-Code, and details for the MPP implementation are included. Formal descriptions of Parallel Pascal and Parallel P-Code are given. A compiler was developed which converts programs in Parallel Pascal into the intermediate Parallel P-Code language. The code generator to complete the compiler for the MPP is being developed independently. A Parallel Pascal to Pascal translator was also developed. The architecture design for a VLSI version of the MPP was completed with a description of fault tolerant interconnection networks. The memory arrangement aspects of the MPP are discussed and a survey of other high level languages is given.
The factorization of large composite numbers on the MPP
NASA Technical Reports Server (NTRS)
Mckurdy, Kathy J.; Wunderlich, Marvin C.
1987-01-01
The continued fraction method for factoring large integers (CFRAC) was an ideal algorithm to be implemented on a massively parallel computer such as the Massively Parallel Processor (MPP). After much effort, the first 60 digit number was factored on the MPP using about 6 1/2 hours of array time. Although this result added about 10 digits to the size number that could be factored using CFRAC on a serial machine, it was already badly beaten by the implementation of Davis and Holdridge on the CRAY-1 using the quadratic sieve, an algorithm which is clearly superior to CFRAC for large numbers. An algorithm is illustrated which is ideally suited to the single instruction multiple data (SIMD) massively parallel architecture and some of the modifications which were needed in order to make the parallel implementation effective and efficient are described.
Matthew Parks; Richard Cronn; Aaron Liston
2009-01-01
We reconstruct the infrageneric phylogeny of Pinus from 37 nearly-complete chloroplast genomes (average 109 kilobases each of an approximately 120 kilobase genome) generated using multiplexed massively parallel sequencing. We found that 30/33 ingroup nodes resolved wlth > 95-percent bootstrap support; this is a substantial improvement relative...
Fast, Massively Parallel Data Processors
NASA Technical Reports Server (NTRS)
Heaton, Robert A.; Blevins, Donald W.; Davis, ED
1994-01-01
Proposed fast, massively parallel data processor contains 8x16 array of processing elements with efficient interconnection scheme and options for flexible local control. Processing elements communicate with each other on "X" interconnection grid with external memory via high-capacity input/output bus. This approach to conditional operation nearly doubles speed of various arithmetic operations.
Massively Parallel Solution of Poisson Equation on Coarse Grain MIMD Architectures
NASA Technical Reports Server (NTRS)
Fijany, A.; Weinberger, D.; Roosta, R.; Gulati, S.
1998-01-01
In this paper a new algorithm, designated as Fast Invariant Imbedding algorithm, for solution of Poisson equation on vector and massively parallel MIMD architectures is presented. This algorithm achieves the same optimal computational efficiency as other Fast Poisson solvers while offering a much better structure for vector and parallel implementation. Our implementation on the Intel Delta and Paragon shows that a speedup of over two orders of magnitude can be achieved even for moderate size problems.
Implementation of highly parallel and large scale GW calculations within the OpenAtom software
NASA Astrophysics Data System (ADS)
Ismail-Beigi, Sohrab
The need to describe electronic excitations with better accuracy than provided by band structures produced by Density Functional Theory (DFT) has been a long-term enterprise for the computational condensed matter and materials theory communities. In some cases, appropriate theoretical frameworks have existed for some time but have been difficult to apply widely due to computational cost. For example, the GW approximation incorporates a great deal of important non-local and dynamical electronic interaction effects but has been too computationally expensive for routine use in large materials simulations. OpenAtom is an open source massively parallel ab initiodensity functional software package based on plane waves and pseudopotentials (http://charm.cs.uiuc.edu/OpenAtom/) that takes advantage of the Charm + + parallel framework. At present, it is developed via a three-way collaboration, funded by an NSF SI2-SSI grant (ACI-1339804), between Yale (Ismail-Beigi), IBM T. J. Watson (Glenn Martyna) and the University of Illinois at Urbana Champaign (Laxmikant Kale). We will describe the project and our current approach towards implementing large scale GW calculations with OpenAtom. Potential applications of large scale parallel GW software for problems involving electronic excitations in semiconductor and/or metal oxide systems will be also be pointed out.
NASA Technical Reports Server (NTRS)
Dongarra, Jack (Editor); Messina, Paul (Editor); Sorensen, Danny C. (Editor); Voigt, Robert G. (Editor)
1990-01-01
Attention is given to such topics as an evaluation of block algorithm variants in LAPACK and presents a large-grain parallel sparse system solver, a multiprocessor method for the solution of the generalized Eigenvalue problem on an interval, and a parallel QR algorithm for iterative subspace methods on the CM2. A discussion of numerical methods includes the topics of asynchronous numerical solutions of PDEs on parallel computers, parallel homotopy curve tracking on a hypercube, and solving Navier-Stokes equations on the Cedar Multi-Cluster system. A section on differential equations includes a discussion of a six-color procedure for the parallel solution of elliptic systems using the finite quadtree structure, data parallel algorithms for the finite element method, and domain decomposition methods in aerodynamics. Topics dealing with massively parallel computing include hypercube vs. 2-dimensional meshes and massively parallel computation of conservation laws. Performance and tools are also discussed.
Proxy-equation paradigm: A strategy for massively parallel asynchronous computations
NASA Astrophysics Data System (ADS)
Mittal, Ankita; Girimaji, Sharath
2017-09-01
Massively parallel simulations of transport equation systems call for a paradigm change in algorithm development to achieve efficient scalability. Traditional approaches require time synchronization of processing elements (PEs), which severely restricts scalability. Relaxing synchronization requirement introduces error and slows down convergence. In this paper, we propose and develop a novel "proxy equation" concept for a general transport equation that (i) tolerates asynchrony with minimal added error, (ii) preserves convergence order and thus, (iii) expected to scale efficiently on massively parallel machines. The central idea is to modify a priori the transport equation at the PE boundaries to offset asynchrony errors. Proof-of-concept computations are performed using a one-dimensional advection (convection) diffusion equation. The results demonstrate the promise and advantages of the present strategy.
2013-08-01
potential for HMX / RDX (3, 9). ...................................................................................8 1 1. Purpose This work...6 dispersion and electrostatic interactions. Constants for the SB potential are given in table 1. 8 Table 1. SB potential for HMX / RDX (3, 9...modeling dislocations in the energetic molecular crystal RDX using the Large-Scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) molecular
Algorithms and programming tools for image processing on the MPP
NASA Technical Reports Server (NTRS)
Reeves, A. P.
1985-01-01
Topics addressed include: data mapping and rotational algorithms for the Massively Parallel Processor (MPP); Parallel Pascal language; documentation for the Parallel Pascal Development system; and a description of the Parallel Pascal language used on the MPP.
Parallelized direct execution simulation of message-passing parallel programs
NASA Technical Reports Server (NTRS)
Dickens, Phillip M.; Heidelberger, Philip; Nicol, David M.
1994-01-01
As massively parallel computers proliferate, there is growing interest in findings ways by which performance of massively parallel codes can be efficiently predicted. This problem arises in diverse contexts such as parallelizing computers, parallel performance monitoring, and parallel algorithm development. In this paper we describe one solution where one directly executes the application code, but uses a discrete-event simulator to model details of the presumed parallel machine such as operating system and communication network behavior. Because this approach is computationally expensive, we are interested in its own parallelization specifically the parallelization of the discrete-event simulator. We describe methods suitable for parallelized direct execution simulation of message-passing parallel programs, and report on the performance of such a system, Large Application Parallel Simulation Environment (LAPSE), we have built on the Intel Paragon. On all codes measured to date, LAPSE predicts performance well typically within 10 percent relative error. Depending on the nature of the application code, we have observed low slowdowns (relative to natively executing code) and high relative speedups using up to 64 processors.
The architecture of tomorrow's massively parallel computer
NASA Technical Reports Server (NTRS)
Batcher, Ken
1987-01-01
Goodyear Aerospace delivered the Massively Parallel Processor (MPP) to NASA/Goddard in May 1983, over three years ago. Ever since then, Goodyear has tried to look in a forward direction. There is always some debate as to which way is forward when it comes to supercomputer architecture. Improvements to the MPP's massively parallel architecture are discussed in the areas of data I/O, memory capacity, connectivity, and indirect (or local) addressing. In I/O, transfer rates up to 640 megabytes per second can be achieved. There are devices that can supply the data and accept it at this rate. The memory capacity can be increased up to 128 megabytes in the ARU and over a gigabyte in the staging memory. For connectivity, there are several different kinds of multistage networks that should be considered.
NASA Technical Reports Server (NTRS)
Reinsch, K. G. (Editor); Schmidt, W. (Editor); Ecer, A. (Editor); Haeuser, Jochem (Editor); Periaux, J. (Editor)
1992-01-01
A conference was held on parallel computational fluid dynamics and produced related papers. Topics discussed in these papers include: parallel implicit and explicit solvers for compressible flow, parallel computational techniques for Euler and Navier-Stokes equations, grid generation techniques for parallel computers, and aerodynamic simulation om massively parallel systems.
NASA Technical Reports Server (NTRS)
Barnden, John; Srinivas, Kankanahalli
1990-01-01
Symbol manipulation as used in traditional Artificial Intelligence has been criticized by neural net researchers for being excessively inflexible and sequential. On the other hand, the application of neural net techniques to the types of high-level cognitive processing studied in traditional artificial intelligence presents major problems as well. A promising way out of this impasse is to build neural net models that accomplish massively parallel case-based reasoning. Case-based reasoning, which has received much attention recently, is essentially the same as analogy-based reasoning, and avoids many of the problems leveled at traditional artificial intelligence. Further problems are avoided by doing many strands of case-based reasoning in parallel, and by implementing the whole system as a neural net. In addition, such a system provides an approach to some aspects of the problems of noise, uncertainty and novelty in reasoning systems. The current neural net system (Conposit), which performs standard rule-based reasoning, is being modified into a massively parallel case-based reasoning version.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Performance of the Heavy Flavor Tracker (HFT) detector in star experiment at RHIC
NASA Astrophysics Data System (ADS)
Alruwaili, Manal
With the growing technology, the number of the processors is becoming massive. Current supercomputer processing will be available on desktops in the next decade. For mass scale application software development on massive parallel computing available on desktops, existing popular languages with large libraries have to be augmented with new constructs and paradigms that exploit massive parallel computing and distributed memory models while retaining the user-friendliness. Currently, available object oriented languages for massive parallel computing such as Chapel, X10 and UPC++ exploit distributed computing, data parallel computing and thread-parallelism at the process level in the PGAS (Partitioned Global Address Space) memory model. However, they do not incorporate: 1) any extension at for object distribution to exploit PGAS model; 2) the programs lack the flexibility of migrating or cloning an object between places to exploit load balancing; and 3) lack the programming paradigms that will result from the integration of data and thread-level parallelism and object distribution. In the proposed thesis, I compare different languages in PGAS model; propose new constructs that extend C++ with object distribution and object migration; and integrate PGAS based process constructs with these extensions on distributed objects. Object cloning and object migration. Also a new paradigm MIDD (Multiple Invocation Distributed Data) is presented when different copies of the same class can be invoked, and work on different elements of a distributed data concurrently using remote method invocations. I present new constructs, their grammar and their behavior. The new constructs have been explained using simple programs utilizing these constructs.
Network Analysis of an Emergent Massively Collaborative Creation on Video Sharing Website
NASA Astrophysics Data System (ADS)
Hamasaki, Masahiro; Takeda, Hideaki; Nishimura, Takuichi
The Web technology enables numerous people to collaborate in creation. We designate it as massively collaborative creation via the Web. As an example of massively collaborative creation, we particularly examine video development on Nico Nico Douga, which is a video sharing website that is popular in Japan. We specifically examine videos on Hatsune Miku, a version of a singing synthesizer application software that has inspired not only song creation but also songwriting, illustration, and video editing. As described herein, creators of interact to create new contents through their social network. In this paper, we analyzed the process of developing thousands of videos based on creators' social networks and investigate relationships among creation activity and social networks. The social network reveals interesting features. Creators generate large and sparse social networks including some centralized communities, and such centralized community's members shared special tags. Different categories of creators have different roles in evolving the network, e.g., songwriters gather more links than other categories, implying that they are triggers to network evolution.
Role of APOE Isoforms in the Pathogenesis of TBI induced Alzheimer’s Disease
2016-10-01
deletion, APOE targeted replacement, complex breeding, CCI model optimization, mRNA library generation, high throughput massive parallel sequencing...demonstrate that the lack of Abca1 increases amyloid plaques and decreased APOE protein levels in AD-model mice. In this proposal we will test the hypothesis...injury, inflammatory reaction, transcriptome, high throughput massive parallel sequencing, mRNA-seq., behavioral testing, memory impairment, recovery 3
2010-10-14
High-Resolution Functional Mapping of the Venezuelan Equine Encephalitis Virus Genome by Insertional Mutagenesis and Massively Parallel Sequencing...Venezuelan equine encephalitis virus (VEEV) genome. We initially used a capillary electrophoresis method to gain insight into the role of the VEEV...Smith JM, Schmaljohn CS (2010) High-Resolution Functional Mapping of the Venezuelan Equine Encephalitis Virus Genome by Insertional Mutagenesis and
Efficient, massively parallel eigenvalue computation
NASA Technical Reports Server (NTRS)
Huo, Yan; Schreiber, Robert
1993-01-01
In numerical simulations of disordered electronic systems, one of the most common approaches is to diagonalize random Hamiltonian matrices and to study the eigenvalues and eigenfunctions of a single electron in the presence of a random potential. An effort to implement a matrix diagonalization routine for real symmetric dense matrices on massively parallel SIMD computers, the Maspar MP-1 and MP-2 systems, is described. Results of numerical tests and timings are also presented.
2012-10-01
using the open-source code Large-scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ) (http://lammps.sandia.gov) (23). The commercial...parameters are proprietary and cannot be ported to the LAMMPS 4 simulation code. In our molecular dynamics simulations at the atomistic resolution, we...IBI iterative Boltzmann inversion LAMMPS Large-scale Atomic/Molecular Massively Parallel Simulator MAPS Materials Processes and Simulations MS
Massively parallel GPU-accelerated minimization of classical density functional theory
NASA Astrophysics Data System (ADS)
Stopper, Daniel; Roth, Roland
2017-08-01
In this paper, we discuss the ability to numerically minimize the grand potential of hard disks in two-dimensional and of hard spheres in three-dimensional space within the framework of classical density functional and fundamental measure theory on modern graphics cards. Our main finding is that a massively parallel minimization leads to an enormous performance gain in comparison to standard sequential minimization schemes. Furthermore, the results indicate that in complex multi-dimensional situations, a heavy parallel minimization of the grand potential seems to be mandatory in order to reach a reasonable balance between accuracy and computational cost.
A new parallel-vector finite element analysis software on distributed-memory computers
NASA Technical Reports Server (NTRS)
Qin, Jiangning; Nguyen, Duc T.
1993-01-01
A new parallel-vector finite element analysis software package MPFEA (Massively Parallel-vector Finite Element Analysis) is developed for large-scale structural analysis on massively parallel computers with distributed-memory. MPFEA is designed for parallel generation and assembly of the global finite element stiffness matrices as well as parallel solution of the simultaneous linear equations, since these are often the major time-consuming parts of a finite element analysis. Block-skyline storage scheme along with vector-unrolling techniques are used to enhance the vector performance. Communications among processors are carried out concurrently with arithmetic operations to reduce the total execution time. Numerical results on the Intel iPSC/860 computers (such as the Intel Gamma with 128 processors and the Intel Touchstone Delta with 512 processors) are presented, including an aircraft structure and some very large truss structures, to demonstrate the efficiency and accuracy of MPFEA.
Template based parallel checkpointing in a massively parallel computer system
Archer, Charles Jens [Rochester, MN; Inglett, Todd Alan [Rochester, MN
2009-01-13
A method and apparatus for a template based parallel checkpoint save for a massively parallel super computer system using a parallel variation of the rsync protocol, and network broadcast. In preferred embodiments, the checkpoint data for each node is compared to a template checkpoint file that resides in the storage and that was previously produced. Embodiments herein greatly decrease the amount of data that must be transmitted and stored for faster checkpointing and increased efficiency of the computer system. Embodiments are directed to a parallel computer system with nodes arranged in a cluster with a high speed interconnect that can perform broadcast communication. The checkpoint contains a set of actual small data blocks with their corresponding checksums from all nodes in the system. The data blocks may be compressed using conventional non-lossy data compression algorithms to further reduce the overall checkpoint size.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Not Available
An account of the Caltech Concurrent Computation Program (C{sup 3}P), a five year project that focused on answering the question: Can parallel computers be used to do large-scale scientific computations '' As the title indicates, the question is answered in the affirmative, by implementing numerous scientific applications on real parallel computers and doing computations that produced new scientific results. In the process of doing so, C{sup 3}P helped design and build several new computers, designed and implemented basic system software, developed algorithms for frequently used mathematical computations on massively parallel machines, devised performance models and measured the performance of manymore » computers, and created a high performance computing facility based exclusively on parallel computers. While the initial focus of C{sup 3}P was the hypercube architecture developed by C. Seitz, many of the methods developed and lessons learned have been applied successfully on other massively parallel architectures.« less
Reconstructing evolutionary trees in parallel for massive sequences.
Zou, Quan; Wan, Shixiang; Zeng, Xiangxiang; Ma, Zhanshan Sam
2017-12-14
Building the evolutionary trees for massive unaligned DNA sequences is challenging and crucial. However, reconstructing evolutionary tree for ultra-large sequences is hard. Massive multiple sequence alignment is also challenging and time/space consuming. Hadoop and Spark are developed recently, which bring spring light for the classical computational biology problems. In this paper, we tried to solve the multiple sequence alignment and evolutionary reconstruction in parallel. HPTree, which is developed in this paper, can deal with big DNA sequence files quickly. It works well on the >1GB files, and gets better performance than other evolutionary reconstruction tools. Users could use HPTree for reonstructing evolutioanry trees on the computer clusters or cloud platform (eg. Amazon Cloud). HPTree could help on population evolution research and metagenomics analysis. In this paper, we employ the Hadoop and Spark platform and design an evolutionary tree reconstruction software tool for unaligned massive DNA sequences. Clustering and multiple sequence alignment are done in parallel. Neighbour-joining model was employed for the evolutionary tree building. We opened our software together with source codes via http://lab.malab.cn/soft/HPtree/ .
Load balancing for massively-parallel soft-real-time systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hailperin, M.
1988-09-01
Global load balancing, if practical, would allow the effective use of massively-parallel ensemble architectures for large soft-real-problems. The challenge is to replace quick global communications, which is impractical in a massively-parallel system, with statistical techniques. In this vein, the author proposes a novel approach to decentralized load balancing based on statistical time-series analysis. Each site estimates the system-wide average load using information about past loads of individual sites and attempts to equal that average. This estimation process is practical because the soft-real-time systems of interest naturally exhibit loads that are periodic, in a statistical sense akin to seasonality in econometrics.more » It is shown how this load-characterization technique can be the foundation for a load-balancing system in an architecture employing cut-through routing and an efficient multicast protocol.« less
Evaluation of massively parallel sequencing for forensic DNA methylation profiling.
Richards, Rebecca; Patel, Jayshree; Stevenson, Kate; Harbison, SallyAnn
2018-05-11
Epigenetics is an emerging area of interest in forensic science. DNA methylation, a type of epigenetic modification, can be applied to chronological age estimation, identical twin differentiation and body fluid identification. However, there is not yet an agreed, established methodology for targeted detection and analysis of DNA methylation markers in forensic research. Recently a massively parallel sequencing-based approach has been suggested. The use of massively parallel sequencing is well established in clinical epigenetics and is emerging as a new technology in the forensic field. This review investigates the potential benefits, limitations and considerations of this technique for the analysis of DNA methylation in a forensic context. The importance of a robust protocol, regardless of the methodology used, that minimises potential sources of bias is highlighted. This article is protected by copyright. All rights reserved. This article is protected by copyright. All rights reserved.
NASA Technical Reports Server (NTRS)
Logan, Terry G.
1994-01-01
The purpose of this study is to investigate the performance of the integral equation computations using numerical source field-panel method in a massively parallel processing (MPP) environment. A comparative study of computational performance of the MPP CM-5 computer and conventional Cray-YMP supercomputer for a three-dimensional flow problem is made. A serial FORTRAN code is converted into a parallel CM-FORTRAN code. Some performance results are obtained on CM-5 with 32, 62, 128 nodes along with those on Cray-YMP with a single processor. The comparison of the performance indicates that the parallel CM-FORTRAN code near or out-performs the equivalent serial FORTRAN code for some cases.
Supercomputing on massively parallel bit-serial architectures
NASA Technical Reports Server (NTRS)
Iobst, Ken
1985-01-01
Research on the Goodyear Massively Parallel Processor (MPP) suggests that high-level parallel languages are practical and can be designed with powerful new semantics that allow algorithms to be efficiently mapped to the real machines. For the MPP these semantics include parallel/associative array selection for both dense and sparse matrices, variable precision arithmetic to trade accuracy for speed, micro-pipelined train broadcast, and conditional branching at the processing element (PE) control unit level. The preliminary design of a FORTRAN-like parallel language for the MPP has been completed and is being used to write programs to perform sparse matrix array selection, min/max search, matrix multiplication, Gaussian elimination on single bit arrays and other generic algorithms. A description is given of the MPP design. Features of the system and its operation are illustrated in the form of charts and diagrams.
Zhang, Hong; Zapol, Peter; Dixon, David A.; ...
2015-11-17
The Shift-and-invert parallel spectral transformations (SIPs), a computational approach to solve sparse eigenvalue problems, is developed for massively parallel architectures with exceptional parallel scalability and robustness. The capabilities of SIPs are demonstrated by diagonalization of density-functional based tight-binding (DFTB) Hamiltonian and overlap matrices for single-wall metallic carbon nanotubes, diamond nanowires, and bulk diamond crystals. The largest (smallest) example studied is a 128,000 (2000) atom nanotube for which ~330,000 (~5600) eigenvalues and eigenfunctions are obtained in ~190 (~5) seconds when parallelized over 266,144 (16,384) Blue Gene/Q cores. Weak scaling and strong scaling of SIPs are analyzed and the performance of SIPsmore » is compared with other novel methods. Different matrix ordering methods are investigated to reduce the cost of the factorization step, which dominates the time-to-solution at the strong scaling limit. As a result, a parallel implementation of assembling the density matrix from the distributed eigenvectors is demonstrated.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Hong; Zapol, Peter; Dixon, David A.
The Shift-and-invert parallel spectral transformations (SIPs), a computational approach to solve sparse eigenvalue problems, is developed for massively parallel architectures with exceptional parallel scalability and robustness. The capabilities of SIPs are demonstrated by diagonalization of density-functional based tight-binding (DFTB) Hamiltonian and overlap matrices for single-wall metallic carbon nanotubes, diamond nanowires, and bulk diamond crystals. The largest (smallest) example studied is a 128,000 (2000) atom nanotube for which ~330,000 (~5600) eigenvalues and eigenfunctions are obtained in ~190 (~5) seconds when parallelized over 266,144 (16,384) Blue Gene/Q cores. Weak scaling and strong scaling of SIPs are analyzed and the performance of SIPsmore » is compared with other novel methods. Different matrix ordering methods are investigated to reduce the cost of the factorization step, which dominates the time-to-solution at the strong scaling limit. As a result, a parallel implementation of assembling the density matrix from the distributed eigenvectors is demonstrated.« less
A Novel Implementation of Massively Parallel Three Dimensional Monte Carlo Radiation Transport
NASA Astrophysics Data System (ADS)
Robinson, P. B.; Peterson, J. D. L.
2005-12-01
The goal of our summer project was to implement the difference formulation for radiation transport into Cosmos++, a multidimensional, massively parallel, magneto hydrodynamics code for astrophysical applications (Peter Anninos - AX). The difference formulation is a new method for Symbolic Implicit Monte Carlo thermal transport (Brooks and Szöke - PAT). Formerly, simultaneous implementation of fully implicit Monte Carlo radiation transport in multiple dimensions on multiple processors had not been convincingly demonstrated. We found that a combination of the difference formulation and the inherent structure of Cosmos++ makes such an implementation both accurate and straightforward. We developed a "nearly nearest neighbor physics" technique to allow each processor to work independently, even with a fully implicit code. This technique coupled with the increased accuracy of an implicit Monte Carlo solution and the efficiency of parallel computing systems allows us to demonstrate the possibility of massively parallel thermal transport. This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract No. W-7405-Eng-48
Hinds, Joanne M; Payne, Stephen J
2018-04-01
Collaborative inhibition is a phenomenon where collaborating groups experience a decrement in recall when interacting with others. Despite this, collaboration has been found to improve subsequent individual recall. We explore these effects in semantic recall, which is seldom studied in collaborative retrieval. We also examine "parallel CMC", a synchronous form of computer-mediated communication that has previously been found to improve collaborative recall [Hinds, J. M., & Payne, S. J. (2016). Collaborative inhibition and semantic recall: Improving collaboration through computer-mediated communication. Applied Cognitive Psychology, 30(4), 554-565]. Sixty three triads completed a semantic recall task, which involved generating words beginning with "PO" or "HE" across three recall trials, in one of three retrieval conditions: Individual-Individual-Individual (III), Face-to-face-Face-to-Face-Individual (FFI) and Parallel-Parallel-Individual (PPI). Collaborative inhibition was present across both collaborative conditions. Individual recall in Recall 3 was higher when participants had previously collaborated in comparison to recalling three times individually. There was no difference between face-to-face and parallel CMC recall, however subsidiary analyses of instance repetitions and subjective organisation highlighted differences in group members' approaches to recall in terms of organisation and attention to others' contributions. We discuss the implications of these findings in relation to retrieval strategy disruption.
NASA Astrophysics Data System (ADS)
Yu, Leiming; Nina-Paravecino, Fanny; Kaeli, David; Fang, Qianqian
2018-01-01
We present a highly scalable Monte Carlo (MC) three-dimensional photon transport simulation platform designed for heterogeneous computing systems. Through the development of a massively parallel MC algorithm using the Open Computing Language framework, this research extends our existing graphics processing unit (GPU)-accelerated MC technique to a highly scalable vendor-independent heterogeneous computing environment, achieving significantly improved performance and software portability. A number of parallel computing techniques are investigated to achieve portable performance over a wide range of computing hardware. Furthermore, multiple thread-level and device-level load-balancing strategies are developed to obtain efficient simulations using multiple central processing units and GPUs.
Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.; ...
2016-09-18
This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Settgast, Randolph R.; Fu, Pengcheng; Walsh, Stuart D. C.
This study describes a fully coupled finite element/finite volume approach for simulating field-scale hydraulically driven fractures in three dimensions, using massively parallel computing platforms. The proposed method is capable of capturing realistic representations of local heterogeneities, layering and natural fracture networks in a reservoir. A detailed description of the numerical implementation is provided, along with numerical studies comparing the model with both analytical solutions and experimental results. The results demonstrate the effectiveness of the proposed method for modeling large-scale problems involving hydraulically driven fractures in three dimensions.
Using CLIPS in the domain of knowledge-based massively parallel programming
NASA Technical Reports Server (NTRS)
Dvorak, Jiri J.
1994-01-01
The Program Development Environment (PDE) is a tool for massively parallel programming of distributed-memory architectures. Adopting a knowledge-based approach, the PDE eliminates the complexity introduced by parallel hardware with distributed memory and offers complete transparency in respect of parallelism exploitation. The knowledge-based part of the PDE is realized in CLIPS. Its principal task is to find an efficient parallel realization of the application specified by the user in a comfortable, abstract, domain-oriented formalism. A large collection of fine-grain parallel algorithmic skeletons, represented as COOL objects in a tree hierarchy, contains the algorithmic knowledge. A hybrid knowledge base with rule modules and procedural parts, encoding expertise about application domain, parallel programming, software engineering, and parallel hardware, enables a high degree of automation in the software development process. In this paper, important aspects of the implementation of the PDE using CLIPS and COOL are shown, including the embedding of CLIPS with C++-based parts of the PDE. The appropriateness of the chosen approach and of the CLIPS language for knowledge-based software engineering are discussed.
Collaborative Calibrated Peer Assessment in Massive Open Online Courses
ERIC Educational Resources Information Center
Boudria, Asma; Lafifi, Yacine; Bordjiba, Yamina
2018-01-01
The free nature and open access courses in the Massive Open Online Courses (MOOC) allow the facilities of disseminating information for a large number of participants. However, the "massive" propriety can generate many pedagogical problems, such as the assessment of learners, which is considered as the major difficulty facing in the…
Massively parallel sparse matrix function calculations with NTPoly
NASA Astrophysics Data System (ADS)
Dawson, William; Nakajima, Takahito
2018-04-01
We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
NASA Technical Reports Server (NTRS)
Keppenne, Christian L.; Rienecker, Michele; Borovikov, Anna Y.; Suarez, Max
1999-01-01
A massively parallel ensemble Kalman filter (EnKF)is used to assimilate temperature data from the TOGA/TAO array and altimetry from TOPEX/POSEIDON into a Pacific basin version of the NASA Seasonal to Interannual Prediction Project (NSIPP)ls quasi-isopycnal ocean general circulation model. The EnKF is an approximate Kalman filter in which the error-covariance propagation step is modeled by the integration of multiple instances of a numerical model. An estimate of the true error covariances is then inferred from the distribution of the ensemble of model state vectors. This inplementation of the filter takes advantage of the inherent parallelism in the EnKF algorithm by running all the model instances concurrently. The Kalman filter update step also occurs in parallel by having each processor process the observations that occur in the region of physical space for which it is responsible. The massively parallel data assimilation system is validated by withholding some of the data and then quantifying the extent to which the withheld information can be inferred from the assimilation of the remaining data. The distributions of the forecast and analysis error covariances predicted by the ENKF are also examined.
NASA Technical Reports Server (NTRS)
1994-01-01
CESDIS, the Center of Excellence in Space Data and Information Sciences was developed jointly by NASA, Universities Space Research Association (USRA), and the University of Maryland in 1988 to focus on the design of advanced computing techniques and data systems to support NASA Earth and space science research programs. CESDIS is operated by USRA under contract to NASA. The Director, Associate Director, Staff Scientists, and administrative staff are located on-site at NASA's Goddard Space Flight Center in Greenbelt, Maryland. The primary CESDIS mission is to increase the connection between computer science and engineering research programs at colleges and universities and NASA groups working with computer applications in Earth and space science. Research areas of primary interest at CESDIS include: 1) High performance computing, especially software design and performance evaluation for massively parallel machines; 2) Parallel input/output and data storage systems for high performance parallel computers; 3) Data base and intelligent data management systems for parallel computers; 4) Image processing; 5) Digital libraries; and 6) Data compression. CESDIS funds multiyear projects at U. S. universities and colleges. Proposals are accepted in response to calls for proposals and are selected on the basis of peer reviews. Funds are provided to support faculty and graduate students working at their home institutions. Project personnel visit Goddard during academic recess periods to attend workshops, present seminars, and collaborate with NASA scientists on research projects. Additionally, CESDIS takes on specific research tasks of shorter duration for computer science research requested by NASA Goddard scientists.
Parallels between a Collaborative Research Process and the Middle Level Philosophy
ERIC Educational Resources Information Center
Dever, Robin; Ross, Diane; Miller, Jennifer; White, Paula; Jones, Karen
2014-01-01
The characteristics of the middle level philosophy as described in This We Believe closely parallel the collaborative research process. The journey of one research team is described in relationship to these characteristics. The collaborative process includes strengths such as professional relationships, professional development, courageous…
A fast ultrasonic simulation tool based on massively parallel implementations
NASA Astrophysics Data System (ADS)
Lambert, Jason; Rougeron, Gilles; Lacassagne, Lionel; Chatillon, Sylvain
2014-02-01
This paper presents a CIVA optimized ultrasonic inspection simulation tool, which takes benefit of the power of massively parallel architectures: graphical processing units (GPU) and multi-core general purpose processors (GPP). This tool is based on the classical approach used in CIVA: the interaction model is based on Kirchoff, and the ultrasonic field around the defect is computed by the pencil method. The model has been adapted and parallelized for both architectures. At this stage, the configurations addressed by the tool are : multi and mono-element probes, planar specimens made of simple isotropic materials, planar rectangular defects or side drilled holes of small diameter. Validations on the model accuracy and performances measurements are presented.
Ordered fast Fourier transforms on a massively parallel hypercube multiprocessor
NASA Technical Reports Server (NTRS)
Tong, Charles; Swarztrauber, Paul N.
1991-01-01
The present evaluation of alternative, massively parallel hypercube processor-applicable designs for ordered radix-2 decimation-in-frequency FFT algorithms gives attention to the reduction of computation time-dominating communication. A combination of the order and computational phases of the FFT is accordingly employed, in conjunction with sequence-to-processor maps which reduce communication. Two orderings, 'standard' and 'cyclic', in which the order of the transform is the same as that of the input sequence, can be implemented with ease on the Connection Machine (where orderings are determined by geometries and priorities. A parallel method for trigonometric coefficient computation is presented which does not employ trigonometric functions or interprocessor communication.
O'keefe, Matthew; Parr, Terence; Edgar, B. Kevin; ...
1995-01-01
Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. Wemore » have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.« less
Lee, Anthony; Yau, Christopher; Giles, Michael B.; Doucet, Arnaud; Holmes, Christopher C.
2011-01-01
We present a case-study on the utility of graphics cards to perform massively parallel simulation of advanced Monte Carlo methods. Graphics cards, containing multiple Graphics Processing Units (GPUs), are self-contained parallel computational devices that can be housed in conventional desktop and laptop computers and can be thought of as prototypes of the next generation of many-core processors. For certain classes of population-based Monte Carlo algorithms they offer massively parallel simulation, with the added advantage over conventional distributed multi-core processors that they are cheap, easily accessible, easy to maintain, easy to code, dedicated local devices with low power consumption. On a canonical set of stochastic simulation examples including population-based Markov chain Monte Carlo methods and Sequential Monte Carlo methods, we nd speedups from 35 to 500 fold over conventional single-threaded computer code. Our findings suggest that GPUs have the potential to facilitate the growth of statistical modelling into complex data rich domains through the availability of cheap and accessible many-core computation. We believe the speedup we observe should motivate wider use of parallelizable simulation methods and greater methodological attention to their design. PMID:22003276
Analysis of multigrid methods on massively parallel computers: Architectural implications
NASA Technical Reports Server (NTRS)
Matheson, Lesley R.; Tarjan, Robert E.
1993-01-01
We study the potential performance of multigrid algorithms running on massively parallel computers with the intent of discovering whether presently envisioned machines will provide an efficient platform for such algorithms. We consider the domain parallel version of the standard V cycle algorithm on model problems, discretized using finite difference techniques in two and three dimensions on block structured grids of size 10(exp 6) and 10(exp 9), respectively. Our models of parallel computation were developed to reflect the computing characteristics of the current generation of massively parallel multicomputers. These models are based on an interconnection network of 256 to 16,384 message passing, 'workstation size' processors executing in an SPMD mode. The first model accomplishes interprocessor communications through a multistage permutation network. The communication cost is a logarithmic function which is similar to the costs in a variety of different topologies. The second model allows single stage communication costs only. Both models were designed with information provided by machine developers and utilize implementation derived parameters. With the medium grain parallelism of the current generation and the high fixed cost of an interprocessor communication, our analysis suggests an efficient implementation requires the machine to support the efficient transmission of long messages, (up to 1000 words) or the high initiation cost of a communication must be significantly reduced through an alternative optimization technique. Furthermore, with variable length message capability, our analysis suggests the low diameter multistage networks provide little or no advantage over a simple single stage communications network.
High-performance computing — an overview
NASA Astrophysics Data System (ADS)
Marksteiner, Peter
1996-08-01
An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
Fast I/O for Massively Parallel Applications
NASA Technical Reports Server (NTRS)
OKeefe, Matthew T.
1996-01-01
The two primary goals for this report were the design, contruction and modeling of parallel disk arrays for scientific visualization and animation, and a study of the IO requirements of highly parallel applications. In addition, further work in parallel display systems required to project and animate the very high-resolution frames resulting from our supercomputing simulations in ocean circulation and compressible gas dynamics.
NASA Technical Reports Server (NTRS)
Lou, John; Ferraro, Robert; Farrara, John; Mechoso, Carlos
1996-01-01
An analysis is presented of several factors influencing the performance of a parallel implementation of the UCLA atmospheric general circulation model (AGCM) on massively parallel computer systems. Several modificaitons to the original parallel AGCM code aimed at improving its numerical efficiency, interprocessor communication cost, load-balance and issues affecting single-node code performance are discussed.
LDRD final report on massively-parallel linear programming : the parPCx system.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar
2005-02-01
This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less
Coordinating the Commons: Diversity & Dynamics in Open Collaborations
ERIC Educational Resources Information Center
Morgan, Jonathan T.
2013-01-01
The success of Wikipedia demonstrates that open collaboration can be an effective model for organizing geographically-distributed volunteers to perform complex, sustained work at a massive scale. However, Wikipedia's history also demonstrates some of the challenges that large, long-term open collaborations face: the core community of Wikipedia…
Block iterative restoration of astronomical images with the massively parallel processor
NASA Technical Reports Server (NTRS)
Heap, Sara R.; Lindler, Don J.
1987-01-01
A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images.
A Massively Parallel Code for Polarization Calculations
NASA Astrophysics Data System (ADS)
Akiyama, Shizuka; Höflich, Peter
2001-03-01
We present an implementation of our Monte-Carlo radiation transport method for rapidly expanding, NLTE atmospheres for massively parallel computers which utilizes both the distributed and shared memory models. This allows us to take full advantage of the fast communication and low latency inherent to nodes with multiple CPUs, and to stretch the limits of scalability with the number of nodes compared to a version which is based on the shared memory model. Test calculations on a local 20-node Beowulf cluster with dual CPUs showed an improved scalability by about 40%.
Routing performance analysis and optimization within a massively parallel computer
Archer, Charles Jens; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen
2013-04-16
An apparatus, program product and method optimize the operation of a massively parallel computer system by, in part, receiving actual performance data concerning an application executed by the plurality of interconnected nodes, and analyzing the actual performance data to identify an actual performance pattern. A desired performance pattern may be determined for the application, and an algorithm may be selected from among a plurality of algorithms stored within a memory, the algorithm being configured to achieve the desired performance pattern based on the actual performance data.
NASA Technical Reports Server (NTRS)
Manohar, Mareboyana; Tilton, James C.
1994-01-01
A progressive vector quantization (VQ) compression approach is discussed which decomposes image data into a number of levels using full search VQ. The final level is losslessly compressed, enabling lossless reconstruction. The computational difficulties are addressed by implementation on a massively parallel SIMD machine. We demonstrate progressive VQ on multispectral imagery obtained from the Advanced Very High Resolution Radiometer instrument and other Earth observation image data, and investigate the trade-offs in selecting the number of decomposition levels and codebook training method.
Regional-scale calculation of the LS factor using parallel processing
NASA Astrophysics Data System (ADS)
Liu, Kai; Tang, Guoan; Jiang, Ling; Zhu, A.-Xing; Yang, Jianyi; Song, Xiaodong
2015-05-01
With the increase of data resolution and the increasing application of USLE over large areas, the existing serial implementation of algorithms for computing the LS factor is becoming a bottleneck. In this paper, a parallel processing model based on message passing interface (MPI) is presented for the calculation of the LS factor, so that massive datasets at a regional scale can be processed efficiently. The parallel model contains algorithms for calculating flow direction, flow accumulation, drainage network, slope, slope length and the LS factor. According to the existence of data dependence, the algorithms are divided into local algorithms and global algorithms. Parallel strategy are designed according to the algorithm characters including the decomposition method for maintaining the integrity of the results, optimized workflow for reducing the time taken for exporting the unnecessary intermediate data and a buffer-communication-computation strategy for improving the communication efficiency. Experiments on a multi-node system show that the proposed parallel model allows efficient calculation of the LS factor at a regional scale with a massive dataset.
Tough2{_}MP: A parallel version of TOUGH2
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Keni; Wu, Yu-Shu; Ding, Chris
2003-04-09
TOUGH2{_}MP is a massively parallel version of TOUGH2. It was developed for running on distributed-memory parallel computers to simulate large simulation problems that may not be solved by the standard, single-CPU TOUGH2 code. The new code implements an efficient massively parallel scheme, while preserving the full capacity and flexibility of the original TOUGH2 code. The new software uses the METIS software package for grid partitioning and AZTEC software package for linear-equation solving. The standard message-passing interface is adopted for communication among processors. Numerical performance of the current version code has been tested on CRAY-T3E and IBM RS/6000 SP platforms. Inmore » addition, the parallel code has been successfully applied to real field problems of multi-million-cell simulations for three-dimensional multiphase and multicomponent fluid and heat flow, as well as solute transport. In this paper, we will review the development of the TOUGH2{_}MP, and discuss the basic features, modules, and their applications.« less
Massively parallel processor computer
NASA Technical Reports Server (NTRS)
Fung, L. W. (Inventor)
1983-01-01
An apparatus for processing multidimensional data with strong spatial characteristics, such as raw image data, characterized by a large number of parallel data streams in an ordered array is described. It comprises a large number (e.g., 16,384 in a 128 x 128 array) of parallel processing elements operating simultaneously and independently on single bit slices of a corresponding array of incoming data streams under control of a single set of instructions. Each of the processing elements comprises a bidirectional data bus in communication with a register for storing single bit slices together with a random access memory unit and associated circuitry, including a binary counter/shift register device, for performing logical and arithmetical computations on the bit slices, and an I/O unit for interfacing the bidirectional data bus with the data stream source. The massively parallel processor architecture enables very high speed processing of large amounts of ordered parallel data, including spatial translation by shifting or sliding of bits vertically or horizontally to neighboring processing elements.
Compact holographic optical neural network system for real-time pattern recognition
NASA Astrophysics Data System (ADS)
Lu, Taiwei; Mintzer, David T.; Kostrzewski, Andrew A.; Lin, Freddie S.
1996-08-01
One of the important characteristics of artificial neural networks is their capability for massive interconnection and parallel processing. Recently, specialized electronic neural network processors and VLSI neural chips have been introduced in the commercial market. The number of parallel channels they can handle is limited because of the limited parallel interconnections that can be implemented with 1D electronic wires. High-resolution pattern recognition problems can require a large number of neurons for parallel processing of an image. This paper describes a holographic optical neural network (HONN) that is based on high- resolution volume holographic materials and is capable of performing massive 3D parallel interconnection of tens of thousands of neurons. A HONN with more than 16,000 neurons packaged in an attache case has been developed. Rotation- shift-scale-invariant pattern recognition operations have been demonstrated with this system. System parameters such as the signal-to-noise ratio, dynamic range, and processing speed are discussed.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barrett, Brian; Brightwell, Ronald B.; Grant, Ryan
This report presents a specification for the Portals 4 networ k programming interface. Portals 4 is intended to allow scalable, high-performance network communication betwee n nodes of a parallel computing system. Portals 4 is well suited to massively parallel processing and embedded syste ms. Portals 4 represents an adaption of the data movement layer developed for massively parallel processing platfor ms, such as the 4500-node Intel TeraFLOPS machine. Sandia's Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4 is tarmore » geted to the next generation of machines employing advanced network interface architectures that support enh anced offload capabilities.« less
The Portals 4.0 network programming interface.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin
2012-11-01
This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities.« less
Scalable isosurface visualization of massive datasets on commodity off-the-shelf clusters
Bajaj, Chandrajit
2009-01-01
Tomographic imaging and computer simulations are increasingly yielding massive datasets. Interactive and exploratory visualizations have rapidly become indispensable tools to study large volumetric imaging and simulation data. Our scalable isosurface visualization framework on commodity off-the-shelf clusters is an end-to-end parallel and progressive platform, from initial data access to the final display. Interactive browsing of extracted isosurfaces is made possible by using parallel isosurface extraction, and rendering in conjunction with a new specialized piece of image compositing hardware called Metabuffer. In this paper, we focus on the back end scalability by introducing a fully parallel and out-of-core isosurface extraction algorithm. It achieves scalability by using both parallel and out-of-core processing and parallel disks. It statically partitions the volume data to parallel disks with a balanced workload spectrum, and builds I/O-optimal external interval trees to minimize the number of I/O operations of loading large data from disk. We also describe an isosurface compression scheme that is efficient for progress extraction, transmission and storage of isosurfaces. PMID:19756231
DOE Office of Scientific and Technical Information (OSTI.GOV)
Sofronov, I.D.; Voronin, B.L.; Butnev, O.I.
1997-12-31
The aim of the work performed is to develop a 3D parallel program for numerical calculation of gas dynamics problem with heat conductivity on distributed memory computational systems (CS), satisfying the condition of numerical result independence from the number of processors involved. Two basically different approaches to the structure of massive parallel computations have been developed. The first approach uses the 3D data matrix decomposition reconstructed at temporal cycle and is a development of parallelization algorithms for multiprocessor CS with shareable memory. The second approach is based on using a 3D data matrix decomposition not reconstructed during a temporal cycle.more » The program was developed on 8-processor CS MP-3 made in VNIIEF and was adapted to a massive parallel CS Meiko-2 in LLNL by joint efforts of VNIIEF and LLNL staffs. A large number of numerical experiments has been carried out with different number of processors up to 256 and the efficiency of parallelization has been evaluated in dependence on processor number and their parameters.« less
Massively parallel multicanonical simulations
NASA Astrophysics Data System (ADS)
Gross, Jonathan; Zierenberg, Johannes; Weigel, Martin; Janke, Wolfhard
2018-03-01
Generalized-ensemble Monte Carlo simulations such as the multicanonical method and similar techniques are among the most efficient approaches for simulations of systems undergoing discontinuous phase transitions or with rugged free-energy landscapes. As Markov chain methods, they are inherently serial computationally. It was demonstrated recently, however, that a combination of independent simulations that communicate weight updates at variable intervals allows for the efficient utilization of parallel computational resources for multicanonical simulations. Implementing this approach for the many-thread architecture provided by current generations of graphics processing units (GPUs), we show how it can be efficiently employed with of the order of 104 parallel walkers and beyond, thus constituting a versatile tool for Monte Carlo simulations in the era of massively parallel computing. We provide the fully documented source code for the approach applied to the paradigmatic example of the two-dimensional Ising model as starting point and reference for practitioners in the field.
A General-purpose Framework for Parallel Processing of Large-scale LiDAR Data
NASA Astrophysics Data System (ADS)
Li, Z.; Hodgson, M.; Li, W.
2016-12-01
Light detection and ranging (LiDAR) technologies have proven efficiency to quickly obtain very detailed Earth surface data for a large spatial extent. Such data is important for scientific discoveries such as Earth and ecological sciences and natural disasters and environmental applications. However, handling LiDAR data poses grand geoprocessing challenges due to data intensity and computational intensity. Previous studies received notable success on parallel processing of LiDAR data to these challenges. However, these studies either relied on high performance computers and specialized hardware (GPUs) or focused mostly on finding customized solutions for some specific algorithms. We developed a general-purpose scalable framework coupled with sophisticated data decomposition and parallelization strategy to efficiently handle big LiDAR data. Specifically, 1) a tile-based spatial index is proposed to manage big LiDAR data in the scalable and fault-tolerable Hadoop distributed file system, 2) two spatial decomposition techniques are developed to enable efficient parallelization of different types of LiDAR processing tasks, and 3) by coupling existing LiDAR processing tools with Hadoop, this framework is able to conduct a variety of LiDAR data processing tasks in parallel in a highly scalable distributed computing environment. The performance and scalability of the framework is evaluated with a series of experiments conducted on a real LiDAR dataset using a proof-of-concept prototype system. The results show that the proposed framework 1) is able to handle massive LiDAR data more efficiently than standalone tools; and 2) provides almost linear scalability in terms of either increased workload (data volume) or increased computing nodes with both spatial decomposition strategies. We believe that the proposed framework provides valuable references on developing a collaborative cyberinfrastructure for processing big earth science data in a highly scalable environment.
Thought Leaders during Crises in Massive Social Networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Corley, Courtney D.; Farber, Robert M.; Reynolds, William
The vast amount of social media data that can be gathered from the internet coupled with workflows that utilize both commodity systems and massively parallel supercomputers, such as the Cray XMT, open new vistas for research to support health, defense, and national security. Computer technology now enables the analysis of graph structures containing more than 4 billion vertices joined by 34 billion edges along with metrics and massively parallel algorithms that exhibit near-linear scalability according to number of processors. The challenge lies in making this massive data and analysis comprehensible to an analyst and end-users that require actionable knowledge tomore » carry out their duties. Simply stated, we have developed language and content agnostic techniques to reduce large graphs built from vast media corpora into forms people can understand. Specifically, our tools and metrics act as a survey tool to identify thought leaders' -- those members that lead or reflect the thoughts and opinions of an online community, independent of the source language.« less
NASA Astrophysics Data System (ADS)
Sourbier, Florent; Operto, Stéphane; Virieux, Jean; Amestoy, Patrick; L'Excellent, Jean-Yves
2009-03-01
This is the first paper in a two-part series that describes a massively parallel code that performs 2D frequency-domain full-waveform inversion of wide-aperture seismic data for imaging complex structures. Full-waveform inversion methods, namely quantitative seismic imaging methods based on the resolution of the full wave equation, are computationally expensive. Therefore, designing efficient algorithms which take advantage of parallel computing facilities is critical for the appraisal of these approaches when applied to representative case studies and for further improvements. Full-waveform modelling requires the resolution of a large sparse system of linear equations which is performed with the massively parallel direct solver MUMPS for efficient multiple-shot simulations. Efficiency of the multiple-shot solution phase (forward/backward substitutions) is improved by using the BLAS3 library. The inverse problem relies on a classic local optimization approach implemented with a gradient method. The direct solver returns the multiple-shot wavefield solutions distributed over the processors according to a domain decomposition driven by the distribution of the LU factors. The domain decomposition of the wavefield solutions is used to compute in parallel the gradient of the objective function and the diagonal Hessian, this latter providing a suitable scaling of the gradient. The algorithm allows one to test different strategies for multiscale frequency inversion ranging from successive mono-frequency inversion to simultaneous multifrequency inversion. These different inversion strategies will be illustrated in the following companion paper. The parallel efficiency and the scalability of the code will also be quantified.
Institutional Collaboration on MOOCs in Education--A Literature Review
ERIC Educational Resources Information Center
Nortvig, Anne-Mette; Christiansen, René B.
2017-01-01
This literature review seeks to outline the state of the art regarding collaboration between educational institutions on Massive Open Online Courses (MOOCs) launched in Europe and in the US for the past 10 years. The review explores enablers and barriers that influence national institutional MOOC collaboration, and looks into how existing…
Collaborative Virtual Gaming Worlds in Higher Education
ERIC Educational Resources Information Center
Whitton, Nicola; Hollins, Paul
2008-01-01
There is growing interest in the use of virtual gaming worlds in education, supported by the increased use of multi-user virtual environments (MUVEs) and massively multi-player online role-playing games (MMORPGs) for collaborative learning. However, this paper argues that collaborative gaming worlds have been in use much longer and are much wider…
Open-Source Development of the Petascale Reactive Flow and Transport Code PFLOTRAN
NASA Astrophysics Data System (ADS)
Hammond, G. E.; Andre, B.; Bisht, G.; Johnson, T.; Karra, S.; Lichtner, P. C.; Mills, R. T.
2013-12-01
Open-source software development has become increasingly popular in recent years. Open-source encourages collaborative and transparent software development and promotes unlimited free redistribution of source code to the public. Open-source development is good for science as it reveals implementation details that are critical to scientific reproducibility, but generally excluded from journal publications. In addition, research funds that would have been spent on licensing fees can be redirected to code development that benefits more scientists. In 2006, the developers of PFLOTRAN open-sourced their code under the U.S. Department of Energy SciDAC-II program. Since that time, the code has gained popularity among code developers and users from around the world seeking to employ PFLOTRAN to simulate thermal, hydraulic, mechanical and biogeochemical processes in the Earth's surface/subsurface environment. PFLOTRAN is a massively-parallel subsurface reactive multiphase flow and transport simulator designed from the ground up to run efficiently on computing platforms ranging from the laptop to leadership-class supercomputers, all from a single code base. The code employs domain decomposition for parallelism and is founded upon the well-established and open-source parallel PETSc and HDF5 frameworks. PFLOTRAN leverages modern Fortran (i.e. Fortran 2003-2008) in its extensible object-oriented design. The use of this progressive, yet domain-friendly programming language has greatly facilitated collaboration in the code's software development. Over the past year, PFLOTRAN's top-level data structures were refactored as Fortran classes (i.e. extendible derived types) to improve the flexibility of the code, ease the addition of new process models, and enable coupling to external simulators. For instance, PFLOTRAN has been coupled to the parallel electrical resistivity tomography code E4D to enable hydrogeophysical inversion while the same code base can be used as a third-party library to provide hydrologic flow, energy transport, and biogeochemical capability to the community land model, CLM, part of the open-source community earth system model (CESM) for climate. In this presentation, the advantages and disadvantages of open source software development in support of geoscience research at government laboratories, universities, and the private sector are discussed. Since the code is open-source (i.e. it's transparent and readily available to competitors), the PFLOTRAN team's development strategy within a competitive research environment is presented. Finally, the developers discuss their approach to object-oriented programming and the leveraging of modern Fortran in support of collaborative geoscience research as the Fortran standard evolves among compiler vendors.
Roever, Stefan
2012-01-01
A massively parallel, low cost molecular analysis platform will dramatically change the nature of protein, molecular and genomics research, DNA sequencing, and ultimately, molecular diagnostics. An integrated circuit (IC) with 264 sensors was fabricated using standard CMOS semiconductor processing technology. Each of these sensors is individually controlled with precision analog circuitry and is capable of single molecule measurements. Under electronic and software control, the IC was used to demonstrate the feasibility of creating and detecting lipid bilayers and biological nanopores using wild type α-hemolysin. The ability to dynamically create bilayers over each of the sensors will greatly accelerate pore development and pore mutation analysis. In addition, the noise performance of the IC was measured to be 30fA(rms). With this noise performance, single base detection of DNA was demonstrated using α-hemolysin. The data shows that a single molecule, electrical detection platform using biological nanopores can be operationalized and can ultimately scale to millions of sensors. Such a massively parallel platform will revolutionize molecular analysis and will completely change the field of molecular diagnostics in the future.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wylie, Brian Neil; Moreland, Kenneth D.
Graphs are a vital way of organizing data with complex correlations. A good visualization of a graph can fundamentally change human understanding of the data. Consequently, there is a rich body of work on graph visualization. Although there are many techniques that are effective on small to medium sized graphs (tens of thousands of nodes), there is a void in the research for visualizing massive graphs containing millions of nodes. Sandia is one of the few entities in the world that has the means and motivation to handle data on such a massive scale. For example, homeland security generates graphsmore » from prolific media sources such as television, telephone, and the Internet. The purpose of this project is to provide the groundwork for visualizing such massive graphs. The research provides for two major feature gaps: a parallel, interactive visualization framework and scalable algorithms to make the framework usable to a practical application. Both the frameworks and algorithms are designed to run on distributed parallel computers, which are already available at Sandia. Some features are integrated into the ThreatView{trademark} application and future work will integrate further parallel algorithms.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rutland, Christopher J.
2009-04-26
The Terascale High-Fidelity Simulations of Turbulent Combustion (TSTC) project is a multi-university collaborative effort to develop a high-fidelity turbulent reacting flow simulation capability utilizing terascale, massively parallel computer technology. The main paradigm of the approach is direct numerical simulation (DNS) featuring the highest temporal and spatial accuracy, allowing quantitative observations of the fine-scale physics found in turbulent reacting flows as well as providing a useful tool for development of sub-models needed in device-level simulations. Under this component of the TSTC program the simulation code named S3D, developed and shared with coworkers at Sandia National Laboratories, has been enhanced with newmore » numerical algorithms and physical models to provide predictive capabilities for turbulent liquid fuel spray dynamics. Major accomplishments include improved fundamental understanding of mixing and auto-ignition in multi-phase turbulent reactant mixtures and turbulent fuel injection spray jets.« less
The portals 4.0.1 network programming interface.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Barrett, Brian W.; Brightwell, Ronald Brian; Pedretti, Kevin
2013-04-01
This report presents a specification for the Portals 4.0 network programming interface. Portals 4.0 is intended to allow scalable, high-performance network communication between nodes of a parallel computing system. Portals 4.0 is well suited to massively parallel processing and embedded systems. Portals 4.0 represents an adaption of the data movement layer developed for massively parallel processing platforms, such as the 4500-node Intel TeraFLOPS machine. Sandias Cplant cluster project motivated the development of Version 3.0, which was later extended to Version 3.3 as part of the Cray Red Storm machine and XT line. Version 4.0 is targeted to the next generationmore » of machines employing advanced network interface architectures that support enhanced offload capabilities. 3« less
Merlin - Massively parallel heterogeneous computing
NASA Technical Reports Server (NTRS)
Wittie, Larry; Maples, Creve
1989-01-01
Hardware and software for Merlin, a new kind of massively parallel computing system, are described. Eight computers are linked as a 300-MIPS prototype to develop system software for a larger Merlin network with 16 to 64 nodes, totaling 600 to 3000 MIPS. These working prototypes help refine a mapped reflective memory technique that offers a new, very general way of linking many types of computer to form supercomputers. Processors share data selectively and rapidly on a word-by-word basis. Fast firmware virtual circuits are reconfigured to match topological needs of individual application programs. Merlin's low-latency memory-sharing interfaces solve many problems in the design of high-performance computing systems. The Merlin prototypes are intended to run parallel programs for scientific applications and to determine hardware and software needs for a future Teraflops Merlin network.
Massively parallel quantum computer simulator
NASA Astrophysics Data System (ADS)
De Raedt, K.; Michielsen, K.; De Raedt, H.; Trieu, B.; Arnold, G.; Richter, M.; Lippert, Th.; Watanabe, H.; Ito, N.
2007-01-01
We describe portable software to simulate universal quantum computers on massive parallel computers. We illustrate the use of the simulation software by running various quantum algorithms on different computer architectures, such as a IBM BlueGene/L, a IBM Regatta p690+, a Hitachi SR11000/J1, a Cray X1E, a SGI Altix 3700 and clusters of PCs running Windows XP. We study the performance of the software by simulating quantum computers containing up to 36 qubits, using up to 4096 processors and up to 1 TB of memory. Our results demonstrate that the simulator exhibits nearly ideal scaling as a function of the number of processors and suggest that the simulation software described in this paper may also serve as benchmark for testing high-end parallel computers.
Archer, Charles Jens [Rochester, MN; Musselman, Roy Glenn [Rochester, MN; Peters, Amanda [Rochester, MN; Pinnow, Kurt Walter [Rochester, MN; Swartz, Brent Allen [Chippewa Falls, WI; Wallenfelt, Brian Paul [Eden Prairie, MN
2011-10-04
A massively parallel nodal computer system periodically collects and broadcasts usage data for an internal communications network. A node sending data over the network makes a global routing determination using the network usage data. Preferably, network usage data comprises an N-bit usage value for each output buffer associated with a network link. An optimum routing is determined by summing the N-bit values associated with each link through which a data packet must pass, and comparing the sums associated with different possible routes.
Scalable Visual Analytics of Massive Textual Datasets
DOE Office of Scientific and Technical Information (OSTI.GOV)
Krishnan, Manoj Kumar; Bohn, Shawn J.; Cowley, Wendy E.
2007-04-01
This paper describes the first scalable implementation of text processing engine used in Visual Analytics tools. These tools aid information analysts in interacting with and understanding large textual information content through visual interfaces. By developing parallel implementation of the text processing engine, we enabled visual analytics tools to exploit cluster architectures and handle massive dataset. The paper describes key elements of our parallelization approach and demonstrates virtually linear scaling when processing multi-gigabyte data sets such as Pubmed. This approach enables interactive analysis of large datasets beyond capabilities of existing state-of-the art visual analytics tools.
Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul
2010-03-16
A massively parallel computer system contains an inter-nodal communications network of node-to-node links. Each node implements a respective routing strategy for routing data through the network, the routing strategies not necessarily being the same in every node. The routing strategies implemented in the nodes are dynamically adjusted during application execution to shift network workload as required. Preferably, adjustment of routing policies in selective nodes is performed at synchronization points. The network may be dynamically monitored, and routing strategies adjusted according to detected network conditions.
Parallel processing architecture for H.264 deblocking filter on multi-core platforms
NASA Astrophysics Data System (ADS)
Prasad, Durga P.; Sonachalam, Sekar; Kunchamwar, Mangesh K.; Gunupudi, Nageswara Rao
2012-03-01
Massively parallel computing (multi-core) chips offer outstanding new solutions that satisfy the increasing demand for high resolution and high quality video compression technologies such as H.264. Such solutions not only provide exceptional quality but also efficiency, low power, and low latency, previously unattainable in software based designs. While custom hardware and Application Specific Integrated Circuit (ASIC) technologies may achieve lowlatency, low power, and real-time performance in some consumer devices, many applications require a flexible and scalable software-defined solution. The deblocking filter in H.264 encoder/decoder poses difficult implementation challenges because of heavy data dependencies and the conditional nature of the computations. Deblocking filter implementations tend to be fixed and difficult to reconfigure for different needs. The ability to scale up for higher quality requirements such as 10-bit pixel depth or a 4:2:2 chroma format often reduces the throughput of a parallel architecture designed for lower feature set. A scalable architecture for deblocking filtering, created with a massively parallel processor based solution, means that the same encoder or decoder will be deployed in a variety of applications, at different video resolutions, for different power requirements, and at higher bit-depths and better color sub sampling patterns like YUV, 4:2:2, or 4:4:4 formats. Low power, software-defined encoders/decoders may be implemented using a massively parallel processor array, like that found in HyperX technology, with 100 or more cores and distributed memory. The large number of processor elements allows the silicon device to operate more efficiently than conventional DSP or CPU technology. This software programing model for massively parallel processors offers a flexible implementation and a power efficiency close to that of ASIC solutions. This work describes a scalable parallel architecture for an H.264 compliant deblocking filter for multi core platforms such as HyperX technology. Parallel techniques such as parallel processing of independent macroblocks, sub blocks, and pixel row level are examined in this work. The deblocking architecture consists of a basic cell called deblocking filter unit (DFU) and dependent data buffer manager (DFM). The DFU can be used in several instances, catering to different performance needs the DFM serves the data required for the different number of DFUs, and also manages all the neighboring data required for future data processing of DFUs. This approach achieves the scalability, flexibility, and performance excellence required in deblocking filters.
NASA Astrophysics Data System (ADS)
Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo
2017-08-01
We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU data processing time)
NASA Astrophysics Data System (ADS)
Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A.; Oliveira, Micael J. T.; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G.; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A. L.
2012-06-01
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
Andrade, Xavier; Alberdi-Rodriguez, Joseba; Strubbe, David A; Oliveira, Micael J T; Nogueira, Fernando; Castro, Alberto; Muguerza, Javier; Arruabarrena, Agustin; Louie, Steven G; Aspuru-Guzik, Alán; Rubio, Angel; Marques, Miguel A L
2012-06-13
Octopus is a general-purpose density-functional theory (DFT) code, with a particular emphasis on the time-dependent version of DFT (TDDFT). In this paper we present the ongoing efforts to achieve the parallelization of octopus. We focus on the real-time variant of TDDFT, where the time-dependent Kohn-Sham equations are directly propagated in time. This approach has great potential for execution in massively parallel systems such as modern supercomputers with thousands of processors and graphics processing units (GPUs). For harvesting the potential of conventional supercomputers, the main strategy is a multi-level parallelization scheme that combines the inherent scalability of real-time TDDFT with a real-space grid domain-partitioning approach. A scalable Poisson solver is critical for the efficiency of this scheme. For GPUs, we show how using blocks of Kohn-Sham states provides the required level of data parallelism and that this strategy is also applicable for code optimization on standard processors. Our results show that real-time TDDFT, as implemented in octopus, can be the method of choice for studying the excited states of large molecular systems in modern parallel architectures.
Parallel Tensor Compression for Large-Scale Scientific Data.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kolda, Tamara G.; Ballard, Grey; Austin, Woody Nathan
As parallel computing trends towards the exascale, scientific data produced by high-fidelity simulations are growing increasingly massive. For instance, a simulation on a three-dimensional spatial grid with 512 points per dimension that tracks 64 variables per grid point for 128 time steps yields 8 TB of data. By viewing the data as a dense five way tensor, we can compute a Tucker decomposition to find inherent low-dimensional multilinear structure, achieving compression ratios of up to 10000 on real-world data sets with negligible loss in accuracy. So that we can operate on such massive data, we present the first-ever distributed memorymore » parallel implementation for the Tucker decomposition, whose key computations correspond to parallel linear algebra operations, albeit with nonstandard data layouts. Our approach specifies a data distribution for tensors that avoids any tensor data redistribution, either locally or in parallel. We provide accompanying analysis of the computation and communication costs of the algorithms. To demonstrate the compression and accuracy of the method, we apply our approach to real-world data sets from combustion science simulations. We also provide detailed performance results, including parallel performance in both weak and strong scaling experiments.« less
NASA Astrophysics Data System (ADS)
Neff, John A.
1989-12-01
Experiments originating from Gestalt psychology have shown that representing information in a symbolic form provides a more effective means to understanding. Computer scientists have been struggling for the last two decades to determine how best to create, manipulate, and store collections of symbolic structures. In the past, much of this struggling led to software innovations because that was the path of least resistance. For example, the development of heuristics for organizing the searching through knowledge bases was much less expensive than building massively parallel machines that could search in parallel. That is now beginning to change with the emergence of parallel architectures which are showing the potential for handling symbolic structures. This paper will review the relationships between symbolic computing and parallel computing architectures, and will identify opportunities for optics to significantly impact the performance of such computing machines. Although neural networks are an exciting subset of massively parallel computing structures, this paper will not touch on this area since it is receiving a great deal of attention in the literature. That is, the concepts presented herein do not consider the distributed representation of knowledge.
Massively Parallel DNA Sequencing Facilitates Diagnosis of Patients with Usher Syndrome Type 1
Yoshimura, Hidekane; Iwasaki, Satoshi; Nishio, Shin-ya; Kumakawa, Kozo; Tono, Tetsuya; Kobayashi, Yumiko; Sato, Hiroaki; Nagai, Kyoko; Ishikawa, Kotaro; Ikezono, Tetsuo; Naito, Yasushi; Fukushima, Kunihiro; Oshikawa, Chie; Kimitsuki, Takashi; Nakanishi, Hiroshi; Usami, Shin-ichi
2014-01-01
Usher syndrome is an autosomal recessive disorder manifesting hearing loss, retinitis pigmentosa and vestibular dysfunction, and having three clinical subtypes. Usher syndrome type 1 is the most severe subtype due to its profound hearing loss, lack of vestibular responses, and retinitis pigmentosa that appears in prepuberty. Six of the corresponding genes have been identified, making early diagnosis through DNA testing possible, with many immediate and several long-term advantages for patients and their families. However, the conventional genetic techniques, such as direct sequence analysis, are both time-consuming and expensive. Targeted exon sequencing of selected genes using the massively parallel DNA sequencing technology will potentially enable us to systematically tackle previously intractable monogenic disorders and improve molecular diagnosis. Using this technique combined with direct sequence analysis, we screened 17 unrelated Usher syndrome type 1 patients and detected probable pathogenic variants in the 16 of them (94.1%) who carried at least one mutation. Seven patients had the MYO7A mutation (41.2%), which is the most common type in Japanese. Most of the mutations were detected by only the massively parallel DNA sequencing. We report here four patients, who had probable pathogenic mutations in two different Usher syndrome type 1 genes, and one case of MYO7A/PCDH15 digenic inheritance. This is the first report of Usher syndrome mutation analysis using massively parallel DNA sequencing and the frequency of Usher syndrome type 1 genes in Japanese. Mutation screening using this technique has the power to quickly identify mutations of many causative genes while maintaining cost-benefit performance. In addition, the simultaneous mutation analysis of large numbers of genes is useful for detecting mutations in different genes that are possibly disease modifiers or of digenic inheritance. PMID:24618850
Massively parallel DNA sequencing facilitates diagnosis of patients with Usher syndrome type 1.
Yoshimura, Hidekane; Iwasaki, Satoshi; Nishio, Shin-Ya; Kumakawa, Kozo; Tono, Tetsuya; Kobayashi, Yumiko; Sato, Hiroaki; Nagai, Kyoko; Ishikawa, Kotaro; Ikezono, Tetsuo; Naito, Yasushi; Fukushima, Kunihiro; Oshikawa, Chie; Kimitsuki, Takashi; Nakanishi, Hiroshi; Usami, Shin-Ichi
2014-01-01
Usher syndrome is an autosomal recessive disorder manifesting hearing loss, retinitis pigmentosa and vestibular dysfunction, and having three clinical subtypes. Usher syndrome type 1 is the most severe subtype due to its profound hearing loss, lack of vestibular responses, and retinitis pigmentosa that appears in prepuberty. Six of the corresponding genes have been identified, making early diagnosis through DNA testing possible, with many immediate and several long-term advantages for patients and their families. However, the conventional genetic techniques, such as direct sequence analysis, are both time-consuming and expensive. Targeted exon sequencing of selected genes using the massively parallel DNA sequencing technology will potentially enable us to systematically tackle previously intractable monogenic disorders and improve molecular diagnosis. Using this technique combined with direct sequence analysis, we screened 17 unrelated Usher syndrome type 1 patients and detected probable pathogenic variants in the 16 of them (94.1%) who carried at least one mutation. Seven patients had the MYO7A mutation (41.2%), which is the most common type in Japanese. Most of the mutations were detected by only the massively parallel DNA sequencing. We report here four patients, who had probable pathogenic mutations in two different Usher syndrome type 1 genes, and one case of MYO7A/PCDH15 digenic inheritance. This is the first report of Usher syndrome mutation analysis using massively parallel DNA sequencing and the frequency of Usher syndrome type 1 genes in Japanese. Mutation screening using this technique has the power to quickly identify mutations of many causative genes while maintaining cost-benefit performance. In addition, the simultaneous mutation analysis of large numbers of genes is useful for detecting mutations in different genes that are possibly disease modifiers or of digenic inheritance.
A hierarchical, automated target recognition algorithm for a parallel analog processor
NASA Technical Reports Server (NTRS)
Woodward, Gail; Padgett, Curtis
1997-01-01
A hierarchical approach is described for an automated target recognition (ATR) system, VIGILANTE, that uses a massively parallel, analog processor (3DANN). The 3DANN processor is capable of performing 64 concurrent inner products of size 1x4096 every 250 nanoseconds.
Real-time electron dynamics for massively parallel excited-state simulations
NASA Astrophysics Data System (ADS)
Andrade, Xavier
The simulation of the real-time dynamics of electrons, based on time dependent density functional theory (TDDFT), is a powerful approach to study electronic excited states in molecular and crystalline systems. What makes the method attractive is its flexibility to simulate different kinds of phenomena beyond the linear-response regime, including strongly-perturbed electronic systems and non-adiabatic electron-ion dynamics. Electron-dynamics simulations are also attractive from a computational point of view. They can run efficiently on massively parallel architectures due to the low communication requirements. Our implementations of electron dynamics, based on the codes Octopus (real-space) and Qball (plane-waves), allow us to simulate systems composed of thousands of atoms and to obtain good parallel scaling up to 1.6 million processor cores. Due to the versatility of real-time electron dynamics and its parallel performance, we expect it to become the method of choice to apply the capabilities of exascale supercomputers for the simulation of electronic excited states.
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe
A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations
Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; ...
2017-11-14
A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations
NASA Astrophysics Data System (ADS)
Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; Gagliardi, Laura; de Jong, Wibe A.
2017-11-01
A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals for the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. The chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.
NASA Technical Reports Server (NTRS)
Treinish, Lloyd A.; Gough, Michael L.; Wildenhain, W. David
1987-01-01
The capability was developed of rapidly producing visual representations of large, complex, multi-dimensional space and earth sciences data sets via the implementation of computer graphics modeling techniques on the Massively Parallel Processor (MPP) by employing techniques recently developed for typically non-scientific applications. Such capabilities can provide a new and valuable tool for the understanding of complex scientific data, and a new application of parallel computing via the MPP. A prototype system with such capabilities was developed and integrated into the National Space Science Data Center's (NSSDC) Pilot Climate Data System (PCDS) data-independent environment for computer graphics data display to provide easy access to users. While developing these capabilities, several problems had to be solved independently of the actual use of the MPP, all of which are outlined.
A cost-effective methodology for the design of massively-parallel VLSI functional units
NASA Technical Reports Server (NTRS)
Venkateswaran, N.; Sriram, G.; Desouza, J.
1993-01-01
In this paper we propose a generalized methodology for the design of cost-effective massively-parallel VLSI Functional Units. This methodology is based on a technique of generating and reducing a massive bit-array on the mask-programmable PAcube VLSI array. This methodology unifies (maintains identical data flow and control) the execution of complex arithmetic functions on PAcube arrays. It is highly regular, expandable and uniform with respect to problem-size and wordlength, thereby reducing the communication complexity. The memory-functional unit interface is regular and expandable. Using this technique functional units of dedicated processors can be mask-programmed on the naked PAcube arrays, reducing the turn-around time. The production cost of such dedicated processors can be drastically reduced since the naked PAcube arrays can be mass-produced. Analysis of the the performance of functional units designed by our method yields promising results.
A CFD Heterogeneous Parallel Solver Based on Collaborating CPU and GPU
NASA Astrophysics Data System (ADS)
Lai, Jianqi; Tian, Zhengyu; Li, Hua; Pan, Sha
2018-03-01
Since Graphic Processing Unit (GPU) has a strong ability of floating-point computation and memory bandwidth for data parallelism, it has been widely used in the areas of common computing such as molecular dynamics (MD), computational fluid dynamics (CFD) and so on. The emergence of compute unified device architecture (CUDA), which reduces the complexity of compiling program, brings the great opportunities to CFD. There are three different modes for parallel solution of NS equations: parallel solver based on CPU, parallel solver based on GPU and heterogeneous parallel solver based on collaborating CPU and GPU. As we can see, GPUs are relatively rich in compute capacity but poor in memory capacity and the CPUs do the opposite. We need to make full use of the GPUs and CPUs, so a CFD heterogeneous parallel solver based on collaborating CPU and GPU has been established. Three cases are presented to analyse the solver’s computational accuracy and heterogeneous parallel efficiency. The numerical results agree well with experiment results, which demonstrate that the heterogeneous parallel solver has high computational precision. The speedup on a single GPU is more than 40 for laminar flow, it decreases for turbulent flow, but it still can reach more than 20. What’s more, the speedup increases as the grid size becomes larger.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mellor-Crummey, John
The PIPER project set out to develop methodologies and software for measurement, analysis, attribution, and presentation of performance data for extreme-scale systems. Goals of the project were to support analysis of massive multi-scale parallelism, heterogeneous architectures, multi-faceted performance concerns, and to support both post-mortem performance analysis to identify program features that contribute to problematic performance and on-line performance analysis to drive adaptation. This final report summarizes the research and development activity at Rice University as part of the PIPER project. Producing a complete suite of performance tools for exascale platforms during the course of this project was impossible since bothmore » hardware and software for exascale systems is still a moving target. For that reason, the project focused broadly on the development of new techniques for measurement and analysis of performance on modern parallel architectures, enhancements to HPCToolkit’s software infrastructure to support our research goals or use on sophisticated applications, engaging developers of multithreaded runtimes to explore how support for tools should be integrated into their designs, engaging operating system developers with feature requests for enhanced monitoring support, engaging vendors with requests that they add hardware measure- ment capabilities and software interfaces needed by tools as they design new components of HPC platforms including processors, accelerators and networks, and finally collaborations with partners interested in using HPCToolkit to analyze and tune scalable parallel applications.« less
Parallel computing on Unix workstation arrays
NASA Astrophysics Data System (ADS)
Reale, F.; Bocchino, F.; Sciortino, S.
1994-12-01
We have tested arrays of general-purpose Unix workstations used as MIMD systems for massive parallel computations. In particular we have solved numerically a demanding test problem with a 2D hydrodynamic code, generally developed to study astrophysical flows, by exucuting it on arrays either of DECstations 5000/200 on Ethernet LAN, or of DECstations 3000/400, equipped with powerful Alpha processors, on FDDI LAN. The code is appropriate for data-domain decomposition, and we have used a library for parallelization previously developed in our Institute, and easily extended to work on Unix workstation arrays by using the PVM software toolset. We have compared the parallel efficiencies obtained on arrays of several processors to those obtained on a dedicated MIMD parallel system, namely a Meiko Computing Surface (CS-1), equipped with Intel i860 processors. We discuss the feasibility of using non-dedicated parallel systems and conclude that the convenience depends essentially on the size of the computational domain as compared to the relative processor power and network bandwidth. We point out that for future perspectives a parallel development of processor and network technology is important, and that the software still offers great opportunities of improvement, especially in terms of latency times in the message-passing protocols. In conditions of significant gain in terms of speedup, such workstation arrays represent a cost-effective approach to massive parallel computations.
Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul
2010-04-27
A massively parallel computer system contains an inter-nodal communications network of node-to-node links. An automated routing strategy routes packets through one or more intermediate nodes of the network to reach a final destination. The default routing strategy is altered responsive to detection of overutilization of a particular path of one or more links, and at least some traffic is re-routed by distributing the traffic among multiple paths (which may include the default path). An alternative path may require a greater number of link traversals to reach the destination node.
A Massively Parallel Bayesian Approach to Planetary Protection Trajectory Analysis and Design
NASA Technical Reports Server (NTRS)
Wallace, Mark S.
2015-01-01
The NASA Planetary Protection Office has levied a requirement that the upper stage of future planetary launches have a less than 10(exp -4) chance of impacting Mars within 50 years after launch. A brute-force approach requires a decade of computer time to demonstrate compliance. By using a Bayesian approach and taking advantage of the demonstrated reliability of the upper stage, the required number of fifty-year propagations can be massively reduced. By spreading the remaining embarrassingly parallel Monte Carlo simulations across multiple computers, compliance can be demonstrated in a reasonable time frame. The method used is described here.
Systems and methods for rapid processing and storage of data
Stalzer, Mark A.
2017-01-24
Systems and methods of building massively parallel computing systems using low power computing complexes in accordance with embodiments of the invention are disclosed. A massively parallel computing system in accordance with one embodiment of the invention includes at least one Solid State Blade configured to communicate via a high performance network fabric. In addition, each Solid State Blade includes a processor configured to communicate with a plurality of low power computing complexes interconnected by a router, and each low power computing complex includes at least one general processing core, an accelerator, an I/O interface, and cache memory and is configured to communicate with non-volatile solid state memory.
Neupauerová, Jana; Grečmalová, Dagmar; Seeman, Pavel; Laššuthová, Petra
2016-05-01
We describe a patient with early onset severe axonal Charcot-Marie-Tooth disease (CMT2) with dominant inheritance, in whom Sanger sequencing failed to detect a mutation in the mitofusin 2 (MFN2) gene because of a single nucleotide polymorphism (rs2236057) under the PCR primer sequence. The severe early onset phenotype and the family history with severely affected mother (died after delivery) was very suggestive of CMT2A and this suspicion was finally confirmed by a MFN2 mutation. The mutation p.His361Tyr was later detected in the patient by massively parallel sequencing with a gene panel for hereditary neuropathies. According to this information, new primers for amplification and sequencing were designed which bind away from the polymorphic sites of the patient's DNA. Sanger sequencing with these new primers then confirmed the heterozygous mutation in the MFN2 gene in this patient. This case report shows that massively parallel sequencing may in some rare cases be more sensitive than Sanger sequencing and highlights the importance of accurate primer design which requires special attention. © 2016 John Wiley & Sons Ltd/University College London.
Hosokawa, Masahito; Nishikawa, Yohei; Kogawa, Masato; Takeyama, Haruko
2017-07-12
Massively parallel single-cell genome sequencing is required to further understand genetic diversities in complex biological systems. Whole genome amplification (WGA) is the first step for single-cell sequencing, but its throughput and accuracy are insufficient in conventional reaction platforms. Here, we introduce single droplet multiple displacement amplification (sd-MDA), a method that enables massively parallel amplification of single cell genomes while maintaining sequence accuracy and specificity. Tens of thousands of single cells are compartmentalized in millions of picoliter droplets and then subjected to lysis and WGA by passive droplet fusion in microfluidic channels. Because single cells are isolated in compartments, their genomes are amplified to saturation without contamination. This enables the high-throughput acquisition of contamination-free and cell specific sequence reads from single cells (21,000 single-cells/h), resulting in enhancement of the sequence data quality compared to conventional methods. This method allowed WGA of both single bacterial cells and human cancer cells. The obtained sequencing coverage rivals those of conventional techniques with superior sequence quality. In addition, we also demonstrate de novo assembly of uncultured soil bacteria and obtain draft genomes from single cell sequencing. This sd-MDA is promising for flexible and scalable use in single-cell sequencing.
Comparing Learner Community Behavior in Multiple Presentations of a Massive Open Online Course
ERIC Educational Resources Information Center
Gallagher, Silvia Elena; Savage, Timothy
2015-01-01
Massive Online Open Courses (MOOCs) can create large scale communities of learners who collaborate, interact and discuss learning materials and activities. MOOCs are often delivered multiple times with similar content to different cohorts of learners. However, research into the differences of learner communication, behavior and expectation between…
Comparing Learner Community Behavior in Multiple Presentations of a Massive Open Online Course
ERIC Educational Resources Information Center
Gallagher, Silvia Elena; Savage, Timothy
2016-01-01
Massive Online Open Courses (MOOCs) can create large scale communities of learners who collaborate, interact and discuss learning materials and activities. MOOCs are often delivered multiple times with similar content to different cohorts of learners. However, research into the differences of learner communication, behavior and expectation between…
Leadership in MMOGs: A Field of Research on Virtual Teams
ERIC Educational Resources Information Center
Mysirlaki, Sofia; Paraskeva, Fotini
2012-01-01
As our need for collaboration constantly grows, new tools have emerged to connect us in social networks, supporting the development of online communities, such as online games and virtual worlds. MMOGs (Massively Multiplayer Online Games) and MMORPGs (Massively Multiplayer Online Role-Playing Games) are complex systems, in which players are…
Remix as Professional Learning: Educators' Iterative Literacy Practice in CLMOOC
ERIC Educational Resources Information Center
Smith, Anna; West-Puckett, Stephanie; Cantrill, Christina; Zamora, Mia
2016-01-01
The Connected Learning Massive Open Online Collaboration (CLMOOC) is an online professional development experience designed as an openly networked, production-centered, participatory learning collaboration for educators. Addressing the paucity of research that investigates learning processes in MOOC experiences, this paper examines the situated…
Aerodynamic simulation on massively parallel systems
NASA Technical Reports Server (NTRS)
Haeuser, Jochem; Simon, Horst D.
1992-01-01
This paper briefly addresses the computational requirements for the analysis of complete configurations of aircraft and spacecraft currently under design to be used for advanced transportation in commercial applications as well as in space flight. The discussion clearly shows that massively parallel systems are the only alternative which is both cost effective and on the other hand can provide the necessary TeraFlops, needed to satisfy the narrow design margins of modern vehicles. It is assumed that the solution of the governing physical equations, i.e., the Navier-Stokes equations which may be complemented by chemistry and turbulence models, is done on multiblock grids. This technique is situated between the fully structured approach of classical boundary fitted grids and the fully unstructured tetrahedra grids. A fully structured grid best represents the flow physics, while the unstructured grid gives best geometrical flexibility. The multiblock grid employed is structured within a block, but completely unstructured on the block level. While a completely unstructured grid is not straightforward to parallelize, the above mentioned multiblock grid is inherently parallel, in particular for multiple instruction multiple datastream (MIMD) machines. In this paper guidelines are provided for setting up or modifying an existing sequential code so that a direct parallelization on a massively parallel system is possible. Results are presented for three parallel systems, namely the Intel hypercube, the Ncube hypercube, and the FPS 500 system. Some preliminary results for an 8K CM2 machine will also be mentioned. The code run is the two dimensional grid generation module of Grid, which is a general two dimensional and three dimensional grid generation code for complex geometries. A system of nonlinear Poisson equations is solved. This code is also a good testcase for complex fluid dynamics codes, since the same datastructures are used. All systems provided good speedups, but message passing MIMD systems seem to be best suited for large miltiblock applications.
A Survey of Parallel Computing
1988-07-01
Evaluating Two Massively Parallel Machines. Communications of the ACM .9, , , 176 BIBLIOGRAPHY 29, 8 (August), pp. 752-758. Gajski , D.D., Padua, D.A., Kuck...Computer Architecture, edited by Gajski , D. D., Milutinovic, V. M. Siegel, H. J. and Furht, B. P. IEEE Computer Society Press, Washington, D.C., pp. 387-407
Biologically inspired collision avoidance system for unmanned vehicles
NASA Astrophysics Data System (ADS)
Ortiz, Fernando E.; Graham, Brett; Spagnoli, Kyle; Kelmelis, Eric J.
2009-05-01
In this project, we collaborate with researchers in the neuroscience department at the University of Delaware to develop an Field Programmable Gate Array (FPGA)-based embedded computer, inspired by the brains of small vertebrates (fish). The mechanisms of object detection and avoidance in fish have been extensively studied by our Delaware collaborators. The midbrain optic tectum is a biological multimodal navigation controller capable of processing input from all senses that convey spatial information, including vision, audition, touch, and lateral-line (water current sensing in fish). Unfortunately, computational complexity makes these models too slow for use in real-time applications. These simulations are run offline on state-of-the-art desktop computers, presenting a gap between the application and the target platform: a low-power embedded device. EM Photonics has expertise in developing of high-performance computers based on commodity platforms such as graphic cards (GPUs) and FPGAs. FPGAs offer (1) high computational power, low power consumption and small footprint (in line with typical autonomous vehicle constraints), and (2) the ability to implement massively-parallel computational architectures, which can be leveraged to closely emulate biological systems. Combining UD's brain modeling algorithms and the power of FPGAs, this computer enables autonomous navigation in complex environments, and further types of onboard neural processing in future applications.
Computational mechanics analysis tools for parallel-vector supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Baddourah, Majdi; Qin, Jiangning
1993-01-01
Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigensolution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization search analysis and domain decomposition. The source code for many of these algorithms is available.
Parallel Preconditioning for CFD Problems on the CM-5
NASA Technical Reports Server (NTRS)
Simon, Horst D.; Kremenetsky, Mark D.; Richardson, John; Lasinski, T. A. (Technical Monitor)
1994-01-01
Up to today, preconditioning methods on massively parallel systems have faced a major difficulty. The most successful preconditioning methods in terms of accelerating the convergence of the iterative solver such as incomplete LU factorizations are notoriously difficult to implement on parallel machines for two reasons: (1) the actual computation of the preconditioner is not very floating-point intensive, but requires a large amount of unstructured communication, and (2) the application of the preconditioning matrix in the iteration phase (i.e. triangular solves) are difficult to parallelize because of the recursive nature of the computation. Here we present a new approach to preconditioning for very large, sparse, unsymmetric, linear systems, which avoids both difficulties. We explicitly compute an approximate inverse to our original matrix. This new preconditioning matrix can be applied most efficiently for iterative methods on massively parallel machines, since the preconditioning phase involves only a matrix-vector multiplication, with possibly a dense matrix. Furthermore the actual computation of the preconditioning matrix has natural parallelism. For a problem of size n, the preconditioning matrix can be computed by solving n independent small least squares problems. The algorithm and its implementation on the Connection Machine CM-5 are discussed in detail and supported by extensive timings obtained from real problem data.
A biconjugate gradient type algorithm on massively parallel architectures
NASA Technical Reports Server (NTRS)
Freund, Roland W.; Hochbruck, Marlis
1991-01-01
The biconjugate gradient (BCG) method is the natural generalization of the classical conjugate gradient algorithm for Hermitian positive definite matrices to general non-Hermitian linear systems. Unfortunately, the original BCG algorithm is susceptible to possible breakdowns and numerical instabilities. Recently, Freund and Nachtigal have proposed a novel BCG type approach, the quasi-minimal residual method (QMR), which overcomes the problems of BCG. Here, an implementation is presented of QMR based on an s-step version of the nonsymmetric look-ahead Lanczos algorithm. The main feature of the s-step Lanczos algorithm is that, in general, all inner products, except for one, can be computed in parallel at the end of each block; this is unlike the other standard Lanczos process where inner products are generated sequentially. The resulting implementation of QMR is particularly attractive on massively parallel SIMD architectures, such as the Connection Machine.
Massive parallelization of serial inference algorithms for a complex generalized linear model
Suchard, Marc A.; Simpson, Shawn E.; Zorych, Ivan; Ryan, Patrick; Madigan, David
2014-01-01
Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this paper we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety. PMID:25328363
Pandya, Tara M.; Johnson, Seth R.; Evans, Thomas M.; ...
2015-12-21
This paper discusses the implementation, capabilities, and validation of Shift, a massively parallel Monte Carlo radiation transport package developed and maintained at Oak Ridge National Laboratory. It has been developed to scale well from laptop to small computing clusters to advanced supercomputers. Special features of Shift include hybrid capabilities for variance reduction such as CADIS and FW-CADIS, and advanced parallel decomposition and tally methods optimized for scalability on supercomputing architectures. Shift has been validated and verified against various reactor physics benchmarks and compares well to other state-of-the-art Monte Carlo radiation transport codes such as MCNP5, CE KENO-VI, and OpenMC. Somemore » specific benchmarks used for verification and validation include the CASL VERA criticality test suite and several Westinghouse AP1000 ® problems. These benchmark and scaling studies show promising results.« less
Crystal MD: The massively parallel molecular dynamics software for metal with BCC structure
NASA Astrophysics Data System (ADS)
Hu, Changjun; Bai, He; He, Xinfu; Zhang, Boyao; Nie, Ningming; Wang, Xianmeng; Ren, Yingwen
2017-02-01
Material irradiation effect is one of the most important keys to use nuclear power. However, the lack of high-throughput irradiation facility and knowledge of evolution process, lead to little understanding of the addressed issues. With the help of high-performance computing, we could make a further understanding of micro-level-material. In this paper, a new data structure is proposed for the massively parallel simulation of the evolution of metal materials under irradiation environment. Based on the proposed data structure, we developed the new molecular dynamics software named Crystal MD. The simulation with Crystal MD achieved over 90% parallel efficiency in test cases, and it takes more than 25% less memory on multi-core clusters than LAMMPS and IMD, which are two popular molecular dynamics simulation software. Using Crystal MD, a two trillion particles simulation has been performed on Tianhe-2 cluster.
ERIC Educational Resources Information Center
Voulgari, Iro; Komis, Vassilis
2010-01-01
Although there is strong evidence that massively multiplayer online games (MMOGs) constitute environments of social interactions and effective learning, we currently lack the tools for investigating the effectiveness of the social networks emerging as well as the cognitive aspects and knowledge acquisition such environments involve. Within this…
Studying the Effectiveness of Multi-User Immersive Environments for Collaborative Evaluation Tasks
ERIC Educational Resources Information Center
Lorenzo, Carlos-Miguel; Sicilia, Miguel Angel; Sanchez, Salvador
2012-01-01
Massively Multiuser On-line Learning (MMOL) Platforms, often called "virtual learning worlds", constitute a still unexplored context for communication-enhanced learning, where synchronous communication skills in an explicit social setting enhance the potential of effective collaboration. In this paper, we report on an experimental study of…
ERIC Educational Resources Information Center
Wiggins, Andrea
2012-01-01
Citizen science projects involve the public with scientists in collaborative research. Information and communication technologies for citizen science can enable massive virtual collaborations based on voluntary contributions by diverse participants. As the popularity of citizen science increases, scientists need a more thorough understanding of…
Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip
2014-02-28
In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations.
Lagardère, Louis; Lipparini, Filippo; Polack, Étienne; Stamm, Benjamin; Cancès, Éric; Schnieders, Michael; Ren, Pengyu; Maday, Yvon; Piquemal, Jean-Philip
2015-01-01
In this paper, we present a scalable and efficient implementation of point dipole-based polarizable force fields for molecular dynamics (MD) simulations with periodic boundary conditions (PBC). The Smooth Particle-Mesh Ewald technique is combined with two optimal iterative strategies, namely, a preconditioned conjugate gradient solver and a Jacobi solver in conjunction with the Direct Inversion in the Iterative Subspace for convergence acceleration, to solve the polarization equations. We show that both solvers exhibit very good parallel performances and overall very competitive timings in an energy-force computation needed to perform a MD step. Various tests on large systems are provided in the context of the polarizable AMOEBA force field as implemented in the newly developed Tinker-HP package which is the first implementation for a polarizable model making large scale experiments for massively parallel PBC point dipole models possible. We show that using a large number of cores offers a significant acceleration of the overall process involving the iterative methods within the context of spme and a noticeable improvement of the memory management giving access to very large systems (hundreds of thousands of atoms) as the algorithm naturally distributes the data on different cores. Coupled with advanced MD techniques, gains ranging from 2 to 3 orders of magnitude in time are now possible compared to non-optimized, sequential implementations giving new directions for polarizable molecular dynamics in periodic boundary conditions using massively parallel implementations. PMID:26512230
Grindon, Christina; Harris, Sarah; Evans, Tom; Novik, Keir; Coveney, Peter; Laughton, Charles
2004-07-15
Molecular modelling played a central role in the discovery of the structure of DNA by Watson and Crick. Today, such modelling is done on computers: the more powerful these computers are, the more detailed and extensive can be the study of the dynamics of such biological macromolecules. To fully harness the power of modern massively parallel computers, however, we need to develop and deploy algorithms which can exploit the structure of such hardware. The Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is a scalable molecular dynamics code including long-range Coulomb interactions, which has been specifically designed to function efficiently on parallel platforms. Here we describe the implementation of the AMBER98 force field in LAMMPS and its validation for molecular dynamics investigations of DNA structure and flexibility against the benchmark of results obtained with the long-established code AMBER6 (Assisted Model Building with Energy Refinement, version 6). Extended molecular dynamics simulations on the hydrated DNA dodecamer d(CTTTTGCAAAAG)(2), which has previously been the subject of extensive dynamical analysis using AMBER6, show that it is possible to obtain excellent agreement in terms of static, dynamic and thermodynamic parameters between AMBER6 and LAMMPS. In comparison with AMBER6, LAMMPS shows greatly improved scalability in massively parallel environments, opening up the possibility of efficient simulations of order-of-magnitude larger systems and/or for order-of-magnitude greater simulation times.
NASA Astrophysics Data System (ADS)
Hayano, Akira; Ishii, Eiichi
2016-10-01
This study investigates the mechanical relationship between bedding-parallel and bedding-oblique faults in a Neogene massive siliceous mudstone at the site of the Horonobe Underground Research Laboratory (URL) in Hokkaido, Japan, on the basis of observations of drill-core recovered from pilot boreholes and fracture mapping on shaft and gallery walls. Four bedding-parallel faults with visible fault gouge, named respectively the MM Fault, the Last MM Fault, the S1 Fault, and the S2 Fault (stratigraphically, from the highest to the lowest), were observed in two pilot boreholes (PB-V01 and SAB-1). The distribution of the bedding-parallel faults at 350 m depth in the Horonobe URL indicates that these faults are spread over at least several tens of meters in parallel along a bedding plane. The observation that the bedding-oblique fault displaces the Last MM fault is consistent with the previous interpretation that the bedding- oblique faults formed after the bedding-parallel faults. In addition, the bedding-parallel faults terminate near the MM and S1 faults, indicating that the bedding-parallel faults with visible fault gouge act to terminate the propagation of younger bedding-oblique faults. In particular, the MM and S1 faults, which have a relatively thick fault gouge, appear to have had a stronger control on the propagation of bedding-oblique faults than did the Last MM fault, which has a relatively thin fault gouge.
3-D readout-electronics packaging for high-bandwidth massively paralleled imager
Kwiatkowski, Kris; Lyke, James
2007-12-18
Dense, massively parallel signal processing electronics are co-packaged behind associated sensor pixels. Microchips containing a linear or bilinear arrangement of photo-sensors, together with associated complex electronics, are integrated into a simple 3-D structure (a "mirror cube"). An array of photo-sensitive cells are disposed on a stacked CMOS chip's surface at a 45.degree. angle from light reflecting mirror surfaces formed on a neighboring CMOS chip surface. Image processing electronics are held within the stacked CMOS chip layers. Electrical connections couple each of said stacked CMOS chip layers and a distribution grid, the connections for distributing power and signals to components associated with each stacked CSMO chip layer.
Scalable load balancing for massively parallel distributed Monte Carlo particle transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
O'Brien, M. J.; Brantley, P. S.; Joy, K. I.
2013-07-01
In order to run computer simulations efficiently on massively parallel computers with hundreds of thousands or millions of processors, care must be taken that the calculation is load balanced across the processors. Examining the workload of every processor leads to an unscalable algorithm, with run time at least as large as O(N), where N is the number of processors. We present a scalable load balancing algorithm, with run time 0(log(N)), that involves iterated processor-pair-wise balancing steps, ultimately leading to a globally balanced workload. We demonstrate scalability of the algorithm up to 2 million processors on the Sequoia supercomputer at Lawrencemore » Livermore National Laboratory. (authors)« less
Gooding, Thomas Michael; McCarthy, Patrick Joseph
2010-03-02
A data collector for a massively parallel computer system obtains call-return stack traceback data for multiple nodes by retrieving partial call-return stack traceback data from each node, grouping the nodes in subsets according to the partial traceback data, and obtaining further call-return stack traceback data from a representative node or nodes of each subset. Preferably, the partial data is a respective instruction address from each node, nodes having identical instruction address being grouped together in the same subset. Preferably, a single node of each subset is chosen and full stack traceback data is retrieved from the call-return stack within the chosen node.
Gooding, Thomas Michael [Rochester, MN
2011-04-19
An analytical mechanism for a massively parallel computer system automatically analyzes data retrieved from the system, and identifies nodes which exhibit anomalous behavior in comparison to their immediate neighbors. Preferably, anomalous behavior is determined by comparing call-return stack tracebacks for each node, grouping like nodes together, and identifying neighboring nodes which do not themselves belong to the group. A node, not itself in the group, having a large number of neighbors in the group, is a likely locality of error. The analyzer preferably presents this information to the user by sorting the neighbors according to number of adjoining members of the group.
Estimating water flow through a hillslope using the massively parallel processor
NASA Technical Reports Server (NTRS)
Devaney, Judy E.; Camillo, P. J.; Gurney, R. J.
1988-01-01
A new two-dimensional model of water flow in a hillslope has been implemented on the Massively Parallel Processor at the Goddard Space Flight Center. Flow in the soil both in the saturated and unsaturated zones, evaporation and overland flow are all modelled, and the rainfall rates are allowed to vary spatially. Previous models of this type had always been very limited computationally. This model takes less than a minute to model all the components of the hillslope water flow for a day. The model can now be used in sensitivity studies to specify which measurements should be taken and how accurate they should be to describe such flows for environmental studies.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Chiang, Patrick
2014-01-31
The research goal of this CAREER proposal is to develop energy-efficient, VLSI interconnect circuits and systems that will facilitate future massively-parallel, high-performance computing. Extreme-scale computing will exhibit massive parallelism on multiple vertical levels, from thou sands of computational units on a single processor to thousands of processors in a single data center. Unfortunately, the energy required to communicate between these units at every level (on chip, off-chip, off-rack) will be the critical limitation to energy efficiency. Therefore, the PI's career goal is to become a leading researcher in the design of energy-efficient VLSI interconnect for future computing systems.
De novo assembly of human genomes with massively parallel short read sequencing.
Li, Ruiqiang; Zhu, Hongmei; Ruan, Jue; Qian, Wubin; Fang, Xiaodong; Shi, Zhongbin; Li, Yingrui; Li, Shengting; Shan, Gao; Kristiansen, Karsten; Li, Songgang; Yang, Huanming; Wang, Jian; Wang, Jun
2010-02-01
Next-generation massively parallel DNA sequencing technologies provide ultrahigh throughput at a substantially lower unit data cost; however, the data are very short read length sequences, making de novo assembly extremely challenging. Here, we describe a novel method for de novo assembly of large genomes from short read sequences. We successfully assembled both the Asian and African human genome sequences, achieving an N50 contig size of 7.4 and 5.9 kilobases (kb) and scaffold of 446.3 and 61.9 kb, respectively. The development of this de novo short read assembly method creates new opportunities for building reference sequences and carrying out accurate analyses of unexplored genomes in a cost-effective way.
Parallel computations and control of adaptive structures
NASA Technical Reports Server (NTRS)
Park, K. C.; Alvin, Kenneth F.; Belvin, W. Keith; Chong, K. P. (Editor); Liu, S. C. (Editor); Li, J. C. (Editor)
1991-01-01
The equations of motion for structures with adaptive elements for vibration control are presented for parallel computations to be used as a software package for real-time control of flexible space structures. A brief introduction of the state-of-the-art parallel computational capability is also presented. Time marching strategies are developed for an effective use of massive parallel mapping, partitioning, and the necessary arithmetic operations. An example is offered for the simulation of control-structure interaction on a parallel computer and the impact of the approach presented for applications in other disciplines than aerospace industry is assessed.
A Generic Mesh Data Structure with Parallel Applications
ERIC Educational Resources Information Center
Cochran, William Kenneth, Jr.
2009-01-01
High performance, massively-parallel multi-physics simulations are built on efficient mesh data structures. Most data structures are designed from the bottom up, focusing on the implementation of linear algebra routines. In this thesis, we explore a top-down approach to design, evaluating the various needs of many aspects of simulation, not just…
DOE Office of Scientific and Technical Information (OSTI.GOV)
Uhr, L.
1987-01-01
This book is written by research scientists involved in the development of massively parallel, but hierarchically structured, algorithms, architectures, and programs for image processing, pattern recognition, and computer vision. The book gives an integrated picture of the programs and algorithms that are being developed, and also of the multi-computer hardware architectures for which these systems are designed.
DICE/ColDICE: 6D collisionless phase space hydrodynamics using a lagrangian tesselation
NASA Astrophysics Data System (ADS)
Sousbie, Thierry
2018-01-01
DICE is a C++ template library designed to solve collisionless fluid dynamics in 6D phase space using massively parallel supercomputers via an hybrid OpenMP/MPI parallelization. ColDICE, based on DICE, implements a cosmological and physical VLASOV-POISSON solver for cold systems such as dark matter (CDM) dynamics.
A Review of Lightweight Thread Approaches for High Performance Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Castello, Adrian; Pena, Antonio J.; Seo, Sangmin
High-level, directive-based solutions are becoming the programming models (PMs) of the multi/many-core architectures. Several solutions relying on operating system (OS) threads perfectly work with a moderate number of cores. However, exascale systems will spawn hundreds of thousands of threads in order to exploit their massive parallel architectures and thus conventional OS threads are too heavy for that purpose. Several lightweight thread (LWT) libraries have recently appeared offering lighter mechanisms to tackle massive concurrency. In order to examine the suitability of LWTs in high-level runtimes, we develop a set of microbenchmarks consisting of commonlyfound patterns in current parallel codes. Moreover, wemore » study the semantics offered by some LWT libraries in order to expose the similarities between different LWT application programming interfaces. This study reveals that a reduced set of LWT functions can be sufficient to cover the common parallel code patterns and that those LWT libraries perform better than OS threads-based solutions in cases where task and nested parallelism are becoming more popular with new architectures.« less
NASA Technical Reports Server (NTRS)
Reif, John H.
1987-01-01
A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.
Computational mechanics analysis tools for parallel-vector supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.; Qin, J.
1993-01-01
Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigen-solution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization algorithm and domain decomposition. The source code for many of these algorithms is available from NASA Langley.
NASA Technical Reports Server (NTRS)
Kramer, Williams T. C.; Simon, Horst D.
1994-01-01
This tutorial proposes to be a practical guide for the uninitiated to the main topics and themes of high-performance computing (HPC), with particular emphasis to distributed computing. The intent is first to provide some guidance and directions in the rapidly increasing field of scientific computing using both massively parallel and traditional supercomputers. Because of their considerable potential computational power, loosely or tightly coupled clusters of workstations are increasingly considered as a third alternative to both the more conventional supercomputers based on a small number of powerful vector processors, as well as high massively parallel processors. Even though many research issues concerning the effective use of workstation clusters and their integration into a large scale production facility are still unresolved, such clusters are already used for production computing. In this tutorial we will utilize the unique experience made at the NAS facility at NASA Ames Research Center. Over the last five years at NAS massively parallel supercomputers such as the Connection Machines CM-2 and CM-5 from Thinking Machines Corporation and the iPSC/860 (Touchstone Gamma Machine) and Paragon Machines from Intel were used in a production supercomputer center alongside with traditional vector supercomputers such as the Cray Y-MP and C90.
Takano, Yu; Nakata, Kazuto; Yonezawa, Yasushige; Nakamura, Haruki
2016-05-05
A massively parallel program for quantum mechanical-molecular mechanical (QM/MM) molecular dynamics simulation, called Platypus (PLATform for dYnamic Protein Unified Simulation), was developed to elucidate protein functions. The speedup and the parallelization ratio of Platypus in the QM and QM/MM calculations were assessed for a bacteriochlorophyll dimer in the photosynthetic reaction center (DIMER) on the K computer, a massively parallel computer achieving 10 PetaFLOPs with 705,024 cores. Platypus exhibited the increase in speedup up to 20,000 core processors at the HF/cc-pVDZ and B3LYP/cc-pVDZ, and up to 10,000 core processors by the CASCI(16,16)/6-31G** calculations. We also performed excited QM/MM-MD simulations on the chromophore of Sirius (SIRIUS) in water. Sirius is a pH-insensitive and photo-stable ultramarine fluorescent protein. Platypus accelerated on-the-fly excited-state QM/MM-MD simulations for SIRIUS in water, using over 4000 core processors. In addition, it also succeeded in 50-ps (200,000-step) on-the-fly excited-state QM/MM-MD simulations for the SIRIUS in water. © 2016 The Authors. Journal of Computational Chemistry Published by Wiley Periodicals, Inc.
Connecting and Collaborating within and beyond a Massive Open Online Course
ERIC Educational Resources Information Center
Pytash, Kristine E.; Hicks, Troy; Ferdig, Richard E.
2016-01-01
In this article, we draw on the experiences of two young adults, Beth and Jamie (pseudonyms), who participated in a connectivist Massive Open Online Course (cMOOC) to explore how adolescents can become active members in a community of learners and the digital literacy practices that support this entry. We argue that Beth and Jamie engaged in…
Hasson, Uri; Skipper, Jeremy I; Wilde, Michael J; Nusbaum, Howard C; Small, Steven L
2008-01-15
The increasingly complex research questions addressed by neuroimaging research impose substantial demands on computational infrastructures. These infrastructures need to support management of massive amounts of data in a way that affords rapid and precise data analysis, to allow collaborative research, and to achieve these aims securely and with minimum management overhead. Here we present an approach that overcomes many current limitations in data analysis and data sharing. This approach is based on open source database management systems that support complex data queries as an integral part of data analysis, flexible data sharing, and parallel and distributed data processing using cluster computing and Grid computing resources. We assess the strengths of these approaches as compared to current frameworks based on storage of binary or text files. We then describe in detail the implementation of such a system and provide a concrete description of how it was used to enable a complex analysis of fMRI time series data.
Global Ocean Prediction with the HYbrid Coordinate Ocean Model, HYCOM
NASA Astrophysics Data System (ADS)
Chassignet, E.
A broad partnership of institutions is collaborating in developing and demonstrating the performance and application of eddy-resolving, real-time global and Atlantic ocean prediction systems using the the HYbrid Coordinate Ocean Model (HYCOM). These systems will be transitioned for operational use by both the U.S. Navy at the Naval Oceanographic Office (NAVOCEANO), Stennis Space Center, MS, and the Fleet Numerical Meteorology and Oceanography Centre (FNMOC), Monterey, CA, and by NOAA at the National Centers for Environmental Prediction (NCEP), Washington, D.C. These systems will run efficiently on a variety of massively parallel computers and will include sophisticated data assimilation techniques for assimilation of satellite altimeter sea surface height and sea surface temperature as well as in situ temperature, salinity, and float displacement. The Partnership addresses the Global Ocean Data Assimilation Experiment (GODAE) goals of three-dimensional (3D) depiction of the ocean state at fine resolution in real-time and provision of boundary conditions for coastal and regional models. An overview of the effort will be presented.
Component-based integration of chemistry and optimization software.
Kenny, Joseph P; Benson, Steven J; Alexeev, Yuri; Sarich, Jason; Janssen, Curtis L; McInnes, Lois Curfman; Krishnan, Manojkumar; Nieplocha, Jarek; Jurrus, Elizabeth; Fahlstrom, Carl; Windus, Theresa L
2004-11-15
Typical scientific software designs make rigid assumptions regarding programming language and data structures, frustrating software interoperability and scientific collaboration. Component-based software engineering is an emerging approach to managing the increasing complexity of scientific software. Component technology facilitates code interoperability and reuse. Through the adoption of methodology and tools developed by the Common Component Architecture Forum, we have developed a component architecture for molecular structure optimization. Using the NWChem and Massively Parallel Quantum Chemistry packages, we have produced chemistry components that provide capacity for energy and energy derivative evaluation. We have constructed geometry optimization applications by integrating the Toolkit for Advanced Optimization, Portable Extensible Toolkit for Scientific Computation, and Global Arrays packages, which provide optimization and linear algebra capabilities. We present a brief overview of the component development process and a description of abstract interfaces for chemical optimizations. The components conforming to these abstract interfaces allow the construction of applications using different chemistry and mathematics packages interchangeably. Initial numerical results for the component software demonstrate good performance, and highlight potential research enabled by this platform.
2010 Award for Outstanding Doctoral Thesis Research in Biological Physics Talk: How the Genome Folds
NASA Astrophysics Data System (ADS)
Lieberman-Aiden, Erez
2011-03-01
I describe Hi-C, a novel technology for probing the three-dimensional architecture of whole genomes by coupling proximity-based ligation with massively parallel sequencing. Working with collaborators at the Broad Institute and UMass Medical School, we used Hi-C to construct spatial proximity maps of the human genome at a resolution of 1Mb. These maps confirm the presence of chromosome territories and the spatial proximity of small, gene-rich chromosomes. We identified an additional level of genome organization that is characterized by the spatial segregation of open and closed chromatin to form two genome-wide compartments. At the megabase scale, the chromatin conformation is consistent with a fractal globule, a knot-free conformation that enables maximally dense packing while preserving the ability to easily fold and unfold any genomic locus. The fractal globule is distinct from the more commonly used globular equilibrium model. Our results demonstrate the power of Hi-C to map the dynamic conformations of whole genomes.
Hasson, Uri; Skipper, Jeremy I.; Wilde, Michael J.; Nusbaum, Howard C.; Small, Steven L.
2007-01-01
The increasingly complex research questions addressed by neuroimaging research impose substantial demands on computational infrastructures. These infrastructures need to support management of massive amounts of data in a way that affords rapid and precise data analysis, to allow collaborative research, and to achieve these aims securely and with minimum management overhead. Here we present an approach that overcomes many current limitations in data analysis and data sharing. This approach is based on open source database management systems that support complex data queries as an integral part of data analysis, flexible data sharing, and parallel and distributed data processing using cluster computing and Grid computing resources. We assess the strengths of these approaches as compared to current frameworks based on storage of binary or text files. We then describe in detail the implementation of such a system and provide a concrete description of how it was used to enable a complex analysis of fMRI time series data. PMID:17964812
Global computing for bioinformatics.
Loewe, Laurence
2002-12-01
Global computing, the collaboration of idle PCs via the Internet in a SETI@home style, emerges as a new way of massive parallel multiprocessing with potentially enormous CPU power. Its relations to the broader, fast-moving field of Grid computing are discussed without attempting a review of the latter. This review (i) includes a short table of milestones in global computing history, (ii) lists opportunities global computing offers for bioinformatics, (iii) describes the structure of problems well suited for such an approach, (iv) analyses the anatomy of successful projects and (v) points to existing software frameworks. Finally, an evaluation of the various costs shows that global computing indeed has merit, if the problem to be solved is already coded appropriately and a suitable global computing framework can be found. Then, either significant amounts of computing power can be recruited from the general public, or--if employed in an enterprise-wide Intranet for security reasons--idle desktop PCs can substitute for an expensive dedicated cluster.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Onishi, Yasuo
Four Japan Atomic Energy Agency (JAEA) researchers visited Pacific Northwest National Laboratory (PNNL) for seven working days and have evaluated the suitability and adaptability of FLESCOT to a JAEA’s supercomputer system to effectively simulate cesium behavior in dam reservoirs, river mouths, and coastal areas in Fukushima contaminated by the Fukushima Daiichi nuclear accident. PNNL showed the following to JAEA visitors during the seven-working day period: FLESCOT source code; User’s manual; FLESCOT description – Program structure – Algorism – Solver – Boundary condition handling – Data definition – Input and output methods – How to run. During the visit, JAEA hadmore » access to FLESCOT to run with an input data set to evaluate the capacity and feasibility of adapting it to a JAEA super computer with massive parallel processors. As a part of this evaluation, PNNL ran FLESCOT for sample cases of the contaminant migration simulation to further describe FLESCOT in action. JAEA and PNNL researchers also evaluated time spent for each subroutine of FLESCOT, and the JAEA researcher implemented some initial parallelization schemes to FLESCOT. Based on this code evaluation, JAEA and PNNL determined that FLESCOT is: applicable to Fukushima lakes/dam reservoirs, river mouth areas, and coastal water; and feasible to implement parallelization for the JAEA supercomputer. In addition, PNNL and JAEA researchers discussed molecular modeling approaches on cesium adsorption mechanisms to enhance the JAEA molecular modeling activities. PNNL and JAEA also discussed specific collaboration of molecular and computational modeling activities.« less
Zhu, Xiang; Zhang, Dianwen
2013-01-01
We present a fast, accurate and robust parallel Levenberg-Marquardt minimization optimizer, GPU-LMFit, which is implemented on graphics processing unit for high performance scalable parallel model fitting processing. GPU-LMFit can provide a dramatic speed-up in massive model fitting analyses to enable real-time automated pixel-wise parametric imaging microscopy. We demonstrate the performance of GPU-LMFit for the applications in superresolution localization microscopy and fluorescence lifetime imaging microscopy. PMID:24130785
NASA Astrophysics Data System (ADS)
Plaza, Antonio; Chang, Chein-I.; Plaza, Javier; Valencia, David
2006-05-01
The incorporation of hyperspectral sensors aboard airborne/satellite platforms is currently producing a nearly continual stream of multidimensional image data, and this high data volume has soon introduced new processing challenges. The price paid for the wealth spatial and spectral information available from hyperspectral sensors is the enormous amounts of data that they generate. Several applications exist, however, where having the desired information calculated quickly enough for practical use is highly desirable. High computing performance of algorithm analysis is particularly important in homeland defense and security applications, in which swift decisions often involve detection of (sub-pixel) military targets (including hostile weaponry, camouflage, concealment, and decoys) or chemical/biological agents. In order to speed-up computational performance of hyperspectral imaging algorithms, this paper develops several fast parallel data processing techniques. Techniques include four classes of algorithms: (1) unsupervised classification, (2) spectral unmixing, and (3) automatic target recognition, and (4) onboard data compression. A massively parallel Beowulf cluster (Thunderhead) at NASA's Goddard Space Flight Center in Maryland is used to measure parallel performance of the proposed algorithms. In order to explore the viability of developing onboard, real-time hyperspectral data compression algorithms, a Xilinx Virtex-II field programmable gate array (FPGA) is also used in experiments. Our quantitative and comparative assessment of parallel techniques and strategies may help image analysts in selection of parallel hyperspectral algorithms for specific applications.
CMOS VLSI Layout and Verification of a SIMD Computer
NASA Technical Reports Server (NTRS)
Zheng, Jianqing
1996-01-01
A CMOS VLSI layout and verification of a 3 x 3 processor parallel computer has been completed. The layout was done using the MAGIC tool and the verification using HSPICE. Suggestions for expanding the computer into a million processor network are presented. Many problems that might be encountered when implementing a massively parallel computer are discussed.
ERIC Educational Resources Information Center
Puonti, Anne
2004-01-01
Economic-crime investigation in Finland is in transition from hierarchically organized, sequential collaboration between authorities toward parallel, interorganizational collaboration. This article describes the tools used and developed for managing the new collaborative economic-crime-investigation process. The challenge is to find…
Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul
2010-11-16
A massively parallel computer system contains an inter-nodal communications network of node-to-node links. An automated routing strategy routes packets through one or more intermediate nodes of the network to reach a destination. Some packets are constrained to be routed through respective designated transporter nodes, the automated routing strategy determining a path from a respective source node to a respective transporter node, and from a respective transporter node to a respective destination node. Preferably, the source node chooses a routing policy from among multiple possible choices, and that policy is followed by all intermediate nodes. The use of transporter nodes allows greater flexibility in routing.
Parallel design patterns for a low-power, software-defined compressed video encoder
NASA Astrophysics Data System (ADS)
Bruns, Michael W.; Hunt, Martin A.; Prasad, Durga; Gunupudi, Nageswara R.; Sonachalam, Sekar
2011-06-01
Video compression algorithms such as H.264 offer much potential for parallel processing that is not always exploited by the technology of a particular implementation. Consumer mobile encoding devices often achieve real-time performance and low power consumption through parallel processing in Application Specific Integrated Circuit (ASIC) technology, but many other applications require a software-defined encoder. High quality compression features needed for some applications such as 10-bit sample depth or 4:2:2 chroma format often go beyond the capability of a typical consumer electronics device. An application may also need to efficiently combine compression with other functions such as noise reduction, image stabilization, real time clocks, GPS data, mission/ESD/user data or software-defined radio in a low power, field upgradable implementation. Low power, software-defined encoders may be implemented using a massively parallel memory-network processor array with 100 or more cores and distributed memory. The large number of processor elements allow the silicon device to operate more efficiently than conventional DSP or CPU technology. A dataflow programming methodology may be used to express all of the encoding processes including motion compensation, transform and quantization, and entropy coding. This is a declarative programming model in which the parallelism of the compression algorithm is expressed as a hierarchical graph of tasks with message communication. Data parallel and task parallel design patterns are supported without the need for explicit global synchronization control. An example is described of an H.264 encoder developed for a commercially available, massively parallel memorynetwork processor device.
Ordered fast fourier transforms on a massively parallel hypercube multiprocessor
NASA Technical Reports Server (NTRS)
Tong, Charles; Swarztrauber, Paul N.
1989-01-01
Design alternatives for ordered Fast Fourier Transformation (FFT) algorithms were examined on massively parallel hypercube multiprocessors such as the Connection Machine. Particular emphasis is placed on reducing communication which is known to dominate the overall computing time. To this end, the order and computational phases of the FFT were combined, and the sequence to processor maps that reduce communication were used. The class of ordered transforms is expanded to include any FFT in which the order of the transform is the same as that of the input sequence. Two such orderings are examined, namely, standard-order and A-order which can be implemented with equal ease on the Connection Machine where orderings are determined by geometries and priorities. If the sequence has N = 2 exp r elements and the hypercube has P = 2 exp d processors, then a standard-order FFT can be implemented with d + r/2 + 1 parallel transmissions. An A-order sequence can be transformed with 2d - r/2 parallel transmissions which is r - d + 1 fewer than the standard order. A parallel method for computing the trigonometric coefficients is presented that does not use trigonometric functions or interprocessor communication. A performance of 0.9 GFLOPS was obtained for an A-order transform on the Connection Machine.
Liwo, Adam; Ołdziej, Stanisław; Czaplewski, Cezary; Kleinerman, Dana S.; Blood, Philip; Scheraga, Harold A.
2010-01-01
We report the implementation of our united-residue UNRES force field for simulations of protein structure and dynamics with massively parallel architectures. In addition to coarse-grained parallelism already implemented in our previous work, in which each conformation was treated by a different task, we introduce a fine-grained level in which energy and gradient evaluation are split between several tasks. The Message Passing Interface (MPI) libraries have been utilized to construct the parallel code. The parallel performance of the code has been tested on a professional Beowulf cluster (Xeon Quad Core), a Cray XT3 supercomputer, and two IBM BlueGene/P supercomputers with canonical and replica-exchange molecular dynamics. With IBM BlueGene/P, about 50 % efficiency and 120-fold speed-up of the fine-grained part was achieved for a single trajectory of a 767-residue protein with use of 256 processors/trajectory. Because of averaging over the fast degrees of freedom, UNRES provides an effective 1000-fold speed-up compared to the experimental time scale and, therefore, enables us to effectively carry out millisecond-scale simulations of proteins with 500 and more amino-acid residues in days of wall-clock time. PMID:20305729
Optimisation of a parallel ocean general circulation model
NASA Astrophysics Data System (ADS)
Beare, M. I.; Stevens, D. P.
1997-10-01
This paper presents the development of a general-purpose parallel ocean circulation model, for use on a wide range of computer platforms, from traditional scalar machines to workstation clusters and massively parallel processors. Parallelism is provided, as a modular option, via high-level message-passing routines, thus hiding the technical intricacies from the user. An initial implementation highlights that the parallel efficiency of the model is adversely affected by a number of factors, for which optimisations are discussed and implemented. The resulting ocean code is portable and, in particular, allows science to be achieved on local workstations that could otherwise only be undertaken on state-of-the-art supercomputers.
Graphics Processing Unit Assisted Thermographic Compositing
NASA Technical Reports Server (NTRS)
Ragasa, Scott; McDougal, Matthew; Russell, Sam
2012-01-01
Objective: To develop a software application utilizing general purpose graphics processing units (GPUs) for the analysis of large sets of thermographic data. Background: Over the past few years, an increasing effort among scientists and engineers to utilize the GPU in a more general purpose fashion is allowing for supercomputer level results at individual workstations. As data sets grow, the methods to work them grow at an equal, and often great, pace. Certain common computations can take advantage of the massively parallel and optimized hardware constructs of the GPU to allow for throughput that was previously reserved for compute clusters. These common computations have high degrees of data parallelism, that is, they are the same computation applied to a large set of data where the result does not depend on other data elements. Signal (image) processing is one area were GPUs are being used to greatly increase the performance of certain algorithms and analysis techniques. Technical Methodology/Approach: Apply massively parallel algorithms and data structures to the specific analysis requirements presented when working with thermographic data sets.
Nishizawa, Hiroaki; Nishimura, Yoshifumi; Kobayashi, Masato; Irle, Stephan; Nakai, Hiromi
2016-08-05
The linear-scaling divide-and-conquer (DC) quantum chemical methodology is applied to the density-functional tight-binding (DFTB) theory to develop a massively parallel program that achieves on-the-fly molecular reaction dynamics simulations of huge systems from scratch. The functions to perform large scale geometry optimization and molecular dynamics with DC-DFTB potential energy surface are implemented to the program called DC-DFTB-K. A novel interpolation-based algorithm is developed for parallelizing the determination of the Fermi level in the DC method. The performance of the DC-DFTB-K program is assessed using a laboratory computer and the K computer. Numerical tests show the high efficiency of the DC-DFTB-K program, a single-point energy gradient calculation of a one-million-atom system is completed within 60 s using 7290 nodes of the K computer. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Particle simulation of plasmas on the massively parallel processor
NASA Technical Reports Server (NTRS)
Gledhill, I. M. A.; Storey, L. R. O.
1987-01-01
Particle simulations, in which collective phenomena in plasmas are studied by following the self consistent motions of many discrete particles, involve several highly repetitive sets of calculations that are readily adaptable to SIMD parallel processing. A fully electromagnetic, relativistic plasma simulation for the massively parallel processor is described. The particle motions are followed in 2 1/2 dimensions on a 128 x 128 grid, with periodic boundary conditions. The two dimensional simulation space is mapped directly onto the processor network; a Fast Fourier Transform is used to solve the field equations. Particle data are stored according to an Eulerian scheme, i.e., the information associated with each particle is moved from one local memory to another as the particle moves across the spatial grid. The method is applied to the study of the nonlinear development of the whistler instability in a magnetospheric plasma model, with an anisotropic electron temperature. The wave distribution function is included as a new diagnostic to allow simulation results to be compared with satellite observations.
A Massively Parallel Computational Method of Reading Index Files for SOAPsnv.
Zhu, Xiaoqian; Peng, Shaoliang; Liu, Shaojie; Cui, Yingbo; Gu, Xiang; Gao, Ming; Fang, Lin; Fang, Xiaodong
2015-12-01
SOAPsnv is the software used for identifying the single nucleotide variation in cancer genes. However, its performance is yet to match the massive amount of data to be processed. Experiments reveal that the main performance bottleneck of SOAPsnv software is the pileup algorithm. The original pileup algorithm's I/O process is time-consuming and inefficient to read input files. Moreover, the scalability of the pileup algorithm is also poor. Therefore, we designed a new algorithm, named BamPileup, aiming to improve the performance of sequential read, and the new pileup algorithm implemented a parallel read mode based on index. Using this method, each thread can directly read the data start from a specific position. The results of experiments on the Tianhe-2 supercomputer show that, when reading data in a multi-threaded parallel I/O way, the processing time of algorithm is reduced to 3.9 s and the application program can achieve a speedup up to 100×. Moreover, the scalability of the new algorithm is also satisfying.
Quantum supercharger library: hyper-parallelism of the Hartree-Fock method.
Fernandes, Kyle D; Renison, C Alicia; Naidoo, Kevin J
2015-07-05
We present here a set of algorithms that completely rewrites the Hartree-Fock (HF) computations common to many legacy electronic structure packages (such as GAMESS-US, GAMESS-UK, and NWChem) into a massively parallel compute scheme that takes advantage of hardware accelerators such as Graphical Processing Units (GPUs). The HF compute algorithm is core to a library of routines that we name the Quantum Supercharger Library (QSL). We briefly evaluate the QSL's performance and report that it accelerates a HF 6-31G Self-Consistent Field (SCF) computation by up to 20 times for medium sized molecules (such as a buckyball) when compared with mature Central Processing Unit algorithms available in the legacy codes in regular use by researchers. It achieves this acceleration by massive parallelization of the one- and two-electron integrals and optimization of the SCF and Direct Inversion in the Iterative Subspace routines through the use of GPU linear algebra libraries. © 2015 Wiley Periodicals, Inc. © 2015 Wiley Periodicals, Inc.
ERIC Educational Resources Information Center
Junjie, Zhou
2017-01-01
The purpose of this paper was to investigate what factors influence learners' continuance intention in massive open online courses (MOOCs) for online collaborative learning. An extended expectation confirmation model (ECM) was adopted as the theoretical foundation. A total of 435 valid samples were collected in mainland China and structural…
ERIC Educational Resources Information Center
Lee, Mark J. W.; Eustace, Ken; Fellows, Geoff; Bytheway, Allan; Irving, Leah
2005-01-01
This paper reports on the first stage of a project to develop and test the use of massively multiplayer online role playing games (MMORPGs) for promoting computer supported collaborative learning through instructional gaming in the high school classroom. Teachers and students of English and Science at Swan View Senior High School, Western…
Applications of Parallel Process HiMAP for Large Scale Multidisciplinary Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Potsdam, Mark; Rodriguez, David; Kwak, Dochay (Technical Monitor)
2000-01-01
HiMAP is a three level parallel middleware that can be interfaced to a large scale global design environment for code independent, multidisciplinary analysis using high fidelity equations. Aerospace technology needs are rapidly changing. Computational tools compatible with the requirements of national programs such as space transportation are needed. Conventional computation tools are inadequate for modern aerospace design needs. Advanced, modular computational tools are needed, such as those that incorporate the technology of massively parallel processors (MPP).
Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Kwak, Dochan (Technical Monitor)
2002-01-01
A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel supercomputers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.
Development and Applications of a Modular Parallel Process for Large Scale Fluid/Structures Problems
NASA Technical Reports Server (NTRS)
Guruswamy, Guru P.; Byun, Chansup; Kwak, Dochan (Technical Monitor)
2001-01-01
A modular process that can efficiently solve large scale multidisciplinary problems using massively parallel super computers is presented. The process integrates disciplines with diverse physical characteristics by retaining the efficiency of individual disciplines. Computational domain independence of individual disciplines is maintained using a meta programming approach. The process integrates disciplines without affecting the combined performance. Results are demonstrated for large scale aerospace problems on several supercomputers. The super scalability and portability of the approach is demonstrated on several parallel computers.
Adaptive parallel logic networks
NASA Technical Reports Server (NTRS)
Martinez, Tony R.; Vidal, Jacques J.
1988-01-01
Adaptive, self-organizing concurrent systems (ASOCS) that combine self-organization with massive parallelism for such applications as adaptive logic devices, robotics, process control, and system malfunction management, are presently discussed. In ASOCS, an adaptive network composed of many simple computing elements operating in combinational and asynchronous fashion is used and problems are specified by presenting if-then rules to the system in the form of Boolean conjunctions. During data processing, which is a different operational phase from adaptation, the network acts as a parallel hardware circuit.
Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P
DOE Office of Scientific and Technical Information (OSTI.GOV)
Candel, A.; Kabel, A.; Lee, L.
In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).
Experience in highly parallel processing using DAP
NASA Technical Reports Server (NTRS)
Parkinson, D.
1987-01-01
Distributed Array Processors (DAP) have been in day to day use for ten years and a large amount of user experience has been gained. The profile of user applications is similar to that of the Massively Parallel Processor (MPP) working group. Experience has shown that contrary to expectations, highly parallel systems provide excellent performance on so-called dirty problems such as the physics part of meteorological codes. The reasons for this observation are discussed. The arguments against replacing bit processors with floating point processors are also discussed.
Dynamic load balancing of applications
Wheat, Stephen R.
1997-01-01
An application-level method for dynamically maintaining global load balance on a parallel computer, particularly on massively parallel MIMD computers. Global load balancing is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated.
Practical aspects of prestack depth migration with finite differences
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ober, C.C.; Oldfield, R.A.; Womble, D.E.
1997-07-01
Finite-difference, prestack, depth migrations offers significant improvements over Kirchhoff methods in imaging near or under salt structures. The authors have implemented a finite-difference prestack depth migration algorithm for use on massively parallel computers which is discussed. The image quality of the finite-difference scheme has been investigated and suggested improvements are discussed. In this presentation, the authors discuss an implicit finite difference migration code, called Salvo, that has been developed through an ACTI (Advanced Computational Technology Initiative) joint project. This code is designed to be efficient on a variety of massively parallel computers. It takes advantage of both frequency and spatialmore » parallelism as well as the use of nodes dedicated to data input/output (I/O). Besides giving an overview of the finite-difference algorithm and some of the parallelism techniques used, migration results using both Kirchhoff and finite-difference migration will be presented and compared. The authors start out with a very simple Cartoon model where one can intuitively see the multiple travel paths and some of the potential problems that will be encountered with Kirchhoff migration. More complex synthetic models as well as results from actual seismic data from the Gulf of Mexico will be shown.« less
Massively Parallel Dantzig-Wolfe Decomposition Applied to Traffic Flow Scheduling
NASA Technical Reports Server (NTRS)
Rios, Joseph Lucio; Ross, Kevin
2009-01-01
Optimal scheduling of air traffic over the entire National Airspace System is a computationally difficult task. To speed computation, Dantzig-Wolfe decomposition is applied to a known linear integer programming approach for assigning delays to flights. The optimization model is proven to have the block-angular structure necessary for Dantzig-Wolfe decomposition. The subproblems for this decomposition are solved in parallel via independent computation threads. Experimental evidence suggests that as the number of subproblems/threads increases (and their respective sizes decrease), the solution quality, convergence, and runtime improve. A demonstration of this is provided by using one flight per subproblem, which is the finest possible decomposition. This results in thousands of subproblems and associated computation threads. This massively parallel approach is compared to one with few threads and to standard (non-decomposed) approaches in terms of solution quality and runtime. Since this method generally provides a non-integral (relaxed) solution to the original optimization problem, two heuristics are developed to generate an integral solution. Dantzig-Wolfe followed by these heuristics can provide a near-optimal (sometimes optimal) solution to the original problem hundreds of times faster than standard (non-decomposed) approaches. In addition, when massive decomposition is employed, the solution is shown to be more likely integral, which obviates the need for an integerization step. These results indicate that nationwide, real-time, high fidelity, optimal traffic flow scheduling is achievable for (at least) 3 hour planning horizons.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Madduri, Kamesh; Ediger, David; Jiang, Karl
2009-02-15
We present a new lock-free parallel algorithm for computing betweenness centralityof massive small-world networks. With minor changes to the data structures, ouralgorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in HPCS SSCA#2, a benchmark extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the Threadstorm processor, and a single-socket Sun multicore server with the UltraSPARC T2 processor. For a small-world network of 134 millionmore » vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Madduri, Kamesh; Ediger, David; Jiang, Karl
2009-05-29
We present a new lock-free parallel algorithm for computing betweenness centrality of massive small-world networks. With minor changes to the data structures, our algorithm also achieves better spatial cache locality compared to previous approaches. Betweenness centrality is a key algorithm kernel in the HPCS SSCA#2 Graph Analysis benchmark, which has been extensively used to evaluate the performance of emerging high-performance computing architectures for graph-theoretic computations. We design optimized implementations of betweenness centrality and the SSCA#2 benchmark for two hardware multithreaded systems: a Cray XMT system with the ThreadStorm processor, and a single-socket Sun multicore server with the UltraSparc T2 processor.more » For a small-world network of 134 million vertices and 1.073 billion edges, the 16-processor XMT system and the 8-core Sun Fire T5120 server achieve TEPS scores (an algorithmic performance count for the SSCA#2 benchmark) of 160 million and 90 million respectively, which corresponds to more than a 2X performance improvement over the previous parallel implementations. To better characterize the performance of these multithreaded systems, we correlate the SSCA#2 performance results with data from the memory-intensive STREAM and RandomAccess benchmarks. Finally, we demonstrate the applicability of our implementation to analyze massive real-world datasets by computing approximate betweenness centrality for a large-scale IMDb movie-actor network.« less
Learning Quantitative Sequence-Function Relationships from Massively Parallel Experiments
NASA Astrophysics Data System (ADS)
Atwal, Gurinder S.; Kinney, Justin B.
2016-03-01
A fundamental aspect of biological information processing is the ubiquity of sequence-function relationships—functions that map the sequence of DNA, RNA, or protein to a biochemically relevant activity. Most sequence-function relationships in biology are quantitative, but only recently have experimental techniques for effectively measuring these relationships been developed. The advent of such "massively parallel" experiments presents an exciting opportunity for the concepts and methods of statistical physics to inform the study of biological systems. After reviewing these recent experimental advances, we focus on the problem of how to infer parametric models of sequence-function relationships from the data produced by these experiments. Specifically, we retrace and extend recent theoretical work showing that inference based on mutual information, not the standard likelihood-based approach, is often necessary for accurately learning the parameters of these models. Closely connected with this result is the emergence of "diffeomorphic modes"—directions in parameter space that are far less constrained by data than likelihood-based inference would suggest. Analogous to Goldstone modes in physics, diffeomorphic modes arise from an arbitrarily broken symmetry of the inference problem. An analytically tractable model of a massively parallel experiment is then described, providing an explicit demonstration of these fundamental aspects of statistical inference. This paper concludes with an outlook on the theoretical and computational challenges currently facing studies of quantitative sequence-function relationships.
NASA Astrophysics Data System (ADS)
Zhao, Feng; Frietman, Edward E. E.; Han, Zhong; Chen, Ray T.
1999-04-01
A characteristic feature of a conventional von Neumann computer is that computing power is delivered by a single processing unit. Although increasing the clock frequency improves the performance of the computer, the switching speed of the semiconductor devices and the finite speed at which electrical signals propagate along the bus set the boundaries. Architectures containing large numbers of nodes can solve this performance dilemma, with the comment that main obstacles in designing such systems are caused by difficulties to come up with solutions that guarantee efficient communications among the nodes. Exchanging data becomes really a bottleneck should al nodes be connected by a shared resource. Only optics, due to its inherent parallelism, could solve that bottleneck. Here, we explore a multi-faceted free space image distributor to be used in optical interconnects in massively parallel processing. In this paper, physical and optical models of the image distributor are focused on from diffraction theory of light wave to optical simulations. the general features and the performance of the image distributor are also described. The new structure of an image distributor and the simulations for it are discussed. From the digital simulation and experiment, it is found that the multi-faceted free space image distributing technique is quite suitable for free space optical interconnection in massively parallel processing and new structure of the multifaceted free space image distributor would perform better.
Computer Sciences and Data Systems, volume 2
NASA Technical Reports Server (NTRS)
1987-01-01
Topics addressed include: data storage; information network architecture; VHSIC technology; fiber optics; laser applications; distributed processing; spaceborne optical disk controller; massively parallel processors; and advanced digital SAR processors.
Design considerations for parallel graphics libraries
NASA Technical Reports Server (NTRS)
Crockett, Thomas W.
1994-01-01
Applications which run on parallel supercomputers are often characterized by massive datasets. Converting these vast collections of numbers to visual form has proven to be a powerful aid to comprehension. For a variety of reasons, it may be desirable to provide this visual feedback at runtime. One way to accomplish this is to exploit the available parallelism to perform graphics operations in place. In order to do this, we need appropriate parallel rendering algorithms and library interfaces. This paper provides a tutorial introduction to some of the issues which arise in designing parallel graphics libraries and their underlying rendering algorithms. The focus is on polygon rendering for distributed memory message-passing systems. We illustrate our discussion with examples from PGL, a parallel graphics library which has been developed on the Intel family of parallel systems.
Directions in parallel programming: HPF, shared virtual memory and object parallelism in pC++
NASA Technical Reports Server (NTRS)
Bodin, Francois; Priol, Thierry; Mehrotra, Piyush; Gannon, Dennis
1994-01-01
Fortran and C++ are the dominant programming languages used in scientific computation. Consequently, extensions to these languages are the most popular for programming massively parallel computers. We discuss two such approaches to parallel Fortran and one approach to C++. The High Performance Fortran Forum has designed HPF with the intent of supporting data parallelism on Fortran 90 applications. HPF works by asking the user to help the compiler distribute and align the data structures with the distributed memory modules in the system. Fortran-S takes a different approach in which the data distribution is managed by the operating system and the user provides annotations to indicate parallel control regions. In the case of C++, we look at pC++ which is based on a concurrent aggregate parallel model.
Archer, Charles Jens; Musselman, Roy Glenn; Peters, Amanda; Pinnow, Kurt Walter; Swartz, Brent Allen; Wallenfelt, Brian Paul
2010-11-23
A massively parallel computer system contains an inter-nodal communications network of node-to-node links. Nodes vary a choice of routing policy for routing data in the network in a semi-random manner, so that similarly situated packets are not always routed along the same path. Semi-random variation of the routing policy tends to avoid certain local hot spots of network activity, which might otherwise arise using more consistent routing determinations. Preferably, the originating node chooses a routing policy for a packet, and all intermediate nodes in the path route the packet according to that policy. Policies may be rotated on a round-robin basis, selected by generating a random number, or otherwise varied.
Phase space simulation of collisionless stellar systems on the massively parallel processor
NASA Technical Reports Server (NTRS)
White, Richard L.
1987-01-01
A numerical technique for solving the collisionless Boltzmann equation describing the time evolution of a self gravitating fluid in phase space was implemented on the Massively Parallel Processor (MPP). The code performs calculations for a two dimensional phase space grid (with one space and one velocity dimension). Some results from calculations are presented. The execution speed of the code is comparable to the speed of a single processor of a Cray-XMP. Advantages and disadvantages of the MPP architecture for this type of problem are discussed. The nearest neighbor connectivity of the MPP array does not pose a significant obstacle. Future MPP-like machines should have much more local memory and easier access to staging memory and disks in order to be effective for this type of problem.
Applications of massively parallel computers in telemetry processing
NASA Technical Reports Server (NTRS)
El-Ghazawi, Tarek A.; Pritchard, Jim; Knoble, Gordon
1994-01-01
Telemetry processing refers to the reconstruction of full resolution raw instrumentation data with artifacts, of space and ground recording and transmission, removed. Being the first processing phase of satellite data, this process is also referred to as level-zero processing. This study is aimed at investigating the use of massively parallel computing technology in providing level-zero processing to spaceflights that adhere to the recommendations of the Consultative Committee on Space Data Systems (CCSDS). The workload characteristics, of level-zero processing, are used to identify processing requirements in high-performance computing systems. An example of level-zero functions on a SIMD MPP, such as the MasPar, is discussed. The requirements in this paper are based in part on the Earth Observing System (EOS) Data and Operation System (EDOS).
Genetic heterogeneity of RPMI-8402, a T-acute lymphoblastic leukemia cell line
STOCZYNSKA-FIDELUS, EWELINA; PIASKOWSKI, SYLWESTER; PAWLOWSKA, ROZA; SZYBKA, MALGORZATA; PECIAK, JOANNA; HULAS-BIGOSZEWSKA, KRYSTYNA; WINIECKA-KLIMEK, MARTA; RIESKE, PIOTR
2016-01-01
Thorough examination of genetic heterogeneity of cell lines is uncommon. In order to address this issue, the present study analyzed the genetic heterogeneity of RPMI-8402, a T-acute lymphoblastic leukemia (T-ALL) cell line. For this purpose, traditional techniques such as fluorescence in situ hybridization and immunocytochemistry were used, in addition to more advanced techniques, including cell sorting, Sanger sequencing and massive parallel sequencing. The results indicated that the RPMI-8402 cell line consists of several genetically different cell subpopulations. Furthermore, massive parallel sequencing of RPMI-8402 provided insight into the evolution of T-ALL carcinogenesis, since this cell line exhibited the genetic heterogeneity typical of T-ALL. Therefore, the use of cell lines for drug testing in future studies may aid the progress of anticancer drug research. PMID:26870252
Big data mining analysis method based on cloud computing
NASA Astrophysics Data System (ADS)
Cai, Qing Qiu; Cui, Hong Gang; Tang, Hao
2017-08-01
Information explosion era, large data super-large, discrete and non-(semi) structured features have gone far beyond the traditional data management can carry the scope of the way. With the arrival of the cloud computing era, cloud computing provides a new technical way to analyze the massive data mining, which can effectively solve the problem that the traditional data mining method cannot adapt to massive data mining. This paper introduces the meaning and characteristics of cloud computing, analyzes the advantages of using cloud computing technology to realize data mining, designs the mining algorithm of association rules based on MapReduce parallel processing architecture, and carries out the experimental verification. The algorithm of parallel association rule mining based on cloud computing platform can greatly improve the execution speed of data mining.
Hybrid massively parallel fast sweeping method for static Hamilton-Jacobi equations
NASA Astrophysics Data System (ADS)
Detrixhe, Miles; Gibou, Frédéric
2016-10-01
The fast sweeping method is a popular algorithm for solving a variety of static Hamilton-Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling, and show state-of-the-art speedup values for the fast sweeping method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lichtner, Peter C.; Hammond, Glenn E.; Lu, Chuan
PFLOTRAN solves a system of generally nonlinear partial differential equations describing multi-phase, multicomponent and multiscale reactive flow and transport in porous materials. The code is designed to run on massively parallel computing architectures as well as workstations and laptops (e.g. Hammond et al., 2011). Parallelization is achieved through domain decomposition using the PETSc (Portable Extensible Toolkit for Scientific Computation) libraries for the parallelization framework (Balay et al., 1997). PFLOTRAN has been developed from the ground up for parallel scalability and has been run on up to 218 processor cores with problem sizes up to 2 billion degrees of freedom. Writtenmore » in object oriented Fortran 90, the code requires the latest compilers compatible with Fortran 2003. At the time of this writing this requires gcc 4.7.x, Intel 12.1.x and PGC compilers. As a requirement of running problems with a large number of degrees of freedom, PFLOTRAN allows reading input data that is too large to fit into memory allotted to a single processor core. The current limitation to the problem size PFLOTRAN can handle is the limitation of the HDF5 file format used for parallel IO to 32 bit integers. Noting that 2 32 = 4; 294; 967; 296, this gives an estimate of the maximum problem size that can be currently run with PFLOTRAN. Hopefully this limitation will be remedied in the near future.« less
The Use of MOOC as a Means of Creating a Collaborative Learning Environment in a Blended CLIL Course
ERIC Educational Resources Information Center
Titova, Svetlana
2017-01-01
The objective of this action research is to work out the possible ways of Massive Open Online Course (MOOC) integration in a blended Content and Language Integrated Learning (CLIL) course to create an authentic online collaborative community. The theoretical framework of the intervention is based on current MOOC theories, connectivism, and the…
Dynamic Imbalance Would Counter Offcenter Thrust
NASA Technical Reports Server (NTRS)
Mccanna, Jason
1994-01-01
Dynamic imbalance generated by offcenter thrust on rotating body eliminated by shifting some of mass of body to generate opposing dynamic imbalance. Technique proposed originally for spacecraft including massive crew module connected via long, lightweight intermediate structure to massive engine module, such that artificial gravitation in crew module generated by rotating spacecraft around axis parallel to thrust generated by engine. Also applicable to dynamic balancing of rotating terrestrial equipment to which offcenter forces applied.
ASSET: Analysis of Sequences of Synchronous Events in Massively Parallel Spike Trains
Canova, Carlos; Denker, Michael; Gerstein, George; Helias, Moritz
2016-01-01
With the ability to observe the activity from large numbers of neurons simultaneously using modern recording technologies, the chance to identify sub-networks involved in coordinated processing increases. Sequences of synchronous spike events (SSEs) constitute one type of such coordinated spiking that propagates activity in a temporally precise manner. The synfire chain was proposed as one potential model for such network processing. Previous work introduced a method for visualization of SSEs in massively parallel spike trains, based on an intersection matrix that contains in each entry the degree of overlap of active neurons in two corresponding time bins. Repeated SSEs are reflected in the matrix as diagonal structures of high overlap values. The method as such, however, leaves the task of identifying these diagonal structures to visual inspection rather than to a quantitative analysis. Here we present ASSET (Analysis of Sequences of Synchronous EvenTs), an improved, fully automated method which determines diagonal structures in the intersection matrix by a robust mathematical procedure. The method consists of a sequence of steps that i) assess which entries in the matrix potentially belong to a diagonal structure, ii) cluster these entries into individual diagonal structures and iii) determine the neurons composing the associated SSEs. We employ parallel point processes generated by stochastic simulations as test data to demonstrate the performance of the method under a wide range of realistic scenarios, including different types of non-stationarity of the spiking activity and different correlation structures. Finally, the ability of the method to discover SSEs is demonstrated on complex data from large network simulations with embedded synfire chains. Thus, ASSET represents an effective and efficient tool to analyze massively parallel spike data for temporal sequences of synchronous activity. PMID:27420734
Dynamic load balancing of applications
Wheat, S.R.
1997-05-13
An application-level method for dynamically maintaining global load balance on a parallel computer, particularly on massively parallel MIMD computers is disclosed. Global load balancing is achieved by overlapping neighborhoods of processors, where each neighborhood performs local load balancing. The method supports a large class of finite element and finite difference based applications and provides an automatic element management system to which applications are easily integrated. 13 figs.
Computational methods and software systems for dynamics and control of large space structures
NASA Technical Reports Server (NTRS)
Park, K. C.; Felippa, C. A.; Farhat, C.; Pramono, E.
1990-01-01
Two key areas of crucial importance to the computer-based simulation of large space structures are discussed. The first area involves multibody dynamics (MBD) of flexible space structures, with applications directed to deployment, construction, and maneuvering. The second area deals with advanced software systems, with emphasis on parallel processing. The latest research thrust in the second area involves massively parallel computers.
DGDFT: A massively parallel method for large scale density functional theory calculations.
Hu, Wei; Lin, Lin; Yang, Chao
2015-09-28
We describe a massively parallel implementation of the recently developed discontinuous Galerkin density functional theory (DGDFT) method, for efficient large-scale Kohn-Sham DFT based electronic structure calculations. The DGDFT method uses adaptive local basis (ALB) functions generated on-the-fly during the self-consistent field iteration to represent the solution to the Kohn-Sham equations. The use of the ALB set provides a systematic way to improve the accuracy of the approximation. By using the pole expansion and selected inversion technique to compute electron density, energy, and atomic forces, we can make the computational complexity of DGDFT scale at most quadratically with respect to the number of electrons for both insulating and metallic systems. We show that for the two-dimensional (2D) phosphorene systems studied here, using 37 basis functions per atom allows us to reach an accuracy level of 1.3 × 10(-4) Hartree/atom in terms of the error of energy and 6.2 × 10(-4) Hartree/bohr in terms of the error of atomic force, respectively. DGDFT can achieve 80% parallel efficiency on 128,000 high performance computing cores when it is used to study the electronic structure of 2D phosphorene systems with 3500-14 000 atoms. This high parallel efficiency results from a two-level parallelization scheme that we will describe in detail.
Ocean Modeling and Visualization on Massively Parallel Computer
NASA Technical Reports Server (NTRS)
Chao, Yi; Li, P. Peggy; Wang, Ping; Katz, Daniel S.; Cheng, Benny N.
1997-01-01
Climate modeling is one of the grand challenges of computational science, and ocean modeling plays an important role in both understanding the current climatic conditions and predicting future climate change.
Moskalev, Evgeny A; Frohnauer, Judith; Merkelbach-Bruse, Sabine; Schildhaus, Hans-Ulrich; Dimmler, Arno; Schubert, Thomas; Boltze, Carsten; König, Helmut; Fuchs, Florian; Sirbu, Horia; Rieker, Ralf J; Agaimy, Abbas; Hartmann, Arndt; Haller, Florian
2014-06-01
Recurrent gene fusions of anaplastic lymphoma receptor tyrosine kinase (ALK) and echinoderm microtubule-associated protein-like 4 (EML4) have been recently identified in ∼5% of non-small cell lung cancers (NSCLCs) and are targets for selective tyrosine kinase inhibitors. While fluorescent in situ hybridization (FISH) is the current gold standard for detection of EML4-ALK rearrangements, several limitations exist including high costs, time-consuming evaluation and somewhat equivocal interpretation of results. In contrast, targeted massive parallel sequencing has been introduced as a powerful method for simultaneous and sensitive detection of multiple somatic mutations even in limited biopsies, and is currently evolving as the method of choice for molecular diagnostic work-up of NSCLCs. We developed a novel approach for indirect detection of EML4-ALK rearrangements based on 454 massive parallel sequencing after reverse transcription and subsequent multiplex amplification (multiplex ALK RNA-seq) which takes advantage of unbalanced expression of the 5' and 3' ALK mRNA regions. Two lung cancer cell lines and a selected series of 32 NSCLC samples including 11 cases with EML4-ALK rearrangement were analyzed with this novel approach in comparison to ALK FISH, ALK qRT-PCR and EML4-ALK RT-PCR. The H2228 cell line with known EML4-ALK rearrangement showed 171 and 729 reads for 5' and 3' ALK regions, respectively, demonstrating a clearly unbalanced expression pattern. In contrast, the H1299 cell line with ALK wildtype status displayed no reads for both ALK regions. Considering a threshold of 100 reads for 3' ALK region as indirect indicator of EML4-ALK rearrangement, there was 100% concordance between the novel multiplex ALK RNA-seq approach and ALK FISH among all 32 NSCLC samples. Multiplex ALK RNA-seq is a sensitive and specific method for indirect detection of EML4-ALK rearrangements, and can be easily implemented in panel based molecular diagnostic work-up of NSCLCs by massive parallel sequencing. Copyright © 2014 Elsevier Ireland Ltd. All rights reserved.
NASA Astrophysics Data System (ADS)
Ford, Eric B.; Dindar, Saleh; Peters, Jorg
2015-08-01
The realism of astrophysical simulations and statistical analyses of astronomical data are set by the available computational resources. Thus, astronomers and astrophysicists are constantly pushing the limits of computational capabilities. For decades, astronomers benefited from massive improvements in computational power that were driven primarily by increasing clock speeds and required relatively little attention to details of the computational hardware. For nearly a decade, increases in computational capabilities have come primarily from increasing the degree of parallelism, rather than increasing clock speeds. Further increases in computational capabilities will likely be led by many-core architectures such as Graphical Processing Units (GPUs) and Intel Xeon Phi. Successfully harnessing these new architectures, requires significantly more understanding of the hardware architecture, cache hierarchy, compiler capabilities and network network characteristics.I will provide an astronomer's overview of the opportunities and challenges provided by modern many-core architectures and elastic cloud computing. The primary goal is to help an astronomical audience understand what types of problems are likely to yield more than order of magnitude speed-ups and which problems are unlikely to parallelize sufficiently efficiently to be worth the development time and/or costs.I will draw on my experience leading a team in developing the Swarm-NG library for parallel integration of large ensembles of small n-body systems on GPUs, as well as several smaller software projects. I will share lessons learned from collaborating with computer scientists, including both technical and soft skills. Finally, I will discuss the challenges of training the next generation of astronomers to be proficient in this new era of high-performance computing, drawing on experience teaching a graduate class on High-Performance Scientific Computing for Astrophysics and organizing a 2014 advanced summer school on Bayesian Computing for Astronomical Data Analysis with support of the Penn State Center for Astrostatistics and Institute for CyberScience.
Speeding up parallel processing
NASA Technical Reports Server (NTRS)
Denning, Peter J.
1988-01-01
In 1967 Amdahl expressed doubts about the ultimate utility of multiprocessors. The formulation, now called Amdahl's law, became part of the computing folklore and has inspired much skepticism about the ability of the current generation of massively parallel processors to efficiently deliver all their computing power to programs. The widely publicized recent results of a group at Sandia National Laboratory, which showed speedup on a 1024 node hypercube of over 500 for three fixed size problems and over 1000 for three scalable problems, have convincingly challenged this bit of folklore and have given new impetus to parallel scientific computing.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Brown, William Michael; Plimpton, Steven James; Wang, Peng
2010-03-01
LAMMPS is a classical molecular dynamics code, and an acronym for Large-scale Atomic/Molecular Massively Parallel Simulator. LAMMPS has potentials for soft materials (biomolecules, polymers) and solid-state materials (metals, semiconductors) and coarse-grained or mesoscopic systems. It can be used to model atoms or, more generically, as a parallel particle simulator at the atomic, meso, or continuum scale. LAMMPS runs on single processors or in parallel using message-passing techniques and a spatial-decomposition of the simulation domain. The code is designed to be easy to modify or extend with new functionality.
AdiosStMan: Parallelizing Casacore Table Data System using Adaptive IO System
NASA Astrophysics Data System (ADS)
Wang, R.; Harris, C.; Wicenec, A.
2016-07-01
In this paper, we investigate the Casacore Table Data System (CTDS) used in the casacore and CASA libraries, and methods to parallelize it. CTDS provides a storage manager plugin mechanism for third-party developers to design and implement their own CTDS storage managers. Having this in mind, we looked into various storage backend techniques that can possibly enable parallel I/O for CTDS by implementing new storage managers. After carrying on benchmarks showing the excellent parallel I/O throughput of the Adaptive IO System (ADIOS), we implemented an ADIOS based parallel CTDS storage manager. We then applied the CASA MSTransform frequency split task to verify the ADIOS Storage Manager. We also ran a series of performance tests to examine the I/O throughput in a massively parallel scenario.
Dubinett - Targeted Sequencing 2012 — EDRN Public Portal
we propose to use targeted massively parallel DNA sequencing to identify somatic alterations within mutational hotspots in matched sets of primary lung tumors, premalignant lesions, and adjacent,histologically normal lung tissue.
ERIC Educational Resources Information Center
Rosé, Carolyn Penstein; Ferschke, Oliver
2016-01-01
This article offers a vision for technology supported collaborative and discussion-based learning at scale. It begins with historical work in the area of tutorial dialogue systems. It traces the history of that area of the field of Artificial Intelligence in Education as it has made an impact on the field of Computer-Supported Collaborative…
NASA Astrophysics Data System (ADS)
Newman, Gregory A.; Commer, Michael
2009-07-01
Three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical conductivity mapping of potential offshore oil and gas reservoirs. The imaging technology employs controlled source electromagnetic (CSEM) and magnetotelluric (MT) fields and treats geological media exhibiting transverse anisotropy. Moreover when combined with established seismic methods, direct imaging of reservoir fluids is possible. Because of the size of the 3D conductivity imaging problem, strategies are required exploiting computational parallelism and optimal meshing. The algorithm thus developed has been shown to scale to tens of thousands of processors. In one imaging experiment, 32,768 tasks/processors on the IBM Watson Research Blue Gene/L supercomputer were successfully utilized. Over a 24 hour period we were able to image a large scale field data set that previously required over four months of processing time on distributed clusters based on Intel or AMD processors utilizing 1024 tasks on an InfiniBand fabric. Electrical conductivity imaging using massively parallel computational resources produces results that cannot be obtained otherwise and are consistent with timeframes required for practical exploration problems.
NASA Astrophysics Data System (ADS)
Calafiura, Paolo; Leggett, Charles; Seuster, Rolf; Tsulaia, Vakhtang; Van Gemmeren, Peter
2015-12-01
AthenaMP is a multi-process version of the ATLAS reconstruction, simulation and data analysis framework Athena. By leveraging Linux fork and copy-on-write mechanisms, it allows for sharing of memory pages between event processors running on the same compute node with little to no change in the application code. Originally targeted to optimize the memory footprint of reconstruction jobs, AthenaMP has demonstrated that it can reduce the memory usage of certain configurations of ATLAS production jobs by a factor of 2. AthenaMP has also evolved to become the parallel event-processing core of the recently developed ATLAS infrastructure for fine-grained event processing (Event Service) which allows the running of AthenaMP inside massively parallel distributed applications on hundreds of compute nodes simultaneously. We present the architecture of AthenaMP, various strategies implemented by AthenaMP for scheduling workload to worker processes (for example: Shared Event Queue and Shared Distributor of Event Tokens) and the usage of AthenaMP in the diversity of ATLAS event processing workloads on various computing resources: Grid, opportunistic resources and HPC.
Multi-mode sensor processing on a dynamically reconfigurable massively parallel processor array
NASA Astrophysics Data System (ADS)
Chen, Paul; Butts, Mike; Budlong, Brad; Wasson, Paul
2008-04-01
This paper introduces a novel computing architecture that can be reconfigured in real time to adapt on demand to multi-mode sensor platforms' dynamic computational and functional requirements. This 1 teraOPS reconfigurable Massively Parallel Processor Array (MPPA) has 336 32-bit processors. The programmable 32-bit communication fabric provides streamlined inter-processor connections with deterministically high performance. Software programmability, scalability, ease of use, and fast reconfiguration time (ranging from microseconds to milliseconds) are the most significant advantages over FPGAs and DSPs. This paper introduces the MPPA architecture, its programming model, and methods of reconfigurability. An MPPA platform for reconfigurable computing is based on a structural object programming model. Objects are software programs running concurrently on hundreds of 32-bit RISC processors and memories. They exchange data and control through a network of self-synchronizing channels. A common application design pattern on this platform, called a work farm, is a parallel set of worker objects, with one input and one output stream. Statically configured work farms with homogeneous and heterogeneous sets of workers have been used in video compression and decompression, network processing, and graphics applications.
GPU-accelerated Tersoff potentials for massively parallel Molecular Dynamics simulations
NASA Astrophysics Data System (ADS)
Nguyen, Trung Dac
2017-03-01
The Tersoff potential is one of the empirical many-body potentials that has been widely used in simulation studies at atomic scales. Unlike pair-wise potentials, the Tersoff potential involves three-body terms, which require much more arithmetic operations and data dependency. In this contribution, we have implemented the GPU-accelerated version of several variants of the Tersoff potential for LAMMPS, an open-source massively parallel Molecular Dynamics code. Compared to the existing MPI implementation in LAMMPS, the GPU implementation exhibits a better scalability and offers a speedup of 2.2X when run on 1000 compute nodes on the Titan supercomputer. On a single node, the speedup ranges from 2.0 to 8.0 times, depending on the number of atoms per GPU and hardware configurations. The most notable features of our GPU-accelerated version include its design for MPI/accelerator heterogeneous parallelism, its compatibility with other functionalities in LAMMPS, its ability to give deterministic results and to support both NVIDIA CUDA- and OpenCL-enabled accelerators. Our implementation is now part of the GPU package in LAMMPS and accessible for public use.
Progress report on PIXIE3D, a fully implicit 3D extended MHD solver
NASA Astrophysics Data System (ADS)
Chacon, Luis
2008-11-01
Recently, invited talk at DPP07 an optimal, massively parallel implicit algorithm for 3D resistive magnetohydrodynamics (PIXIE3D) was demonstrated. Excellent algorithmic and parallel results were obtained with up to 4096 processors and 138 million unknowns. While this is a remarkable result, further developments are still needed for PIXIE3D to become a 3D extended MHD production code in general geometries. In this poster, we present an update on the status of PIXIE3D on several fronts. On the physics side, we will describe our progress towards the full Braginskii model, including: electron Hall terms, anisotropic heat conduction, and gyroviscous corrections. Algorithmically, we will discuss progress towards a robust, optimal, nonlinear solver for arbitrary geometries, including preconditioning for the new physical effects described, the implementation of a coarse processor-grid solver (to maintain optimal algorithmic performance for an arbitrarily large number of processors in massively parallel computations), and of a multiblock capability to deal with complicated geometries. L. Chac'on, Phys. Plasmas 15, 056103 (2008);
Towards implementation of cellular automata in Microbial Fuel Cells.
Tsompanas, Michail-Antisthenis I; Adamatzky, Andrew; Sirakoulis, Georgios Ch; Greenman, John; Ieropoulos, Ioannis
2017-01-01
The Microbial Fuel Cell (MFC) is a bio-electrochemical transducer converting waste products into electricity using microbial communities. Cellular Automaton (CA) is a uniform array of finite-state machines that update their states in discrete time depending on states of their closest neighbors by the same rule. Arrays of MFCs could, in principle, act as massive-parallel computing devices with local connectivity between elementary processors. We provide a theoretical design of such a parallel processor by implementing CA in MFCs. We have chosen Conway's Game of Life as the 'benchmark' CA because this is the most popular CA which also exhibits an enormously rich spectrum of patterns. Each cell of the Game of Life CA is realized using two MFCs. The MFCs are linked electrically and hydraulically. The model is verified via simulation of an electrical circuit demonstrating equivalent behaviours. The design is a first step towards future implementations of fully autonomous biological computing devices with massive parallelism. The energy independence of such devices counteracts their somewhat slow transitions-compared to silicon circuitry-between the different states during computation.
Towards implementation of cellular automata in Microbial Fuel Cells
Adamatzky, Andrew; Sirakoulis, Georgios Ch.; Greenman, John; Ieropoulos, Ioannis
2017-01-01
The Microbial Fuel Cell (MFC) is a bio-electrochemical transducer converting waste products into electricity using microbial communities. Cellular Automaton (CA) is a uniform array of finite-state machines that update their states in discrete time depending on states of their closest neighbors by the same rule. Arrays of MFCs could, in principle, act as massive-parallel computing devices with local connectivity between elementary processors. We provide a theoretical design of such a parallel processor by implementing CA in MFCs. We have chosen Conway’s Game of Life as the ‘benchmark’ CA because this is the most popular CA which also exhibits an enormously rich spectrum of patterns. Each cell of the Game of Life CA is realized using two MFCs. The MFCs are linked electrically and hydraulically. The model is verified via simulation of an electrical circuit demonstrating equivalent behaviours. The design is a first step towards future implementations of fully autonomous biological computing devices with massive parallelism. The energy independence of such devices counteracts their somewhat slow transitions—compared to silicon circuitry—between the different states during computation. PMID:28498871
Hybrid massively parallel fast sweeping method for static Hamilton–Jacobi equations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Detrixhe, Miles, E-mail: mdetrixhe@engineering.ucsb.edu; University of California Santa Barbara, Santa Barbara, CA, 93106; Gibou, Frédéric, E-mail: fgibou@engineering.ucsb.edu
The fast sweeping method is a popular algorithm for solving a variety of static Hamilton–Jacobi equations. Fast sweeping algorithms for parallel computing have been developed, but are severely limited. In this work, we present a multilevel, hybrid parallel algorithm that combines the desirable traits of two distinct parallel methods. The fine and coarse grained components of the algorithm take advantage of heterogeneous computer architecture common in high performance computing facilities. We present the algorithm and demonstrate its effectiveness on a set of example problems including optimal control, dynamic games, and seismic wave propagation. We give results for convergence, parallel scaling,more » and show state-of-the-art speedup values for the fast sweeping method.« less
Transmissive Nanohole Arrays for Massively-Parallel Optical Biosensing
2015-01-01
A high-throughput optical biosensing technique is proposed and demonstrated. This hybrid technique combines optical transmission of nanoholes with colorimetric silver staining. The size and spacing of the nanoholes are chosen so that individual nanoholes can be independently resolved in massive parallel using an ordinary transmission optical microscope, and, in place of determining a spectral shift, the brightness of each nanohole is recorded to greatly simplify the readout. Each nanohole then acts as an independent sensor, and the blocking of nanohole optical transmission by enzymatic silver staining defines the specific detection of a biological agent. Nearly 10000 nanoholes can be simultaneously monitored under the field of view of a typical microscope. As an initial proof of concept, biotinylated lysozyme (biotin-HEL) was used as a model analyte, giving a detection limit as low as 0.1 ng/mL. PMID:25530982
Large-eddy simulations of compressible convection on massively parallel computers. [stellar physics
NASA Technical Reports Server (NTRS)
Xie, Xin; Toomre, Juri
1993-01-01
We report preliminary implementation of the large-eddy simulation (LES) technique in 2D simulations of compressible convection carried out on the CM-2 massively parallel computer. The convective flow fields in our simulations possess structures similar to those found in a number of direct simulations, with roll-like flows coherent across the entire depth of the layer that spans several density scale heights. Our detailed assessment of the effects of various subgrid scale (SGS) terms reveals that they may affect the gross character of convection. Yet, somewhat surprisingly, we find that our LES solutions, and another in which the SGS terms are turned off, only show modest differences. The resulting 2D flows realized here are rather laminar in character, and achieving substantial turbulence may require stronger forcing and less dissipation.
Contextual classification on the massively parallel processor
NASA Technical Reports Server (NTRS)
Tilton, James C.
1987-01-01
Classifiers are often used to produce land cover maps from multispectral Earth observation imagery. Conventionally, these classifiers have been designed to exploit the spectral information contained in the imagery. Very few classifiers exploit the spatial information content of the imagery, and the few that do rarely exploit spatial information content in conjunction with spectral and/or temporal information. A contextual classifier that exploits spatial and spectral information in combination through a general statistical approach was studied. Early test results obtained from an implementation of the classifier on a VAX-11/780 minicomputer were encouraging, but they are of limited meaning because they were produced from small data sets. An implementation of the contextual classifier is presented on the Massively Parallel Processor (MPP) at Goddard that for the first time makes feasible the testing of the classifier on large data sets.
Brett, Maggie; McPherson, John; Zang, Zhi Jiang; Lai, Angeline; Tan, Ee-Shien; Ng, Ivy; Ong, Lai-Choo; Cham, Breana; Tan, Patrick; Rozen, Steve; Tan, Ene-Choo
2014-01-01
Developmental delay and/or intellectual disability (DD/ID) affects 1–3% of all children. At least half of these are thought to have a genetic etiology. Recent studies have shown that massively parallel sequencing (MPS) using a targeted gene panel is particularly suited for diagnostic testing for genetically heterogeneous conditions. We report on our experiences with using massively parallel sequencing of a targeted gene panel of 355 genes for investigating the genetic etiology of eight patients with a wide range of phenotypes including DD/ID, congenital anomalies and/or autism spectrum disorder. Targeted sequence enrichment was performed using the Agilent SureSelect Target Enrichment Kit and sequenced on the Illumina HiSeq2000 using paired-end reads. For all eight patients, 81–84% of the targeted regions achieved read depths of at least 20×, with average read depths overlapping targets ranging from 322× to 798×. Causative variants were successfully identified in two of the eight patients: a nonsense mutation in the ATRX gene and a canonical splice site mutation in the L1CAM gene. In a third patient, a canonical splice site variant in the USP9X gene could likely explain all or some of her clinical phenotypes. These results confirm the value of targeted MPS for investigating DD/ID in children for diagnostic purposes. However, targeted gene MPS was less likely to provide a genetic diagnosis for children whose phenotype includes autism. PMID:24690944
Eduardoff, M; Gross, T E; Santos, C; de la Puente, M; Ballard, D; Strobl, C; Børsting, C; Morling, N; Fusco, L; Hussing, C; Egyed, B; Souto, L; Uacyisrael, J; Syndercombe Court, D; Carracedo, Á; Lareu, M V; Schneider, P M; Parson, W; Phillips, C; Parson, W; Phillips, C
2016-07-01
The EUROFORGEN Global ancestry-informative SNP (AIM-SNPs) panel is a forensic multiplex of 128 markers designed to differentiate an individual's ancestry from amongst the five continental population groups of Africa, Europe, East Asia, Native America, and Oceania. A custom multiplex of AmpliSeq™ PCR primers was designed for the Global AIM-SNPs to perform massively parallel sequencing using the Ion PGM™ system. This study assessed individual SNP genotyping precision using the Ion PGM™, the forensic sensitivity of the multiplex using dilution series, degraded DNA plus simple mixtures, and the ancestry differentiation power of the final panel design, which required substitution of three original ancestry-informative SNPs with alternatives. Fourteen populations that had not been previously analyzed were genotyped using the custom multiplex and these studies allowed assessment of genotyping performance by comparison of data across five laboratories. Results indicate a low level of genotyping error can still occur from sequence misalignment caused by homopolymeric tracts close to the target SNP, despite careful scrutiny of candidate SNPs at the design stage. Such sequence misalignment required the exclusion of component SNP rs2080161 from the Global AIM-SNPs panel. However, the overall genotyping precision and sensitivity of this custom multiplex indicates the Ion PGM™ assay for the Global AIM-SNPs is highly suitable for forensic ancestry analysis with massively parallel sequencing. Copyright © 2016 Elsevier Ireland Ltd. All rights reserved.
Waugh, Caryll; Cromer, Deborah; Grimm, Andrew; Chopra, Abha; Mallal, Simon; Davenport, Miles; Mak, Johnson
2015-04-09
Massive, parallel sequencing is a potent tool for dissecting the regulation of biological processes by revealing the dynamics of the cellular RNA profile under different conditions. Similarly, massive, parallel sequencing can be used to reveal the complexity of viral quasispecies that are often found in the RNA virus infected host. However, the production of cDNA libraries for next-generation sequencing (NGS) necessitates the reverse transcription of RNA into cDNA and the amplification of the cDNA template using PCR, which may introduce artefact in the form of phantom nucleic acids species that can bias the composition and interpretation of original RNA profiles. Using HIV as a model we have characterised the major sources of error during the conversion of viral RNA to cDNA, namely excess RNA template and the RNaseH activity of the polymerase enzyme, reverse transcriptase. In addition we have analysed the effect of PCR cycle on detection of recombinants and assessed the contribution of transfection of highly similar plasmid DNA to the formation of recombinant species during the production of our control viruses. We have identified RNA template concentrations, RNaseH activity of reverse transcriptase, and PCR conditions as key parameters that must be carefully optimised to minimise chimeric artefacts. Using our optimised RT-PCR conditions, in combination with our modified PCR amplification procedure, we have developed a reliable technique for accurate determination of RNA species using NGS technology.
Sierra Structural Dynamics User's Notes
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reese, Garth M.
2015-10-19
Sierra/SD provides a massively parallel implementation of structural dynamics finite element analysis, required for high fidelity, validated models used in modal, vibration, static and shock analysis of weapons systems. This document provides a users guide to the input for Sierra/SD. Details of input specifications for the different solution types, output options, element types and parameters are included. The appendices contain detailed examples, and instructions for running the software on parallel platforms.
Reverse time migration: A seismic processing application on the connection machine
NASA Technical Reports Server (NTRS)
Fiebrich, Rolf-Dieter
1987-01-01
The implementation of a reverse time migration algorithm on the Connection Machine, a massively parallel computer is described. Essential architectural features of this machine as well as programming concepts are presented. The data structures and parallel operations for the implementation of the reverse time migration algorithm are described. The algorithm matches the Connection Machine architecture closely and executes almost at the peak performance of this machine.
Massively-Parallel Architectures for Automatic Recognition of Visual Speech Signals
1988-10-12
Secusrity Clamifieation, Nlassively-Parallel Architectures for Automa ic Recognitio of Visua, Speech Signals 12. PERSONAL AUTHOR(S) Terrence J...characteristics of speech from tJhe, visual speech signals. Neural networks have been trained on a database of vowels. The rqw images of faces , aligned and...images of faces , aligned and preprocessed, were used as input to these network which were trained to estimate the corresponding envelope of the
Solving Navier-Stokes equations on a massively parallel processor; The 1 GFLOP performance
DOE Office of Scientific and Technical Information (OSTI.GOV)
Saati, A.; Biringen, S.; Farhat, C.
This paper reports on experience in solving large-scale fluid dynamics problems on the Connection Machine model CM-2. The authors have implemented a parallel version of the MacCormack scheme for the solution of the Navier-Stokes equations. By using triad floating point operations and reducing the number of interprocessor communications, they have achieved a sustained performance rate of 1.42 GFLOPS.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Munday, Lynn Brendon; Day, David M.; Bunting, Gregory
Sierra/SD provides a massively parallel implementation of structural dynamics finite element analysis, required for high fidelity, validated models used in modal, vibration, static and shock analysis of weapons systems. This document provides a users guide to the input for Sierra/SD. Details of input specifications for the different solution types, output options, element types and parameters are included. The appendices contain detailed examples, and instructions for running the software on parallel platforms.
NASA Technical Reports Server (NTRS)
Morgan, Philip E.
2004-01-01
This final report contains reports of research related to the tasks "Scalable High Performance Computing: Direct and Lark-Eddy Turbulent FLow Simulations Using Massively Parallel Computers" and "Devleop High-Performance Time-Domain Computational Electromagnetics Capability for RCS Prediction, Wave Propagation in Dispersive Media, and Dual-Use Applications. The discussion of Scalable High Performance Computing reports on three objectives: validate, access scalability, and apply two parallel flow solvers for three-dimensional Navier-Stokes flows; develop and validate a high-order parallel solver for Direct Numerical Simulations (DNS) and Large Eddy Simulation (LES) problems; and Investigate and develop a high-order Reynolds averaged Navier-Stokes turbulence model. The discussion of High-Performance Time-Domain Computational Electromagnetics reports on five objectives: enhancement of an electromagnetics code (CHARGE) to be able to effectively model antenna problems; utilize lessons learned in high-order/spectral solution of swirling 3D jets to apply to solving electromagnetics project; transition a high-order fluids code, FDL3DI, to be able to solve Maxwell's Equations using compact-differencing; develop and demonstrate improved radiation absorbing boundary conditions for high-order CEM; and extend high-order CEM solver to address variable material properties. The report also contains a review of work done by the systems engineer.
Representing and computing regular languages on massively parallel networks
DOE Office of Scientific and Technical Information (OSTI.GOV)
Miller, M.I.; O'Sullivan, J.A.; Boysam, B.
1991-01-01
This paper proposes a general method for incorporating rule-based constraints corresponding to regular languages into stochastic inference problems, thereby allowing for a unified representation of stochastic and syntactic pattern constraints. The authors' approach first established the formal connection of rules to Chomsky grammars, and generalizes the original work of Shannon on the encoding of rule-based channel sequences to Markov chains of maximum entropy. This maximum entropy probabilistic view leads to Gibb's representations with potentials which have their number of minima growing at precisely the exponential rate that the language of deterministically constrained sequences grow. These representations are coupled to stochasticmore » diffusion algorithms, which sample the language-constrained sequences by visiting the energy minima according to the underlying Gibbs' probability law. The coupling to stochastic search methods yields the all-important practical result that fully parallel stochastic cellular automata may be derived to generate samples from the rule-based constraint sets. The production rules and neighborhood state structure of the language of sequences directly determines the necessary connection structures of the required parallel computing surface. Representations of this type have been mapped to the DAP-510 massively-parallel processor consisting of 1024 mesh-connected bit-serial processing elements for performing automated segmentation of electron-micrograph images.« less
NASA Astrophysics Data System (ADS)
Sun, Rui; Xiao, Heng
2016-04-01
With the growth of available computational resource, CFD-DEM (computational fluid dynamics-discrete element method) becomes an increasingly promising and feasible approach for the study of sediment transport. Several existing CFD-DEM solvers are applied in chemical engineering and mining industry. However, a robust CFD-DEM solver for the simulation of sediment transport is still desirable. In this work, the development of a three-dimensional, massively parallel, and open-source CFD-DEM solver SediFoam is detailed. This solver is built based on open-source solvers OpenFOAM and LAMMPS. OpenFOAM is a CFD toolbox that can perform three-dimensional fluid flow simulations on unstructured meshes; LAMMPS is a massively parallel DEM solver for molecular dynamics. Several validation tests of SediFoam are performed using cases of a wide range of complexities. The results obtained in the present simulations are consistent with those in the literature, which demonstrates the capability of SediFoam for sediment transport applications. In addition to the validation test, the parallel efficiency of SediFoam is studied to test the performance of the code for large-scale and complex simulations. The parallel efficiency tests show that the scalability of SediFoam is satisfactory in the simulations using up to O(107) particles.
Cost-effective GPU-grid for genome-wide epistasis calculations.
Pütz, B; Kam-Thong, T; Karbalai, N; Altmann, A; Müller-Myhsok, B
2013-01-01
Until recently, genotype studies were limited to the investigation of single SNP effects due to the computational burden incurred when studying pairwise interactions of SNPs. However, some genetic effects as simple as coloring (in plants and animals) cannot be ascribed to a single locus but only understood when epistasis is taken into account [1]. It is expected that such effects are also found in complex diseases where many genes contribute to the clinical outcome of affected individuals. Only recently have such problems become feasible computationally. The inherently parallel structure of the problem makes it a perfect candidate for massive parallelization on either grid or cloud architectures. Since we are also dealing with confidential patient data, we were not able to consider a cloud-based solution but had to find a way to process the data in-house and aimed to build a local GPU-based grid structure. Sequential epistatsis calculations were ported to GPU using CUDA at various levels. Parallelization on the CPU was compared to corresponding GPU counterparts with regards to performance and cost. A cost-effective solution was created by combining custom-built nodes equipped with relatively inexpensive consumer-level graphics cards with highly parallel GPUs in a local grid. The GPU method outperforms current cluster-based systems on a price/performance criterion, as a single GPU shows speed performance comparable up to 200 CPU cores. The outlined approach will work for problems that easily lend themselves to massive parallelization. Code for various tasks has been made available and ongoing development of tools will further ease the transition from sequential to parallel algorithms.
Magnetic Fields Versus Gravity
NASA Astrophysics Data System (ADS)
Hensley, Kerry
2018-04-01
Deep within giant molecular clouds, hidden by dense gas and dust, stars form. Unprecedented data from the Atacama Large Millimeter/submillimeter Array (ALMA) reveal the intricate magnetic structureswoven throughout one of the most massive star-forming regions in the Milky Way.How Stars Are BornThe Horsehead Nebulasdense column of gas and dust is opaque to visible light, but this infrared image reveals the young stars hidden in the dust. [NASA/ESA/Hubble Heritage Team]Simple theory dictates that when a dense clump of molecular gas becomes massive enough that its self-gravity overwhelms the thermal pressure of the cloud, the gas collapses and forms a star. In reality, however, star formation is more complicated than a simple give and take between gravity and pressure. Thedusty molecular gas in stellar nurseries is permeated with magnetic fields, which are thought to impede the inward pull of gravity and slow the rate of star formation.How can we learn about the magnetic fields of distant objects? One way is by measuring dust polarization. An elongated dust grain will tend to align itself with its short axis parallel to the direction of the magnetic field. This systematic alignment of the dust grains along the magnetic field lines polarizes the dust grains emission perpendicular to the local magnetic field. This allows us to infer the direction of the magnetic field from the direction of polarization.Magnetic field orientations for protostars e2 and e8 derived from Submillimeter Array observations (panels a through c) and ALMA observations (panels d and e). Click to enlarge. [Adapted from Koch et al. 2018]Tracing Magnetic FieldsPatrick Koch (Academia Sinica, Taiwan) and collaborators used high-sensitivity ALMA observations of dust polarization to learn more about the magnetic field morphology of Milky Way star-forming region W51. W51 is one of the largest star-forming regions in our galaxy, home to high-mass protostars e2, e8, and North.The ALMA observations reveal polarized emission toward all three sources. By extracting the magnetic field orientations from the polarization vectors, Koch and collaborators found that the molecular cloud contains an ordered magnetic field with never-before-seen structures. Several small clumps on the perimeter of the massive star-forming cores exhibit comet-shaped magnetic field structures, which could indicate that these smaller cores are being pulled toward the more massive cores.These findings hint that the magnetic field structure can tell us about the flow of material within star-forming regions key to understanding the nature of star formation itself.Maps of sin for two of the protostars (e2 and e8) and their surroundings. [Adapted from Koch et al. 2018]Guiding Star FormationDo the magnetic fields in W51 help or hinder star formation? To explore this question,Koch and collaborators introduced the quantity sin , where is the angle between the local gravity and the local magnetic field.When the angle between gravity and the magnetic field is small (sin 0), the magnetic field has little effect on the collapse of the cloud. If gravity and the magnetic field are perpendicular (sin 1), the magnetic field can slow the infall of gas and inhibit star formation.Based on this parameter, Koch and collaborators identified narrow channels where gravity acts unimpeded by the magnetic field. These magnetic channels may funnel gas toward the dense cores and aid the star-formation process.The authors observations demonstrate just one example of the broad realm ALMAs polarimetry capabilities have opened to discovery. These and future observations of dust polarization will continue to reveal more about the delicate magnetic structure within molecular clouds, furtherilluminating the role that magnetic fields play in star formation.CitationPatrick M. Koch et al 2018 ApJ 855 39. doi:10.3847/1538-4357/aaa4c1
Interprofessional Practice and Education in Health Care: Their Relevance to School Psychology
ERIC Educational Resources Information Center
Margison, Judith A.; Shore, Bruce M.
2009-01-01
Calls for increased collaborative practices in school psychology parallel similar advances in the realm of health care. This article overviews the concepts associated with collaborative practice in school psychology and in health care (e.g., interaction, teamwork, and collaboration) and discusses how the literature emerging from interprofessional…
2013-11-21
Fanconi Anemia; Autosomal or Sex Linked Recessive Genetic Disease; Bone Marrow Hematopoiesis Failure, Multiple Congenital Abnormalities, and Susceptibility to Neoplastic Diseases.; Hematopoiesis Maintainance.
2015-11-03
scale optical projection system powered by spatial light modulators, such as digital micro-mirror device ( DMD ). Figure 4 shows the parallel lithography ...1Scientific RepoRts | 5:16192 | DOi: 10.1038/srep16192 www.nature.com/scientificreports High throughput optical lithography by scanning a massive...array of bowtie aperture antennas at near-field X. Wen1,2,3,*, A. Datta1,*, L. M. Traverso1, L. Pan1, X. Xu1 & E. E. Moon4 Optical lithography , the
DooSo6: Easy Collaboration over Shared Projects
NASA Astrophysics Data System (ADS)
Ignat, Claudia-Lavinia; Oster, Gérald; Molli, Pascal
Existing tools for supporting parallel work feature some disadvantages that prevent them to be widely used. Very often they require a complex installation and creation of accounts for all group members. Users need to learn and deal with complex commands for efficiently using these collaborative tools. Some tools require users to abandon their favourite editors and impose them to use a certain co-authorship application. In this paper, we propose the DooSo6 collaboration tool that offers support for parallel work, requires no installation, no creation of accounts and that is easy to use, users being able to continue working with their favourite editors. User authentication is achieved by means of a capability-based mechanism.
High-Performance Compute Infrastructure in Astronomy: 2020 Is Only Months Away
NASA Astrophysics Data System (ADS)
Berriman, B.; Deelman, E.; Juve, G.; Rynge, M.; Vöckler, J. S.
2012-09-01
By 2020, astronomy will be awash with as much as 60 PB of public data. Full scientific exploitation of such massive volumes of data will require high-performance computing on server farms co-located with the data. Development of this computing model will be a community-wide enterprise that has profound cultural and technical implications. Astronomers must be prepared to develop environment-agnostic applications that support parallel processing. The community must investigate the applicability and cost-benefit of emerging technologies such as cloud computing to astronomy, and must engage the Computer Science community to develop science-driven cyberinfrastructure such as workflow schedulers and optimizers. We report here the results of collaborations between a science center, IPAC, and a Computer Science research institute, ISI. These collaborations may be considered pathfinders in developing a high-performance compute infrastructure in astronomy. These collaborations investigated two exemplar large-scale science-driver workflow applications: 1) Calculation of an infrared atlas of the Galactic Plane at 18 different wavelengths by placing data from multiple surveys on a common plate scale and co-registering all the pixels; 2) Calculation of an atlas of periodicities present in the public Kepler data sets, which currently contain 380,000 light curves. These products have been generated with two workflow applications, written in C for performance and designed to support parallel processing on multiple environments and platforms, but with different compute resource needs: the Montage image mosaic engine is I/O-bound, and the NASA Star and Exoplanet Database periodogram code is CPU-bound. Our presentation will report cost and performance metrics and lessons-learned for continuing development. Applicability of Cloud Computing: Commercial Cloud providers generally charge for all operations, including processing, transfer of input and output data, and for storage of data, and so the costs of running applications vary widely according to how they use resources. The cloud is well suited to processing CPU-bound (and memory bound) workflows such as the periodogram code, given the relatively low cost of processing in comparison with I/O operations. I/O-bound applications such as Montage perform best on high-performance clusters with fast networks and parallel file-systems. Science-driven Cyberinfrastructure: Montage has been widely used as a driver application to develop workflow management services, such as task scheduling in distributed environments, designing fault tolerance techniques for job schedulers, and developing workflow orchestration techniques. Running Parallel Applications Across Distributed Cloud Environments: Data processing will eventually take place in parallel distributed across cyber infrastructure environments having different architectures. We have used the Pegasus Work Management System (WMS) to successfully run applications across three very different environments: TeraGrid, OSG (Open Science Grid), and FutureGrid. Provisioning resources across different grids and clouds (also referred to as Sky Computing), involves establishing a distributed environment, where issues of, e.g, remote job submission, data management, and security need to be addressed. This environment also requires building virtual machine images that can run in different environments. Usually, each cloud provides basic images that can be customized with additional software and services. In most of our work, we provisioned compute resources using a custom application, called Wrangler. Pegasus WMS abstracts the architectures of the compute environments away from the end-user, and can be considered a first-generation tool suitable for scientists to run their applications on disparate environments.
The Climate Data Analytic Services (CDAS) Framework.
NASA Astrophysics Data System (ADS)
Maxwell, T. P.; Duffy, D.
2016-12-01
Faced with unprecedented growth in climate data volume and demand, NASA has developed the Climate Data Analytic Services (CDAS) framework. This framework enables scientists to execute data processing workflows combining common analysis operations in a high performance environment close to the massive data stores at NASA. The data is accessed in standard (NetCDF, HDF, etc.) formats in a POSIX file system and processed using vetted climate data analysis tools (ESMF, CDAT, NCO, etc.). A dynamic caching architecture enables interactive response times. CDAS utilizes Apache Spark for parallelization and a custom array framework for processing huge datasets within limited memory spaces. CDAS services are accessed via a WPS API being developed in collaboration with the ESGF Compute Working Team to support server-side analytics for ESGF. The API can be accessed using either direct web service calls, a python script, a unix-like shell client, or a javascript-based web application. Client packages in python, scala, or javascript contain everything needed to make CDAS requests. The CDAS architecture brings together the tools, data storage, and high-performance computing required for timely analysis of large-scale data sets, where the data resides, to ultimately produce societal benefits. It is is currently deployed at NASA in support of the Collaborative REAnalysis Technical Environment (CREATE) project, which centralizes numerous global reanalysis datasets onto a single advanced data analytics platform. This service permits decision makers to investigate climate changes around the globe, inspect model trends and variability, and compare multiple reanalysis datasets.
Role of APOE Isoforms in the Pathogenesis of TBI Induced Alzheimer’s Disease
2015-10-01
global deletion, APOE targeted replacement, complex breeding, CCI model optimization, mRNA library generation, high throughput massive parallel ...ATP binding cassette transporter A1 (ABCA1) is a lipid transporter that controls the generation of HDL in plasma and ApoE-containing lipoproteins in... parallel sequencing, mRNA-seq, behavioral testing, mem- ory impairement, recovery. 3 Overall Project Summary During the reported period, we have been able
Applications and accuracy of the parallel diagonal dominant algorithm
NASA Technical Reports Server (NTRS)
Sun, Xian-He
1993-01-01
The Parallel Diagonal Dominant (PDD) algorithm is a highly efficient, ideally scalable tridiagonal solver. In this paper, a detailed study of the PDD algorithm is given. First the PDD algorithm is introduced. Then the algorithm is extended to solve periodic tridiagonal systems. A variant, the reduced PDD algorithm, is also proposed. Accuracy analysis is provided for a class of tridiagonal systems, the symmetric, and anti-symmetric Toeplitz tridiagonal systems. Implementation results show that the analysis gives a good bound on the relative error, and the algorithm is a good candidate for the emerging massively parallel machines.
Scalability and Portability of Two Parallel Implementations of ADI
NASA Technical Reports Server (NTRS)
Phung, Thanh; VanderWijngaart, Rob F.
1994-01-01
Two domain decompositions for the implementation of the NAS Scalar Penta-diagonal Parallel Benchmark on MIMD systems are investigated, namely transposition and multi-partitioning. Hardware platforms considered are the Intel iPSC/860 and Paragon XP/S-15, and clusters of SGI workstations on ethernet, communicating through PVM. It is found that the multi-partitioning strategy offers the kind of coarse granularity that allows scaling up to hundreds of processors on a massively parallel machine. Moreover, efficiency is retained when the code is ported verbatim (save message passing syntax) to a PVM environment on a modest size cluster of workstations.
Nuclide Depletion Capabilities in the Shift Monte Carlo Code
Davidson, Gregory G.; Pandya, Tara M.; Johnson, Seth R.; ...
2017-12-21
A new depletion capability has been developed in the Exnihilo radiation transport code suite. This capability enables massively parallel domain-decomposed coupling between the Shift continuous-energy Monte Carlo solver and the nuclide depletion solvers in ORIGEN to perform high-performance Monte Carlo depletion calculations. This paper describes this new depletion capability and discusses its various features, including a multi-level parallel decomposition, high-order transport-depletion coupling, and energy-integrated power renormalization. Several test problems are presented to validate the new capability against other Monte Carlo depletion codes, and the parallel performance of the new capability is analyzed.
High temporal resolution mapping of seismic noise sources using heterogeneous supercomputers
NASA Astrophysics Data System (ADS)
Gokhberg, Alexey; Ermert, Laura; Paitz, Patrick; Fichtner, Andreas
2017-04-01
Time- and space-dependent distribution of seismic noise sources is becoming a key ingredient of modern real-time monitoring of various geo-systems. Significant interest in seismic noise source maps with high temporal resolution (days) is expected to come from a number of domains, including natural resources exploration, analysis of active earthquake fault zones and volcanoes, as well as geothermal and hydrocarbon reservoir monitoring. Currently, knowledge of noise sources is insufficient for high-resolution subsurface monitoring applications. Near-real-time seismic data, as well as advanced imaging methods to constrain seismic noise sources have recently become available. These methods are based on the massive cross-correlation of seismic noise records from all available seismic stations in the region of interest and are therefore very computationally intensive. Heterogeneous massively parallel supercomputing systems introduced in the recent years combine conventional multi-core CPU with GPU accelerators and provide an opportunity for manifold increase and computing performance. Therefore, these systems represent an efficient platform for implementation of a noise source mapping solution. We present the first results of an ongoing research project conducted in collaboration with the Swiss National Supercomputing Centre (CSCS). The project aims at building a service that provides seismic noise source maps for Central Europe with high temporal resolution (days to few weeks depending on frequency and data availability). The service is hosted on the CSCS computing infrastructure; all computationally intensive processing is performed on the massively parallel heterogeneous supercomputer "Piz Daint". The solution architecture is based on the Application-as-a-Service concept in order to provide the interested external researchers the regular access to the noise source maps. The solution architecture includes the following sub-systems: (1) data acquisition responsible for collecting, on a periodic basis, raw seismic records from the European seismic networks, (2) high-performance noise source mapping application responsible for generation of source maps using cross-correlation of seismic records, (3) back-end infrastructure for the coordination of various tasks and computations, (4) front-end Web interface providing the service to the end-users and (5) data repository. The noise mapping application is composed of four principal modules: (1) pre-processing of raw data, (2) massive cross-correlation, (3) post-processing of correlation data based on computation of logarithmic energy ratio and (4) generation of source maps from post-processed data. Implementation of the solution posed various challenges, in particular, selection of data sources and transfer protocols, automation and monitoring of daily data downloads, ensuring the required data processing performance, design of a general service oriented architecture for coordination of various sub-systems, and engineering an appropriate data storage solution. The present pilot version of the service implements noise source maps for Switzerland. Extension of the solution to Central Europe is planned for the next project phase.
Simulation of an array-based neural net model
NASA Technical Reports Server (NTRS)
Barnden, John A.
1987-01-01
Research in cognitive science suggests that much of cognition involves the rapid manipulation of complex data structures. However, it is very unclear how this could be realized in neural networks or connectionist systems. A core question is: how could the interconnectivity of items in an abstract-level data structure be neurally encoded? The answer appeals mainly to positional relationships between activity patterns within neural arrays, rather than directly to neural connections in the traditional way. The new method was initially devised to account for abstract symbolic data structures, but it also supports cognitively useful spatial analogue, image-like representations. As the neural model is based on massive, uniform, parallel computations over 2D arrays, the massively parallel processor is a convenient tool for simulation work, although there are complications in using the machine to the fullest advantage. An MPP Pascal simulation program for a small pilot version of the model is running.
Computations on the massively parallel processor at the Goddard Space Flight Center
NASA Technical Reports Server (NTRS)
Strong, James P.
1991-01-01
Described are four significant algorithms implemented on the massively parallel processor (MPP) at the Goddard Space Flight Center. Two are in the area of image analysis. Of the other two, one is a mathematical simulation experiment and the other deals with the efficient transfer of data between distantly separated processors in the MPP array. The first algorithm presented is the automatic determination of elevations from stereo pairs. The second algorithm solves mathematical logistic equations capable of producing both ordered and chaotic (or random) solutions. This work can potentially lead to the simulation of artificial life processes. The third algorithm is the automatic segmentation of images into reasonable regions based on some similarity criterion, while the fourth is an implementation of a bitonic sort of data which significantly overcomes the nearest neighbor interconnection constraints on the MPP for transferring data between distant processors.
The CAnadian NIRISS Unbiased Cluster Survey (CANUCS)
NASA Astrophysics Data System (ADS)
Ravindranath, Swara; NIRISS GTO Team
2017-06-01
CANUCS GTO program is a JWST spectroscopy and imaging survey of five massive galaxy clusters and ten parallel fields using the NIRISS low-resolution grisms, NIRCam imaging and NIRSpec multi-object spectroscopy. The primary goal is to understand the evolution of low mass galaxies across cosmic time. The resolved emission line maps and line ratios for many galaxies, with some at resolution of 100pc via the magnification by gravitational lensing will enable determining the spatial distribution of star formation, dust and metals. Other science goals include the detection and characterization of galaxies within the reionization epoch, using multiply-imaged lensed galaxies to constrain cluster mass distributions and dark matter substructure, and understanding star-formation suppression in the most massive galaxy clusters. In this talk I will describe the science goals of the CANUCS program. The proposed prime and parallel observations will be presented with details of the implementation of the observation strategy using JWST proposal planning tools.
The MasPar MP-1 As a Computer Arithmetic Laboratory
Anuta, Michael A.; Lozier, Daniel W.; Turner, Peter R.
1996-01-01
This paper is a blueprint for the use of a massively parallel SIMD computer architecture for the simulation of various forms of computer arithmetic. The particular system used is a DEC/MasPar MP-1 with 4096 processors in a square array. This architecture has many advantages for such simulations due largely to the simplicity of the individual processors. Arithmetic operations can be spread across the processor array to simulate a hardware chip. Alternatively they may be performed on individual processors to allow simulation of a massively parallel implementation of the arithmetic. Compromises between these extremes permit speed-area tradeoffs to be examined. The paper includes a description of the architecture and its features. It then summarizes some of the arithmetic systems which have been, or are to be, implemented. The implementation of the level-index and symmetric level-index, LI and SLI, systems is described in some detail. An extensive bibliography is included. PMID:27805123
You, Yanqin; Sun, Yan; Li, Xuchao; Li, Yali; Wei, Xiaoming; Chen, Fang; Ge, Huijuan; Lan, Zhangzhang; Zhu, Qian; Tang, Ying; Wang, Shujuan; Gao, Ya; Jiang, Fuman; Song, Jiaping; Shi, Quan; Zhu, Xuan; Mu, Feng; Dong, Wei; Gao, Vince; Jiang, Hui; Yi, Xin; Wang, Wei; Gao, Zhiying
2014-08-01
This article demonstrates a prominent noninvasive prenatal approach to assist the clinical diagnosis of a single-gene disorder disease, maple syrup urine disease, using targeted sequencing knowledge from the affected family. The method reported here combines novel mutant discovery in known genes by targeted massively parallel sequencing with noninvasive prenatal testing. By applying this new strategy, we successfully revealed novel mutations in the gene BCKDHA (Ex2_4dup and c.392A>G) in this Chinese family and developed a prenatal haplotype-assisted approach to noninvasively detect the genotype of the fetus (transmitted from both parents). This is the first report of integration of targeted sequencing and noninvasive prenatal testing into clinical practice. Our study has demonstrated that this massively parallel sequencing-based strategy can potentially be used for single-gene disorder diagnosis in the future.
Laurie, Matthew T; Bertout, Jessica A; Taylor, Sean D; Burton, Joshua N; Shendure, Jay A; Bielas, Jason H
2013-08-01
Due to the high cost of failed runs and suboptimal data yields, quantification and determination of fragment size range are crucial steps in the library preparation process for massively parallel sequencing (or next-generation sequencing). Current library quality control methods commonly involve quantification using real-time quantitative PCR and size determination using gel or capillary electrophoresis. These methods are laborious and subject to a number of significant limitations that can make library calibration unreliable. Herein, we propose and test an alternative method for quality control of sequencing libraries using droplet digital PCR (ddPCR). By exploiting a correlation we have discovered between droplet fluorescence and amplicon size, we achieve the joint quantification and size determination of target DNA with a single ddPCR assay. We demonstrate the accuracy and precision of applying this method to the preparation of sequencing libraries.
Hu, Peng; Fabyanic, Emily; Kwon, Deborah Y; Tang, Sheng; Zhou, Zhaolan; Wu, Hao
2017-12-07
Massively parallel single-cell RNA sequencing can precisely resolve cellular diversity in a high-throughput manner at low cost, but unbiased isolation of intact single cells from complex tissues such as adult mammalian brains is challenging. Here, we integrate sucrose-gradient-assisted purification of nuclei with droplet microfluidics to develop a highly scalable single-nucleus RNA-seq approach (sNucDrop-seq), which is free of enzymatic dissociation and nucleus sorting. By profiling ∼18,000 nuclei isolated from cortical tissues of adult mice, we demonstrate that sNucDrop-seq not only accurately reveals neuronal and non-neuronal subtype composition with high sensitivity but also enables in-depth analysis of transient transcriptional states driven by neuronal activity, at single-cell resolution, in vivo. Copyright © 2017 Elsevier Inc. All rights reserved.
ls1 mardyn: The Massively Parallel Molecular Dynamics Code for Large Systems.
Niethammer, Christoph; Becker, Stefan; Bernreuther, Martin; Buchholz, Martin; Eckhardt, Wolfgang; Heinecke, Alexander; Werth, Stephan; Bungartz, Hans-Joachim; Glass, Colin W; Hasse, Hans; Vrabec, Jadran; Horsch, Martin
2014-10-14
The molecular dynamics simulation code ls1 mardyn is presented. It is a highly scalable code, optimized for massively parallel execution on supercomputing architectures and currently holds the world record for the largest molecular simulation with over four trillion particles. It enables the application of pair potentials to length and time scales that were previously out of scope for molecular dynamics simulation. With an efficient dynamic load balancing scheme, it delivers high scalability even for challenging heterogeneous configurations. Presently, multicenter rigid potential models based on Lennard-Jones sites, point charges, and higher-order polarities are supported. Due to its modular design, ls1 mardyn can be extended to new physical models, methods, and algorithms, allowing future users to tailor it to suit their respective needs. Possible applications include scenarios with complex geometries, such as fluids at interfaces, as well as nonequilibrium molecular dynamics simulation of heat and mass transfer.
Gole, Jeff; Gore, Athurva; Richards, Andrew; Chiu, Yu-Jui; Fung, Ho-Lim; Bushman, Diane; Chiang, Hsin-I; Chun, Jerold; Lo, Yu-Hwa; Zhang, Kun
2013-01-01
Genome sequencing of single cells has a variety of applications, including characterizing difficult-to-culture microorganisms and identifying somatic mutations in single cells from mammalian tissues. A major hurdle in this process is the bias in amplifying the genetic material from a single cell, a procedure known as polymerase cloning. Here we describe the microwell displacement amplification system (MIDAS), a massively parallel polymerase cloning method in which single cells are randomly distributed into hundreds to thousands of nanoliter wells and simultaneously amplified for shotgun sequencing. MIDAS reduces amplification bias because polymerase cloning occurs in physically separated nanoliter-scale reactors, facilitating the de novo assembly of near-complete microbial genomes from single E. coli cells. In addition, MIDAS allowed us to detect single-copy number changes in primary human adult neurons at 1–2 Mb resolution. MIDAS will further the characterization of genomic diversity in many heterogeneous cell populations. PMID:24213699
The development of a massive open online course during the 2014-15 Ebola virus disease epidemic.
Evans, Dabney P; Luffy, Samantha M; Parisi, Stephanie; Del Rio, Carlos
2017-09-01
Timely training was urgently needed at the onset of the 2014 Ebola virus disease epidemic. Massive open online courses (MOOCs) have grown in popularity, though little is known about their utility in time-sensitive situations, including infectious disease outbreaks. We created the first English language massive open online course on Ebola virus disease. Designed by a team representing various units of Emory University and six partner institutions, the six module course was aimed at a global general audience but also relevant for health care professionals. Over 7,000 learners from 170 countries participated in the initial course offering. More than a third of learners were from emerging economies, including seven percent from Africa, and another 13% from countries outside the United States who received individuals requiring treatment for Ebola virus disease. Creating and producing the first English language MOOC on EVD in a short time period required effective collaboration and strong coordination between subject matter and course development experts from Emory. Through these collaborative efforts, the development team was able to provide urgently needed training and educational materials while the epidemic of EVD continued to radiate through West Africa. Copyright © 2017 Elsevier Inc. All rights reserved.
Advancing migratory bird conservation and management by using radar: An interagency collaboration
Ruth, Janet M.; Barrow, Wylie C.; Sojda, Richard S.; Dawson, Deanna K.; Diehl, Robert H.; Manville, Albert; Green, Michael T.; Krueper, David J.; Johnston, Scott
2005-01-01
Many technical issues make this work difficult, including complex data structures, massive data sets, digital recognition of birds, large areas not covered by weather radar, and model validation; however, progress will only be furthered by tackling the challenge. The new coalition will meets its goals by: (1) facilitating a productive collaboration with NOAA, Department of the Interior bureaus, state wildlife agencies, universities, power companies, and other potential partners; (2) building and strengthening scientific capabilities within USGS; (3) addressing key migratory bird management issues; and (4) ensuring full funding for the collaborative effort.
Hao, Bibo; Sun, Wen; Yu, Yiqin; Li, Jing; Hu, Gang; Xie, Guotong
2016-01-01
Recent advances in cloud computing and machine learning made it more convenient for researchers to gain insights from massive healthcare data, while performing analyses on healthcare data in current practice still lacks efficiency for researchers. What's more, collaborating among different researchers and sharing analysis results are challenging issues. In this paper, we developed a practice to make analytics process collaborative and analysis results reproducible by exploiting and extending Jupyter Notebook. After applying this practice in our use cases, we can perform analyses and deliver results with less efforts in shorter time comparing to our previous practice.
Tsui, Nancy B. Y.; Jiang, Peiyong; Chow, Katherine C. K.; Su, Xiaoxi; Leung, Tak Y.; Sun, Hao; Chan, K. C. Allen; Chiu, Rossa W. K.; Lo, Y. M. Dennis
2012-01-01
Background Fetal DNA in maternal urine, if present, would be a valuable source of fetal genetic material for noninvasive prenatal diagnosis. However, the existence of fetal DNA in maternal urine has remained controversial. The issue is due to the lack of appropriate technology to robustly detect the potentially highly degraded fetal DNA in maternal urine. Methodology We have used massively parallel paired-end sequencing to investigate cell-free DNA molecules in maternal urine. Catheterized urine samples were collected from seven pregnant women during the third trimester of pregnancies. We detected fetal DNA by identifying sequenced reads that contained fetal-specific alleles of the single nucleotide polymorphisms. The sizes of individual urinary DNA fragments were deduced from the alignment positions of the paired reads. We measured the fractional fetal DNA concentration as well as the size distributions of fetal and maternal DNA in maternal urine. Principal Findings Cell-free fetal DNA was detected in five of the seven maternal urine samples, with the fractional fetal DNA concentrations ranged from 1.92% to 4.73%. Fetal DNA became undetectable in maternal urine after delivery. The total urinary cell-free DNA molecules were less intact when compared with plasma DNA. Urinary fetal DNA fragments were very short, and the most dominant fetal sequences were between 29 bp and 45 bp in length. Conclusions With the use of massively parallel sequencing, we have confirmed the existence of transrenal fetal DNA in maternal urine, and have shown that urinary fetal DNA was heavily degraded. PMID:23118982
Energy-efficient STDP-based learning circuits with memristor synapses
NASA Astrophysics Data System (ADS)
Wu, Xinyu; Saxena, Vishal; Campbell, Kristy A.
2014-05-01
It is now accepted that the traditional von Neumann architecture, with processor and memory separation, is ill suited to process parallel data streams which a mammalian brain can efficiently handle. Moreover, researchers now envision computing architectures which enable cognitive processing of massive amounts of data by identifying spatio-temporal relationships in real-time and solving complex pattern recognition problems. Memristor cross-point arrays, integrated with standard CMOS technology, are expected to result in massively parallel and low-power Neuromorphic computing architectures. Recently, significant progress has been made in spiking neural networks (SNN) which emulate data processing in the cortical brain. These architectures comprise of a dense network of neurons and the synapses formed between the axons and dendrites. Further, unsupervised or supervised competitive learning schemes are being investigated for global training of the network. In contrast to a software implementation, hardware realization of these networks requires massive circuit overhead for addressing and individually updating network weights. Instead, we employ bio-inspired learning rules such as the spike-timing-dependent plasticity (STDP) to efficiently update the network weights locally. To realize SNNs on a chip, we propose to use densely integrating mixed-signal integrate-andfire neurons (IFNs) and cross-point arrays of memristors in back-end-of-the-line (BEOL) of CMOS chips. Novel IFN circuits have been designed to drive memristive synapses in parallel while maintaining overall power efficiency (<1 pJ/spike/synapse), even at spike rate greater than 10 MHz. We present circuit design details and simulation results of the IFN with memristor synapses, its response to incoming spike trains and STDP learning characterization.
Massively parallel cis-regulatory analysis in the mammalian central nervous system
Shen, Susan Q.; Myers, Connie A.; Hughes, Andrew E.O.; Byrne, Leah C.; Flannery, John G.; Corbo, Joseph C.
2016-01-01
Cis-regulatory elements (CREs, e.g., promoters and enhancers) regulate gene expression, and variants within CREs can modulate disease risk. Next-generation sequencing has enabled the rapid generation of genomic data that predict the locations of CREs, but a bottleneck lies in functionally interpreting these data. To address this issue, massively parallel reporter assays (MPRAs) have emerged, in which barcoded reporter libraries are introduced into cells, and the resulting barcoded transcripts are quantified by next-generation sequencing. Thus far, MPRAs have been largely restricted to assaying short CREs in a limited repertoire of cultured cell types. Here, we present two advances that extend the biological relevance and applicability of MPRAs. First, we adapt exome capture technology to instead capture candidate CREs, thereby tiling across the targeted regions and markedly increasing the length of CREs that can be readily assayed. Second, we package the library into adeno-associated virus (AAV), thereby allowing delivery to target organs in vivo. As a proof of concept, we introduce a capture library of about 46,000 constructs, corresponding to roughly 3500 DNase I hypersensitive (DHS) sites, into the mouse retina by ex vivo plasmid electroporation and into the mouse cerebral cortex by in vivo AAV injection. We demonstrate tissue-specific cis-regulatory activity of DHSs and provide examples of high-resolution truncation mutation analysis for multiplex parsing of CREs. Our approach should enable massively parallel functional analysis of a wide range of CREs in any organ or species that can be infected by AAV, such as nonhuman primates and human stem cell–derived organoids. PMID:26576614
Massively Parallel Simulations of Diffusion in Dense Polymeric Structures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Faulon, Jean-Loup, Wilcox, R.T.
1997-11-01
An original computational technique to generate close-to-equilibrium dense polymeric structures is proposed. Diffusion of small gases are studied on the equilibrated structures using massively parallel molecular dynamics simulations running on the Intel Teraflops (9216 Pentium Pro processors) and Intel Paragon(1840 processors). Compared to the current state-of-the-art equilibration methods this new technique appears to be faster by some orders of magnitude.The main advantage of the technique is that one can circumvent the bottlenecks in configuration space that inhibit relaxation in molecular dynamics simulations. The technique is based on the fact that tetravalent atoms (such as carbon and silicon) fit in themore » center of a regular tetrahedron and that regular tetrahedrons can be used to mesh the three-dimensional space. Thus, the problem of polymer equilibration described by continuous equations in molecular dynamics is reduced to a discrete problem where solutions are approximated by simple algorithms. Practical modeling applications include the constructing of butyl rubber and ethylene-propylene-dimer-monomer (EPDM) models for oxygen and water diffusion calculations. Butyl and EPDM are used in O-ring systems and serve as sealing joints in many manufactured objects. Diffusion coefficients of small gases have been measured experimentally on both polymeric systems, and in general the diffusion coefficients in EPDM are an order of magnitude larger than in butyl. In order to better understand the diffusion phenomena, 10, 000 atoms models were generated and equilibrated for butyl and EPDM. The models were submitted to a massively parallel molecular dynamics simulation to monitor the trajectories of the diffusing species.« less
NASA Technical Reports Server (NTRS)
Lyster, P. M.; Liewer, P. C.; Decyk, V. K.; Ferraro, R. D.
1995-01-01
A three-dimensional electrostatic particle-in-cell (PIC) plasma simulation code has been developed on coarse-grain distributed-memory massively parallel computers with message passing communications. Our implementation is the generalization to three-dimensions of the general concurrent particle-in-cell (GCPIC) algorithm. In the GCPIC algorithm, the particle computation is divided among the processors using a domain decomposition of the simulation domain. In a three-dimensional simulation, the domain can be partitioned into one-, two-, or three-dimensional subdomains ("slabs," "rods," or "cubes") and we investigate the efficiency of the parallel implementation of the push for all three choices. The present implementation runs on the Intel Touchstone Delta machine at Caltech; a multiple-instruction-multiple-data (MIMD) parallel computer with 512 nodes. We find that the parallel efficiency of the push is very high, with the ratio of communication to computation time in the range 0.3%-10.0%. The highest efficiency (> 99%) occurs for a large, scaled problem with 64(sup 3) particles per processing node (approximately 134 million particles of 512 nodes) which has a push time of about 250 ns per particle per time step. We have also developed expressions for the timing of the code which are a function of both code parameters (number of grid points, particles, etc.) and machine-dependent parameters (effective FLOP rate, and the effective interprocessor bandwidths for the communication of particles and grid points). These expressions can be used to estimate the performance of scaled problems--including those with inhomogeneous plasmas--to other parallel machines once the machine-dependent parameters are known.
Solving very large, sparse linear systems on mesh-connected parallel computers
NASA Technical Reports Server (NTRS)
Opsahl, Torstein; Reif, John
1987-01-01
The implementation of Pan and Reif's Parallel Nested Dissection (PND) algorithm on mesh connected parallel computers is described. This is the first known algorithm that allows very large, sparse linear systems of equations to be solved efficiently in polylog time using a small number of processors. How the processor bound of PND can be matched to the number of processors available on a given parallel computer by slowing down the algorithm by constant factors is described. Also, for the important class of problems where G(A) is a grid graph, a unique memory mapping that reduces the inter-processor communication requirements of PND to those that can be executed on mesh connected parallel machines is detailed. A description of an implementation on the Goodyear Massively Parallel Processor (MPP), located at Goddard is given. Also, a detailed discussion of data mappings and performance issues is given.
Karasick, Michael S.; Strip, David R.
1996-01-01
A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modelling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modelling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modelling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication.
An Overview of Mesoscale Modeling Software for Energetic Materials Research
2010-03-01
12 2.9 Large-scale Atomic/Molecular Massively Parallel Simulator ( LAMMPS ...13 Table 10. LAMMPS summary...extensive reviews, lectures and workshops are available on multiscale modeling of materials applications (76-78). • Multi-phase mixtures of
GPU computing in medical physics: a review.
Pratx, Guillem; Xing, Lei
2011-05-01
The graphics processing unit (GPU) has emerged as a competitive platform for computing massively parallel problems. Many computing applications in medical physics can be formulated as data-parallel tasks that exploit the capabilities of the GPU for reducing processing times. The authors review the basic principles of GPU computing as well as the main performance optimization techniques, and survey existing applications in three areas of medical physics, namely image reconstruction, dose calculation and treatment plan optimization, and image processing.
NASA Astrophysics Data System (ADS)
Kjærgaard, Thomas; Baudin, Pablo; Bykov, Dmytro; Eriksen, Janus Juul; Ettenhuber, Patrick; Kristensen, Kasper; Larkin, Jeff; Liakh, Dmitry; Pawłowski, Filip; Vose, Aaron; Wang, Yang Min; Jørgensen, Poul
2017-03-01
We present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide-Expand-Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide-Expand-Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalability of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the "resolution of the identity second-order Møller-Plesset perturbation theory" (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.
Massive parallel 3D PIC simulation of negative ion extraction
NASA Astrophysics Data System (ADS)
Revel, Adrien; Mochalskyy, Serhiy; Montellano, Ivar Mauricio; Wünderlich, Dirk; Fantz, Ursel; Minea, Tiberiu
2017-09-01
The 3D PIC-MCC code ONIX is dedicated to modeling Negative hydrogen/deuterium Ion (NI) extraction and co-extraction of electrons from radio-frequency driven, low pressure plasma sources. It provides valuable insight on the complex phenomena involved in the extraction process. In previous calculations, a mesh size larger than the Debye length was used, implying numerical electron heating. Important steps have been achieved in terms of computation performance and parallelization efficiency allowing successful massive parallel calculations (4096 cores), imperative to resolve the Debye length. In addition, the numerical algorithms have been improved in terms of grid treatment, i.e., the electric field near the complex geometry boundaries (plasma grid) is calculated more accurately. The revised model preserves the full 3D treatment, but can take advantage of a highly refined mesh. ONIX was used to investigate the role of the mesh size, the re-injection scheme for lost particles (extracted or wall absorbed), and the electron thermalization process on the calculated extracted current and plasma characteristics. It is demonstrated that all numerical schemes give the same NI current distribution for extracted ions. Concerning the electrons, the pair-injection technique is found well-adapted to simulate the sheath in front of the plasma grid.
GPAW - massively parallel electronic structure calculations with Python-based software.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Enkovaara, J.; Romero, N.; Shende, S.
2011-01-01
Electronic structure calculations are a widely used tool in materials science and large consumer of supercomputing resources. Traditionally, the software packages for these kind of simulations have been implemented in compiled languages, where Fortran in its different versions has been the most popular choice. While dynamic, interpreted languages, such as Python, can increase the effciency of programmer, they cannot compete directly with the raw performance of compiled languages. However, by using an interpreted language together with a compiled language, it is possible to have most of the productivity enhancing features together with a good numerical performance. We have used thismore » approach in implementing an electronic structure simulation software GPAW using the combination of Python and C programming languages. While the chosen approach works well in standard workstations and Unix environments, massively parallel supercomputing systems can present some challenges in porting, debugging and profiling the software. In this paper we describe some details of the implementation and discuss the advantages and challenges of the combined Python/C approach. We show that despite the challenges it is possible to obtain good numerical performance and good parallel scalability with Python based software.« less
NASA Technical Reports Server (NTRS)
Sobieszczanski-Sobieski, Jaroslaw
1998-01-01
The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate a radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimization (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behavior by interaction of a large number of very simple models may be an inspiration for the above algorithms, the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should be now, even though the widespread availability of massively parallel processing is still a few years away.
Cazzaniga, Paolo; Nobile, Marco S.; Besozzi, Daniela; Bellini, Matteo; Mauri, Giancarlo
2014-01-01
The introduction of general-purpose Graphics Processing Units (GPUs) is boosting scientific applications in Bioinformatics, Systems Biology, and Computational Biology. In these fields, the use of high-performance computing solutions is motivated by the need of performing large numbers of in silico analysis to study the behavior of biological systems in different conditions, which necessitate a computing power that usually overtakes the capability of standard desktop computers. In this work we present coagSODA, a CUDA-powered computational tool that was purposely developed for the analysis of a large mechanistic model of the blood coagulation cascade (BCC), defined according to both mass-action kinetics and Hill functions. coagSODA allows the execution of parallel simulations of the dynamics of the BCC by automatically deriving the system of ordinary differential equations and then exploiting the numerical integration algorithm LSODA. We present the biological results achieved with a massive exploration of perturbed conditions of the BCC, carried out with one-dimensional and bi-dimensional parameter sweep analysis, and show that GPU-accelerated parallel simulations of this model can increase the computational performances up to a 181× speedup compared to the corresponding sequential simulations. PMID:25025072
Massively Multithreaded Maxflow for Image Segmentation on the Cray XMT-2
Bokhari, Shahid H.; Çatalyürek, Ümit V.; Gurcan, Metin N.
2014-01-01
SUMMARY Image segmentation is a very important step in the computerized analysis of digital images. The maxflow mincut approach has been successfully used to obtain minimum energy segmentations of images in many fields. Classical algorithms for maxflow in networks do not directly lend themselves to efficient parallel implementations on contemporary parallel processors. We present the results of an implementation of Goldberg-Tarjan preflow-push algorithm on the Cray XMT-2 massively multithreaded supercomputer. This machine has hardware support for 128 threads in each physical processor, a uniformly accessible shared memory of up to 4 TB and hardware synchronization for each 64 bit word. It is thus well-suited to the parallelization of graph theoretic algorithms, such as preflow-push. We describe the implementation of the preflow-push code on the XMT-2 and present the results of timing experiments on a series of synthetically generated as well as real images. Our results indicate very good performance on large images and pave the way for practical applications of this machine architecture for image analysis in a production setting. The largest images we have run are 320002 pixels in size, which are well beyond the largest previously reported in the literature. PMID:25598745
NASA Technical Reports Server (NTRS)
Sobieszczanski-Sobieski, Jaroslaw
1999-01-01
The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate. A radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimisation (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behaviour by interaction of a large number of very simple models may be an inspiration for the above algorithms; the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should begin now, even though the widespread availability of massively parallel processing is still a few years away.
A massively asynchronous, parallel brain.
Zeki, Semir
2015-05-19
Whether the visual brain uses a parallel or a serial, hierarchical, strategy to process visual signals, the end result appears to be that different attributes of the visual scene are perceived asynchronously--with colour leading form (orientation) by 40 ms and direction of motion by about 80 ms. Whatever the neural root of this asynchrony, it creates a problem that has not been properly addressed, namely how visual attributes that are perceived asynchronously over brief time windows after stimulus onset are bound together in the longer term to give us a unified experience of the visual world, in which all attributes are apparently seen in perfect registration. In this review, I suggest that there is no central neural clock in the (visual) brain that synchronizes the activity of different processing systems. More likely, activity in each of the parallel processing-perceptual systems of the visual brain is reset independently, making of the brain a massively asynchronous organ, just like the new generation of more efficient computers promise to be. Given the asynchronous operations of the brain, it is likely that the results of activities in the different processing-perceptual systems are not bound by physiological interactions between cells in the specialized visual areas, but post-perceptually, outside the visual brain.
NASA Technical Reports Server (NTRS)
Banks, Daniel W.; Laflin, Brenda E. Gile; Kemmerly, Guy T.; Campbell, Bryan A.
1999-01-01
The paper identifies speed, agility, human interface, generation of sensitivity information, task decomposition, and data transmission (including storage) as important attributes for a computer environment to have in order to support engineering design effectively. It is argued that when examined in terms of these attributes the presently available environment can be shown to be inadequate. A radical improvement is needed, and it may be achieved by combining new methods that have recently emerged from multidisciplinary design optimisation (MDO) with massively parallel processing computer technology. The caveat is that, for successful use of that technology in engineering computing, new paradigms for computing will have to be developed - specifically, innovative algorithms that are intrinsically parallel so that their performance scales up linearly with the number of processors. It may be speculated that the idea of simulating a complex behaviour by interaction of a large number of very simple models may be an inspiration for the above algorithms; the cellular automata are an example. Because of the long lead time needed to develop and mature new paradigms, development should begin now, even though the widespread availability of massively parallel processing is still a few years away.
Suppressing correlations in massively parallel simulations of lattice models
NASA Astrophysics Data System (ADS)
Kelling, Jeffrey; Ódor, Géza; Gemming, Sibylle
2017-11-01
For lattice Monte Carlo simulations parallelization is crucial to make studies of large systems and long simulation time feasible, while sequential simulations remain the gold-standard for correlation-free dynamics. Here, various domain decomposition schemes are compared, concluding with one which delivers virtually correlation-free simulations on GPUs. Extensive simulations of the octahedron model for 2 + 1 dimensional Kardar-Parisi-Zhang surface growth, which is very sensitive to correlation in the site-selection dynamics, were performed to show self-consistency of the parallel runs and agreement with the sequential algorithm. We present a GPU implementation providing a speedup of about 30 × over a parallel CPU implementation on a single socket and at least 180 × with respect to the sequential reference.
The remote sensing image segmentation mean shift algorithm parallel processing based on MapReduce
NASA Astrophysics Data System (ADS)
Chen, Xi; Zhou, Liqing
2015-12-01
With the development of satellite remote sensing technology and the remote sensing image data, traditional remote sensing image segmentation technology cannot meet the massive remote sensing image processing and storage requirements. This article put cloud computing and parallel computing technology in remote sensing image segmentation process, and build a cheap and efficient computer cluster system that uses parallel processing to achieve MeanShift algorithm of remote sensing image segmentation based on the MapReduce model, not only to ensure the quality of remote sensing image segmentation, improved split speed, and better meet the real-time requirements. The remote sensing image segmentation MeanShift algorithm parallel processing algorithm based on MapReduce shows certain significance and a realization of value.
Line-drawing algorithms for parallel machines
NASA Technical Reports Server (NTRS)
Pang, Alex T.
1990-01-01
The fact that conventional line-drawing algorithms, when applied directly on parallel machines, can lead to very inefficient codes is addressed. It is suggested that instead of modifying an existing algorithm for a parallel machine, a more efficient implementation can be produced by going back to the invariants in the definition. Popular line-drawing algorithms are compared with two alternatives; distance to a line (a point is on the line if sufficiently close to it) and intersection with a line (a point on the line if an intersection point). For massively parallel single-instruction-multiple-data (SIMD) machines (with thousands of processors and up), the alternatives provide viable line-drawing algorithms. Because of the pixel-per-processor mapping, their performance is independent of the line length and orientation.
An efficient parallel algorithm for matrix-vector multiplication
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hendrickson, B.; Leland, R.; Plimpton, S.
The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
Rehfeldt, Ruth Anne; Jung, Heidi L; Aguirre, Angelica; Nichols, Jane L; Root, William B
2016-03-01
The e-Transformation in higher education, in which Massive Open Online Courses (MOOCs) are playing a pivotal role, has had an impact on the modality in which behavior analysis is taught. In this paper, we survey the history and implications of online education including MOOCs and describe the implementation and results for the discipline's first MOOC, delivered at Southern Illinois University in spring 2015. Implications for the globalization and free access of higher education are discussed, as well as the parallel between MOOCs and Skinner's teaching machines.
A Pervasive Parallel Processing Framework for Data Visualization and Analysis at Extreme Scale
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth; Geveci, Berk
2014-11-01
The evolution of the computing world from teraflop to petaflop has been relatively effortless, with several of the existing programming models scaling effectively to the petascale. The migration to exascale, however, poses considerable challenges. All industry trends infer that the exascale machine will be built using processors containing hundreds to thousands of cores per chip. It can be inferred that efficient concurrency on exascale machines requires a massive amount of concurrent threads, each performing many operations on a localized piece of data. Currently, visualization libraries and applications are based off what is known as the visualization pipeline. In the pipelinemore » model, algorithms are encapsulated as filters with inputs and outputs. These filters are connected by setting the output of one component to the input of another. Parallelism in the visualization pipeline is achieved by replicating the pipeline for each processing thread. This works well for today’s distributed memory parallel computers but cannot be sustained when operating on processors with thousands of cores. Our project investigates a new visualization framework designed to exhibit the pervasive parallelism necessary for extreme scale machines. Our framework achieves this by defining algorithms in terms of worklets, which are localized stateless operations. Worklets are atomic operations that execute when invoked unlike filters, which execute when a pipeline request occurs. The worklet design allows execution on a massive amount of lightweight threads with minimal overhead. Only with such fine-grained parallelism can we hope to fill the billions of threads we expect will be necessary for efficient computation on an exascale machine.« less
A Fast Algorithm for Massively Parallel, Long-Term, Simulation of Complex Molecular Dynamics Systems
NASA Technical Reports Server (NTRS)
Jaramillo-Botero, Andres; Goddard, William A, III; Fijany, Amir
1997-01-01
The advances in theory and computing technology over the last decade have led to enormous progress in applying atomistic molecular dynamics (MD) methods to the characterization, prediction, and design of chemical, biological, and material systems,.
Branched Polymers for Enhancing Polymer Gel Strength and Toughness
2013-02-01
Molecular Massively Parallel Simulator ( LAMMPS ) program and the stress-strain relations were calculated with varying strain-rates (figure 6). A...Acronyms ARL U.S. Army Research Laboratory D3 hexamethylcyclotrisiloxane FTIR Fourier transform infrared GPC gel permeation chromatography LAMMPS
DOE Office of Scientific and Technical Information (OSTI.GOV)
SmartImport.py is a Python source-code file that implements a replacement for the standard Python module importer. The code is derived from knee.py, a file in the standard Python diestribution , and adds functionality to improve the performance of Python module imports in massively parallel contexts.
NASA Technical Reports Server (NTRS)
Tilton, James C.
1988-01-01
Image segmentation can be a key step in data compression and image analysis. However, the segmentation results produced by most previous approaches to region growing are suspect because they depend on the order in which portions of the image are processed. An iterative parallel segmentation algorithm avoids this problem by performing globally best merges first. Such a segmentation approach, and two implementations of the approach on NASA's Massively Parallel Processor (MPP) are described. Application of the segmentation approach to data compression and image analysis is then described, and results of such application are given for a LANDSAT Thematic Mapper image.
Low profile, highly configurable, current sharing paralleled wide band gap power device power module
McPherson, Brice; Killeen, Peter D.; Lostetter, Alex; Shaw, Robert; Passmore, Brandon; Hornberger, Jared; Berry, Tony M
2016-08-23
A power module with multiple equalized parallel power paths supporting multiple parallel bare die power devices constructed with low inductance equalized current paths for even current sharing and clean switching events. Wide low profile power contacts provide low inductance, short current paths, and large conductor cross section area provides for massive current carrying. An internal gate & source kelvin interconnection substrate is provided with individual ballast resistors and simple bolted construction. Gate drive connectors are provided on either left or right size of the module. The module is configurable as half bridge, full bridge, common source, and common drain topologies.
Progress on the Multiphysics Capabilities of the Parallel Electromagnetic ACE3P Simulation Suite
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kononenko, Oleksiy
2015-03-26
ACE3P is a 3D parallel simulation suite that is being developed at SLAC National Accelerator Laboratory. Effectively utilizing supercomputer resources, ACE3P has become a key tool for the coupled electromagnetic, thermal and mechanical research and design of particle accelerators. Based on the existing finite-element infrastructure, a massively parallel eigensolver is developed for modal analysis of mechanical structures. It complements a set of the multiphysics tools in ACE3P and, in particular, can be used for the comprehensive study of microphonics in accelerating cavities ensuring the operational reliability of a particle accelerator.
Biomimetic Models for An Ecological Approach to Massively-Deployed Sensor Networks
NASA Technical Reports Server (NTRS)
Jones, Kennie H.; Lodding, Kenneth N.; Olariu, Stephan; Wilson, Larry; Xin, Chunsheng
2005-01-01
Promises of ubiquitous control of the physical environment by massively-deployed wireless sensor networks open avenues for new applications that will redefine the way we live and work. Due to small size and low cost of sensor devices, visionaries promise systems enabled by deployment of massive numbers of sensors ubiquitous throughout our environment working in concert. Recent research has concentrated on developing techniques for performing relatively simple tasks with minimal energy expense, assuming some form of centralized control. Unfortunately, centralized control is not conducive to parallel activities and does not scale to massive size networks. Execution of simple tasks in sparse networks will not lead to the sophisticated applications predicted. We propose a new way of looking at massively-deployed sensor networks, motivated by lessons learned from the way biological ecosystems are organized. We demonstrate that in such a model, fully distributed data aggregation can be performed in a scalable fashion in massively deployed sensor networks, where motes operate on local information, making local decisions that are aggregated across the network to achieve globally-meaningful effects. We show that such architectures may be used to facilitate communication and synchronization in a fault-tolerant manner, while balancing workload and required energy expenditure throughout the network.
NASA Technical Reports Server (NTRS)
OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)
1998-01-01
This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
Yang, L. H.; Brooks III, E. D.; Belak, J.
1992-01-01
A molecular dynamics algorithm for performing large-scale simulations using the Parallel C Preprocessor (PCP) programming paradigm on the BBN TC2000, a massively parallel computer, is discussed. The algorithm uses a linked-cell data structure to obtain the near neighbors of each atom as time evoles. Each processor is assigned to a geometric domain containing many subcells and the storage for that domain is private to the processor. Within this scheme, the interdomain (i.e., interprocessor) communication is minimized.
Reconstruction of coded aperture images
NASA Technical Reports Server (NTRS)
Bielefeld, Michael J.; Yin, Lo I.
1987-01-01
Balanced correlation method and the Maximum Entropy Method (MEM) were implemented to reconstruct a laboratory X-ray source as imaged by a Uniformly Redundant Array (URA) system. Although the MEM method has advantages over the balanced correlation method, it is computationally time consuming because of the iterative nature of its solution. Massively Parallel Processing, with its parallel array structure is ideally suited for such computations. These preliminary results indicate that it is possible to use the MEM method in future coded-aperture experiments with the help of the MPP.
Function algorithms for MPP scientific subroutines, volume 1
NASA Technical Reports Server (NTRS)
Gouch, J. G.
1984-01-01
Design documentation and user documentation for function algorithms for the Massively Parallel Processor (MPP) are presented. The contract specifies development of MPP assembler instructions to perform the following functions: natural logarithm; exponential (e to the x power); square root; sine; cosine; and arctangent. To fulfill the requirements of the contract, parallel array and solar implementations for these functions were developed on the PDP11/34 Program Development and Management Unit (PDMU) that is resident at the MPP testbed installation located at the NASA Goddard facility.
ERIC Educational Resources Information Center
Gardner, Maureen B., Ed.
This document presents the Maryland Collaborative for Teacher Preparation (MCTP) faculty's reviews on instructional issues of different disciplines. Contents include: (1) "The Maryland Collaborative for Teacher Preparation"; (2) "Guiding Principles: New Thinking in Mathematics and Science Teaching"; (3) "Introduction: Parallel Journeys of Risk and…
User's Guide for TOUGH2-MP - A Massively Parallel Version of the TOUGH2 Code
DOE Office of Scientific and Technical Information (OSTI.GOV)
Earth Sciences Division; Zhang, Keni; Zhang, Keni
TOUGH2-MP is a massively parallel (MP) version of the TOUGH2 code, designed for computationally efficient parallel simulation of isothermal and nonisothermal flows of multicomponent, multiphase fluids in one, two, and three-dimensional porous and fractured media. In recent years, computational requirements have become increasingly intensive in large or highly nonlinear problems for applications in areas such as radioactive waste disposal, CO2 geological sequestration, environmental assessment and remediation, reservoir engineering, and groundwater hydrology. The primary objective of developing the parallel-simulation capability is to significantly improve the computational performance of the TOUGH2 family of codes. The particular goal for the parallel simulator ismore » to achieve orders-of-magnitude improvement in computational time for models with ever-increasing complexity. TOUGH2-MP is designed to perform parallel simulation on multi-CPU computational platforms. An earlier version of TOUGH2-MP (V1.0) was based on the TOUGH2 Version 1.4 with EOS3, EOS9, and T2R3D modules, a software previously qualified for applications in the Yucca Mountain project, and was designed for execution on CRAY T3E and IBM SP supercomputers. The current version of TOUGH2-MP (V2.0) includes all fluid property modules of the standard version TOUGH2 V2.0. It provides computationally efficient capabilities using supercomputers, Linux clusters, or multi-core PCs, and also offers many user-friendly features. The parallel simulator inherits all process capabilities from V2.0 together with additional capabilities for handling fractured media from V1.4. This report provides a quick starting guide on how to set up and run the TOUGH2-MP program for users with a basic knowledge of running the (standard) version TOUGH2 code, The report also gives a brief technical description of the code, including a discussion of parallel methodology, code structure, as well as mathematical and numerical methods used. To familiarize users with the parallel code, illustrative sample problems are presented.« less
GPU-completeness: theory and implications
NASA Astrophysics Data System (ADS)
Lin, I.-Jong
2011-01-01
This paper formalizes a major insight into a class of algorithms that relate parallelism and performance. The purpose of this paper is to define a class of algorithms that trades off parallelism for quality of result (e.g. visual quality, compression rate), and we propose a similar method for algorithmic classification based on NP-Completeness techniques, applied toward parallel acceleration. We will define this class of algorithm as "GPU-Complete" and will postulate the necessary properties of the algorithms for admission into this class. We will also formally relate his algorithmic space and imaging algorithms space. This concept is based upon our experience in the print production area where GPUs (Graphic Processing Units) have shown a substantial cost/performance advantage within the context of HPdelivered enterprise services and commercial printing infrastructure. While CPUs and GPUs are converging in their underlying hardware and functional blocks, their system behaviors are clearly distinct in many ways: memory system design, programming paradigms, and massively parallel SIMD architecture. There are applications that are clearly suited to each architecture: for CPU: language compilation, word processing, operating systems, and other applications that are highly sequential in nature; for GPU: video rendering, particle simulation, pixel color conversion, and other problems clearly amenable to massive parallelization. While GPUs establishing themselves as a second, distinct computing architecture from CPUs, their end-to-end system cost/performance advantage in certain parts of computation inform the structure of algorithms and their efficient parallel implementations. While GPUs are merely one type of architecture for parallelization, we show that their introduction into the design space of printing systems demonstrate the trade-offs against competing multi-core, FPGA, and ASIC architectures. While each architecture has its own optimal application, we believe that the selection of architecture can be defined in terms of properties of GPU-Completeness. For a welldefined subset of algorithms, GPU-Completeness is intended to connect the parallelism, algorithms and efficient architectures into a unified framework to show that multiple layers of parallel implementation are guided by the same underlying trade-off.
Investigations into Gravitational Wave Emission from Compact Body Inspiral into Massive Black Holes
NASA Technical Reports Server (NTRS)
Hughes, Scott A.
2005-01-01
In contrast to year 1 (when much of the activity associated with this grant focused upon developing our group at MIT), year 2 was a period of very focused attention on research problems. We made significant progress developing relativistic waveforms for the extreme mass ratio inspiral problem; we have pushed forward a formalism our group developed for mapping the spacetimes of massive compact objects; and, in collaboration with the Caltech group, we began to develop a framework for addressing issues in LISA data analysis for extreme mass ratio systems.
Data intensive computing at Sandia.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wilson, Andrew T.
2010-09-01
Data-Intensive Computing is parallel computing where you design your algorithms and your software around efficient access and traversal of a data set; where hardware requirements are dictated by data size as much as by desired run times usually distilling compact results from massive data.
Optimization of Particle-in-Cell Codes on RISC Processors
NASA Technical Reports Server (NTRS)
Decyk, Viktor K.; Karmesin, Steve Roy; Boer, Aeint de; Liewer, Paulette C.
1996-01-01
General strategies are developed to optimize particle-cell-codes written in Fortran for RISC processors which are commonly used on massively parallel computers. These strategies include data reorganization to improve cache utilization and code reorganization to improve efficiency of arithmetic pipelines.
Optimized Landing of Autonomous Unmanned Aerial Vehicle Swarms
2012-06-01
understanding about the world. Examples of these emergent behaviors include construction of complex structures (e.g., hives, termite mounds), trends in economic...Sep. 2007. [16] M. Resnick, Turtles, Termites , and Traffic Jams: Explorations in Massively Parallel Microworlds. MIT Press, 1997. [Online]. Available
Constraint-Based Scheduling System
NASA Technical Reports Server (NTRS)
Zweben, Monte; Eskey, Megan; Stock, Todd; Taylor, Will; Kanefsky, Bob; Drascher, Ellen; Deale, Michael; Daun, Brian; Davis, Gene
1995-01-01
Report describes continuing development of software for constraint-based scheduling system implemented eventually on massively parallel computer. Based on machine learning as means of improving scheduling. Designed to learn when to change search strategy by analyzing search progress and learning general conditions under which resource bottleneck occurs.
Chen, Mingyang; Stott, Amanda C; Li, Shenggang; Dixon, David A
2012-04-01
A robust metadata database called the Collaborative Chemistry Database Tool (CCDBT) for massive amounts of computational chemistry raw data has been designed and implemented. It performs data synchronization and simultaneously extracts the metadata. Computational chemistry data in various formats from different computing sources, software packages, and users can be parsed into uniform metadata for storage in a MySQL database. Parsing is performed by a parsing pyramid, including parsers written for different levels of data types and sets created by the parser loader after loading parser engines and configurations. Copyright © 2011 Elsevier Inc. All rights reserved.
Mueller, Jennifer J; Schlappe, Brooke A; Kumar, Rahul; Olvera, Narciso; Dao, Fanny; Abu-Rustum, Nadeem; Aghajanian, Carol; DeLair, Deborah; Hussein, Yaser R; Soslow, Robert A; Levine, Douglas A; Weigelt, Britta
2018-05-21
Mucinous ovarian cancer (MOC) is a rare type of epithelial ovarian cancer resistant to standard chemotherapy regimens. We sought to characterize the repertoire of somatic mutations in MOCs and to define the contribution of massively parallel sequencing to the classification of tumors diagnosed as primary MOCs. Following gynecologic pathology and chart review, DNA samples obtained from primary MOCs and matched normal tissues/blood were subjected to whole-exome (n = 9) or massively parallel sequencing targeting 341 cancer genes (n = 15). Immunohistochemical analysis of estrogen receptor, progesterone receptor, PTEN, ARID1A/BAF250a, and the DNA mismatch (MMR) proteins MSH6 and PMS2 was performed for all cases. Mutational frequencies of MOCs were compared to those of high-grade serous ovarian cancers (HGSOCs) and mucinous tumors from other sites. MOCs were heterogeneous at the genetic level, frequently harboring TP53 (75%) mutations, KRAS (71%) mutations and/or CDKN2A/B homozygous deletions/mutations (33%). Although established criteria for diagnosis were employed, four cases harbored mutational and immunohistochemical profiles similar to those of endometrioid carcinomas, and one case for colorectal or endometrioid carcinoma. Significant differences in the frequencies of KRAS, TP53, CDKN2A, FBXW7, PIK3CA and/or APC mutations between the confirmed primary MOCs (n = 19) and HGSOCs, mucinous gastric and/or mucinous colorectal carcinomas were found, whereas no differences in the 341 genes studied between MOCs and mucinous pancreatic carcinomas were identified. Our findings suggest that the assessment of mutations affecting TP53, KRAS, PIK3CA, ARID1A and POLE, and DNA MMR protein expression may be used to further aid the diagnosis and treatment decision-making of primary MOC. Copyright © 2018 Elsevier Inc. All rights reserved.
Steinberg, Karyn Meltz; Ramachandran, Dhanya; Patel, Viren C; Shetty, Amol C; Cutler, David J; Zwick, Michael E
2012-09-28
Autism spectrum disorder (ASD) is highly heritable, but the genetic risk factors for it remain largely unknown. Although structural variants with large effect sizes may explain up to 15% ASD, genome-wide association studies have failed to uncover common single nucleotide variants with large effects on phenotype. The focus within ASD genetics is now shifting to the examination of rare sequence variants of modest effect, which is most often achieved via exome selection and sequencing. This strategy has indeed identified some rare candidate variants; however, the approach does not capture the full spectrum of genetic variation that might contribute to the phenotype. We surveyed two loci with known rare variants that contribute to ASD, the X-linked neuroligin genes by performing massively parallel Illumina sequencing of the coding and noncoding regions from these genes in males from families with multiplex autism. We annotated all variant sites and functionally tested a subset to identify other rare mutations contributing to ASD susceptibility. We found seven rare variants at evolutionary conserved sites in our study population. Functional analyses of the three 3' UTR variants did not show statistically significant effects on the expression of NLGN3 and NLGN4X. In addition, we identified two NLGN3 intronic variants located within conserved transcription factor binding sites that could potentially affect gene regulation. These data demonstrate the power of massively parallel, targeted sequencing studies of affected individuals for identifying rare, potentially disease-contributing variation. However, they also point out the challenges and limitations of current methods of direct functional testing of rare variants and the difficulties of identifying alleles with modest effects.
2012-01-01
Background Autism spectrum disorder (ASD) is highly heritable, but the genetic risk factors for it remain largely unknown. Although structural variants with large effect sizes may explain up to 15% ASD, genome-wide association studies have failed to uncover common single nucleotide variants with large effects on phenotype. The focus within ASD genetics is now shifting to the examination of rare sequence variants of modest effect, which is most often achieved via exome selection and sequencing. This strategy has indeed identified some rare candidate variants; however, the approach does not capture the full spectrum of genetic variation that might contribute to the phenotype. Methods We surveyed two loci with known rare variants that contribute to ASD, the X-linked neuroligin genes by performing massively parallel Illumina sequencing of the coding and noncoding regions from these genes in males from families with multiplex autism. We annotated all variant sites and functionally tested a subset to identify other rare mutations contributing to ASD susceptibility. Results We found seven rare variants at evolutionary conserved sites in our study population. Functional analyses of the three 3’ UTR variants did not show statistically significant effects on the expression of NLGN3 and NLGN4X. In addition, we identified two NLGN3 intronic variants located within conserved transcription factor binding sites that could potentially affect gene regulation. Conclusions These data demonstrate the power of massively parallel, targeted sequencing studies of affected individuals for identifying rare, potentially disease-contributing variation. However, they also point out the challenges and limitations of current methods of direct functional testing of rare variants and the difficulties of identifying alleles with modest effects. PMID:23020841
PARAVT: Parallel Voronoi tessellation code
NASA Astrophysics Data System (ADS)
González, R. E.
2016-10-01
In this study, we present a new open source code for massive parallel computation of Voronoi tessellations (VT hereafter) in large data sets. The code is focused for astrophysical purposes where VT densities and neighbors are widely used. There are several serial Voronoi tessellation codes, however no open source and parallel implementations are available to handle the large number of particles/galaxies in current N-body simulations and sky surveys. Parallelization is implemented under MPI and VT using Qhull library. Domain decomposition takes into account consistent boundary computation between tasks, and includes periodic conditions. In addition, the code computes neighbors list, Voronoi density, Voronoi cell volume, density gradient for each particle, and densities on a regular grid. Code implementation and user guide are publicly available at https://github.com/regonzar/paravt.
Parallel Computational Fluid Dynamics: Current Status and Future Requirements
NASA Technical Reports Server (NTRS)
Simon, Horst D.; VanDalsem, William R.; Dagum, Leonardo; Kutler, Paul (Technical Monitor)
1994-01-01
One or the key objectives of the Applied Research Branch in the Numerical Aerodynamic Simulation (NAS) Systems Division at NASA Allies Research Center is the accelerated introduction of highly parallel machines into a full operational environment. In this report we discuss the performance results obtained from the implementation of some computational fluid dynamics (CFD) applications on the Connection Machine CM-2 and the Intel iPSC/860. We summarize some of the experiences made so far with the parallel testbed machines at the NAS Applied Research Branch. Then we discuss the long term computational requirements for accomplishing some of the grand challenge problems in computational aerosciences. We argue that only massively parallel machines will be able to meet these grand challenge requirements, and we outline the computer science and algorithm research challenges ahead.
Parallel dispatch: a new paradigm of electrical power system dispatch
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Jun Jason; Wang, Fei-Yue; Wang, Qiang
Modern power systems are evolving into sociotechnical systems with massive complexity, whose real-time operation and dispatch go beyond human capability. Thus, the need for developing and applying new intelligent power system dispatch tools are of great practical significance. In this paper, we introduce the overall business model of power system dispatch, the top level design approach of an intelligent dispatch system, and the parallel intelligent technology with its dispatch applications. We expect that a new dispatch paradigm, namely the parallel dispatch, can be established by incorporating various intelligent technologies, especially the parallel intelligent technology, to enable secure operation of complexmore » power grids, extend system operators U+02BC capabilities, suggest optimal dispatch strategies, and to provide decision-making recommendations according to power system operational goals.« less
Data and Workflow Management Challenges in Global Adjoint Tomography
NASA Astrophysics Data System (ADS)
Lei, W.; Ruan, Y.; Smith, J. A.; Modrak, R. T.; Orsvuran, R.; Krischer, L.; Chen, Y.; Balasubramanian, V.; Hill, J.; Turilli, M.; Bozdag, E.; Lefebvre, M. P.; Jha, S.; Tromp, J.
2017-12-01
It is crucial to take the complete physics of wave propagation into account in seismic tomography to further improve the resolution of tomographic images. The adjoint method is an efficient way of incorporating 3D wave simulations in seismic tomography. However, global adjoint tomography is computationally expensive, requiring thousands of wavefield simulations and massive data processing. Through our collaboration with the Oak Ridge National Laboratory (ORNL) computing group and an allocation on Titan, ORNL's GPU-accelerated supercomputer, we are now performing our global inversions by assimilating waveform data from over 1,000 earthquakes. The first challenge we encountered is dealing with the sheer amount of seismic data. Data processing based on conventional data formats and processing tools (such as SAC), which are not designed for parallel systems, becomes our major bottleneck. To facilitate the data processing procedures, we designed the Adaptive Seismic Data Format (ASDF) and developed a set of Python-based processing tools to replace legacy FORTRAN-based software. These tools greatly enhance reproducibility and accountability while taking full advantage of highly parallel system and showing superior scaling on modern computational platforms. The second challenge is that the data processing workflow contains more than 10 sub-procedures, making it delicate to handle and prone to human mistakes. To reduce human intervention as much as possible, we are developing a framework specifically designed for seismic inversion based on the state-of-the art workflow management research, specifically the Ensemble Toolkit (EnTK), in collaboration with the RADICAL team from Rutgers University. Using the initial developments of the EnTK, we are able to utilize the full computing power of the data processing cluster RHEA at ORNL while keeping human interaction to a minimum and greatly reducing the data processing time. Thanks to all the improvements, we are now able to perform iterations fast enough on more than a 1,000 earthquakes dataset. Starting from model GLAD-M15 (Bozdag et al., 2016), an elastic 3D model with a transversely isotropic upper mantle, we have successfully performed 5 iterations. Our goal is to finish 10 iterations, i.e., generating GLAD M25* by the end of this year.
ParaBTM: A Parallel Processing Framework for Biomedical Text Mining on Supercomputers.
Xing, Yuting; Wu, Chengkun; Yang, Xi; Wang, Wei; Zhu, En; Yin, Jianping
2018-04-27
A prevailing way of extracting valuable information from biomedical literature is to apply text mining methods on unstructured texts. However, the massive amount of literature that needs to be analyzed poses a big data challenge to the processing efficiency of text mining. In this paper, we address this challenge by introducing parallel processing on a supercomputer. We developed paraBTM, a runnable framework that enables parallel text mining on the Tianhe-2 supercomputer. It employs a low-cost yet effective load balancing strategy to maximize the efficiency of parallel processing. We evaluated the performance of paraBTM on several datasets, utilizing three types of named entity recognition tasks as demonstration. Results show that, in most cases, the processing efficiency can be greatly improved with parallel processing, and the proposed load balancing strategy is simple and effective. In addition, our framework can be readily applied to other tasks of biomedical text mining besides NER.
Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT)
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Dean N.
2011-07-20
This report summarizes work carried out by the Ultra-scale Visualization Climate Data Analysis Tools (UV-CDAT) Team for the period of January 1, 2011 through June 30, 2011. It discusses highlights, overall progress, period goals, and collaborations and lists papers and presentations. To learn more about our project, please visit our UV-CDAT website (URL: http://uv-cdat.org). This report will be forwarded to the program manager for the Department of Energy (DOE) Office of Biological and Environmental Research (BER), national and international collaborators and stakeholders, and to researchers working on a wide range of other climate model, reanalysis, and observation evaluation activities. Themore » UV-CDAT executive committee consists of Dean N. Williams of Lawrence Livermore National Laboratory (LLNL); Dave Bader and Galen Shipman of Oak Ridge National Laboratory (ORNL); Phil Jones and James Ahrens of Los Alamos National Laboratory (LANL), Claudio Silva of Polytechnic Institute of New York University (NYU-Poly); and Berk Geveci of Kitware, Inc. The UV-CDAT team consists of researchers and scientists with diverse domain knowledge whose home institutions also include the National Aeronautics and Space Administration (NASA) and the University of Utah. All work is accomplished under DOE open-source guidelines and in close collaboration with the project's stakeholders, domain researchers, and scientists. Working directly with BER climate science analysis projects, this consortium will develop and deploy data and computational resources useful to a wide variety of stakeholders, including scientists, policymakers, and the general public. Members of this consortium already collaborate with other institutions and universities in researching data discovery, management, visualization, workflow analysis, and provenance. The UV-CDAT team will address the following high-level visualization requirements: (1) Alternative parallel streaming statistics and analysis pipelines - Data parallelism, Task parallelism, Visualization parallelism; (2) Optimized parallel input/output (I/O); (3) Remote interactive execution; (4) Advanced intercomparison visualization; (5) Data provenance processing and capture; and (6) Interfaces for scientists - Workflow data analysis and visualization construction tools, and Visualization interfaces.« less
Kjaergaard, Thomas; Baudin, Pablo; Bykov, Dmytro; ...
2016-11-16
Here, we present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide–Expand–Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide–Expand–Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalabilitymore » of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the “resolution of the identity second-order Moller–Plesset perturbation theory” (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.« less
GROMACS 4.5: a high-throughput and highly parallel open source molecular simulation toolkit
Pronk, Sander; Páll, Szilárd; Schulz, Roland; Larsson, Per; Bjelkmar, Pär; Apostolov, Rossen; Shirts, Michael R.; Smith, Jeremy C.; Kasson, Peter M.; van der Spoel, David; Hess, Berk; Lindahl, Erik
2013-01-01
Motivation: Molecular simulation has historically been a low-throughput technique, but faster computers and increasing amounts of genomic and structural data are changing this by enabling large-scale automated simulation of, for instance, many conformers or mutants of biomolecules with or without a range of ligands. At the same time, advances in performance and scaling now make it possible to model complex biomolecular interaction and function in a manner directly testable by experiment. These applications share a need for fast and efficient software that can be deployed on massive scale in clusters, web servers, distributed computing or cloud resources. Results: Here, we present a range of new simulation algorithms and features developed during the past 4 years, leading up to the GROMACS 4.5 software package. The software now automatically handles wide classes of biomolecules, such as proteins, nucleic acids and lipids, and comes with all commonly used force fields for these molecules built-in. GROMACS supports several implicit solvent models, as well as new free-energy algorithms, and the software now uses multithreading for efficient parallelization even on low-end systems, including windows-based workstations. Together with hand-tuned assembly kernels and state-of-the-art parallelization, this provides extremely high performance and cost efficiency for high-throughput as well as massively parallel simulations. Availability: GROMACS is an open source and free software available from http://www.gromacs.org. Contact: erik.lindahl@scilifelab.se Supplementary information: Supplementary data are available at Bioinformatics online. PMID:23407358
Design of multiple sequence alignment algorithms on parallel, distributed memory supercomputers.
Church, Philip C; Goscinski, Andrzej; Holt, Kathryn; Inouye, Michael; Ghoting, Amol; Makarychev, Konstantin; Reumann, Matthias
2011-01-01
The challenge of comparing two or more genomes that have undergone recombination and substantial amounts of segmental loss and gain has recently been addressed for small numbers of genomes. However, datasets of hundreds of genomes are now common and their sizes will only increase in the future. Multiple sequence alignment of hundreds of genomes remains an intractable problem due to quadratic increases in compute time and memory footprint. To date, most alignment algorithms are designed for commodity clusters without parallelism. Hence, we propose the design of a multiple sequence alignment algorithm on massively parallel, distributed memory supercomputers to enable research into comparative genomics on large data sets. Following the methodology of the sequential progressiveMauve algorithm, we design data structures including sequences and sorted k-mer lists on the IBM Blue Gene/P supercomputer (BG/P). Preliminary results show that we can reduce the memory footprint so that we can potentially align over 250 bacterial genomes on a single BG/P compute node. We verify our results on a dataset of E.coli, Shigella and S.pneumoniae genomes. Our implementation returns results matching those of the original algorithm but in 1/2 the time and with 1/4 the memory footprint for scaffold building. In this study, we have laid the basis for multiple sequence alignment of large-scale datasets on a massively parallel, distributed memory supercomputer, thus enabling comparison of hundreds instead of a few genome sequences within reasonable time.
Sewage Reflects the Distriubtion of Human Faecal Lachnospiraceae
Faecal pollution contains a rich and diverse community of bacteria derived from animals and humans,many of which might serve as alternatives to the traditional enterococci and Escherichia coli faecal indicators. We used massively parallel sequencing (MPS)of the 16S rRNA gene to ...
Genetics Home Reference: medullary cystic kidney disease type 1
... They lead to the production of an altered protein. It is unclear how this change causes kidney disease. ... cystic kidney disease type 1 lie in a large VNTR in MUC1 missed by massively parallel sequencing. Nat Genet. 2013 Mar;45(3):299-303. ...
Large-scale enrichment and discovery of gene-associated SNPs
USDA-ARS?s Scientific Manuscript database
With the recent advent of massively parallel pyrosequencing by 454 Life Sciences it has become feasible to cost-effectively identify numerous single nucleotide polymorphisms (SNPs) within the recombinogenic regions of the maize (Zea mays L.) genome. We developed a modified version of hypomethylated...
Software Applications on the Peregrine System | High-Performance Computing
programming and optimization. Gaussian Chemistry Program for calculating molecular electronic structure and Materials Science Open-source classical molecular dynamics program designed for massively parallel systems framework Q-Chem Chemistry ab initio quantum chemistry package for predictin molecular structures
Flow cytometry for enrichment and titration in massively parallel DNA sequencing
Sandberg, Julia; Ståhl, Patrik L.; Ahmadian, Afshin; Bjursell, Magnus K.; Lundeberg, Joakim
2009-01-01
Massively parallel DNA sequencing is revolutionizing genomics research throughout the life sciences. However, the reagent costs and labor requirements in current sequencing protocols are still substantial, although improvements are continuously being made. Here, we demonstrate an effective alternative to existing sample titration protocols for the Roche/454 system using Fluorescence Activated Cell Sorting (FACS) technology to determine the optimal DNA-to-bead ratio prior to large-scale sequencing. Our method, which eliminates the need for the costly pilot sequencing of samples during titration is capable of rapidly providing accurate DNA-to-bead ratios that are not biased by the quantification and sedimentation steps included in current protocols. Moreover, we demonstrate that FACS sorting can be readily used to highly enrich fractions of beads carrying template DNA, with near total elimination of empty beads and no downstream sacrifice of DNA sequencing quality. Automated enrichment by FACS is a simple approach to obtain pure samples for bead-based sequencing systems, and offers an efficient, low-cost alternative to current enrichment protocols. PMID:19304748
2017-01-01
Amplicon (targeted) sequencing by massively parallel sequencing (PCR-MPS) is a potential method for use in forensic DNA analyses. In this application, PCR-MPS may supplement or replace other instrumental analysis methods such as capillary electrophoresis and Sanger sequencing for STR and mitochondrial DNA typing, respectively. PCR-MPS also may enable the expansion of forensic DNA analysis methods to include new marker systems such as single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) that currently are assayable using various instrumental analysis methods including microarray and quantitative PCR. Acceptance of PCR-MPS as a forensic method will depend in part upon developing protocols and criteria that define the limitations of a method, including a defensible analytical threshold or method detection limit. This paper describes an approach to establish objective analytical thresholds suitable for multiplexed PCR-MPS methods. A definition is proposed for PCR-MPS method background noise, and an analytical threshold based on background noise is described. PMID:28542338
Automation of a Wave-Optics Simulation and Image Post-Processing Package on Riptide
NASA Astrophysics Data System (ADS)
Werth, M.; Lucas, J.; Thompson, D.; Abercrombie, M.; Holmes, R.; Roggemann, M.
Detailed wave-optics simulations and image post-processing algorithms are computationally expensive and benefit from the massively parallel hardware available at supercomputing facilities. We created an automated system that interfaces with the Maui High Performance Computing Center (MHPCC) Distributed MATLAB® Portal interface to submit massively parallel waveoptics simulations to the IBM iDataPlex (Riptide) supercomputer. This system subsequently postprocesses the output images with an improved version of physically constrained iterative deconvolution (PCID) and analyzes the results using a series of modular algorithms written in Python. With this architecture, a single person can simulate thousands of unique scenarios and produce analyzed, archived, and briefing-compatible output products with very little effort. This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views, opinions, and/or findings expressed are those of the author(s) and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government.
Scudder, Nathan; McNevin, Dennis; Kelty, Sally F; Walsh, Simon J; Robertson, James
2018-03-01
Use of DNA in forensic science will be significantly influenced by new technology in coming years. Massively parallel sequencing and forensic genomics will hasten the broadening of forensic DNA analysis beyond short tandem repeats for identity towards a wider array of genetic markers, in applications as diverse as predictive phenotyping, ancestry assignment, and full mitochondrial genome analysis. With these new applications come a range of legal and policy implications, as forensic science touches on areas as diverse as 'big data', privacy and protected health information. Although these applications have the potential to make a more immediate and decisive forensic intelligence contribution to criminal investigations, they raise policy issues that will require detailed consideration if this potential is to be realised. The purpose of this paper is to identify the scope of the issues that will confront forensic and user communities. Copyright © 2017 The Chartered Society of Forensic Sciences. All rights reserved.
Young, Brian; King, Jonathan L; Budowle, Bruce; Armogida, Luigi
2017-01-01
Amplicon (targeted) sequencing by massively parallel sequencing (PCR-MPS) is a potential method for use in forensic DNA analyses. In this application, PCR-MPS may supplement or replace other instrumental analysis methods such as capillary electrophoresis and Sanger sequencing for STR and mitochondrial DNA typing, respectively. PCR-MPS also may enable the expansion of forensic DNA analysis methods to include new marker systems such as single nucleotide polymorphisms (SNPs) and insertion/deletions (indels) that currently are assayable using various instrumental analysis methods including microarray and quantitative PCR. Acceptance of PCR-MPS as a forensic method will depend in part upon developing protocols and criteria that define the limitations of a method, including a defensible analytical threshold or method detection limit. This paper describes an approach to establish objective analytical thresholds suitable for multiplexed PCR-MPS methods. A definition is proposed for PCR-MPS method background noise, and an analytical threshold based on background noise is described.
Massively parallel de novo protein design for targeted therapeutics.
Chevalier, Aaron; Silva, Daniel-Adriano; Rocklin, Gabriel J; Hicks, Derrick R; Vergara, Renan; Murapa, Patience; Bernard, Steffen M; Zhang, Lu; Lam, Kwok-Ho; Yao, Guorui; Bahl, Christopher D; Miyashita, Shin-Ichiro; Goreshnik, Inna; Fuller, James T; Koday, Merika T; Jenkins, Cody M; Colvin, Tom; Carter, Lauren; Bohn, Alan; Bryan, Cassie M; Fernández-Velasco, D Alejandro; Stewart, Lance; Dong, Min; Huang, Xuhui; Jin, Rongsheng; Wilson, Ian A; Fuller, Deborah H; Baker, David
2017-10-05
De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37-43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing.
Massively parallel de novo protein design for targeted therapeutics
NASA Astrophysics Data System (ADS)
Chevalier, Aaron; Silva, Daniel-Adriano; Rocklin, Gabriel J.; Hicks, Derrick R.; Vergara, Renan; Murapa, Patience; Bernard, Steffen M.; Zhang, Lu; Lam, Kwok-Ho; Yao, Guorui; Bahl, Christopher D.; Miyashita, Shin-Ichiro; Goreshnik, Inna; Fuller, James T.; Koday, Merika T.; Jenkins, Cody M.; Colvin, Tom; Carter, Lauren; Bohn, Alan; Bryan, Cassie M.; Fernández-Velasco, D. Alejandro; Stewart, Lance; Dong, Min; Huang, Xuhui; Jin, Rongsheng; Wilson, Ian A.; Fuller, Deborah H.; Baker, David
2017-10-01
De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37-43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing.
Buenrostro, Jason D.; Chircus, Lauren M.; Araya, Carlos L.; Layton, Curtis J.; Chang, Howard Y.; Snyder, Michael P.; Greenleaf, William J.
2015-01-01
RNA-protein interactions drive fundamental biological processes and are targets for molecular engineering, yet quantitative and comprehensive understanding of the sequence determinants of affinity remains limited. Here we repurpose a high-throughput sequencing instrument to quantitatively measure binding and dissociation of MS2 coat protein to >107 RNA targets generated on a flow-cell surface by in situ transcription and inter-molecular tethering of RNA to DNA. We decompose the binding energy contributions from primary and secondary RNA structure, finding that differences in affinity are often driven by sequence-specific changes in association rates. By analyzing the biophysical constraints and modeling mutational paths describing the molecular evolution of MS2 from low- to high-affinity hairpins, we quantify widespread molecular epistasis, and a long-hypothesized structure-dependent preference for G:U base pairs over C:A intermediates in evolutionary trajectories. Our results suggest that quantitative analysis of RNA on a massively parallel array (RNAMaP) relationships across molecular variants. PMID:24727714
Massively parallel de novo protein design for targeted therapeutics
Chevalier, Aaron; Silva, Daniel-Adriano; Rocklin, Gabriel J.; Hicks, Derrick R.; Vergara, Renan; Murapa, Patience; Bernard, Steffen M.; Zhang, Lu; Lam, Kwok-Ho; Yao, Guorui; Bahl, Christopher D.; Miyashita, Shin-Ichiro; Goreshnik, Inna; Fuller, James T.; Koday, Merika T.; Jenkins, Cody M.; Colvin, Tom; Carter, Lauren; Bohn, Alan; Bryan, Cassie M.; Fernández-Velasco, D. Alejandro; Stewart, Lance; Dong, Min; Huang, Xuhui; Jin, Rongsheng; Wilson, Ian A.; Fuller, Deborah H.; Baker, David
2018-01-01
De novo protein design holds promise for creating small stable proteins with shapes customized to bind therapeutic targets. We describe a massively parallel approach for designing, manufacturing and screening mini-protein binders, integrating large-scale computational design, oligonucleotide synthesis, yeast display screening and next-generation sequencing. We designed and tested 22,660 mini-proteins of 37–43 residues that target influenza haemagglutinin and botulinum neurotoxin B, along with 6,286 control sequences to probe contributions to folding and binding, and identified 2,618 high-affinity binders. Comparison of the binding and non-binding design sets, which are two orders of magnitude larger than any previously investigated, enabled the evaluation and improvement of the computational model. Biophysical characterization of a subset of the binder designs showed that they are extremely stable and, unlike antibodies, do not lose activity after exposure to high temperatures. The designs elicit little or no immune response and provide potent prophylactic and therapeutic protection against influenza, even after extensive repeated dosing. PMID:28953867
Visualization of Pulsar Search Data
NASA Astrophysics Data System (ADS)
Foster, R. S.; Wolszczan, A.
1993-05-01
The search for periodic signals from rotating neutron stars or pulsars has been a computationally taxing problem to astronomers for more than twenty-five years. Over this time interval, increases in computational capability have allowed ever more sensitive searches, covering a larger parameter space. The volume of input data and the general presence of radio frequency interference typically produce numerous spurious signals. Visualization of the search output and enhanced real-time processing of significant candidate events allow the pulsar searcher to optimally processes and search for new radio pulsars. The pulsar search algorithm and visualization system presented in this paper currently runs on serial RISC based workstations, a traditional vector based super computer, and a massively parallel computer. A description of the serial software algorithm and its modifications for massively parallel computing are describe. The results of four successive searches for millisecond period radio pulsars using the Arecibo telescope at 430 MHz have resulted in the successful detection of new long-period and millisecond period radio pulsars.
Bailey, Jeffrey A; Mvalo, Tisungane; Aragam, Nagesh; Weiser, Matthew; Congdon, Seth; Kamwendo, Debbie; Martinson, Francis; Hoffman, Irving; Meshnick, Steven R; Juliano, Jonathan J
2012-08-15
The development of an effective malaria vaccine has been hampered by the genetic diversity of commonly used target antigens. This diversity has led to concerns about allele-specific immunity limiting the effectiveness of vaccines. Despite extensive genetic diversity of circumsporozoite protein (CS), the most successful malaria vaccine is RTS/S, a monovalent CS vaccine. By use of massively parallel pyrosequencing, we evaluated the diversity of CS haplotypes across the T-cell epitopes in parasites from Lilongwe, Malawi. We identified 57 unique parasite haplotypes from 100 participants. By use of ecological and molecular indexes of diversity, we saw no difference in the diversity of CS haplotypes between adults and children. We saw evidence of weak variant-specific selection within this region of CS, suggesting naturally acquired immunity does induce variant-specific selection on CS. Therefore, the impact of CS vaccines on variant frequencies with widespread implementation of vaccination requires further study.
Drögemüller, Cord; Tetens, Jens; Sigurdsson, Snaevar; Gentile, Arcangelo; Testoni, Stefania; Lindblad-Toh, Kerstin; Leeb, Tosso
2010-01-01
Arachnomelia is a monogenic recessive defect of skeletal development in cattle. The causative mutation was previously mapped to a ∼7 Mb interval on chromosome 5. Here we show that array-based sequence capture and massively parallel sequencing technology, combined with the typical family structure in livestock populations, facilitates the identification of the causative mutation. We re-sequenced the entire critical interval in a healthy partially inbred cow carrying one copy of the critical chromosome segment in its ancestral state and one copy of the same segment with the arachnomelia mutation, and we detected a single heterozygous position. The genetic makeup of several partially inbred cattle provides extremely strong support for the causality of this mutation. The mutation represents a single base insertion leading to a premature stop codon in the coding sequence of the SUOX gene and is perfectly associated with the arachnomelia phenotype. Our findings suggest an important role for sulfite oxidase in bone development. PMID:20865119
NASA Astrophysics Data System (ADS)
Brockmann, J. M.; Schuh, W.-D.
2011-07-01
The estimation of the global Earth's gravity field parametrized as a finite spherical harmonic series is computationally demanding. The computational effort depends on the one hand on the maximal resolution of the spherical harmonic expansion (i.e. the number of parameters to be estimated) and on the other hand on the number of observations (which are several millions for e.g. observations from the GOCE satellite missions). To circumvent these restrictions, a massive parallel software based on high-performance computing (HPC) libraries as ScaLAPACK, PBLAS and BLACS was designed in the context of GOCE HPF WP6000 and the GOCO consortium. A prerequisite for the use of these libraries is that all matrices are block-cyclic distributed on a processor grid comprised by a large number of (distributed memory) computers. Using this set of standard HPC libraries has the benefit that once the matrices are distributed across the computer cluster, a huge set of efficient and highly scalable linear algebra operations can be used.
Track finding in ATLAS using GPUs
NASA Astrophysics Data System (ADS)
Mattmann, J.; Schmitt, C.
2012-12-01
The reconstruction and simulation of collision events is a major task in modern HEP experiments involving several ten thousands of standard CPUs. On the other hand the graphics processors (GPUs) have become much more powerful and are by far outperforming the standard CPUs in terms of floating point operations due to their massive parallel approach. The usage of these GPUs could therefore significantly reduce the overall reconstruction time per event or allow for the usage of more sophisticated algorithms. In this paper the track finding in the ATLAS experiment will be used as an example on how the GPUs can be used in this context: the implementation on the GPU requires a change in the algorithmic flow to allow the code to work in the rather limited environment on the GPU in terms of memory, cache, and transfer speed from and to the GPU and to make use of the massive parallel computation. Both, the specific implementation of parts of the ATLAS track reconstruction chain and the performance improvements obtained will be discussed.
WebViz:A Web-based Collaborative Interactive Visualization System for large-Scale Data Sets
NASA Astrophysics Data System (ADS)
Yuen, D. A.; McArthur, E.; Weiss, R. M.; Zhou, J.; Yao, B.
2010-12-01
WebViz is a web-based application designed to conduct collaborative, interactive visualizations of large data sets for multiple users, allowing researchers situated all over the world to utilize the visualization services offered by the University of Minnesota’s Laboratory for Computational Sciences and Engineering (LCSE). This ongoing project has been built upon over the last 3 1/2 years .The motivation behind WebViz lies primarily with the need to parse through an increasing amount of data produced by the scientific community as a result of larger and faster multicore and massively parallel computers coming to the market, including the use of general purpose GPU computing. WebViz allows these large data sets to be visualized online by anyone with an account. The application allows users to save time and resources by visualizing data ‘on the fly’, wherever he or she may be located. By leveraging AJAX via the Google Web Toolkit (http://code.google.com/webtoolkit/), we are able to provide users with a remote, web portal to LCSE's (http://www.lcse.umn.edu) large-scale interactive visualization system already in place at the University of Minnesota. LCSE’s custom hierarchical volume rendering software provides high resolution visualizations on the order of 15 million pixels and has been employed for visualizing data primarily from simulations in astrophysics to geophysical fluid dynamics . In the current version of WebViz, we have implemented a highly extensible back-end framework built around HTTP "server push" technology. The web application is accessible via a variety of devices including netbooks, iPhones, and other web and javascript-enabled cell phones. Features in the current version include the ability for users to (1) securely login (2) launch multiple visualizations (3) conduct collaborative visualization sessions (4) delegate control aspects of a visualization to others and (5) engage in collaborative chats with other users within the user interface of the web application. These features are all in addition to a full range of essential visualization functions including 3-D camera and object orientation, position manipulation, time-stepping control, and custom color/alpha mapping.
PoPLAR: Portal for Petascale Lifescience Applications and Research
2013-01-01
Background We are focusing specifically on fast data analysis and retrieval in bioinformatics that will have a direct impact on the quality of human health and the environment. The exponential growth of data generated in biology research, from small atoms to big ecosystems, necessitates an increasingly large computational component to perform analyses. Novel DNA sequencing technologies and complementary high-throughput approaches--such as proteomics, genomics, metabolomics, and meta-genomics--drive data-intensive bioinformatics. While individual research centers or universities could once provide for these applications, this is no longer the case. Today, only specialized national centers can deliver the level of computing resources required to meet the challenges posed by rapid data growth and the resulting computational demand. Consequently, we are developing massively parallel applications to analyze the growing flood of biological data and contribute to the rapid discovery of novel knowledge. Methods The efforts of previous National Science Foundation (NSF) projects provided for the generation of parallel modules for widely used bioinformatics applications on the Kraken supercomputer. We have profiled and optimized the code of some of the scientific community's most widely used desktop and small-cluster-based applications, including BLAST from the National Center for Biotechnology Information (NCBI), HMMER, and MUSCLE; scaled them to tens of thousands of cores on high-performance computing (HPC) architectures; made them robust and portable to next-generation architectures; and incorporated these parallel applications in science gateways with a web-based portal. Results This paper will discuss the various developmental stages, challenges, and solutions involved in taking bioinformatics applications from the desktop to petascale with a front-end portal for very-large-scale data analysis in the life sciences. Conclusions This research will help to bridge the gap between the rate of data generation and the speed at which scientists can study this data. The ability to rapidly analyze data at such a large scale is having a significant, direct impact on science achieved by collaborators who are currently using these tools on supercomputers. PMID:23902523
Karasick, M.S.; Strip, D.R.
1996-01-30
A parallel computing system is described that comprises a plurality of uniquely labeled, parallel processors, each processor capable of modeling a three-dimensional object that includes a plurality of vertices, faces and edges. The system comprises a front-end processor for issuing a modeling command to the parallel processors, relating to a three-dimensional object. Each parallel processor, in response to the command and through the use of its own unique label, creates a directed-edge (d-edge) data structure that uniquely relates an edge of the three-dimensional object to one face of the object. Each d-edge data structure at least includes vertex descriptions of the edge and a description of the one face. As a result, each processor, in response to the modeling command, operates upon a small component of the model and generates results, in parallel with all other processors, without the need for processor-to-processor intercommunication. 8 figs.
Parallel k-means++ for Multiple Shared-Memory Architectures
DOE Office of Scientific and Technical Information (OSTI.GOV)
Mackey, Patrick S.; Lewis, Robert R.
2016-09-22
In recent years k-means++ has become a popular initialization technique for improved k-means clustering. To date, most of the work done to improve its performance has involved parallelizing algorithms that are only approximations of k-means++. In this paper we present a parallelization of the exact k-means++ algorithm, with a proof of its correctness. We develop implementations for three distinct shared-memory architectures: multicore CPU, high performance GPU, and the massively multithreaded Cray XMT platform. We demonstrate the scalability of the algorithm on each platform. In addition we present a visual approach for showing which platform performed k-means++ the fastest for varyingmore » data sizes.« less
Password Cracking Using Sony Playstations
NASA Astrophysics Data System (ADS)
Kleinhans, Hugo; Butts, Jonathan; Shenoi, Sujeet
Law enforcement agencies frequently encounter encrypted digital evidence for which the cryptographic keys are unknown or unavailable. Password cracking - whether it employs brute force or sophisticated cryptanalytic techniques - requires massive computational resources. This paper evaluates the benefits of using the Sony PlayStation 3 (PS3) to crack passwords. The PS3 offers massive computational power at relatively low cost. Moreover, multiple PS3 systems can be introduced easily to expand parallel processing when additional power is needed. This paper also describes a distributed framework designed to enable law enforcement agents to crack encrypted archives and applications in an efficient and cost-effective manner.
A massively asynchronous, parallel brain
Zeki, Semir
2015-01-01
Whether the visual brain uses a parallel or a serial, hierarchical, strategy to process visual signals, the end result appears to be that different attributes of the visual scene are perceived asynchronously—with colour leading form (orientation) by 40 ms and direction of motion by about 80 ms. Whatever the neural root of this asynchrony, it creates a problem that has not been properly addressed, namely how visual attributes that are perceived asynchronously over brief time windows after stimulus onset are bound together in the longer term to give us a unified experience of the visual world, in which all attributes are apparently seen in perfect registration. In this review, I suggest that there is no central neural clock in the (visual) brain that synchronizes the activity of different processing systems. More likely, activity in each of the parallel processing-perceptual systems of the visual brain is reset independently, making of the brain a massively asynchronous organ, just like the new generation of more efficient computers promise to be. Given the asynchronous operations of the brain, it is likely that the results of activities in the different processing-perceptual systems are not bound by physiological interactions between cells in the specialized visual areas, but post-perceptually, outside the visual brain. PMID:25823871
PREPping Students for Authentic Science
ERIC Educational Resources Information Center
Dolan, Erin L.; Lally, David J.; Brooks, Eric; Tax, Frans E.
2008-01-01
In this article, the authors describe a large-scale research collaboration, the Partnership for Research and Education in Plants (PREP), which has capitalized on publicly available databases that contain massive amounts of biological information; stock centers that house and distribute inexpensive organisms with different genotypes; and the…
NASA Technical Reports Server (NTRS)
Dorband, John E.
1987-01-01
Generating graphics to faithfully represent information can be a computationally intensive task. A way of using the Massively Parallel Processor to generate images by ray tracing is presented. This technique uses sort computation, a method of performing generalized routing interspersed with computation on a single-instruction-multiple-data (SIMD) computer.
Multiplexed microsatellite recovery using massively parallel sequencing
T.N. Jennings; B.J. Knaus; T.D. Mullins; S.M. Haig; R.C. Cronn
2011-01-01
Conservation and management of natural populations requires accurate and inexpensive genotyping methods. Traditional microsatellite, or simple sequence repeat (SSR), marker analysis remains a popular genotyping method because of the comparatively low cost of marker development, ease of analysis and high power of genotype discrimination. With the availability of...
DNA methylation profiling using HpaII tiny fragment enrichment by ligation-mediated PCR (HELP)
Suzuki, Masako; Greally, John M.
2010-01-01
The HELP assay is a technique that allows genome-wide analysis of cytosine methylation. Here we describe the assay, its relative strengths and weaknesses, and the transition of the assay from a microarray to massively-parallel sequencing-based foundation. PMID:20434563
Genetics Home Reference: Ochoa syndrome
... Other researchers believe that a defective heparanase 2 protein may lead to problems with the development of the urinary tract or with muscle ... Peng W, Xu J, Li J, Owens KM, Bloom D, Innis JW. Exome capture and massively parallel sequencing identifies a novel HPSE2 mutation in a Saudi ...
USDA-ARS?s Scientific Manuscript database
Flesh flies in the genus Sarcophaga are important models for investigating endocrinology, diapause, cold hardiness, reproduction, and immunity. Despite the prominence of Sarcophaga flesh flies as models for insect physiology and biochemistry, and in forensic studies, little genomic or transcriptom...
Medical applications for high-performance computers in SKIF-GRID network.
Zhuchkov, Alexey; Tverdokhlebov, Nikolay
2009-01-01
The paper presents a set of software services for massive mammography image processing by using high-performance parallel computers of SKIF-family which are linked into a service-oriented grid-network. An experience of a prototype system implementation in two medical institutions is also described.
Data decomposition method for parallel polygon rasterization considering load balancing
NASA Astrophysics Data System (ADS)
Zhou, Chen; Chen, Zhenjie; Liu, Yongxue; Li, Feixue; Cheng, Liang; Zhu, A.-xing; Li, Manchun
2015-12-01
It is essential to adopt parallel computing technology to rapidly rasterize massive polygon data. In parallel rasterization, it is difficult to design an effective data decomposition method. Conventional methods ignore load balancing of polygon complexity in parallel rasterization and thus fail to achieve high parallel efficiency. In this paper, a novel data decomposition method based on polygon complexity (DMPC) is proposed. First, four factors that possibly affect the rasterization efficiency were investigated. Then, a metric represented by the boundary number and raster pixel number in the minimum bounding rectangle was developed to calculate the complexity of each polygon. Using this metric, polygons were rationally allocated according to the polygon complexity, and each process could achieve balanced loads of polygon complexity. To validate the efficiency of DMPC, it was used to parallelize different polygon rasterization algorithms and tested on different datasets. Experimental results showed that DMPC could effectively parallelize polygon rasterization algorithms. Furthermore, the implemented parallel algorithms with DMPC could achieve good speedup ratios of at least 15.69 and generally outperformed conventional decomposition methods in terms of parallel efficiency and load balancing. In addition, the results showed that DMPC exhibited consistently better performance for different spatial distributions of polygons.
Spring, Hannah
2016-03-01
In a wired, virtual and information rich society, MOOCs (Massive Open Online Courses) are leading us into a brave new world in which their key role is to support lifelong networked learning. This feature looks at the broad role of MOOCs and considers them within the context of health, and health librarianship. In particular, it provides examples of where health librarians have developed MOOCs and what opportunities there are in the future for health librarians to collaborate in the development and delivery of health MOOCs. H.S. © 2016 Health Libraries Group.
Substructured multibody molecular dynamics.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Grest, Gary Stephen; Stevens, Mark Jackson; Plimpton, Steven James
2006-11-01
We have enhanced our parallel molecular dynamics (MD) simulation software LAMMPS (Large-scale Atomic/Molecular Massively Parallel Simulator, lammps.sandia.gov) to include many new features for accelerated simulation including articulated rigid body dynamics via coupling to the Rensselaer Polytechnic Institute code POEMS (Parallelizable Open-source Efficient Multibody Software). We use new features of the LAMMPS software package to investigate rhodopsin photoisomerization, and water model surface tension and capillary waves at the vapor-liquid interface. Finally, we motivate the recipes of MD for practitioners and researchers in numerical analysis and computational mechanics.
High Performance Programming Using Explicit Shared Memory Model on the Cray T3D
NASA Technical Reports Server (NTRS)
Saini, Subhash; Simon, Horst D.; Lasinski, T. A. (Technical Monitor)
1994-01-01
The Cray T3D is the first-phase system in Cray Research Inc.'s (CRI) three-phase massively parallel processing program. In this report we describe the architecture of the T3D, as well as the CRAFT (Cray Research Adaptive Fortran) programming model, and contrast it with PVM, which is also supported on the T3D We present some performance data based on the NAS Parallel Benchmarks to illustrate both architectural and software features of the T3D.
Faculty and Community Collaboration in Sustained Community-University Engagement Partnerships
ERIC Educational Resources Information Center
Allen, Angela Danyell
2009-01-01
This dissertation is a qualitative case study of the factors of collaboration between faculty and community partners in sustained community-university engagement partnerships at a public research university in the Midwest. Based on secondary data from an annual, online, mixed-method survey of faculty-reported engagement activity, parallel yet…
Tuning and History: A Personal Overview
ERIC Educational Resources Information Center
Isaacs, Ann Katherine
2017-01-01
The text places Tuning History in the context of the rapidly developing international collaboration among historians which began in Europe in 1989, with the ECTS Pilot project, and continued, from 2000 on, with the European History Networks (for research and for curriculum development) working in parallel and in collaboration with Tuning, in…
Student Teachers' Team Teaching during Field Experiences: An Evaluation by Their Mentors
ERIC Educational Resources Information Center
Simons, Mathea; Baeten, Marlies
2016-01-01
Since collaboration within schools gains importance, teacher educators are looking for alternative models of field experience inspired by collaborative learning. Team teaching is such a model. This study explores two team teaching models (parallel and sequential teaching) by investigating the mentors' perspective. Semi-structured interviews were…
NASA Technical Reports Server (NTRS)
Sohn, Andrew; Biswas, Rupak
1996-01-01
Solving the hard Satisfiability Problem is time consuming even for modest-sized problem instances. Solving the Random L-SAT Problem is especially difficult due to the ratio of clauses to variables. This report presents a parallel synchronous simulated annealing method for solving the Random L-SAT Problem on a large-scale distributed-memory multiprocessor. In particular, we use a parallel synchronous simulated annealing procedure, called Generalized Speculative Computation, which guarantees the same decision sequence as sequential simulated annealing. To demonstrate the performance of the parallel method, we have selected problem instances varying in size from 100-variables/425-clauses to 5000-variables/21,250-clauses. Experimental results on the AP1000 multiprocessor indicate that our approach can satisfy 99.9 percent of the clauses while giving almost a 70-fold speedup on 500 processors.
Quakefinder: A scalable data mining system for detecting earthquakes from space
DOE Office of Scientific and Technical Information (OSTI.GOV)
Stolorz, P.; Dean, C.
1996-12-31
We present an application of novel massively parallel datamining techniques to highly precise inference of important physical processes from remote sensing imagery. Specifically, we have developed and applied a system, Quakefinder, that automatically detects and measures tectonic activity in the earth`s crust by examination of satellite data. We have used Quakefinder to automatically map the direction and magnitude of ground displacements due to the 1992 Landers earthquake in Southern California, over a spatial region of several hundred square kilometers, at a resolution of 10 meters, to a (sub-pixel) precision of 1 meter. This is the first calculation that has evermore » been able to extract area-mapped information about 2D tectonic processes at this level of detail. We outline the architecture of the Quakefinder system, based upon a combination of techniques drawn from the fields of statistical inference, massively parallel computing and global optimization. We confirm the overall correctness of the procedure by comparison of our results with known locations of targeted faults obtained by careful and time-consuming field measurements. The system also performs knowledge discovery by indicating novel unexplained tectonic activity away from the primary faults that has never before been observed. We conclude by discussing the future potential of this data mining system in the broad context of studying subtle spatio-temporal processes within massive image streams.« less
Lost in Interaction in IMS Learning Design Runtime Environments
ERIC Educational Resources Information Center
Derntl, Michael; Neumann, Susanne; Oberhuemer, Petra
2014-01-01
Educators are exploiting the advantages of advanced web-based collaboration technologies and massive online interactions. Interactions between learners and human or nonhuman resources therefore play an increasingly important pedagogical role, and the way these interactions are expressed in the user interface of virtual learning environments is…
Leveraging human oversight and intervention in large-scale parallel processing of open-source data
NASA Astrophysics Data System (ADS)
Casini, Enrico; Suri, Niranjan; Bradshaw, Jeffrey M.
2015-05-01
The popularity of cloud computing along with the increased availability of cheap storage have led to the necessity of elaboration and transformation of large volumes of open-source data, all in parallel. One way to handle such extensive volumes of information properly is to take advantage of distributed computing frameworks like Map-Reduce. Unfortunately, an entirely automated approach that excludes human intervention is often unpredictable and error prone. Highly accurate data processing and decision-making can be achieved by supporting an automatic process through human collaboration, in a variety of environments such as warfare, cyber security and threat monitoring. Although this mutual participation seems easily exploitable, human-machine collaboration in the field of data analysis presents several challenges. First, due to the asynchronous nature of human intervention, it is necessary to verify that once a correction is made, all the necessary reprocessing is done in chain. Second, it is often needed to minimize the amount of reprocessing in order to optimize the usage of resources due to limited availability. In order to improve on these strict requirements, this paper introduces improvements to an innovative approach for human-machine collaboration in the processing of large amounts of open-source data in parallel.
Tumor Genomic Profiling in Breast Cancer Patients Using Targeted Massively Parallel Sequencing
2015-04-30
recently, we identified several novel alterations in in ER+ breast tumors, including translocations in ESR1 , the gene that encodes the estrogen receptor...modified our bait design to include genomic coordinates across select introns in ESR1 . In addition, two recent papers from the Broad Institute published
Associative Networks on a Massively Parallel Computer.
1985-10-01
lgbt (as a group of numbers, in this case), but this only leads to sensible queries when a statistical function is applied: "What is the largest salary...34.*"* . •.,. 64 the siW~pe operations being used during ascend, each movement step costs the same as executing an operation
NASA Astrophysics Data System (ADS)
Gil-Jaurena, Inés; Domínguez, Daniel
2018-03-01
This article analyses the challenges teachers face when entering a digital and open online environment in higher education. Massive open online courses (MOOCs) have become a popular phenomenon, making online learning more visible in the educational agenda; therefore, it is appropriate to analyse their expansion and diversification to help inform the next generation of courses. In this article, MOOCs are contextualised in a historical and wider approach to online education, building upon lessons learned from open and distance education, and exploring the introduction of technologies in providing higher education to massive populations over the past 45 years. In particular, the research study presented in this article used the open scholarship approach to analyse many of the changes that can occur in teaching when an open context applies, as in the case of MOOCs. Taking into account that a collaborative online learning experience is influenced by the simultaneous presence and overlap of cognitive, social and teaching elements, the study also used the community of inquiry model as a theoretical framework. In the study, 24 teachers (from the Universidad Nacional de Educación a Distancia [UNED] in Madrid, Spain) were surveyed about their experiences of MOOCs in terms of their current tasks, and the main changes they have observed compared to teaching in a more traditional electronic learning (e-learning) environment (at both graduate and postgraduate levels). These changes in roles, as well as teachers' views about the impact of "massiveness" and "openness" on their understanding and teaching practice, are presented and analysed. Finally, the article also discusses how the evolution towards adapted learning, collaborative learning and assessment supported by technical tools, for example, was already in progress at UNED before MOOCs were initiated.
On the suitability of the connection machine for direct particle simulation
NASA Technical Reports Server (NTRS)
Dagum, Leonard
1990-01-01
The algorithmic structure was examined of the vectorizable Stanford particle simulation (SPS) method and the structure is reformulated in data parallel form. Some of the SPS algorithms can be directly translated to data parallel, but several of the vectorizable algorithms have no direct data parallel equivalent. This requires the development of new, strictly data parallel algorithms. In particular, a new sorting algorithm is developed to identify collision candidates in the simulation and a master/slave algorithm is developed to minimize communication cost in large table look up. Validation of the method is undertaken through test calculations for thermal relaxation of a gas, shock wave profiles, and shock reflection from a stationary wall. A qualitative measure is provided of the performance of the Connection Machine for direct particle simulation. The massively parallel architecture of the Connection Machine is found quite suitable for this type of calculation. However, there are difficulties in taking full advantage of this architecture because of lack of a broad based tradition of data parallel programming. An important outcome of this work has been new data parallel algorithms specifically of use for direct particle simulation but which also expand the data parallel diction.
Private Cloud Communities for Faculty and Students
ERIC Educational Resources Information Center
Tomal, Daniel R.; Grant, Cynthia
2015-01-01
Massive open online courses (MOOCs) and public and private cloud communities continue to flourish in the field of higher education. However, MOOCs have received criticism in recent years and offer little benefit to students already enrolled at an institution. This article advocates for the collaborative creation and use of institutional, program…
The Semiotic Ecology and Linguistic Complexity of an Online Game World
ERIC Educational Resources Information Center
Thorne, Steven L.; Fischer, Ingrid; Lu, Xiaofei
2012-01-01
Multiplayer online games form complex semiotic ecologies that include game-generated texts, player-to-player communication and collaboration, and associated websites that support in-game play. This article describes an exploratory study of the massively multiplayer online game (MMO) "World of Warcraft" ("WoW"), with specific…
Analysis and selection of optimal function implementations in massively parallel computer
Archer, Charles Jens [Rochester, MN; Peters, Amanda [Rochester, MN; Ratterman, Joseph D [Rochester, MN
2011-05-31
An apparatus, program product and method optimize the operation of a parallel computer system by, in part, collecting performance data for a set of implementations of a function capable of being executed on the parallel computer system based upon the execution of the set of implementations under varying input parameters in a plurality of input dimensions. The collected performance data may be used to generate selection program code that is configured to call selected implementations of the function in response to a call to the function under varying input parameters. The collected performance data may be used to perform more detailed analysis to ascertain the comparative performance of the set of implementations of the function under the varying input parameters.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Snyder, L.; Notkin, D.; Adams, L.
1990-03-31
This task relates to research on programming massively parallel computers. Previous work on the Ensamble concept of programming was extended and investigation into nonshared memory models of parallel computation was undertaken. Previous work on the Ensamble concept defined a set of programming abstractions and was used to organize the programming task into three distinct levels; Composition of machine instruction, composition of processes, and composition of phases. It was applied to shared memory models of computations. During the present research period, these concepts were extended to nonshared memory models. During the present research period, one Ph D. thesis was completed, onemore » book chapter, and six conference proceedings were published.« less
NASA Astrophysics Data System (ADS)
Liao, S.; Chen, L.; Li, J.; Xiong, W.; Wu, Q.
2015-07-01
Existing spatiotemporal database supports spatiotemporal aggregation query over massive moving objects datasets. Due to the large amounts of data and single-thread processing method, the query speed cannot meet the application requirements. On the other hand, the query efficiency is more sensitive to spatial variation then temporal variation. In this paper, we proposed a spatiotemporal aggregation query method using multi-thread parallel technique based on regional divison and implemented it on the server. Concretely, we divided the spatiotemporal domain into several spatiotemporal cubes, computed spatiotemporal aggregation on all cubes using the technique of multi-thread parallel processing, and then integrated the query results. By testing and analyzing on the real datasets, this method has improved the query speed significantly.
cellGPU: Massively parallel simulations of dynamic vertex models
NASA Astrophysics Data System (ADS)
Sussman, Daniel M.
2017-10-01
Vertex models represent confluent tissue by polygonal or polyhedral tilings of space, with the individual cells interacting via force laws that depend on both the geometry of the cells and the topology of the tessellation. This dependence on the connectivity of the cellular network introduces several complications to performing molecular-dynamics-like simulations of vertex models, and in particular makes parallelizing the simulations difficult. cellGPU addresses this difficulty and lays the foundation for massively parallelized, GPU-based simulations of these models. This article discusses its implementation for a pair of two-dimensional models, and compares the typical performance that can be expected between running cellGPU entirely on the CPU versus its performance when running on a range of commercial and server-grade graphics cards. By implementing the calculation of topological changes and forces on cells in a highly parallelizable fashion, cellGPU enables researchers to simulate time- and length-scales previously inaccessible via existing single-threaded CPU implementations. Program Files doi:http://dx.doi.org/10.17632/6j2cj29t3r.1 Licensing provisions: MIT Programming language: CUDA/C++ Nature of problem: Simulations of off-lattice "vertex models" of cells, in which the interaction forces depend on both the geometry and the topology of the cellular aggregate. Solution method: Highly parallelized GPU-accelerated dynamical simulations in which the force calculations and the topological features can be handled on either the CPU or GPU. Additional comments: The code is hosted at https://gitlab.com/dmsussman/cellGPU, with documentation additionally maintained at http://dmsussman.gitlab.io/cellGPUdocumentation
GPU COMPUTING FOR PARTICLE TRACKING
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nishimura, Hiroshi; Song, Kai; Muriki, Krishna
2011-03-25
This is a feasibility study of using a modern Graphics Processing Unit (GPU) to parallelize the accelerator particle tracking code. To demonstrate the massive parallelization features provided by GPU computing, a simplified TracyGPU program is developed for dynamic aperture calculation. Performances, issues, and challenges from introducing GPU are also discussed. General purpose Computation on Graphics Processing Units (GPGPU) bring massive parallel computing capabilities to numerical calculation. However, the unique architecture of GPU requires a comprehensive understanding of the hardware and programming model to be able to well optimize existing applications. In the field of accelerator physics, the dynamic aperture calculationmore » of a storage ring, which is often the most time consuming part of the accelerator modeling and simulation, can benefit from GPU due to its embarrassingly parallel feature, which fits well with the GPU programming model. In this paper, we use the Tesla C2050 GPU which consists of 14 multi-processois (MP) with 32 cores on each MP, therefore a total of 448 cores, to host thousands ot threads dynamically. Thread is a logical execution unit of the program on GPU. In the GPU programming model, threads are grouped into a collection of blocks Within each block, multiple threads share the same code, and up to 48 KB of shared memory. Multiple thread blocks form a grid, which is executed as a GPU kernel. A simplified code that is a subset of Tracy++ [2] is developed to demonstrate the possibility of using GPU to speed up the dynamic aperture calculation by having each thread track a particle.« less
Nagai, Moeto; Oohara, Kiyotaka; Kato, Keita; Kawashima, Takahiro; Shibata, Takayuki
2015-04-01
Parallel manipulation of single cells is important for reconstructing in vivo cellular microenvironments and studying cell functions. To manipulate single cells and reconstruct their environments, development of a versatile manipulation tool is necessary. In this study, we developed an array of hollow probes using microelectromechanical systems fabrication technology and demonstrated the manipulation of single cells. We conducted a cell aspiration experiment with a glass pipette and modeled a cell using a standard linear solid model, which provided information for designing hollow stepped probes for minimally invasive single-cell manipulation. We etched a silicon wafer on both sides and formed through holes with stepped structures. The inner diameters of the holes were reduced by SiO2 deposition of plasma-enhanced chemical vapor deposition to trap cells on the tips. This fabrication process makes it possible to control the wall thickness, inner diameter, and outer diameter of the probes. With the fabricated probes, single cells were manipulated and placed in microwells at a single-cell level in a parallel manner. We studied the capture, release, and survival rates of cells at different suction and release pressures and found that the cell trapping rate was directly proportional to the suction pressure, whereas the release rate and viability decreased with increasing the suction pressure. The proposed manipulation system makes it possible to place cells in a well array and observe the adherence, spreading, culture, and death of the cells. This system has potential as a tool for massively parallel manipulation and for three-dimensional hetero cellular assays.
GPU: the biggest key processor for AI and parallel processing
NASA Astrophysics Data System (ADS)
Baji, Toru
2017-07-01
Two types of processors exist in the market. One is the conventional CPU and the other is Graphic Processor Unit (GPU). Typical CPU is composed of 1 to 8 cores while GPU has thousands of cores. CPU is good for sequential processing, while GPU is good to accelerate software with heavy parallel executions. GPU was initially dedicated for 3D graphics. However from 2006, when GPU started to apply general-purpose cores, it was noticed that this architecture can be used as a general purpose massive-parallel processor. NVIDIA developed a software framework Compute Unified Device Architecture (CUDA) that make it possible to easily program the GPU for these application. With CUDA, GPU started to be used in workstations and supercomputers widely. Recently two key technologies are highlighted in the industry. The Artificial Intelligence (AI) and Autonomous Driving Cars. AI requires a massive parallel operation to train many-layers of neural networks. With CPU alone, it was impossible to finish the training in a practical time. The latest multi-GPU system with P100 makes it possible to finish the training in a few hours. For the autonomous driving cars, TOPS class of performance is required to implement perception, localization, path planning processing and again SoC with integrated GPU will play a key role there. In this paper, the evolution of the GPU which is one of the biggest commercial devices requiring state-of-the-art fabrication technology will be introduced. Also overview of the GPU demanding key application like the ones described above will be introduced.
Classroom management of situated group learning: A research study of two teaching strategies
NASA Astrophysics Data System (ADS)
Smeh, Kathy; Fawns, Rod
2000-06-01
Although peer-based work is encouraged by theories in developmental psychology and although classroom interventions suggest it is effective, there are grounds for recognising that young pupils find collaborative learning hard to sustain. Discontinuities in collaborative skill during development have been suggested as one interpretation. Theory and research have neglected situational continuities that the teacher may provide in management of formal and informal collaborations. This experimental study, with the collaboration of the science faculty in one urban secondary college, investigated the effect of two role attribution strategies on communication in peer groups of different gender composition in three parallel Year 8 science classes. The group were set a problem that required them to design an experiment to compare the thermal insulating properties of two different materials. This presents the data collected and key findings, and reviews the findings from previous parallel studies that have employed the same research design in different school settings. The results confirm the effectiveness of social role attribution strategies in teacher management of communication in peer-based work.
Massively parallel simulator of optical coherence tomography of inhomogeneous turbid media.
Malektaji, Siavash; Lima, Ivan T; Escobar I, Mauricio R; Sherif, Sherif S
2017-10-01
An accurate and practical simulator for Optical Coherence Tomography (OCT) could be an important tool to study the underlying physical phenomena in OCT such as multiple light scattering. Recently, many researchers have investigated simulation of OCT of turbid media, e.g., tissue, using Monte Carlo methods. The main drawback of these earlier simulators is the long computational time required to produce accurate results. We developed a massively parallel simulator of OCT of inhomogeneous turbid media that obtains both Class I diffusive reflectivity, due to ballistic and quasi-ballistic scattered photons, and Class II diffusive reflectivity due to multiply scattered photons. This Monte Carlo-based simulator is implemented on graphic processing units (GPUs), using the Compute Unified Device Architecture (CUDA) platform and programming model, to exploit the parallel nature of propagation of photons in tissue. It models an arbitrary shaped sample medium as a tetrahedron-based mesh and uses an advanced importance sampling scheme. This new simulator speeds up simulations of OCT of inhomogeneous turbid media by about two orders of magnitude. To demonstrate this result, we have compared the computation times of our new parallel simulator and its serial counterpart using two samples of inhomogeneous turbid media. We have shown that our parallel implementation reduced simulation time of OCT of the first sample medium from 407 min to 92 min by using a single GPU card, to 12 min by using 8 GPU cards and to 7 min by using 16 GPU cards. For the second sample medium, the OCT simulation time was reduced from 209 h to 35.6 h by using a single GPU card, and to 4.65 h by using 8 GPU cards, and to only 2 h by using 16 GPU cards. Therefore our new parallel simulator is considerably more practical to use than its central processing unit (CPU)-based counterpart. Our new parallel OCT simulator could be a practical tool to study the different physical phenomena underlying OCT, or to design OCT systems with improved performance. Copyright © 2017 Elsevier B.V. All rights reserved.
NASA Astrophysics Data System (ADS)
Benini, Luca
2017-06-01
The "internet of everything" envisions trillions of connected objects loaded with high-bandwidth sensors requiring massive amounts of local signal processing, fusion, pattern extraction and classification. From the computational viewpoint, the challenge is formidable and can be addressed only by pushing computing fabrics toward massive parallelism and brain-like energy efficiency levels. CMOS technology can still take us a long way toward this goal, but technology scaling is losing steam. Energy efficiency improvement will increasingly hinge on architecture, circuits, design techniques such as heterogeneous 3D integration, mixed-signal preprocessing, event-based approximate computing and non-Von-Neumann architectures for scalable acceleration.
Parallel VLSI architecture emulation and the organization of APSA/MPP
NASA Technical Reports Server (NTRS)
Odonnell, John T.
1987-01-01
The Applicative Programming System Architecture (APSA) combines an applicative language interpreter with a novel parallel computer architecture that is well suited for Very Large Scale Integration (VLSI) implementation. The Massively Parallel Processor (MPP) can simulate VLSI circuits by allocating one processing element in its square array to an area on a square VLSI chip. As long as there are not too many long data paths, the MPP can simulate a VLSI clock cycle very rapidly. The APSA circuit contains a binary tree with a few long paths and many short ones. A skewed H-tree layout allows every processing element to simulate a leaf cell and up to four tree nodes, with no loss in parallelism. Emulation of a key APSA algorithm on the MPP resulted in performance 16,000 times faster than a Vax. This speed will make it possible for the APSA language interpreter to run fast enough to support research in parallel list processing algorithms.
Dynamic file-access characteristics of a production parallel scientific workload
NASA Technical Reports Server (NTRS)
Kotz, David; Nieuwejaar, Nils
1994-01-01
Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to tremendous amounts of data in parallel to hundreds or thousands of processors. Most successful systems are based on a solid understanding of the expected workload, but thus far there have been no comprehensive workload characterizations of multiprocessor file systems. This paper presents the results of a three week tracing study in which all file-related activity on a massively parallel computer was recorded. Our instrumentation differs from previous efforts in that it collects information about every I/O request and about the mix of jobs running in a production environment. We also present the results of a trace-driven caching simulation and recommendations for designers of multiprocessor file systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Reed, D.A.; Grunwald, D.C.
The spectrum of parallel processor designs can be divided into three sections according to the number and complexity of the processors. At one end there are simple, bit-serial processors. Any one of thee processors is of little value, but when it is coupled with many others, the aggregate computing power can be large. This approach to parallel processing can be likened to a colony of termites devouring a log. The most notable examples of this approach are the NASA/Goodyear Massively Parallel Processor, which has 16K one-bit processors, and the Thinking Machines Connection Machine, which has 64K one-bit processors. At themore » other end of the spectrum, a small number of processors, each built using the fastest available technology and the most sophisticated architecture, are combined. An example of this approach is the Cray X-MP. This type of parallel processing is akin to four woodmen attacking the log with chainsaws.« less
3D Data Denoising via Nonlocal Means Filter by Using Parallel GPU Strategies
Cuomo, Salvatore; De Michele, Pasquale; Piccialli, Francesco
2014-01-01
Nonlocal Means (NLM) algorithm is widely considered as a state-of-the-art denoising filter in many research fields. Its high computational complexity leads researchers to the development of parallel programming approaches and the use of massively parallel architectures such as the GPUs. In the recent years, the GPU devices had led to achieving reasonable running times by filtering, slice-by-slice, and 3D datasets with a 2D NLM algorithm. In our approach we design and implement a fully 3D NonLocal Means parallel approach, adopting different algorithm mapping strategies on GPU architecture and multi-GPU framework, in order to demonstrate its high applicability and scalability. The experimental results we obtained encourage the usability of our approach in a large spectrum of applicative scenarios such as magnetic resonance imaging (MRI) or video sequence denoising. PMID:25045397
Three-Dimensional High-Lift Analysis Using a Parallel Unstructured Multigrid Solver
NASA Technical Reports Server (NTRS)
Mavriplis, Dimitri J.
1998-01-01
A directional implicit unstructured agglomeration multigrid solver is ported to shared and distributed memory massively parallel machines using the explicit domain-decomposition and message-passing approach. Because the algorithm operates on local implicit lines in the unstructured mesh, special care is required in partitioning the problem for parallel computing. A weighted partitioning strategy is described which avoids breaking the implicit lines across processor boundaries, while incurring minimal additional communication overhead. Good scalability is demonstrated on a 128 processor SGI Origin 2000 machine and on a 512 processor CRAY T3E machine for reasonably fine grids. The feasibility of performing large-scale unstructured grid calculations with the parallel multigrid algorithm is demonstrated by computing the flow over a partial-span flap wing high-lift geometry on a highly resolved grid of 13.5 million points in approximately 4 hours of wall clock time on the CRAY T3E.
Wen, X.; Datta, A.; Traverso, L. M.; Pan, L.; Xu, X.; Moon, E. E.
2015-01-01
Optical lithography, the enabling process for defining features, has been widely used in semiconductor industry and many other nanotechnology applications. Advances of nanotechnology require developments of high-throughput optical lithography capabilities to overcome the optical diffraction limit and meet the ever-decreasing device dimensions. We report our recent experimental advancements to scale up diffraction unlimited optical lithography in a massive scale using the near field nanolithography capabilities of bowtie apertures. A record number of near-field optical elements, an array of 1,024 bowtie antenna apertures, are simultaneously employed to generate a large number of patterns by carefully controlling their working distances over the entire array using an optical gap metrology system. Our experimental results reiterated the ability of using massively-parallel near-field devices to achieve high-throughput optical nanolithography, which can be promising for many important nanotechnology applications such as computation, data storage, communication, and energy. PMID:26525906
Massive problem reports mining and analysis based parallelism for similar search
NASA Astrophysics Data System (ADS)
Zhou, Ya; Hu, Cailin; Xiong, Han; Wei, Xiafei; Li, Ling
2017-05-01
Massive problem reports and solutions accumulated over time and continuously collected in XML Spreadsheet (XMLSS) format from enterprises and organizations, which record a series of comprehensive description about problems that can help technicians to trace problems and their solutions. It's a significant and challenging issue to effectively manage and analyze these massive semi-structured data to provide similar problem solutions, decisions of immediate problem and assisting product optimization for users during hardware and software maintenance. For this purpose, we build a data management system to manage, mine and analyze these data search results that can be categorized and organized into several categories for users to quickly find out where their interesting results locate. Experiment results demonstrate that this system is better than traditional centralized management system on the performance and the adaptive capability of heterogeneous data greatly. Besides, because of re-extracting topics, it enables each cluster to be described more precise and reasonable.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zhang, Keni; Yamamoto, Hajime; Pruess, Karsten
2008-02-15
TMVOC-MP is a massively parallel version of the TMVOC code (Pruess and Battistelli, 2002), a numerical simulator for three-phase non-isothermal flow of water, gas, and a multicomponent mixture of volatile organic chemicals (VOCs) in multidimensional heterogeneous porous/fractured media. TMVOC-MP was developed by introducing massively parallel computing techniques into TMVOC. It retains the physical process model of TMVOC, designed for applications to contamination problems that involve hydrocarbon fuels or organic solvents in saturated and unsaturated zones. TMVOC-MP can model contaminant behavior under 'natural' environmental conditions, as well as for engineered systems, such as soil vapor extraction, groundwater pumping, or steam-assisted sourcemore » remediation. With its sophisticated parallel computing techniques, TMVOC-MP can handle much larger problems than TMVOC, and can be much more computationally efficient. TMVOC-MP models multiphase fluid systems containing variable proportions of water, non-condensible gases (NCGs), and water-soluble volatile organic chemicals (VOCs). The user can specify the number and nature of NCGs and VOCs. There are no intrinsic limitations to the number of NCGs or VOCs, although the arrays for fluid components are currently dimensioned as 20, accommodating water plus 19 components that may be either NCGs or VOCs. Among them, NCG arrays are dimensioned as 10. The user may select NCGs from a data bank provided in the software. The currently available choices include O{sub 2}, N{sub 2}, CO{sub 2}, CH{sub 4}, ethane, ethylene, acetylene, and air (a pseudo-component treated with properties averaged from N{sub 2} and O{sub 2}). Thermophysical property data of VOCs can be selected from a chemical data bank, included with TMVOC-MP, that provides parameters for 26 commonly encountered chemicals. Users also can input their own data for other fluids. The fluid components may partition (volatilize and/or dissolve) among gas, aqueous, and NAPL phases. Any combination of the three phases may present, and phases may appear and disappear in the course of a simulation. In addition, VOCs may be adsorbed by the porous medium, and may biodegrade according to a simple half-life model. Detailed discussion of physical processes, assumptions, and fluid properties used in TMVOC-MP can be found in the TMVOC user's guide (Pruess and Battistelli, 2002). TMVOC-MP was developed based on the parallel framework of the TOUGH2-MP code (Zhang et al. 2001, Wu et al. 2002). It uses the MPI (Message Passing Forum, 1994) for parallel implementation. A domain decomposition approach is adopted for the parallelization. The code partitions a simulation domain, defined by an unstructured grid, using partitioning algorithm from the METIS software package (Karypsis and Kumar, 1998). In parallel simulation, each processor is in charge of one part of the simulation domain for assembling mass and energy balance equations, solving linear equation systems, updating thermophysical properties, and performing other local computations. The local linear-equation systems are solved in parallel by multiple processors with the Aztec linear solver package (Tuminaro et al., 1999). Although each processor solves the linearized equations of subdomains independently, the entire linear equation system is solved together by all processors collaboratively via communication between neighboring processors during each iteration. Detailed discussion of the prototype of the data-exchange scheme can be found in Elmroth et al. (2001). In addition, FORTRAN 90 features are introduced to TMVOC-MP, such as dynamic memory allocation, array operation, matrix manipulation, and replacing 'common blocks' (used in the original TMVOC) with modules. All new subroutines are written in FORTRAN 90. Program units imported from the original TMVOC remain in standard FORTRAN 77. This report provides a quick starting guide for using the TMVOC-MP program. We suppose that the users have basic knowledge of using the original TMVOC code. The users can find the detailed technical description of the physical processes modeled, and the mathematical and numerical methods in the user's guide for TMVOC (Pruess and Battistelli, 2002).« less
A Cyber-ITS Framework for Massive Traffic Data Analysis Using Cyber Infrastructure
Fontaine, Michael D.
2013-01-01
Traffic data is commonly collected from widely deployed sensors in urban areas. This brings up a new research topic, data-driven intelligent transportation systems (ITSs), which means to integrate heterogeneous traffic data from different kinds of sensors and apply it for ITS applications. This research, taking into consideration the significant increase in the amount of traffic data and the complexity of data analysis, focuses mainly on the challenge of solving data-intensive and computation-intensive problems. As a solution to the problems, this paper proposes a Cyber-ITS framework to perform data analysis on Cyber Infrastructure (CI), by nature parallel-computing hardware and software systems, in the context of ITS. The techniques of the framework include data representation, domain decomposition, resource allocation, and parallel processing. All these techniques are based on data-driven and application-oriented models and are organized as a component-and-workflow-based model in order to achieve technical interoperability and data reusability. A case study of the Cyber-ITS framework is presented later based on a traffic state estimation application that uses the fusion of massive Sydney Coordinated Adaptive Traffic System (SCATS) data and GPS data. The results prove that the Cyber-ITS-based implementation can achieve a high accuracy rate of traffic state estimation and provide a significant computational speedup for the data fusion by parallel computing. PMID:23766690
A massively parallel strategy for STR marker development, capture, and genotyping.
Kistler, Logan; Johnson, Stephen M; Irwin, Mitchell T; Louis, Edward E; Ratan, Aakrosh; Perry, George H
2017-09-06
Short tandem repeat (STR) variants are highly polymorphic markers that facilitate powerful population genetic analyses. STRs are especially valuable in conservation and ecological genetic research, yielding detailed information on population structure and short-term demographic fluctuations. Massively parallel sequencing has not previously been leveraged for scalable, efficient STR recovery. Here, we present a pipeline for developing STR markers directly from high-throughput shotgun sequencing data without a reference genome, and an approach for highly parallel target STR recovery. We employed our approach to capture a panel of 5000 STRs from a test group of diademed sifakas (Propithecus diadema, n = 3), endangered Malagasy rainforest lemurs, and we report extremely efficient recovery of targeted loci-97.3-99.6% of STRs characterized with ≥10x non-redundant sequence coverage. We then tested our STR capture strategy on P. diadema fecal DNA, and report robust initial results and suggestions for future implementations. In addition to STR targets, this approach also generates large, genome-wide single nucleotide polymorphism (SNP) panels from flanking regions. Our method provides a cost-effective and scalable solution for rapid recovery of large STR and SNP datasets in any species without needing a reference genome, and can be used even with suboptimal DNA more easily acquired in conservation and ecological studies. Published by Oxford University Press on behalf of Nucleic Acids Research 2017.
A Cyber-ITS framework for massive traffic data analysis using cyber infrastructure.
Xia, Yingjie; Hu, Jia; Fontaine, Michael D
2013-01-01
Traffic data is commonly collected from widely deployed sensors in urban areas. This brings up a new research topic, data-driven intelligent transportation systems (ITSs), which means to integrate heterogeneous traffic data from different kinds of sensors and apply it for ITS applications. This research, taking into consideration the significant increase in the amount of traffic data and the complexity of data analysis, focuses mainly on the challenge of solving data-intensive and computation-intensive problems. As a solution to the problems, this paper proposes a Cyber-ITS framework to perform data analysis on Cyber Infrastructure (CI), by nature parallel-computing hardware and software systems, in the context of ITS. The techniques of the framework include data representation, domain decomposition, resource allocation, and parallel processing. All these techniques are based on data-driven and application-oriented models and are organized as a component-and-workflow-based model in order to achieve technical interoperability and data reusability. A case study of the Cyber-ITS framework is presented later based on a traffic state estimation application that uses the fusion of massive Sydney Coordinated Adaptive Traffic System (SCATS) data and GPS data. The results prove that the Cyber-ITS-based implementation can achieve a high accuracy rate of traffic state estimation and provide a significant computational speedup for the data fusion by parallel computing.
The Cognitive Architecture for Chaining of Two Mental Operations
ERIC Educational Resources Information Center
Sackur, Jerome; Dehaene, Stanislas
2009-01-01
A simple view, which dates back to Turing, proposes that complex cognitive operations are composed of serially arranged elementary operations, each passing intermediate results to the next. However, whether and how such serial processing is achieved with a brain composed of massively parallel processors, remains an open question. Here, we study…
Meeting the challenges of non-referenced genome assembly from short-read sequence data
M. Parks; A. Liston; R. Cronn
2010-01-01
Massively parallel sequencing technologies (MPST) offer unprecedented opportunities for novel sequencing projects. MPST, while offering tremendous sequencing capacity, are typically most effective in resequencing projects (as opposed to the sequencing of novel genomes) due to the fact that sequence is returned in relatively short reads. Nonetheless, there is great...
Sleep EEG Changes during Adolescence: An Index of a Fundamental Brain Reorganization
ERIC Educational Resources Information Center
Feinberg, Irwin; Campbell, Ian G.
2010-01-01
Delta (1-4 Hz) EEG power in non-rapid eye movement (NREM) sleep declines massively during adolescence. This observation stimulated the hypothesis that during adolescence the human brain undergoes an extensive reorganization driven by synaptic elimination. The parallel declines in synaptic density, delta wave amplitude and cortical metabolic rate…
DOE Office of Scientific and Technical Information (OSTI.GOV)
2012-12-17
Grizzly is a simulation tool for assessing the effects of age-related degradation on systems, structures, and components of nuclear power plants. Grizzly is built on the MOOSE framework, and uses a Jacobian-free Newton Krylov method to obtain solutions to tightly coupled thermo-mechanical simulations. Grizzly runs on a wide range of hardware, from a single processor to massively parallel machines.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Steich, D J; Brugger, S T; Kallman, J S
2000-02-01
This final report describes our efforts on the Three-Dimensional Massively Parallel CEM Technologies LDRD project (97-ERD-009). Significant need exists for more advanced time domain computational electromagnetics modeling. Bookkeeping details and modifying inflexible software constitute a vast majority of the effort required to address such needs. The required effort escalates rapidly as problem complexity increases. For example, hybrid meshes requiring hybrid numerics on massively parallel platforms (MPPs). This project attempts to alleviate the above limitations by investigating flexible abstractions for these numerical algorithms on MPPs using object-oriented methods, providing a programming environment insulating physics from bookkeeping. The three major design iterationsmore » during the project, known as TIGER-I to TIGER-III, are discussed. Each version of TIGER is briefly discussed along with lessons learned during the development and implementation. An Application Programming Interface (API) of the object-oriented interface for Tiger-III is included in three appendices. The three appendices contain the Utilities, Entity-Attribute, and Mesh libraries developed during the project. The API libraries represent a snapshot of our latest attempt at insulated the physics from the bookkeeping.« less
Rodríguez-Martín, Carlos; Cidre, Florencia; Fernández-Teijeiro, Ana; Gómez-Mariano, Gema; de la Vega, Leticia; Ramos, Patricia; Zaballos, Ángel; Monzón, Sara; Alonso, Javier
2016-05-01
Retinoblastoma (RB, MIM 180200) is the paradigm of hereditary cancer. Individuals harboring a constitutional mutation in one allele of the RB1 gene have a high predisposition to develop RB. Here, we present the first case of familial RB caused by a de novo insertion of a full-length long interspersed element-1 (LINE-1) into intron 14 of the RB1 gene that caused a highly heterogeneous splicing pattern of RB1 mRNA. LINE-1 insertion was inferred by mRNA studies and full-length sequenced by massive parallel sequencing. Some of the aberrant mRNAs were produced by noncanonical acceptor splice sites, a new finding that up to date has not been described to occur upon LINE-1 retrotransposition. Our results clearly show that RNA-based strategies have the potential to detect disease-causing transposon insertions. It also confirms that the incorporation of new genetic approaches, such as massive parallel sequencing, contributes to characterize at the sequence level these unique and exceptional genetic alterations.
Neuromorphic Hardware Architecture Using the Neural Engineering Framework for Pattern Recognition.
Wang, Runchun; Thakur, Chetan Singh; Cohen, Gregory; Hamilton, Tara Julia; Tapson, Jonathan; van Schaik, Andre
2017-06-01
We present a hardware architecture that uses the neural engineering framework (NEF) to implement large-scale neural networks on field programmable gate arrays (FPGAs) for performing massively parallel real-time pattern recognition. NEF is a framework that is capable of synthesising large-scale cognitive systems from subnetworks and we have previously presented an FPGA implementation of the NEF that successfully performs nonlinear mathematical computations. That work was developed based on a compact digital neural core, which consists of 64 neurons that are instantiated by a single physical neuron using a time-multiplexing approach. We have now scaled this approach up to build a pattern recognition system by combining identical neural cores together. As a proof of concept, we have developed a handwritten digit recognition system using the MNIST database and achieved a recognition rate of 96.55%. The system is implemented on a state-of-the-art FPGA and can process 5.12 million digits per second. The architecture and hardware optimisations presented offer high-speed and resource-efficient means for performing high-speed, neuromorphic, and massively parallel pattern recognition and classification tasks.
Goossens, Dirk; Moens, Lotte N; Nelis, Eva; Lenaerts, An-Sofie; Glassee, Wim; Kalbe, Andreas; Frey, Bruno; Kopal, Guido; De Jonghe, Peter; De Rijk, Peter; Del-Favero, Jurgen
2009-03-01
We evaluated multiplex PCR amplification as a front-end for high-throughput sequencing, to widen the applicability of massive parallel sequencers for the detailed analysis of complex genomes. Using multiplex PCR reactions, we sequenced the complete coding regions of seven genes implicated in peripheral neuropathies in 40 individuals on a GS-FLX genome sequencer (Roche). The resulting dataset showed highly specific and uniform amplification. Comparison of the GS-FLX sequencing data with the dataset generated by Sanger sequencing confirmed the detection of all variants present and proved the sensitivity of the method for mutation detection. In addition, we showed that we could exploit the multiplexed PCR amplicons to determine individual copy number variation (CNV), increasing the spectrum of detected variations to both genetic and genomic variants. We conclude that our straightforward procedure substantially expands the applicability of the massive parallel sequencers for sequencing projects of a moderate number of amplicons (50-500) with typical applications in resequencing exons in positional or functional candidate regions and molecular genetic diagnostics. 2008 Wiley-Liss, Inc.
Spectral Calculation of ICRF Wave Propagation and Heating in 2-D Using Massively Parallel Computers
NASA Astrophysics Data System (ADS)
Jaeger, E. F.; D'Azevedo, E.; Berry, L. A.; Carter, M. D.; Batchelor, D. B.
2000-10-01
Spectral calculations of ICRF wave propagation in plasmas have the natural advantage that they require no assumption regarding the smallness of the ion Larmor radius ρ relative to wavelength λ. Results are therefore applicable to all orders in k_bot ρ where k_bot = 2π/λ. But because all modes in the spectral representation are coupled, the solution requires inversion of a large dense matrix. In contrast, finite difference algorithms involve only matrices that are sparse and banded. Thus, spectral calculations of wave propagation and heating in tokamak plasmas have so far been limited to 1-D. In this paper, we extend the spectral method to 2-D by taking advantage of new matrix inversion techniques that utilize massively parallel computers. By spreading the dense matrix over 576 processors on the ORNL IBM RS/6000 SP supercomputer, we are able to solve up to 120,000 coupled complex equations requiring 230 GBytes of memory and achieving over 500 Gflops/sec. Initial results for ASDEX and NSTX will be presented using up to 200 modes in both the radial and vertical dimensions.
IFRD1 Is a Candidate Gene for SMNA on Chromosome 7q22-q23
Brkanac, Zoran; Spencer, David; Shendure, Jay; Robertson, Peggy D.; Matsushita, Mark; Vu, Tiffany; Bird, Thomas D.; Olson, Maynard V.; Raskind, Wendy H.
2009-01-01
We have established strong linkage evidence that supports mapping autosomal-dominant sensory/motor neuropathy with ataxia (SMNA) to chromosome 7q22-q32. SMNA is a rare neurological disorder whose phenotype encompasses both the central and the peripheral nervous system. In order to identify a gene responsible for SMNA, we have undertaken a comprehensive genomic evaluation of the region of linkage, including evaluation for repeat expansion and small deletions or duplications, capillary sequencing of candidate genes, and massively parallel sequencing of all coding exons. We excluded repeat expansion and small deletions or duplications as causative, and through microarray-based hybrid capture and massively parallel short-read sequencing, we identified a nonsynonymous variant in the human interferon-related developmental regulator gene 1 (IFRD1) as a disease-causing candidate. Sequence conservation, animal models, and protein structure evaluation support the involvement of IFRD1 in SMNA. Mutation analysis of IFRD1 in additional patients with similar phenotypes is needed for demonstration of causality and further evaluation of its importance in neurological diseases. PMID:19409521
Performance Evaluation in Network-Based Parallel Computing
NASA Technical Reports Server (NTRS)
Dezhgosha, Kamyar
1996-01-01
Network-based parallel computing is emerging as a cost-effective alternative for solving many problems which require use of supercomputers or massively parallel computers. The primary objective of this project has been to conduct experimental research on performance evaluation for clustered parallel computing. First, a testbed was established by augmenting our existing SUNSPARCs' network with PVM (Parallel Virtual Machine) which is a software system for linking clusters of machines. Second, a set of three basic applications were selected. The applications consist of a parallel search, a parallel sort, a parallel matrix multiplication. These application programs were implemented in C programming language under PVM. Third, we conducted performance evaluation under various configurations and problem sizes. Alternative parallel computing models and workload allocations for application programs were explored. The performance metric was limited to elapsed time or response time which in the context of parallel computing can be expressed in terms of speedup. The results reveal that the overhead of communication latency between processes in many cases is the restricting factor to performance. That is, coarse-grain parallelism which requires less frequent communication between processes will result in higher performance in network-based computing. Finally, we are in the final stages of installing an Asynchronous Transfer Mode (ATM) switch and four ATM interfaces (each 155 Mbps) which will allow us to extend our study to newer applications, performance metrics, and configurations.
The Earth Data Analytic Services (EDAS) Framework
NASA Astrophysics Data System (ADS)
Maxwell, T. P.; Duffy, D.
2017-12-01
Faced with unprecedented growth in earth data volume and demand, NASA has developed the Earth Data Analytic Services (EDAS) framework, a high performance big data analytics framework built on Apache Spark. This framework enables scientists to execute data processing workflows combining common analysis operations close to the massive data stores at NASA. The data is accessed in standard (NetCDF, HDF, etc.) formats in a POSIX file system and processed using vetted earth data analysis tools (ESMF, CDAT, NCO, etc.). EDAS utilizes a dynamic caching architecture, a custom distributed array framework, and a streaming parallel in-memory workflow for efficiently processing huge datasets within limited memory spaces with interactive response times. EDAS services are accessed via a WPS API being developed in collaboration with the ESGF Compute Working Team to support server-side analytics for ESGF. The API can be accessed using direct web service calls, a Python script, a Unix-like shell client, or a JavaScript-based web application. New analytic operations can be developed in Python, Java, or Scala (with support for other languages planned). Client packages in Python, Java/Scala, or JavaScript contain everything needed to build and submit EDAS requests. The EDAS architecture brings together the tools, data storage, and high-performance computing required for timely analysis of large-scale data sets, where the data resides, to ultimately produce societal benefits. It is is currently deployed at NASA in support of the Collaborative REAnalysis Technical Environment (CREATE) project, which centralizes numerous global reanalysis datasets onto a single advanced data analytics platform. This service enables decision makers to compare multiple reanalysis datasets and investigate trends, variability, and anomalies in earth system dynamics around the globe.
Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E.; Tkachenko, Valery; Torcivia-Rodriguez, John; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja
2016-01-01
The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure. The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu PMID:26989153
Simonyan, Vahan; Chumakov, Konstantin; Dingerdissen, Hayley; Faison, William; Goldweber, Scott; Golikov, Anton; Gulzar, Naila; Karagiannis, Konstantinos; Vinh Nguyen Lam, Phuc; Maudru, Thomas; Muravitskaja, Olesja; Osipova, Ekaterina; Pan, Yang; Pschenichnov, Alexey; Rostovtsev, Alexandre; Santana-Quintero, Luis; Smith, Krista; Thompson, Elaine E; Tkachenko, Valery; Torcivia-Rodriguez, John; Voskanian, Alin; Wan, Quan; Wang, Jing; Wu, Tsung-Jung; Wilson, Carolyn; Mazumder, Raja
2016-01-01
The High-performance Integrated Virtual Environment (HIVE) is a distributed storage and compute environment designed primarily to handle next-generation sequencing (NGS) data. This multicomponent cloud infrastructure provides secure web access for authorized users to deposit, retrieve, annotate and compute on NGS data, and to analyse the outcomes using web interface visual environments appropriately built in collaboration with research and regulatory scientists and other end users. Unlike many massively parallel computing environments, HIVE uses a cloud control server which virtualizes services, not processes. It is both very robust and flexible due to the abstraction layer introduced between computational requests and operating system processes. The novel paradigm of moving computations to the data, instead of moving data to computational nodes, has proven to be significantly less taxing for both hardware and network infrastructure.The honeycomb data model developed for HIVE integrates metadata into an object-oriented model. Its distinction from other object-oriented databases is in the additional implementation of a unified application program interface to search, view and manipulate data of all types. This model simplifies the introduction of new data types, thereby minimizing the need for database restructuring and streamlining the development of new integrated information systems. The honeycomb model employs a highly secure hierarchical access control and permission system, allowing determination of data access privileges in a finely granular manner without flooding the security subsystem with a multiplicity of rules. HIVE infrastructure will allow engineers and scientists to perform NGS analysis in a manner that is both efficient and secure. HIVE is actively supported in public and private domains, and project collaborations are welcomed. Database URL: https://hive.biochemistry.gwu.edu. © The Author(s) 2016. Published by Oxford University Press.
NASA Astrophysics Data System (ADS)
Alves Júnior, A. A.; Sokoloff, M. D.
2017-10-01
MCBooster is a header-only, C++11-compliant library that provides routines to generate and perform calculations on large samples of phase space Monte Carlo events. To achieve superior performance, MCBooster is capable to perform most of its calculations in parallel using CUDA- and OpenMP-enabled devices. MCBooster is built on top of the Thrust library and runs on Linux systems. This contribution summarizes the main features of MCBooster. A basic description of the user interface and some examples of applications are provided, along with measurements of performance in a variety of environments
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, Cameron W.; Granzow, Brian; Diamond, Gerrett
Unstructured mesh methods, like finite elements and finite volumes, support the effective analysis of complex physical behaviors modeled by partial differential equations over general threedimensional domains. The most reliable and efficient methods apply adaptive procedures with a-posteriori error estimators that indicate where and how the mesh is to be modified. Although adaptive meshes can have two to three orders of magnitude fewer elements than a more uniform mesh for the same level of accuracy, there are many complex simulations where the meshes required are so large that they can only be solved on massively parallel systems.
Smith, Cameron W.; Granzow, Brian; Diamond, Gerrett; ...
2017-01-01
Unstructured mesh methods, like finite elements and finite volumes, support the effective analysis of complex physical behaviors modeled by partial differential equations over general threedimensional domains. The most reliable and efficient methods apply adaptive procedures with a-posteriori error estimators that indicate where and how the mesh is to be modified. Although adaptive meshes can have two to three orders of magnitude fewer elements than a more uniform mesh for the same level of accuracy, there are many complex simulations where the meshes required are so large that they can only be solved on massively parallel systems.
Towards massively parallelized all-optical magnetic recording
NASA Astrophysics Data System (ADS)
Davies, C. S.; Janušonis, J.; Kimel, A. V.; Kirilyuk, A.; Tsukamoto, A.; Rasing, Th.; Tobey, R. I.
2018-06-01
We demonstrate an approach to parallel all-optical writing of magnetic domains using spatial and temporal interference of two ultrashort light pulses. We explore how the fluence and grating periodicity of the optical transient grating influence the size and uniformity of the written bits. Using a total incident optical energy of 3.5 μJ, we demonstrate the capability of simultaneously writing 102 spatially separated bits, each featuring a relevant lateral width of ˜1 μm. We discuss viable routes to extend this technique to write individually addressable, sub-diffraction-limited magnetic domains in a wide range of materials.
Parallelized seeded region growing using CUDA.
Park, Seongjin; Lee, Jeongjin; Lee, Hyunna; Shin, Juneseuk; Seo, Jinwook; Lee, Kyoung Ho; Shin, Yeong-Gil; Kim, Bohyoung
2014-01-01
This paper presents a novel method for parallelizing the seeded region growing (SRG) algorithm using Compute Unified Device Architecture (CUDA) technology, with intention to overcome the theoretical weakness of SRG algorithm of its computation time being directly proportional to the size of a segmented region. The segmentation performance of the proposed CUDA-based SRG is compared with SRG implementations on single-core CPUs, quad-core CPUs, and shader language programming, using synthetic datasets and 20 body CT scans. Based on the experimental results, the CUDA-based SRG outperforms the other three implementations, advocating that it can substantially assist the segmentation during massive CT screening tests.
Optical Interconnection Via Computer-Generated Holograms
NASA Technical Reports Server (NTRS)
Liu, Hua-Kuang; Zhou, Shaomin
1995-01-01
Method of free-space optical interconnection developed for data-processing applications like parallel optical computing, neural-network computing, and switching in optical communication networks. In method, multiple optical connections between multiple sources of light in one array and multiple photodetectors in another array made via computer-generated holograms in electrically addressed spatial light modulators (ESLMs). Offers potential advantages of massive parallelism, high space-bandwidth product, high time-bandwidth product, low power consumption, low cross talk, and low time skew. Also offers advantage of programmability with flexibility of reconfiguration, including variation of strengths of optical connections in real time.
NASA Astrophysics Data System (ADS)
Ukawa, Akira
1998-05-01
The CP-PACS computer is a massively parallel computer consisting of 2048 processing units and having a peak speed of 614 GFLOPS and 128 GByte of main memory. It was developed over the four years from 1992 to 1996 at the Center for Computational Physics, University of Tsukuba, for large-scale numerical simulations in computational physics, especially those of lattice QCD. The CP-PACS computer has been in full operation for physics computations since October 1996. In this article we describe the chronology of the development, the hardware and software characteristics of the computer, and its performance for lattice QCD simulations.
Advanced miniature processing handware for ATR applications
NASA Technical Reports Server (NTRS)
Chao, Tien-Hsin (Inventor); Daud, Taher (Inventor); Thakoor, Anikumar (Inventor)
2003-01-01
A Hybrid Optoelectronic Neural Object Recognition System (HONORS), is disclosed, comprising two major building blocks: (1) an advanced grayscale optical correlator (OC) and (2) a massively parallel three-dimensional neural-processor. The optical correlator, with its inherent advantages in parallel processing and shift invariance, is used for target of interest (TOI) detection and segmentation. The three-dimensional neural-processor, with its robust neural learning capability, is used for target classification and identification. The hybrid optoelectronic neural object recognition system, with its powerful combination of optical processing and neural networks, enables real-time, large frame, automatic target recognition (ATR).
ERIC Educational Resources Information Center
Sundararajan, NarayanKripa; Adesope, Olusola; Cavagnetto, Andy
2017-01-01
To develop and nurture critical thinking, students must have opportunities to observe and practice critical thinking in the classroom. In this parallel mixed method classroom study, we investigate the role of collaborative concept mapping in the development of kindergarten learners' critical thinking skills of analysis and interpretation over a…
ERIC Educational Resources Information Center
Palinkas, Lawrence A.; Garcia, Antonio; Aarons, Gregory; Finno-Velasquez, Megan; Fuentes, Dahlia; Holloway, Ian; Chamberlain, Patricia
2018-01-01
The Cultural Exchange Inventory (CEI) is a 15-item instrument designed to measure the process (7 items) and outcomes (8 items) of exchanges of knowledge, attitudes and practices between members of different organisations collaborating in implementing evidence-based practice. We conducted principal axis factor analyses and parallel analyses of data…
ERIC Educational Resources Information Center
Santos, Elvira Santos; Garcia, Irma Cruz Gavilan; Gomez, Eva Florencia Lejarazo; Vilchis-Reyes, Miguel Angel
2010-01-01
A series of experiments based on problem-solving and collaborative-learning pedagogies are described that encourage students to interpret results and draw conclusions from data. Different approaches including parallel library synthesis, solvent variation, and leaving group variation are used to study a nucleophilic aromatic substitution of…
Scalable Parallel Density-based Clustering and Applications
NASA Astrophysics Data System (ADS)
Patwary, Mostofa Ali
2014-04-01
Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.
On the Accuracy and Parallelism of GPGPU-Powered Incremental Clustering Algorithms.
Chen, Chunlei; He, Li; Zhang, Huixiang; Zheng, Hao; Wang, Lei
2017-01-01
Incremental clustering algorithms play a vital role in various applications such as massive data analysis and real-time data processing. Typical application scenarios of incremental clustering raise high demand on computing power of the hardware platform. Parallel computing is a common solution to meet this demand. Moreover, General Purpose Graphic Processing Unit (GPGPU) is a promising parallel computing device. Nevertheless, the incremental clustering algorithm is facing a dilemma between clustering accuracy and parallelism when they are powered by GPGPU. We formally analyzed the cause of this dilemma. First, we formalized concepts relevant to incremental clustering like evolving granularity. Second, we formally proved two theorems. The first theorem proves the relation between clustering accuracy and evolving granularity. Additionally, this theorem analyzes the upper and lower bounds of different-to-same mis-affiliation. Fewer occurrences of such mis-affiliation mean higher accuracy. The second theorem reveals the relation between parallelism and evolving granularity. Smaller work-depth means superior parallelism. Through the proofs, we conclude that accuracy of an incremental clustering algorithm is negatively related to evolving granularity while parallelism is positively related to the granularity. Thus the contradictory relations cause the dilemma. Finally, we validated the relations through a demo algorithm. Experiment results verified theoretical conclusions.
NASA Astrophysics Data System (ADS)
Schultz, A.
2010-12-01
3D forward solvers lie at the core of inverse formulations used to image the variation of electrical conductivity within the Earth's interior. This property is associated with variations in temperature, composition, phase, presence of volatiles, and in specific settings, the presence of groundwater, geothermal resources, oil/gas or minerals. The high cost of 3D solutions has been a stumbling block to wider adoption of 3D methods. Parallel algorithms for modeling frequency domain 3D EM problems have not achieved wide scale adoption, with emphasis on fairly coarse grained parallelism using MPI and similar approaches. The communications bandwidth as well as the latency required to send and receive network communication packets is a limiting factor in implementing fine grained parallel strategies, inhibiting wide adoption of these algorithms. Leading Graphics Processor Unit (GPU) companies now produce GPUs with hundreds of GPU processor cores per die. The footprint, in silicon, of the GPU's restricted instruction set is much smaller than the general purpose instruction set required of a CPU. Consequently, the density of processor cores on a GPU can be much greater than on a CPU. GPUs also have local memory, registers and high speed communication with host CPUs, usually through PCIe type interconnects. The extremely low cost and high computational power of GPUs provides the EM geophysics community with an opportunity to achieve fine grained (i.e. massive) parallelization of codes on low cost hardware. The current generation of GPUs (e.g. NVidia Fermi) provides 3 billion transistors per chip die, with nearly 500 processor cores and up to 6 GB of fast (DDR5) GPU memory. This latest generation of GPU supports fast hardware double precision (64 bit) floating point operations of the type required for frequency domain EM forward solutions. Each Fermi GPU board can sustain nearly 1 TFLOP in double precision, and multiple boards can be installed in the host computer system. We describe our ongoing efforts to achieve massive parallelization on a novel hybrid GPU testbed machine currently configured with 12 Intel Westmere Xeon CPU cores (or 24 parallel computational threads) with 96 GB DDR3 system memory, 4 GPU subsystems which in aggregate contain 960 NVidia Tesla GPU cores with 16 GB dedicated DDR3 GPU memory, and a second interleved bank of 4 GPU subsystems containing in aggregate 1792 NVidia Fermi GPU cores with 12 GB dedicated DDR5 GPU memory. We are applying domain decomposition methods to a modified version of Weiss' (2001) 3D frequency domain full physics EM finite difference code, an open source GPL licensed f90 code available for download from www.OpenEM.org. This will be the core of a new hybrid 3D inversion that parallelizes frequencies across CPUs and individual forward solutions across GPUs. We describe progress made in modifying the code to use direct solvers in GPU cores dedicated to each small subdomain, iteratively improving the solution by matching adjacent subdomain boundary solutions, rather than iterative Krylov space sparse solvers as currently applied to the whole domain.
Microresonator-based solitons for massively parallel coherent optical communications
NASA Astrophysics Data System (ADS)
Marin-Palomo, Pablo; Kemal, Juned N.; Karpov, Maxim; Kordts, Arne; Pfeifle, Joerg; Pfeiffer, Martin H. P.; Trocha, Philipp; Wolf, Stefan; Brasch, Victor; Anderson, Miles H.; Rosenberger, Ralf; Vijayan, Kovendhan; Freude, Wolfgang; Kippenberg, Tobias J.; Koos, Christian
2017-06-01
Solitons are waveforms that preserve their shape while propagating, as a result of a balance of dispersion and nonlinearity. Soliton-based data transmission schemes were investigated in the 1980s and showed promise as a way of overcoming the limitations imposed by dispersion of optical fibres. However, these approaches were later abandoned in favour of wavelength-division multiplexing schemes, which are easier to implement and offer improved scalability to higher data rates. Here we show that solitons could make a comeback in optical communications, not as a competitor but as a key element of massively parallel wavelength-division multiplexing. Instead of encoding data on the soliton pulse train itself, we use continuous-wave tones of the associated frequency comb as carriers for communication. Dissipative Kerr solitons (DKSs) (solitons that rely on a double balance of parametric gain and cavity loss, as well as dispersion and nonlinearity) are generated as continuously circulating pulses in an integrated silicon nitride microresonator via four-photon interactions mediated by the Kerr nonlinearity, leading to low-noise, spectrally smooth, broadband optical frequency combs. We use two interleaved DKS frequency combs to transmit a data stream of more than 50 terabits per second on 179 individual optical carriers that span the entire telecommunication C and L bands (centred around infrared telecommunication wavelengths of 1.55 micrometres). We also demonstrate coherent detection of a wavelength-division multiplexing data stream by using a pair of DKS frequency combs—one as a multi-wavelength light source at the transmitter and the other as the corresponding local oscillator at the receiver. This approach exploits the scalability of microresonator-based DKS frequency comb sources for massively parallel optical communications at both the transmitter and the receiver. Our results demonstrate the potential of these sources to replace the arrays of continuous-wave lasers that are currently used in high-speed communications. In combination with advanced spatial multiplexing schemes and highly integrated silicon photonic circuits, DKS frequency combs could bring chip-scale petabit-per-second transceivers into reach.
Gupta, Tripti; Marlow, Florence L.; Ferriola, Deborah; Mackiewicz, Katarzyna; Dapprich, Johannes; Monos, Dimitri; Mullins, Mary C.
2010-01-01
Although of fundamental importance in developmental biology, the genetic basis for the symmetry breaking events that polarize the vertebrate oocyte and egg are largely unknown. In vertebrates, the first morphological asymmetry in the oocyte is the Balbiani body, a highly conserved, transient structure found in vertebrates and invertebrates including Drosophila, Xenopus, human, and mouse. We report the identification of the zebrafish magellan (mgn) mutant, which exhibits a novel enlarged Balbiani body phenotype and a disruption of oocyte polarity. To determine the molecular identity of the mgn gene, we positionally cloned the gene, employing a novel DNA capture method to target region-specific genomic DNA of 600 kb for massively parallel sequencing. Using this technique, we were able to enrich for the genomic region linked to our mutation within one week and then identify the mutation in mgn using massively parallel sequencing. This is one of the first successful uses of genomic DNA enrichment combined with massively parallel sequencing to determine the molecular identity of a gene associated with a mutant phenotype. We anticipate that the combination of these technologies will have wide applicability for the efficient identification of mutant genes in all organisms. We identified the mutation in mgn as a deletion in the coding sequence of the zebrafish microtubule actin crosslinking factor 1 (macf1) gene. macf1 is a member of the highly conserved spectraplakin family of cytoskeletal linker proteins, which play diverse roles in polarized cells such as neurons, muscle cells, and epithelial cells. In mgn mutants, the oocyte nucleus is mislocalized; and the Balbiani body, localized mRNAs, and organelles are absent from the periphery of the oocyte, consistent with a function for macf1 in nuclear anchoring and cortical localization. These data provide the first evidence for a role for spectraplakins in polarization of the vertebrate oocyte and egg. PMID:20808893
Gupta, Tripti; Marlow, Florence L; Ferriola, Deborah; Mackiewicz, Katarzyna; Dapprich, Johannes; Monos, Dimitri; Mullins, Mary C
2010-08-19
Although of fundamental importance in developmental biology, the genetic basis for the symmetry breaking events that polarize the vertebrate oocyte and egg are largely unknown. In vertebrates, the first morphological asymmetry in the oocyte is the Balbiani body, a highly conserved, transient structure found in vertebrates and invertebrates including Drosophila, Xenopus, human, and mouse. We report the identification of the zebrafish magellan (mgn) mutant, which exhibits a novel enlarged Balbiani body phenotype and a disruption of oocyte polarity. To determine the molecular identity of the mgn gene, we positionally cloned the gene, employing a novel DNA capture method to target region-specific genomic DNA of 600 kb for massively parallel sequencing. Using this technique, we were able to enrich for the genomic region linked to our mutation within one week and then identify the mutation in mgn using massively parallel sequencing. This is one of the first successful uses of genomic DNA enrichment combined with massively parallel sequencing to determine the molecular identity of a gene associated with a mutant phenotype. We anticipate that the combination of these technologies will have wide applicability for the efficient identification of mutant genes in all organisms. We identified the mutation in mgn as a deletion in the coding sequence of the zebrafish microtubule actin crosslinking factor 1 (macf1) gene. macf1 is a member of the highly conserved spectraplakin family of cytoskeletal linker proteins, which play diverse roles in polarized cells such as neurons, muscle cells, and epithelial cells. In mgn mutants, the oocyte nucleus is mislocalized; and the Balbiani body, localized mRNAs, and organelles are absent from the periphery of the oocyte, consistent with a function for macf1 in nuclear anchoring and cortical localization. These data provide the first evidence for a role for spectraplakins in polarization of the vertebrate oocyte and egg.
Lagardère, Louis; Jolly, Luc-Henri; Lipparini, Filippo; Aviat, Félix; Stamm, Benjamin; Jing, Zhifeng F; Harger, Matthew; Torabifard, Hedieh; Cisneros, G Andrés; Schnieders, Michael J; Gresh, Nohad; Maday, Yvon; Ren, Pengyu Y; Ponder, Jay W; Piquemal, Jean-Philip
2018-01-28
We present Tinker-HP, a massively MPI parallel package dedicated to classical molecular dynamics (MD) and to multiscale simulations, using advanced polarizable force fields (PFF) encompassing distributed multipoles electrostatics. Tinker-HP is an evolution of the popular Tinker package code that conserves its simplicity of use and its reference double precision implementation for CPUs. Grounded on interdisciplinary efforts with applied mathematics, Tinker-HP allows for long polarizable MD simulations on large systems up to millions of atoms. We detail in the paper the newly developed extension of massively parallel 3D spatial decomposition to point dipole polarizable models as well as their coupling to efficient Krylov iterative and non-iterative polarization solvers. The design of the code allows the use of various computer systems ranging from laboratory workstations to modern petascale supercomputers with thousands of cores. Tinker-HP proposes therefore the first high-performance scalable CPU computing environment for the development of next generation point dipole PFFs and for production simulations. Strategies linking Tinker-HP to Quantum Mechanics (QM) in the framework of multiscale polarizable self-consistent QM/MD simulations are also provided. The possibilities, performances and scalability of the software are demonstrated via benchmarks calculations using the polarizable AMOEBA force field on systems ranging from large water boxes of increasing size and ionic liquids to (very) large biosystems encompassing several proteins as well as the complete satellite tobacco mosaic virus and ribosome structures. For small systems, Tinker-HP appears to be competitive with the Tinker-OpenMM GPU implementation of Tinker. As the system size grows, Tinker-HP remains operational thanks to its access to distributed memory and takes advantage of its new algorithmic enabling for stable long timescale polarizable simulations. Overall, a several thousand-fold acceleration over a single-core computation is observed for the largest systems. The extension of the present CPU implementation of Tinker-HP to other computational platforms is discussed.
Schoeman, Elizna M; Lopez, Genghis H; McGowan, Eunike C; Millard, Glenda M; O'Brien, Helen; Roulis, Eileen V; Liew, Yew-Wah; Martin, Jacqueline R; McGrath, Kelli A; Powley, Tanya; Flower, Robert L; Hyland, Catherine A
2017-04-01
Blood group single nucleotide polymorphism genotyping probes for a limited range of polymorphisms. This study investigated whether massively parallel sequencing (also known as next-generation sequencing), with a targeted exome strategy, provides an extended blood group genotype and the extent to which massively parallel sequencing correctly genotypes in homologous gene systems, such as RH and MNS. Donor samples (n = 28) that were extensively phenotyped and genotyped using single nucleotide polymorphism typing, were analyzed using the TruSight One Sequencing Panel and MiSeq platform. Genes for 28 protein-based blood group systems, GATA1, and KLF1 were analyzed. Copy number variation analysis was used to characterize complex structural variants in the GYPC and RH systems. The average sequencing depth per target region was 66.2 ± 39.8. Each sample harbored on average 43 ± 9 variants, of which 10 ± 3 were used for genotyping. For the 28 samples, massively parallel sequencing variant sequences correctly matched expected sequences based on single nucleotide polymorphism genotyping data. Copy number variation analysis defined the Rh C/c alleles and complex RHD hybrids. Hybrid RHD*D-CE-D variants were correctly identified, but copy number variation analysis did not confidently distinguish between D and CE exon deletion versus rearrangement. The targeted exome sequencing strategy employed extended the range of blood group genotypes detected compared with single nucleotide polymorphism typing. This single-test format included detection of complex MNS hybrid cases and, with copy number variation analysis, defined RH hybrid genes along with the RHCE*C allele hitherto difficult to resolve by variant detection. The approach is economical compared with whole-genome sequencing and is suitable for a red blood cell reference laboratory setting. © 2017 AABB.
NASA Technical Reports Server (NTRS)
Benford, Steve; Bowers, John; Fahlen, Lennart E.; Greenhalgh, Chris; Snowdon, Dave
1994-01-01
This paper explores the issue of user embodiment within collaborative virtual environments. By user embodiment we mean the provision of users with appropriate body images so as to represent them to others and also to themselves. By collaborative virtual environments we mean multi-user virtual reality systems which support cooperative work (although we argue that the results of our exploration may also be applied to other kinds of collaborative systems). The main part of the paper identifies a list of embodiment design issues including: presence, location, identity, activity, availability, history of activity, viewpoint, action point, gesture, facial expression, voluntary versus involuntary expression, degree of presence, reflecting capabilities, manipulating the user's view of others, representation across multiple media, autonomous and distributed body parts, truthfulness and efficiency. Following this, we show how these issues are reflected in our own DIVE and MASSIVE prototype collaborative virtual environments.
Analysis of MOOCs Practices from the Perspective of Learner Experiences and Quality Culture
ERIC Educational Resources Information Center
Ossiannilsson, Ebba; Altinay, Fahriye; Altinay, Zehra
2015-01-01
This explanatory analysis of the relevant literature seeks to identify factors affecting quality in massive open online courses (MOOCs). The paper highlights sub-dimensions of quality in MOOCs using the ladder of analytical abstraction. Communication, trust, collaboration, inclusiveness, innovation, and commitment are identified as key elements in…
ERIC Educational Resources Information Center
West-Puckett, Stephanie; Smith, Anna; Cantrill, Christina; Zamora, Mia
2018-01-01
To better understand the impacts of participatory design in English language arts teacher education, this critical case study focuses on the National Writing Project's Connected Learning Massive, Open, Online Collaboration (CLMOOC) that engaged educators in playing with the connected learning framework. The authors draw from 5 years of interaction…
A Comparison of MOOC Development and Delivery Approaches
ERIC Educational Resources Information Center
Smith, Neil; Caldwell, Helen; Richards, Mike; Bandara, Arosha
2017-01-01
Purpose: The purpose of this paper is to present a comparison of two ways of developing and delivering massive open online courses (MOOCs). One was developed by The Open University in collaboration with FutureLearn; the other was developed independently by a small team at the Northampton University. Design/methodology/approach: The different…
Google Books Mutilates the Printed Past
ERIC Educational Resources Information Center
Musto, Ronald G.
2009-01-01
In this article, the author discusses a mutilation that he has encountered involving Google Book Search. That massive text-digitization project, working in collaboration with several of the world's most important library collections, has now made available, in both PDF and text view, tens of thousands of 19th-century titles while it awaits the…
Open Courses and MOOCs as Professional Development--Is the Openness a Hindrance?
ERIC Educational Resources Information Center
Olsson, Ulf
2016-01-01
Purpose: The purpose of this paper is to analyse if open courses, including massive open online courses (MOOCs), can be used as professional development despite their openness since the openness not only brings together individuals from different organizations but also may make the results of collaboration public. Design/methodology/approach: The…
Facilitating Attitudinal Learning in an Animal Behaviour and Welfare MOOC
ERIC Educational Resources Information Center
Watson, Sunnie Lee
2017-01-01
This case study examines the design and facilitation of a Massive Open Online Course (MOOC) that focused on attitudinal learning about the topic of animal behaviour and welfare. Findings showed that a team of instructors worked together collaboratively towards realising learning goals and found the experience rewarding. While learners had mixed…
Mapping Next Generation Learning Spaces as a Designed Quality Enhancement Process
ERIC Educational Resources Information Center
Leonard, Simon N.; Fitzgerald, Robert N.; Bacon, Matt; Munnerley, Danny
2017-01-01
The learning spaces of higher education are changing with collaborative, agile and technology-enabled spaces ever more popular. Despite the massive investment required to create these new spaces, current quality systems are poorly placed to account for the value they create. Such learning spaces are typically popular with students but the impact…
Collaborative Gaming: Teaching Children about Complex Systems and Collective Behavior
ERIC Educational Resources Information Center
Peppler, Kylie; Danish, Joshua A.; Phelps, David
2013-01-01
Although games--including board games, video games, and Massive Multiplayer Online Games--have garnered significant attention in recent years for their impact on educational outcomes, a primary focus of this interest is the transfer of knowledge from game to nongame settings. Building on this literature, our research explores how game designs that…
ERIC Educational Resources Information Center
Ngamassi Tchouakeu, Louis-Marie
2011-01-01
Massive international response to humanitarian crises such as the South Asian Tsunami in 2004, the Hurricane Katrina in 2005 and the Haiti earthquake in 2010 highlights the importance of humanitarian inter-organizational collaboration networks, especially in information management and exchange. Despite more than a decade old call for more research…
Neuroimaging, Genetics, and Clinical Data Sharing in Python Using the CubicWeb Framework
Grigis, Antoine; Goyard, David; Cherbonnier, Robin; Gareau, Thomas; Papadopoulos Orfanos, Dimitri; Chauvat, Nicolas; Di Mascio, Adrien; Schumann, Gunter; Spooren, Will; Murphy, Declan; Frouin, Vincent
2017-01-01
In neurosciences or psychiatry, the emergence of large multi-center population imaging studies raises numerous technological challenges. From distributed data collection, across different institutions and countries, to final data publication service, one must handle the massive, heterogeneous, and complex data from genetics, imaging, demographics, or clinical scores. These data must be both efficiently obtained and downloadable. We present a Python solution, based on the CubicWeb open-source semantic framework, aimed at building population imaging study repositories. In addition, we focus on the tools developed around this framework to overcome the challenges associated with data sharing and collaborative requirements. We describe a set of three highly adaptive web services that transform the CubicWeb framework into a (1) multi-center upload platform, (2) collaborative quality assessment platform, and (3) publication platform endowed with massive-download capabilities. Two major European projects, IMAGEN and EU-AIMS, are currently supported by the described framework. We also present a Python package that enables end users to remotely query neuroimaging, genetics, and clinical data from scripts. PMID:28360851
Neuroimaging, Genetics, and Clinical Data Sharing in Python Using the CubicWeb Framework.
Grigis, Antoine; Goyard, David; Cherbonnier, Robin; Gareau, Thomas; Papadopoulos Orfanos, Dimitri; Chauvat, Nicolas; Di Mascio, Adrien; Schumann, Gunter; Spooren, Will; Murphy, Declan; Frouin, Vincent
2017-01-01
In neurosciences or psychiatry, the emergence of large multi-center population imaging studies raises numerous technological challenges. From distributed data collection, across different institutions and countries, to final data publication service, one must handle the massive, heterogeneous, and complex data from genetics, imaging, demographics, or clinical scores. These data must be both efficiently obtained and downloadable. We present a Python solution, based on the CubicWeb open-source semantic framework, aimed at building population imaging study repositories. In addition, we focus on the tools developed around this framework to overcome the challenges associated with data sharing and collaborative requirements. We describe a set of three highly adaptive web services that transform the CubicWeb framework into a (1) multi-center upload platform, (2) collaborative quality assessment platform, and (3) publication platform endowed with massive-download capabilities. Two major European projects, IMAGEN and EU-AIMS, are currently supported by the described framework. We also present a Python package that enables end users to remotely query neuroimaging, genetics, and clinical data from scripts.
The emergence of spatial cyberinfrastructure.
Wright, Dawn J; Wang, Shaowen
2011-04-05
Cyberinfrastructure integrates advanced computer, information, and communication technologies to empower computation-based and data-driven scientific practice and improve the synthesis and analysis of scientific data in a collaborative and shared fashion. As such, it now represents a paradigm shift in scientific research that has facilitated easy access to computational utilities and streamlined collaboration across distance and disciplines, thereby enabling scientific breakthroughs to be reached more quickly and efficiently. Spatial cyberinfrastructure seeks to resolve longstanding complex problems of handling and analyzing massive and heterogeneous spatial datasets as well as the necessity and benefits of sharing spatial data flexibly and securely. This article provides an overview and potential future directions of spatial cyberinfrastructure. The remaining four articles of the special feature are introduced and situated in the context of providing empirical examples of how spatial cyberinfrastructure is extending and enhancing scientific practice for improved synthesis and analysis of both physical and social science data. The primary focus of the articles is spatial analyses using distributed and high-performance computing, sensor networks, and other advanced information technology capabilities to transform massive spatial datasets into insights and knowledge.
The emergence of spatial cyberinfrastructure
Wright, Dawn J.; Wang, Shaowen
2011-01-01
Cyberinfrastructure integrates advanced computer, information, and communication technologies to empower computation-based and data-driven scientific practice and improve the synthesis and analysis of scientific data in a collaborative and shared fashion. As such, it now represents a paradigm shift in scientific research that has facilitated easy access to computational utilities and streamlined collaboration across distance and disciplines, thereby enabling scientific breakthroughs to be reached more quickly and efficiently. Spatial cyberinfrastructure seeks to resolve longstanding complex problems of handling and analyzing massive and heterogeneous spatial datasets as well as the necessity and benefits of sharing spatial data flexibly and securely. This article provides an overview and potential future directions of spatial cyberinfrastructure. The remaining four articles of the special feature are introduced and situated in the context of providing empirical examples of how spatial cyberinfrastructure is extending and enhancing scientific practice for improved synthesis and analysis of both physical and social science data. The primary focus of the articles is spatial analyses using distributed and high-performance computing, sensor networks, and other advanced information technology capabilities to transform massive spatial datasets into insights and knowledge. PMID:21467227
EUPDF-II: An Eulerian Joint Scalar Monte Carlo PDF Module : User's Manual
NASA Technical Reports Server (NTRS)
Raju, M. S.; Liu, Nan-Suey (Technical Monitor)
2004-01-01
EUPDF-II provides the solution for the species and temperature fields based on an evolution equation for PDF (Probability Density Function) and it is developed mainly for application with sprays, combustion, parallel computing, and unstructured grids. It is designed to be massively parallel and could easily be coupled with any existing gas-phase CFD and spray solvers. The solver accommodates the use of an unstructured mesh with mixed elements of either triangular, quadrilateral, and/or tetrahedral type. The manual provides the user with an understanding of the various models involved in the PDF formulation, its code structure and solution algorithm, and various other issues related to parallelization and its coupling with other solvers. The source code of EUPDF-II will be available with National Combustion Code (NCC) as a complete package.
Im, Hyungsoon; Lesuffleur, Antoine; Lindquist, Nathan C.; Oh, Sang-Hyun
2009-01-01
We present nanohole arrays in a gold film integrated with a 6-channel microfluidic chip for parallel measurements of molecular binding kinetics. Surface plasmon resonance effects in the nanohole arrays enable real-time label-free measurements of molecular binding events in each channel, while adjacent negative reference channels can record measurement artifacts such as bulk solution index changes, temperature variations, or changing light absorption in the liquid. Using this platform, streptavidin-biotin specific binding kinetics are measured at various concentrations with negative controls. A high-density microarray of 252 biosensing pixels is also demonstrated with a packing density of 106 sensing elements/cm2, which can potentially be coupled with a massively parallel array of microfluidic channels for protein microarray applications. PMID:19284776
Monitoring and controlling ATLAS data management: The Rucio web user interface
NASA Astrophysics Data System (ADS)
Lassnig, M.; Beermann, T.; Vigne, R.; Barisits, M.; Garonne, V.; Serfon, C.
2015-12-01
The monitoring and controlling interfaces of the previous data management system DQ2 followed the evolutionary requirements and needs of the ATLAS collaboration. The new data management system, Rucio, has put in place a redesigned web-based interface based upon the lessons learnt from DQ2, and the increased volume of managed information. This interface encompasses both a monitoring and controlling component, and allows easy integration for usergenerated views. The interface follows three design principles. First, the collection and storage of data from internal and external systems is asynchronous to reduce latency. This includes the use of technologies like ActiveMQ or Nagios. Second, analysis of the data into information is done massively parallel due to its volume, using a combined approach with an Oracle database and Hadoop MapReduce. Third, sharing of the information does not distinguish between human or programmatic access, making it easy to access selective parts of the information both in constrained frontends like web-browsers as well as remote services. This contribution will detail the reasons for these principles and the design choices taken. Additionally, the implementation, the interactions with external systems, and an evaluation of the system in production, both from a technological and user perspective, conclude this contribution.
Parallel Algorithms for Switching Edges in Heterogeneous Graphs.
Bhuiyan, Hasanuzzaman; Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav
2017-06-01
An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors.
Parallel Algorithms for Switching Edges in Heterogeneous Graphs☆
Khan, Maleq; Chen, Jiangzhuo; Marathe, Madhav
2017-01-01
An edge switch is an operation on a graph (or network) where two edges are selected randomly and one of their end vertices are swapped with each other. Edge switch operations have important applications in graph theory and network analysis, such as in generating random networks with a given degree sequence, modeling and analyzing dynamic networks, and in studying various dynamic phenomena over a network. The recent growth of real-world networks motivates the need for efficient parallel algorithms. The dependencies among successive edge switch operations and the requirement to keep the graph simple (i.e., no self-loops or parallel edges) as the edges are switched lead to significant challenges in designing a parallel algorithm. Addressing these challenges requires complex synchronization and communication among the processors leading to difficulties in achieving a good speedup by parallelization. In this paper, we present distributed memory parallel algorithms for switching edges in massive networks. These algorithms provide good speedup and scale well to a large number of processors. A harmonic mean speedup of 73.25 is achieved on eight different networks with 1024 processors. One of the steps in our edge switch algorithms requires the computation of multinomial random variables in parallel. This paper presents the first non-trivial parallel algorithm for the problem, achieving a speedup of 925 using 1024 processors. PMID:28757680
Efficient Parallel Video Processing Techniques on GPU: From Framework to Implementation
Su, Huayou; Wen, Mei; Wu, Nan; Ren, Ju; Zhang, Chunyuan
2014-01-01
Through reorganizing the execution order and optimizing the data structure, we proposed an efficient parallel framework for H.264/AVC encoder based on massively parallel architecture. We implemented the proposed framework by CUDA on NVIDIA's GPU. Not only the compute intensive components of the H.264 encoder are parallelized but also the control intensive components are realized effectively, such as CAVLC and deblocking filter. In addition, we proposed serial optimization methods, including the multiresolution multiwindow for motion estimation, multilevel parallel strategy to enhance the parallelism of intracoding as much as possible, component-based parallel CAVLC, and direction-priority deblocking filter. More than 96% of workload of H.264 encoder is offloaded to GPU. Experimental results show that the parallel implementation outperforms the serial program by 20 times of speedup ratio and satisfies the requirement of the real-time HD encoding of 30 fps. The loss of PSNR is from 0.14 dB to 0.77 dB, when keeping the same bitrate. Through the analysis to the kernels, we found that speedup ratios of the compute intensive algorithms are proportional with the computation power of the GPU. However, the performance of the control intensive parts (CAVLC) is much related to the memory bandwidth, which gives an insight for new architecture design. PMID:24757432
Offering Global Collaboration Services beyond CERN and HEP
NASA Astrophysics Data System (ADS)
Fernandes, J.; Ferreira, P.; Baron, T.
2015-12-01
The CERN IT department has built over the years a performant and integrated ecosystem of collaboration tools, from videoconference and webcast services to event management software. These services have been designed and evolved in very close collaboration with the various communities surrounding the laboratory and have been massively adopted by CERN users. To cope with this very heavy usage, global infrastructures have been deployed which take full advantage of CERN's international and global nature. If these services and tools are instrumental in enabling the worldwide collaboration which generates major HEP breakthroughs, they would certainly also benefit other sectors of science in which globalization has already taken place. Some of these services are driven by commercial software (Vidyo or Wowza for example), some others have been developed internally and have already been made available to the world as Open Source Software in line with CERN's spirit and mission. Indico for example is now installed in 100+ institutes worldwide. But providing the software is often not enough and institutes, collaborations and project teams do not always possess the expertise, or human or material resources that are needed to set up and maintain such services. Regional and national institutions have to answer needs, which are growingly global and often contradict their operational capabilities or organizational mandate and so are looking at existing worldwide service offers such as CERN's. We believe that the accumulated experience obtained through the operation of a large scale worldwide collaboration service combined with CERN's global network and its recently- deployed Agile Infrastructure would allow the Organization to set up and operate collaborative services, such as Indico and Vidyo, at a much larger scale and on behalf of worldwide research and education institutions and thus answer these pressing demands while optimizing resources at a global level. Such services would be built over a robust and massively scalable Indico server to which the concept of communities would be added, and which would then serve as a hub for accessing other collaboration services such as Vidyo, on the same simple and successful model currently in place for CERN users. This talk will describe this vision, its benefits and the steps that have already been taken to make it come to life.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Xiong, Yi; Fakcharoenphol, Perapon; Wang, Shihao
2013-12-01
TOUGH2-EGS-MP is a parallel numerical simulation program coupling geomechanics with fluid and heat flow in fractured and porous media, and is applicable for simulation of enhanced geothermal systems (EGS). TOUGH2-EGS-MP is based on the TOUGH2-MP code, the massively parallel version of TOUGH2. In TOUGH2-EGS-MP, the fully-coupled flow-geomechanics model is developed from linear elastic theory for thermo-poro-elastic systems and is formulated in terms of mean normal stress as well as pore pressure and temperature. Reservoir rock properties such as porosity and permeability depend on rock deformation, and the relationships between these two, obtained from poro-elasticity theories and empirical correlations, are incorporatedmore » into the simulation. This report provides the user with detailed information on the TOUGH2-EGS-MP mathematical model and instructions for using it for Thermal-Hydrological-Mechanical (THM) simulations. The mathematical model includes the fluid and heat flow equations, geomechanical equation, and discretization of those equations. In addition, the parallel aspects of the code, such as domain partitioning and communication between processors, are also included. Although TOUGH2-EGS-MP has the capability for simulating fluid and heat flows coupled with geomechanical effects, it is up to the user to select the specific coupling process, such as THM or only TH, in a simulation. There are several example problems illustrating applications of this program. These example problems are described in detail and their input data are presented. Their results demonstrate that this program can be used for field-scale geothermal reservoir simulation in porous and fractured media with fluid and heat flow coupled with geomechanical effects.« less
Besozzi, Daniela; Pescini, Dario; Mauri, Giancarlo
2014-01-01
Tau-leaping is a stochastic simulation algorithm that efficiently reconstructs the temporal evolution of biological systems, modeled according to the stochastic formulation of chemical kinetics. The analysis of dynamical properties of these systems in physiological and perturbed conditions usually requires the execution of a large number of simulations, leading to high computational costs. Since each simulation can be executed independently from the others, a massive parallelization of tau-leaping can bring to relevant reductions of the overall running time. The emerging field of General Purpose Graphic Processing Units (GPGPU) provides power-efficient high-performance computing at a relatively low cost. In this work we introduce cuTauLeaping, a stochastic simulator of biological systems that makes use of GPGPU computing to execute multiple parallel tau-leaping simulations, by fully exploiting the Nvidia's Fermi GPU architecture. We show how a considerable computational speedup is achieved on GPU by partitioning the execution of tau-leaping into multiple separated phases, and we describe how to avoid some implementation pitfalls related to the scarcity of memory resources on the GPU streaming multiprocessors. Our results show that cuTauLeaping largely outperforms the CPU-based tau-leaping implementation when the number of parallel simulations increases, with a break-even directly depending on the size of the biological system and on the complexity of its emergent dynamics. In particular, cuTauLeaping is exploited to investigate the probability distribution of bistable states in the Schlögl model, and to carry out a bidimensional parameter sweep analysis to study the oscillatory regimes in the Ras/cAMP/PKA pathway in S. cerevisiae. PMID:24663957
NASA Astrophysics Data System (ADS)
Du, Zhidong; Chen, Chen; Pan, Liang
2017-04-01
Maskless lithography using parallel electron beamlets is a promising solution for next generation scalable maskless nanolithography. Researchers have focused on this goal but have been unable to find a robust technology to generate and control high-quality electron beamlets with satisfactory brightness and uniformity. In this work, we will aim to address this challenge by developing a revolutionary surface-plasmon-enhanced-photoemission (SPEP) technology to generate massively-parallel electron beamlets for maskless nanolithography. The new technology is built upon our recent breakthroughs in plasmonic lenses, which will be used to excite and focus surface plasmons to generate massively-parallel electron beamlets through photoemission. Specifically, the proposed SPEP device consists of an array of plasmonic lens and electrostatic micro-lens pairs, each pair independently producing an electron beamlet. During lithography, a spatial optical modulator will dynamically project light onto individual plasmonic lenses to control the switching and brightness of electron beamlets. The photons incident onto each plasmonic lens are concentrated into a diffraction-unlimited spot as localized surface plasmons to excite the local electrons to near their vacuum levels. Meanwhile, the electrostatic micro-lens extracts the excited electrons to form a focused beamlet, which can be rastered across a wafer to perform lithography. Studies showed that surface plasmons can enhance the photoemission by orders of magnitudes. This SPEP technology can scale up the maskless lithography process to write at wafers per hour. In this talk, we will report the mechanism of the strong electron-photon couplings and the locally enhanced photoexcitation, design of a SPEP device, overview of our proof-of-concept study, and demonstrated parallel lithography of 20-50 nm features.
NASA Astrophysics Data System (ADS)
Mapelli, Michela
2018-02-01
The first four LIGO detections have confirmed the existence of massive black holes (BHs), with mass 30-40 M⊙. Such BHs might originate from massive metal-poor stars (Z < 0:3 Z⊙) or from gravitational instabilities in the early Universe. The formation channels of merging BHs are still poorly constrained. The measure of mass, spin and redshift distribution of merging BHs will give us fundamental clues to distinguish between different models. In parallel, a better understanding of several astrophysical processes (e.g. common envelope, core-collapse SNe, and dynamical evolution of BHs) is decisive, to shed light on the formation channels of merging BHs.
Performance effects of irregular communications patterns on massively parallel multiprocessors
NASA Technical Reports Server (NTRS)
Saltz, Joel; Petiton, Serge; Berryman, Harry; Rifkin, Adam
1991-01-01
A detailed study of the performance effects of irregular communications patterns on the CM-2 was conducted. The communications capabilities of the CM-2 were characterized under a variety of controlled conditions. In the process of carrying out the performance evaluation, extensive use was made of a parameterized synthetic mesh. In addition, timings with unstructured meshes generated for aerodynamic codes and a set of sparse matrices with banded patterns on non-zeroes were performed. This benchmarking suite stresses the communications capabilities of the CM-2 in a range of different ways. Benchmark results demonstrate that it is possible to make effective use of much of the massive concurrency available in the communications network.
NASA Technical Reports Server (NTRS)
Hastings, Harold M.; Waner, Stefan
1987-01-01
The Massively Parallel Processor (MPP) is an ideal machine for computer experiments with simulated neural nets as well as more general cellular automata. Experiments using the MPP with a formal model neural network are described. The results on problem mapping and computational efficiency apply equally well to the neural nets of Hopfield, Hinton et al., and Geman and Geman.
Three-Dimensional Nanobiocomputing Architectures With Neuronal Hypercells
2007-06-01
Neumann architectures, and CMOS fabrication. Novel solutions of massive parallel distributed computing and processing (pipelined due to systolic... and processing platforms utilizing molecular hardware within an enabling organization and architecture. The design technology is based on utilizing a...Microsystems and Nanotechnologies investigated a novel 3D3 (Hardware Software Nanotechnology) technology to design super-high performance computing
Conformal anomaly of some 2-d Z (n) models
NASA Astrophysics Data System (ADS)
William, Peter
1991-01-01
We describe a numerical calculation of the conformal anomaly in the case of some two-dimensional statistical models undergoing a second-order phase transition, utilizing a recently developed method to compute the partition function exactly. This computation is carried out on a massively parallel CM2 machine, using the finite size scaling behaviour of the free energy.
Mangano, M.G.; Buatois, L.A.
1996-01-01
The Loma del Kilome??tro Member of the Lower Ordovician Suri Formation records arc-related shelf sedimentation in the Famatina Basin of northwest Argentina. Nine facies, grouped into three facies assemblages, are recognized. Facies assemblage 1 [massive and parallel-laminated mudstones (facies A) locally punctuated by normally graded or parallel-laminated silty sandstones (facies B] records deposition from suspension fall-out and episodic storm-induced turbidity currents in an outer shelf setting. Facies assemblage 2 [massive and parallel-laminated mudstones (facies A) interbedded with rippled-top very fine-grained sandstones (facies D)] is interpreted as the product of background sedimentation alternating with distal storm events in a middle shelf environment. Facies assemblage 3 [normally graded coarse to fine-grained sandstones (facies C); parallel-laminated to low angle cross-stratified sandstones (facies E); hummocky cross-stratified sandstones and siltstones (facies F); interstratified fine-grained sandstones and mudstones (facies G); massive muddy siltstones and sandstones (facies H); tuffaceous sandstones (facies I); and interbedded thin units of massive and parallel-laminated mudstones (facies A)] is thought to represent volcaniclastic mass flow and storm deposition coupled with subordinated suspension fall-out in an inner-shelf to lower-shoreface setting. The Loma del Kilo??metro Member records regressive-transgressive sedimentation in a storm- and mass flow-dominated high-gradient shelf. Volcano-tectonic activity was the important control on shelf morphology, while relative sea-level change influenced sedimentation. The lower part of the succession is attributed to mud blanketing during high stand and volcanic quiescence. Progradation of the inner shelf to lower shoreface facies assemblage in the middle part represents an abrupt basinward shoreline migration. An erosive-based, non-volcaniclastic, turbidite unit at the base of this package suggests a sea level fall. Pyroclastic detritus, andesites, and a non-volcanic terrain were eroded and their detritus was transported basinward and redeposited by sediment gravity flows during the low stand. The local coexistence of juvenile pyroclastic detritus and fossils suggests reworking of rare ash-falls. The upper part of the Loma del Kilo??metre Member records a transgression with no evidence of contemporaneous volcanism. Biostratinomic, paleoecologic, and ichnologic analyses support this paleoenvironmental interpretations and provide independent evidence for the dominance of episodic sedimentation in an arc-related shallow marine setting. Fossil concentrations were mainly formed by event processes, such as storms and volcaniclastic mass flows. High depositional rates inhibited formation of sediment-starved biogenic concentrations. Collectively, trace fossils belong to the Cruziana ichnofacies. Low diversity, scarcity, and presence of relatively simple forms indicate benthic activity under stressful conditions, most probably linked to high sedimentation rates. Contrasting sedimentary dynamics between 'normal shelves' and their volcaniclastic counterparts produce distinct and particular signatures in the stratigraphic record. Arc-related shelves are typified by event deposition with significant participation of sediment gravity flows, relatively high sedimentation rates, textural and mineralogical immaturity of sediments, scarcity and low diversity of trace fossils, and dominance of transported and reworked faunal assemblages genetically related to episodic processes.
NASA Astrophysics Data System (ADS)
Gassmöller, Rene; Bangerth, Wolfgang
2016-04-01
Particle-in-cell methods have a long history and many applications in geodynamic modelling of mantle convection, lithospheric deformation and crustal dynamics. They are primarily used to track material information, the strain a material has undergone, the pressure-temperature history a certain material region has experienced, or the amount of volatiles or partial melt present in a region. However, their efficient parallel implementation - in particular combined with adaptive finite-element meshes - is complicated due to the complex communication patterns and frequent reassignment of particles to cells. Consequently, many current scientific software packages accomplish this efficient implementation by specifically designing particle methods for a single purpose, like the advection of scalar material properties that do not evolve over time (e.g., for chemical heterogeneities). Design choices for particle integration, data storage, and parallel communication are then optimized for this single purpose, making the code relatively rigid to changing requirements. Here, we present the implementation of a flexible, scalable and efficient particle-in-cell method for massively parallel finite-element codes with adaptively changing meshes. Using a modular plugin structure, we allow maximum flexibility of the generation of particles, the carried tracer properties, the advection and output algorithms, and the projection of properties to the finite-element mesh. We present scaling tests ranging up to tens of thousands of cores and tens of billions of particles. Additionally, we discuss efficient load-balancing strategies for particles in adaptive meshes with their strengths and weaknesses, local particle-transfer between parallel subdomains utilizing existing communication patterns from the finite element mesh, and the use of established parallel output algorithms like the HDF5 library. Finally, we show some relevant particle application cases, compare our implementation to a modern advection-field approach, and demonstrate under which conditions which method is more efficient. We implemented the presented methods in ASPECT (aspect.dealii.org), a freely available open-source community code for geodynamic simulations. The structure of the particle code is highly modular, and segregated from the PDE solver, and can thus be easily transferred to other programs, or adapted for various application cases.
Modeling Cooperative Threads to Project GPU Performance for Adaptive Parallelism
DOE Office of Scientific and Technical Information (OSTI.GOV)
Meng, Jiayuan; Uram, Thomas; Morozov, Vitali A.
Most accelerators, such as graphics processing units (GPUs) and vector processors, are particularly suitable for accelerating massively parallel workloads. On the other hand, conventional workloads are developed for multi-core parallelism, which often scale to only a few dozen OpenMP threads. When hardware threads significantly outnumber the degree of parallelism in the outer loop, programmers are challenged with efficient hardware utilization. A common solution is to further exploit the parallelism hidden deep in the code structure. Such parallelism is less structured: parallel and sequential loops may be imperfectly nested within each other, neigh boring inner loops may exhibit different concurrency patternsmore » (e.g. Reduction vs. Forall), yet have to be parallelized in the same parallel section. Many input-dependent transformations have to be explored. A programmer often employs a larger group of hardware threads to cooperatively walk through a smaller outer loop partition and adaptively exploit any encountered parallelism. This process is time-consuming and error-prone, yet the risk of gaining little or no performance remains high for such workloads. To reduce risk and guide implementation, we propose a technique to model workloads with limited parallelism that can automatically explore and evaluate transformations involving cooperative threads. Eventually, our framework projects the best achievable performance and the most promising transformations without implementing GPU code or using physical hardware. We envision our technique to be integrated into future compilers or optimization frameworks for autotuning.« less
World-wide amateur observations
NASA Astrophysics Data System (ADS)
Eversberg, T.; Aldoretta, E. J.; Knapen, J. H.; Moffat, A. F. J.; Morel, T.; Ramiaramanantsoa, T.; Rauw, G.; Richardson, N. D.; St-Louis, N.; Teodoro, M.
For some years now, spectroscopic measurements of massive stars in the amateur domain have been fulfilling professional requirements. Various groups in the northern and southern hemispheres have been established, running successful professional-amateur (ProAm) collaborative campaigns, e.g., on WR, O and B type stars. Today high quality data (echelle and long-slit) are regularly delivered and corresponding results published. Night-to-night long-term observations over months to years open a new opportunity for massive-star research. We introduce recent and ongoing sample campaigns (e.g. ɛ Aur, WR 134, ζ Pup), show respective results and highlight the vast amount of data collected in various data bases. Ultimately it is in the time-dependent domain where amateurs can shine most.
X-raying a galactic gravitational lense
NASA Astrophysics Data System (ADS)
Garmire, Gordon
2016-09-01
It can be very difficult to detect compact objects that are not accreting in binary systems. Therefore we know very little about such objects and likely would not notice them even if they are very nearby. Two most obvious examples are very old isolated neutron stars (those that may be beyond the death line and hence are lacking pulsed radio or gamma-ray emission) and isolated stellar-mass black holes (none have been identified so far). OGLE3-ULENSPAR-05 is one of the few massive objects identified in microlensing searches for dark massive objects by the OGLE collaboration. We propose to detect the source or set stringent limit on its X-ray luminosity.
USDA-ARS?s Scientific Manuscript database
Massive investments in climate change mitigation and adaptation are projected during coming decades. Many of these investments will seek to modify how land is managed. The return on both types of investments can be increased through an understanding of land potential: the potential of the land to s...
Announcing the First Results from Daya Bay: Discovery of a New Kind of
collaboration observed tens of thousands of interactions of electron antineutrinos, caught by six massive was the sizable disappearance, equal to about six percent. Although disappearance has been observed in . "Even with only the six detectors already operating, we have more target mass than any similar
Factors for Development of Learning Content and Task for MOOCs in an Asian Context
ERIC Educational Resources Information Center
Nordin, Norazah; Norman, Helmi; Embi, Mohamed Amin; Mansor, Ahmad Zamri; Idris, Fazilah
2016-01-01
The rapid advancement of emergent learning technologies has led to the introduction of massive open online courses (MOOCs) which offer open-based online learning courses to a large number of students. In line with the advancement, the Malaysia Ministry of Education has recently initiated Malaysia MOOCs via collaboration with four public…
Toward Equity and Diversity in Literacy Research, Policy, and Practice: A Critical, Global Approach
ERIC Educational Resources Information Center
Morrell, Ernest
2017-01-01
Can growing inequities between rich and poor and massive manifestations of hatred and intolerance amid rising tides of global populism inspire a focus on equity and diversity in literacy research, policy, and practice? Can such calls for change be collaborative rather than competitive? Can we envision self-love, wellness, and intercultural…
Visualization of Expert Chat Development in a World of Warcraft Player Group
ERIC Educational Resources Information Center
Chen, Mark
2009-01-01
This article describes expertise development in a player group in the massively multiplayer online game World of Warcraft using visualization of chat log data. Charts were created to get a general sense of chat trends in a specific player group engaged in "high-end raiding", a 40-person collaborative activity. These charts helped identify patterns…
A Cloud Based Real-Time Collaborative Platform for eHealth.
Ionescu, Bogdan; Gadea, Cristian; Solomon, Bogdan; Ionescu, Dan; Stoicu-Tivadar, Vasile; Trifan, Mircea
2015-01-01
For more than a decade, the eHealth initiative has been a government concern of many countries. In an Electronic Health Record (EHR) System, there is a need for sharing the data with a group of specialists simultaneously. Collaborative platforms alone are just a part of a solution, while a collaborative platform with parallel editing capabilities and with synchronized data streaming are stringently needed. In this paper, the design and implementation of a collaborative platform used in healthcare is introduced by describing the high level architecture and its implementation. A series of eHealth services are identified and usage examples in a healthcare environment are given.
Parallel computation and the basis system
DOE Office of Scientific and Technical Information (OSTI.GOV)
Smith, G.R.
1993-05-01
A software package has been written that can facilitate efforts to develop powerful, flexible, and easy-to use programs that can run in single-processor, massively parallel, and distributed computing environments. Particular attention has been given to the difficulties posed by a program consisting of many science packages that represent subsystems of a complicated, coupled system. Methods have been found to maintain independence of the packages by hiding data structures without increasing the communications costs in a parallel computing environment. Concepts developed in this work are demonstrated by a prototype program that uses library routines from two existing software systems, Basis andmore » Parallel Virtual Machine (PVM). Most of the details of these libraries have been encapsulated in routines and macros that could be rewritten for alternative libraries that possess certain minimum capabilities. The prototype software uses a flexible master-and-slaves paradigm for parallel computation and supports domain decomposition with message passing for partitioning work among slaves. Facilities are provided for accessing variables that are distributed among the memories of slaves assigned to subdomains. The software is named PROTOPAR.« less
Dust Dynamics in Protoplanetary Disks: Parallel Computing with PVM
NASA Astrophysics Data System (ADS)
de La Fuente Marcos, Carlos; Barge, Pierre; de La Fuente Marcos, Raúl
2002-03-01
We describe a parallel version of our high-order-accuracy particle-mesh code for the simulation of collisionless protoplanetary disks. We use this code to carry out a massively parallel, two-dimensional, time-dependent, numerical simulation, which includes dust particles, to study the potential role of large-scale, gaseous vortices in protoplanetary disks. This noncollisional problem is easy to parallelize on message-passing multicomputer architectures. We performed the simulations on a cache-coherent nonuniform memory access Origin 2000 machine, using both the parallel virtual machine (PVM) and message-passing interface (MPI) message-passing libraries. Our performance analysis suggests that, for our problem, PVM is about 25% faster than MPI. Using PVM and MPI made it possible to reduce CPU time and increase code performance. This allows for simulations with a large number of particles (N ~ 105-106) in reasonable CPU times. The performances of our implementation of the pa! rallel code on an Origin 2000 supercomputer are presented and discussed. They exhibit very good speedup behavior and low load unbalancing. Our results confirm that giant gaseous vortices can play a dominant role in giant planet formation.
Staging memory for massively parallel processor
NASA Technical Reports Server (NTRS)
Batcher, Kenneth E. (Inventor)
1988-01-01
The invention herein relates to a computer organization capable of rapidly processing extremely large volumes of data. A staging memory is provided having a main stager portion consisting of a large number of memory banks which are accessed in parallel to receive, store, and transfer data words simultaneous with each other. Substager portions interconnect with the main stager portion to match input and output data formats with the data format of the main stager portion. An address generator is coded for accessing the data banks for receiving or transferring the appropriate words. Input and output permutation networks arrange the lineal order of data into and out of the memory banks.
NASA Astrophysics Data System (ADS)
Wittek, Peter; Calderaro, Luca
2015-12-01
We extended a parallel and distributed implementation of the Trotter-Suzuki algorithm for simulating quantum systems to study a wider range of physical problems and to make the library easier to use. The new release allows periodic boundary conditions, many-body simulations of non-interacting particles, arbitrary stationary potential functions, and imaginary time evolution to approximate the ground state energy. The new release is more resilient to the computational environment: a wider range of compiler chains and more platforms are supported. To ease development, we provide a more extensive command-line interface, an application programming interface, and wrappers from high-level languages.
Parallelized Seeded Region Growing Using CUDA
Park, Seongjin; Lee, Hyunna; Seo, Jinwook; Lee, Kyoung Ho; Shin, Yeong-Gil; Kim, Bohyoung
2014-01-01
This paper presents a novel method for parallelizing the seeded region growing (SRG) algorithm using Compute Unified Device Architecture (CUDA) technology, with intention to overcome the theoretical weakness of SRG algorithm of its computation time being directly proportional to the size of a segmented region. The segmentation performance of the proposed CUDA-based SRG is compared with SRG implementations on single-core CPUs, quad-core CPUs, and shader language programming, using synthetic datasets and 20 body CT scans. Based on the experimental results, the CUDA-based SRG outperforms the other three implementations, advocating that it can substantially assist the segmentation during massive CT screening tests. PMID:25309619
Parallel processors and nonlinear structural dynamics algorithms and software
NASA Technical Reports Server (NTRS)
Belytschko, Ted; Gilbertsen, Noreen D.; Neal, Mark O.; Plaskacz, Edward J.
1989-01-01
The adaptation of a finite element program with explicit time integration to a massively parallel SIMD (single instruction multiple data) computer, the CONNECTION Machine is described. The adaptation required the development of a new algorithm, called the exchange algorithm, in which all nodal variables are allocated to the element with an exchange of nodal forces at each time step. The architectural and C* programming language features of the CONNECTION Machine are also summarized. Various alternate data structures and associated algorithms for nonlinear finite element analysis are discussed and compared. Results are presented which demonstrate that the CONNECTION Machine is capable of outperforming the CRAY XMP/14.
A survey of GPU-based acceleration techniques in MRI reconstructions
Wang, Haifeng; Peng, Hanchuan; Chang, Yuchou
2018-01-01
Image reconstruction in magnetic resonance imaging (MRI) clinical applications has become increasingly more complicated. However, diagnostic and treatment require very fast computational procedure. Modern competitive platforms of graphics processing unit (GPU) have been used to make high-performance parallel computations available, and attractive to common consumers for computing massively parallel reconstruction problems at commodity price. GPUs have also become more and more important for reconstruction computations, especially when deep learning starts to be applied into MRI reconstruction. The motivation of this survey is to review the image reconstruction schemes of GPU computing for MRI applications and provide a summary reference for researchers in MRI community. PMID:29675361
A survey of GPU-based acceleration techniques in MRI reconstructions.
Wang, Haifeng; Peng, Hanchuan; Chang, Yuchou; Liang, Dong
2018-03-01
Image reconstruction in magnetic resonance imaging (MRI) clinical applications has become increasingly more complicated. However, diagnostic and treatment require very fast computational procedure. Modern competitive platforms of graphics processing unit (GPU) have been used to make high-performance parallel computations available, and attractive to common consumers for computing massively parallel reconstruction problems at commodity price. GPUs have also become more and more important for reconstruction computations, especially when deep learning starts to be applied into MRI reconstruction. The motivation of this survey is to review the image reconstruction schemes of GPU computing for MRI applications and provide a summary reference for researchers in MRI community.
NASA Astrophysics Data System (ADS)
Goedecker, Stefan; Boulet, Mireille; Deutsch, Thierry
2003-08-01
Three-dimensional Fast Fourier Transforms (FFTs) are the main computational task in plane wave electronic structure calculations. Obtaining a high performance on a large numbers of processors is non-trivial on the latest generation of parallel computers that consist of nodes made up of a shared memory multiprocessors. A non-dogmatic method for obtaining high performance for such 3-dim FFTs in a combined MPI/OpenMP programming paradigm will be presented. Exploiting the peculiarities of plane wave electronic structure calculations, speedups of up to 160 and speeds of up to 130 Gflops were obtained on 256 processors.
Parallel closure theory for toroidally confined plasmas
NASA Astrophysics Data System (ADS)
Ji, Jeong-Young; Held, Eric D.
2017-10-01
We solve a system of general moment equations to obtain parallel closures for electrons and ions in an axisymmetric toroidal magnetic field. Magnetic field gradient terms are kept and treated using the Fourier series method. Assuming lowest order density (pressure) and temperature to be flux labels, the parallel heat flow, friction, and viscosity are expressed in terms of radial gradients of the lowest-order temperature and pressure, parallel gradients of temperature and parallel flow, and the relative electron-ion parallel flow velocity. Convergence of closure quantities is demonstrated as the number of moments and Fourier modes are increased. Properties of the moment equations in the collisionless limit are also discussed. Combining closures with fluid equations parallel mass flow and electric current are also obtained. Work in collaboration with the PSI Center and supported by the U.S. DOE under Grant Nos. DE-SC0014033, DE-SC0016256, and DE-FG02-04ER54746.
Parallel digital forensics infrastructure.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liebrock, Lorie M.; Duggan, David Patrick
2009-10-01
This report documents the architecture and implementation of a Parallel Digital Forensics infrastructure. This infrastructure is necessary for supporting the design, implementation, and testing of new classes of parallel digital forensics tools. Digital Forensics has become extremely difficult with data sets of one terabyte and larger. The only way to overcome the processing time of these large sets is to identify and develop new parallel algorithms for performing the analysis. To support algorithm research, a flexible base infrastructure is required. A candidate architecture for this base infrastructure was designed, instantiated, and tested by this project, in collaboration with New Mexicomore » Tech. Previous infrastructures were not designed and built specifically for the development and testing of parallel algorithms. With the size of forensics data sets only expected to increase significantly, this type of infrastructure support is necessary for continued research in parallel digital forensics. This report documents the implementation of the parallel digital forensics (PDF) infrastructure architecture and implementation.« less
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification
NASA Astrophysics Data System (ADS)
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-12-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification.
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-12-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value.
A Parallel Adaboost-Backpropagation Neural Network for Massive Image Dataset Classification
Cao, Jianfang; Chen, Lichao; Wang, Min; Shi, Hao; Tian, Yun
2016-01-01
Image classification uses computers to simulate human understanding and cognition of images by automatically categorizing images. This study proposes a faster image classification approach that parallelizes the traditional Adaboost-Backpropagation (BP) neural network using the MapReduce parallel programming model. First, we construct a strong classifier by assembling the outputs of 15 BP neural networks (which are individually regarded as weak classifiers) based on the Adaboost algorithm. Second, we design Map and Reduce tasks for both the parallel Adaboost-BP neural network and the feature extraction algorithm. Finally, we establish an automated classification model by building a Hadoop cluster. We use the Pascal VOC2007 and Caltech256 datasets to train and test the classification model. The results are superior to those obtained using traditional Adaboost-BP neural network or parallel BP neural network approaches. Our approach increased the average classification accuracy rate by approximately 14.5% and 26.0% compared to the traditional Adaboost-BP neural network and parallel BP neural network, respectively. Furthermore, the proposed approach requires less computation time and scales very well as evaluated by speedup, sizeup and scaleup. The proposed approach may provide a foundation for automated large-scale image classification and demonstrates practical value. PMID:27905520
Streaming parallel GPU acceleration of large-scale filter-based spiking neural networks.
Slażyński, Leszek; Bohte, Sander
2012-01-01
The arrival of graphics processing (GPU) cards suitable for massively parallel computing promises affordable large-scale neural network simulation previously only available at supercomputing facilities. While the raw numbers suggest that GPUs may outperform CPUs by at least an order of magnitude, the challenge is to develop fine-grained parallel algorithms to fully exploit the particulars of GPUs. Computation in a neural network is inherently parallel and thus a natural match for GPU architectures: given inputs, the internal state for each neuron can be updated in parallel. We show that for filter-based spiking neurons, like the Spike Response Model, the additive nature of membrane potential dynamics enables additional update parallelism. This also reduces the accumulation of numerical errors when using single precision computation, the native precision of GPUs. We further show that optimizing simulation algorithms and data structures to the GPU's architecture has a large pay-off: for example, matching iterative neural updating to the memory architecture of the GPU speeds up this simulation step by a factor of three to five. With such optimizations, we can simulate in better-than-realtime plausible spiking neural networks of up to 50 000 neurons, processing over 35 million spiking events per second.
Sankaran, Ramanan; Angel, Jordan; Brown, W. Michael
2015-04-08
The growth in size of networked high performance computers along with novel accelerator-based node architectures has further emphasized the importance of communication efficiency in high performance computing. The world's largest high performance computers are usually operated as shared user facilities due to the costs of acquisition and operation. Applications are scheduled for execution in a shared environment and are placed on nodes that are not necessarily contiguous on the interconnect. Furthermore, the placement of tasks on the nodes allocated by the scheduler is sub-optimal, leading to performance loss and variability. Here, we investigate the impact of task placement on themore » performance of two massively parallel application codes on the Titan supercomputer, a turbulent combustion flow solver (S3D) and a molecular dynamics code (LAMMPS). Benchmark studies show a significant deviation from ideal weak scaling and variability in performance. The inter-task communication distance was determined to be one of the significant contributors to the performance degradation and variability. A genetic algorithm-based parallel optimization technique was used to optimize the task ordering. This technique provides an improved placement of the tasks on the nodes, taking into account the application's communication topology and the system interconnect topology. As a result, application benchmarks after task reordering through genetic algorithm show a significant improvement in performance and reduction in variability, therefore enabling the applications to achieve better time to solution and scalability on Titan during production.« less
NASA Astrophysics Data System (ADS)
Stellmach, Stephan; Hansen, Ulrich
2008-05-01
Numerical simulations of the process of convection and magnetic field generation in planetary cores still fail to reach geophysically realistic control parameter values. Future progress in this field depends crucially on efficient numerical algorithms which are able to take advantage of the newest generation of parallel computers. Desirable features of simulation algorithms include (1) spectral accuracy, (2) an operation count per time step that is small and roughly proportional to the number of grid points, (3) memory requirements that scale linear with resolution, (4) an implicit treatment of all linear terms including the Coriolis force, (5) the ability to treat all kinds of common boundary conditions, and (6) reasonable efficiency on massively parallel machines with tens of thousands of processors. So far, algorithms for fully self-consistent dynamo simulations in spherical shells do not achieve all these criteria simultaneously, resulting in strong restrictions on the possible resolutions. In this paper, we demonstrate that local dynamo models in which the process of convection and magnetic field generation is only simulated for a small part of a planetary core in Cartesian geometry can achieve the above goal. We propose an algorithm that fulfills the first five of the above criteria and demonstrate that a model implementation of our method on an IBM Blue Gene/L system scales impressively well for up to O(104) processors. This allows for numerical simulations at rather extreme parameter values.
Large Scale Document Inversion using a Multi-threaded Computing System
Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won
2018-01-01
Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. CCS Concepts •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations. PMID:29861701
Novel Highly Parallel and Systolic Architectures Using Quantum Dot-Based Hardware
NASA Technical Reports Server (NTRS)
Fijany, Amir; Toomarian, Benny N.; Spotnitz, Matthew
1997-01-01
VLSI technology has made possible the integration of massive number of components (processors, memory, etc.) into a single chip. In VLSI design, memory and processing power are relatively cheap and the main emphasis of the design is on reducing the overall interconnection complexity since data routing costs dominate the power, time, and area required to implement a computation. Communication is costly because wires occupy the most space on a circuit and it can also degrade clock time. In fact, much of the complexity (and hence the cost) of VLSI design results from minimization of data routing. The main difficulty in VLSI routing is due to the fact that crossing of the lines carrying data, instruction, control, etc. is not possible in a plane. Thus, in order to meet this constraint, the VLSI design aims at keeping the architecture highly regular with local and short interconnection. As a result, while the high level of integration has opened the way for massively parallel computation, practical and full exploitation of such a capability in many applications of interest has been hindered by the constraints on interconnection pattern. More precisely. the use of only localized communication significantly simplifies the design of interconnection architecture but at the expense of somewhat restricted class of applications. For example, there are currently commercially available products integrating; hundreds of simple processor elements within a single chip. However, the lack of adequate interconnection pattern among these processing elements make them inefficient for exploiting a large degree of parallelism in many applications.
CHOLLA: A New Massively Parallel Hydrodynamics Code for Astrophysical Simulation
NASA Astrophysics Data System (ADS)
Schneider, Evan E.; Robertson, Brant E.
2015-04-01
We present Computational Hydrodynamics On ParaLLel Architectures (Cholla ), a new three-dimensional hydrodynamics code that harnesses the power of graphics processing units (GPUs) to accelerate astrophysical simulations. Cholla models the Euler equations on a static mesh using state-of-the-art techniques, including the unsplit Corner Transport Upwind algorithm, a variety of exact and approximate Riemann solvers, and multiple spatial reconstruction techniques including the piecewise parabolic method (PPM). Using GPUs, Cholla evolves the fluid properties of thousands of cells simultaneously and can update over 10 million cells per GPU-second while using an exact Riemann solver and PPM reconstruction. Owing to the massively parallel architecture of GPUs and the design of the Cholla code, astrophysical simulations with physically interesting grid resolutions (≳2563) can easily be computed on a single device. We use the Message Passing Interface library to extend calculations onto multiple devices and demonstrate nearly ideal scaling beyond 64 GPUs. A suite of test problems highlights the physical accuracy of our modeling and provides a useful comparison to other codes. We then use Cholla to simulate the interaction of a shock wave with a gas cloud in the interstellar medium, showing that the evolution of the cloud is highly dependent on its density structure. We reconcile the computed mixing time of a turbulent cloud with a realistic density distribution destroyed by a strong shock with the existing analytic theory for spherical cloud destruction by describing the system in terms of its median gas density.
NASA Astrophysics Data System (ADS)
Li, J.; Zhang, T.; Huang, Q.; Liu, Q.
2014-12-01
Today's climate datasets are featured with large volume, high degree of spatiotemporal complexity and evolving fast overtime. As visualizing large volume distributed climate datasets is computationally intensive, traditional desktop based visualization applications fail to handle the computational intensity. Recently, scientists have developed remote visualization techniques to address the computational issue. Remote visualization techniques usually leverage server-side parallel computing capabilities to perform visualization tasks and deliver visualization results to clients through network. In this research, we aim to build a remote parallel visualization platform for visualizing and analyzing massive climate data. Our visualization platform was built based on Paraview, which is one of the most popular open source remote visualization and analysis applications. To further enhance the scalability and stability of the platform, we have employed cloud computing techniques to support the deployment of the platform. In this platform, all climate datasets are regular grid data which are stored in NetCDF format. Three types of data access methods are supported in the platform: accessing remote datasets provided by OpenDAP servers, accessing datasets hosted on the web visualization server and accessing local datasets. Despite different data access methods, all visualization tasks are completed at the server side to reduce the workload of clients. As a proof of concept, we have implemented a set of scientific visualization methods to show the feasibility of the platform. Preliminary results indicate that the framework can address the computation limitation of desktop based visualization applications.
Massively Parallel Processing for Fast and Accurate Stamping Simulations
NASA Astrophysics Data System (ADS)
Gress, Jeffrey J.; Xu, Siguang; Joshi, Ramesh; Wang, Chuan-tao; Paul, Sabu
2005-08-01
The competitive automotive market drives automotive manufacturers to speed up the vehicle development cycles and reduce the lead-time. Fast tooling development is one of the key areas to support fast and short vehicle development programs (VDP). In the past ten years, the stamping simulation has become the most effective validation tool in predicting and resolving all potential formability and quality problems before the dies are physically made. The stamping simulation and formability analysis has become an critical business segment in GM math-based die engineering process. As the simulation becomes as one of the major production tools in engineering factory, the simulation speed and accuracy are the two of the most important measures for stamping simulation technology. The speed and time-in-system of forming analysis becomes an even more critical to support the fast VDP and tooling readiness. Since 1997, General Motors Die Center has been working jointly with our software vendor to develop and implement a parallel version of simulation software for mass production analysis applications. By 2001, this technology was matured in the form of distributed memory processing (DMP) of draw die simulations in a networked distributed memory computing environment. In 2004, this technology was refined to massively parallel processing (MPP) and extended to line die forming analysis (draw, trim, flange, and associated spring-back) running on a dedicated computing environment. The evolution of this technology and the insight gained through the implementation of DM0P/MPP technology as well as performance benchmarks are discussed in this publication.
Efficiently modeling neural networks on massively parallel computers
NASA Technical Reports Server (NTRS)
Farber, Robert M.
1993-01-01
Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.
Large Scale Document Inversion using a Multi-threaded Computing System.
Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won
2017-06-01
Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.
Mobile collaborative medical display system.
Park, Sanghun; Kim, Wontae; Ihm, Insung
2008-03-01
Because of recent advances in wireless communication technologies, the world of mobile computing is flourishing with a variety of applications. In this study, we present an integrated architecture for a personal digital assistant (PDA)-based mobile medical display system that supports collaborative work between remote users. We aim to develop a system that enables users in different regions to share a working environment for collaborative visualization with the potential for exploring huge medical datasets. Our system consists of three major components: mobile client, gateway, and parallel rendering server. The mobile client serves as a front end and enables users to choose the visualization and control parameters interactively and cooperatively. The gateway handles requests and responses between mobile clients and the rendering server for efficient communication. Through the gateway, it is possible to share working environments between users, allowing them to work together in computer supported cooperative work (CSCW) mode. Finally, the parallel rendering server is responsible for performing heavy visualization tasks. Our experience indicates that some features currently available to our mobile clients for collaborative scientific visualization are limited due to the poor performance of mobile devices and the low bandwidth of wireless connections. However, as mobile devices and wireless network systems are experiencing considerable elevation in their capabilities, we believe that our methodology will be utilized effectively in building quite responsive, useful mobile collaborative medical systems in the very near future.