parallel vector access: Topics by Science.gov

Sample records for parallel vector access

Vectorization and parallelization of the finite strip method for dynamic Mindlin plate problems

NASA Technical Reports Server (NTRS)

Chen, Hsin-Chu; He, Ai-Fang

1993-01-01

The finite strip method is a semi-analytical finite element process which allows for a discrete analysis of certain types of physical problems by discretizing the domain of the problem into finite strips. This method decomposes a single large problem into m smaller independent subproblems when m harmonic functions are employed, thus yielding natural parallelism at a very high level. In this paper we address vectorization and parallelization strategies for the dynamic analysis of simply-supported Mindlin plate bending problems and show how to prevent potential conflicts in memory access during the assemblage process. The vector and parallel implementations of this method and the performance results of a test problem under scalar, vector, and vector-concurrent execution modes on the Alliant FX/80 are also presented.
Three-dimensional magnetic bubble memory system

NASA Technical Reports Server (NTRS)

Stadler, Henry L. (Inventor); Katti, Romney R. (Inventor); Wu, Jiin-Chuan (Inventor)

1994-01-01

A compact memory uses magnetic bubble technology for providing data storage. A three-dimensional arrangement, in the form of stacks of magnetic bubble layers, is used to achieve high volumetric storage density. Output tracks are used within each layer to allow data to be accessed uniquely and unambiguously. Storage can be achieved using either current access or field access magnetic bubble technology. Optical sensing via the Faraday effect is used to detect data. Optical sensing facilitates the accessing of data from within the three-dimensional package and lends itself to parallel operation for supporting high data rates and vector and parallel processing.
Transferring ecosystem simulation codes to supercomputers

NASA Technical Reports Server (NTRS)

Skiles, J. W.; Schulbach, C. H.

1995-01-01

Many ecosystem simulation computer codes have been developed in the last twenty-five years. This development took place initially on main-frame computers, then mini-computers, and more recently, on micro-computers and workstations. Supercomputing platforms (both parallel and distributed systems) have been largely unused, however, because of the perceived difficulty in accessing and using the machines. Also, significant differences in the system architectures of sequential, scalar computers and parallel and/or vector supercomputers must be considered. We have transferred a grassland simulation model (developed on a VAX) to a Cray Y-MP/C90. We describe porting the model to the Cray and the changes we made to exploit the parallelism in the application and improve code execution. The Cray executed the model 30 times faster than the VAX and 10 times faster than a Unix workstation. We achieved an additional speedup of 30 percent by using the compiler's vectoring and 'in-line' capabilities. The code runs at only about 5 percent of the Cray's peak speed because it ineffectively uses the vector and parallel processing capabilities of the Cray. We expect that by restructuring the code, it could execute an additional six to ten times faster.
A Parallel Vector Machine for the PM Programming Language

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2016-04-01

PM is a new programming language which aims to make the writing of computational geoscience models on parallel hardware accessible to scientists who are not themselves expert parallel programmers. It is based around the concept of communicating operators: language constructs that enable variables local to a single invocation of a parallelised loop to be viewed as if they were arrays spanning the entire loop domain. This mechanism enables different loop invocations (which may or may not be executing on different processors) to exchange information in a manner that extends the successful Communicating Sequential Processes idiom from single messages to collective communication. Communicating operators avoid the additional synchronisation mechanisms, such as atomic variables, required when programming using the Partitioned Global Address Space (PGAS) paradigm. Using a single loop invocation as the fundamental unit of concurrency enables PM to uniformly represent different levels of parallelism from vector operations through shared memory systems to distributed grids. This paper describes an implementation of PM based on a vectorised virtual machine. On a single processor node, concurrent operations are implemented using masked vector operations. Virtual machine instructions operate on vectors of values and may be unmasked, masked using a Boolean field, or masked using an array of active vector cell locations. Conditional structures (such as if-then-else or while statement implementations) calculate and apply masks to the operations they control. A shift in mask representation from Boolean to location-list occurs when active locations become sufficiently sparse. Parallel loops unfold data structures (or vectors of data structures for nested loops) into vectors of values that may additionally be distributed over multiple computational nodes and then split into micro-threads compatible with the size of the local cache. Inter-node communication is accomplished using standard OpenMP and MPI. Performance analyses of the PM vector machine, demonstrating its scaling properties with respect to domain size and the number of processor nodes will be presented for a range of hardware configurations. The PM software and language definition are being made available under unrestrictive MIT and Creative Commons Attribution licenses respectively: www.pm-lang.org.
First experience of vectorizing electromagnetic physics models for detector simulation

NASA Astrophysics Data System (ADS)

Amadio, G.; Apostolakis, J.; Bandieramonte, M.; Bianchini, C.; Bitzes, G.; Brun, R.; Canal, P.; Carminati, F.; de Fine Licht, J.; Duhem, L.; Elvira, D.; Gheata, A.; Jun, S. Y.; Lima, G.; Novak, M.; Presbyterian, M.; Shadura, O.; Seghal, R.; Wenzel, S.

2015-12-01

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. The GeantV vector prototype for detector simulations has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth, parallelization needed to achieve optimal performance or memory access latency and speed. An additional challenge is to avoid the code duplication often inherent to supporting heterogeneous platforms. In this paper we present the first experience of vectorizing electromagnetic physics models developed for the GeantV project.
A Data Type for Efficient Representation of Other Data Types

NASA Technical Reports Server (NTRS)

James, Mark

2008-01-01

A self-organizing, monomorphic data type denoted a sequence has been conceived to address certain concerns that arise in programming parallel computers. A sequence in the present sense can be regarded abstractly as a vector, set, bag, queue, or other construct. Heretofore, in programming a parallel computer, it has been necessary for the programmer to state explicitly, at the outset, what parts of the program and the underlying data structures must be represented in parallel form. Not only is this requirement not optimal from the perspective of implementation; it entails an additional requirement that the programmer have intimate understanding of the underlying parallel structure. The present sequence data type overcomes both the implementation and parallel structure obstacles. In so doing, the sequence data type provides unified means by which the programmer can represent a data structure for natural and automatic decomposition to a parallel computing architecture. Sequences exhibit the behavioral and structural characteristics of vectors, but the underlying representations are automatically synthesized from combinations of programmers advice and execution use metrics. Sequences can vary bidirectionally between sparseness and density, making them excellent choices for many kinds of algorithms. The novelty and benefit of this behavior lies in the fact that it can relieve programmers of the details of implementations. The creation of a sequence enables decoupling of a conceptual representation from an implementation. The underlying representation of a sequence is a hybrid of representations composed of vectors, linked lists, connected blocks, and hash tables. The internal structure of a sequence can automatically change from time to time on the basis of how it is being used. Those portions of a sequence where elements have not been added or removed can be as efficient as vectors. As elements are inserted and removed in a given portion, then different methods are utilized to provide both an access and memory strategy that is optimized for that portion and the use to which it is put.
Exploiting Vector and Multicore Parallelsim for Recursive, Data- and Task-Parallel Programs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ren, Bin; Krishnamoorthy, Sriram; Agrawal, Kunal

Modern hardware contains parallel execution resources that are well-suited for data-parallelism-vector units-and task parallelism-multicores. However, most work on parallel scheduling focuses on one type of hardware or the other. In this work, we present a scheduling framework that allows for a unified treatment of task- and data-parallelism. Our key insight is an abstraction, task blocks, that uniformly handles data-parallel iterations and task-parallel tasks, allowing them to be scheduled on vector units or executed independently as multicores. Our framework allows us to define schedulers that can dynamically select between executing task- blocks on vector units or multicores. We show that thesemore » schedulers are asymptotically optimal, and deliver the maximum amount of parallelism available in computation trees. To evaluate our schedulers, we develop program transformations that can convert mixed data- and task-parallel pro- grams into task block-based programs. Using a prototype instantiation of our scheduling framework, we show that, on an 8-core system, we can simultaneously exploit vector and multicore parallelism to achieve 14×-108× speedup over sequential baselines.« less
Parallel-vector computation for linear structural analysis and non-linear unconstrained optimization problems

NASA Technical Reports Server (NTRS)

Nguyen, D. T.; Al-Nasra, M.; Zhang, Y.; Baddourah, M. A.; Agarwal, T. K.; Storaasli, O. O.; Carmona, E. A.

1991-01-01

Several parallel-vector computational improvements to the unconstrained optimization procedure are described which speed up the structural analysis-synthesis process. A fast parallel-vector Choleski-based equation solver, pvsolve, is incorporated into the well-known SAP-4 general-purpose finite-element code. The new code, denoted PV-SAP, is tested for static structural analysis. Initial results on a four processor CRAY 2 show that using pvsolve reduces the equation solution time by a factor of 14-16 over the original SAP-4 code. In addition, parallel-vector procedures for the Golden Block Search technique and the BFGS method are developed and tested for nonlinear unconstrained optimization. A parallel version of an iterative solver and the pvsolve direct solver are incorporated into the BFGS method. Preliminary results on nonlinear unconstrained optimization test problems, using pvsolve in the analysis, show excellent parallel-vector performance indicating that these parallel-vector algorithms can be used in a new generation of finite-element based structural design/analysis-synthesis codes.
High-performance computing — an overview

NASA Astrophysics Data System (ADS)

Marksteiner, Peter

1996-08-01

An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
A new parallel-vector finite element analysis software on distributed-memory computers

NASA Technical Reports Server (NTRS)

Qin, Jiangning; Nguyen, Duc T.

1993-01-01

A new parallel-vector finite element analysis software package MPFEA (Massively Parallel-vector Finite Element Analysis) is developed for large-scale structural analysis on massively parallel computers with distributed-memory. MPFEA is designed for parallel generation and assembly of the global finite element stiffness matrices as well as parallel solution of the simultaneous linear equations, since these are often the major time-consuming parts of a finite element analysis. Block-skyline storage scheme along with vector-unrolling techniques are used to enhance the vector performance. Communications among processors are carried out concurrently with arithmetic operations to reduce the total execution time. Numerical results on the Intel iPSC/860 computers (such as the Intel Gamma with 128 processors and the Intel Touchstone Delta with 512 processors) are presented, including an aircraft structure and some very large truss structures, to demonstrate the efficiency and accuracy of MPFEA.
Hypergraph partitioning implementation for parallelizing matrix-vector multiplication using CUDA GPU-based parallel computing

NASA Astrophysics Data System (ADS)

Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.

2017-07-01

Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Automatic recognition of vector and parallel operations in a higher level language

NASA Technical Reports Server (NTRS)

Schneck, P. B.

1971-01-01

A compiler for recognizing statements of a FORTRAN program which are suited for fast execution on a parallel or pipeline machine such as Illiac-4, Star or ASC is described. The technique employs interval analysis to provide flow information to the vector/parallel recognizer. Where profitable the compiler changes scalar variables to subscripted variables. The output of the compiler is an extension to FORTRAN which shows parallel and vector operations explicitly.
A Parallel Framework with Block Matrices of a Discrete Fourier Transform for Vector-Valued Discrete-Time Signals.

PubMed

Soto-Quiros, Pablo

2015-01-01

This paper presents a parallel implementation of a kind of discrete Fourier transform (DFT): the vector-valued DFT. The vector-valued DFT is a novel tool to analyze the spectra of vector-valued discrete-time signals. This parallel implementation is developed in terms of a mathematical framework with a set of block matrix operations. These block matrix operations contribute to analysis, design, and implementation of parallel algorithms in multicore processors. In this work, an implementation and experimental investigation of the mathematical framework are performed using MATLAB with the Parallel Computing Toolbox. We found that there is advantage to use multicore processors and a parallel computing environment to minimize the high execution time. Additionally, speedup increases when the number of logical processors and length of the signal increase.
Different Relative Orientation of Static and Alternative Magnetic Fields and Cress Roots Direction of Growth Changes Their Gravitropic Reaction

NASA Astrophysics Data System (ADS)

Sheykina, Nadiia; Bogatina, Nina

The following variants of roots location relatively to static and alternative components of magnetic field were studied. At first variant the static magnetic field was directed parallel to the gravitation vector, the alternative magnetic field was directed perpendicular to static one; roots were directed perpendicular to both two fields’ components and gravitation vector. At the variant the negative gravitropysm for cress roots was observed. At second variant the static magnetic field was directed parallel to the gravitation vector, the alternative magnetic field was directed perpendicular to static one; roots were directed parallel to alternative magnetic field. At third variant the alternative magnetic field was directed parallel to the gravitation vector, the static magnetic field was directed perpendicular to the gravitation vector, roots were directed perpendicular to both two fields components and gravitation vector; At forth variant the alternative magnetic field was directed parallel to the gravitation vector, the static magnetic field was directed perpendicular to the gravitation vector, roots were directed parallel to static magnetic field. In all cases studied the alternative magnetic field frequency was equal to Ca ions cyclotron frequency. In 2, 3 and 4 variants gravitropism was positive. But the gravitropic reaction speeds were different. In second and forth variants the gravitropic reaction speed in error limits coincided with the gravitropic reaction speed under Earth’s conditions. At third variant the gravitropic reaction speed was slowed essentially.
The nondeterministic divide

NASA Technical Reports Server (NTRS)

Charlesworth, Arthur

1990-01-01

The nondeterministic divide partitions a vector into two non-empty slices by allowing the point of division to be chosen nondeterministically. Support for high-level divide-and-conquer programming provided by the nondeterministic divide is investigated. A diva algorithm is a recursive divide-and-conquer sequential algorithm on one or more vectors of the same range, whose division point for a new pair of recursive calls is chosen nondeterministically before any computation is performed and whose recursive calls are made immediately after the choice of division point; also, access to vector components is only permitted during activations in which the vector parameters have unit length. The notion of diva algorithm is formulated precisely as a diva call, a restricted call on a sequential procedure. Diva calls are proven to be intimately related to associativity. Numerous applications of diva calls are given and strategies are described for translating a diva call into code for a variety of parallel computers. Thus diva algorithms separate logical correctness concerns from implementation concerns.
Vectorization for Molecular Dynamics on Intel Xeon Phi Corpocessors

NASA Astrophysics Data System (ADS)

Yi, Hongsuk

2014-03-01

Many modern processors are capable of exploiting data-level parallelism through the use of single instruction multiple data (SIMD) execution. The new Intel Xeon Phi coprocessor supports 512 bit vector registers for the high performance computing. In this paper, we have developed a hierarchical parallelization scheme for accelerated molecular dynamics simulations with the Terfoff potentials for covalent bond solid crystals on Intel Xeon Phi coprocessor systems. The scheme exploits multi-level parallelism computing. We combine thread-level parallelism using a tightly coupled thread-level and task-level parallelism with 512-bit vector register. The simulation results show that the parallel performance of SIMD implementations on Xeon Phi is apparently superior to their x86 CPU architecture.
Efficient multitasking of Choleski matrix factorization on CRAY supercomputers

NASA Technical Reports Server (NTRS)

Overman, Andrea L.; Poole, Eugene L.

1991-01-01

A Choleski method is described and used to solve linear systems of equations that arise in large scale structural analysis. The method uses a novel variable-band storage scheme and is structured to exploit fast local memory caches while minimizing data access delays between main memory and vector registers. Several parallel implementations of this method are described for the CRAY-2 and CRAY Y-MP computers demonstrating the use of microtasking and autotasking directives. A portable parallel language, FORCE, is used for comparison with the microtasked and autotasked implementations. Results are presented comparing the matrix factorization times for three representative structural analysis problems from runs made in both dedicated and multi-user modes on both computers. CPU and wall clock timings are given for the parallel implementations and are compared to single processor timings of the same algorithm.
Light scattering of rectangular slot antennas: parallel magnetic vector vs perpendicular electric vector

NASA Astrophysics Data System (ADS)

Lee, Dukhyung; Kim, Dai-Sik

2016-01-01

We study light scattering off rectangular slot nano antennas on a metal film varying incident polarization and incident angle, to examine which field vector of light is more important: electric vector perpendicular to, versus magnetic vector parallel to the long axis of the rectangle. While vector Babinet’s principle would prefer magnetic field along the long axis for optimizing slot antenna function, convention and intuition most often refer to the electric field perpendicular to it. Here, we demonstrate experimentally that in accordance with vector Babinet’s principle, the incident magnetic vector parallel to the long axis is the dominant component, with the perpendicular incident electric field making a small contribution of the factor of 1/|ε|, the reciprocal of the absolute value of the dielectric constant of the metal, owing to the non-perfectness of metals at optical frequencies.
Parallelization of Lower-Upper Symmetric Gauss-Seidel Method for Chemically Reacting Flow

NASA Technical Reports Server (NTRS)

Yoon, Seokkwan; Jost, Gabriele; Chang, Sherry

2005-01-01

Development of technologies for exploration of the solar system has revived an interest in computational simulation of chemically reacting flows since planetary probe vehicles exhibit non-equilibrium phenomena during the atmospheric entry of a planet or a moon as well as the reentry to the Earth. Stability in combustion is essential for new propulsion systems. Numerical solution of real-gas flows often increases computational work by an order-of-magnitude compared to perfect gas flow partly because of the increased complexity of equations to solve. Recently, as part of Project Columbia, NASA has integrated a cluster of interconnected SGI Altix systems to provide a ten-fold increase in current supercomputing capacity that includes an SGI Origin system. Both the new and existing machines are based on cache coherent non-uniform memory access architecture. Lower-Upper Symmetric Gauss-Seidel (LU-SGS) relaxation method has been implemented into both perfect and real gas flow codes including Real-Gas Aerodynamic Simulator (RGAS). However, the vectorized RGAS code runs inefficiently on cache-based shared-memory machines such as SGI system. Parallelization of a Gauss-Seidel method is nontrivial due to its sequential nature. The LU-SGS method has been vectorized on an oblique plane in INS3D-LU code that has been one of the base codes for NAS Parallel benchmarks. The oblique plane has been called a hyperplane by computer scientists. It is straightforward to parallelize a Gauss-Seidel method by partitioning the hyperplanes once they are formed. Another way of parallelization is to schedule processors like a pipeline using software. Both hyperplane and pipeline methods have been implemented using openMP directives. The present paper reports the performance of the parallelized RGAS code on SGI Origin and Altix systems.
MIST: An Open Source Environmental Modelling Programming Language Incorporating Easy to Use Data Parallelism.

NASA Astrophysics Data System (ADS)

Bellerby, Tim

2014-05-01

Model Integration System (MIST) is open-source environmental modelling programming language that directly incorporates data parallelism. The language is designed to enable straightforward programming structures, such as nested loops and conditional statements to be directly translated into sequences of whole-array (or more generally whole data-structure) operations. MIST thus enables the programmer to use well-understood constructs, directly relating to the mathematical structure of the model, without having to explicitly vectorize code or worry about details of parallelization. A range of common modelling operations are supported by dedicated language structures operating on cell neighbourhoods rather than individual cells (e.g.: the 3x3 local neighbourhood needed to implement an averaging image filter can be simply accessed from within a simple loop traversing all image pixels). This facility hides details of inter-process communication behind more mathematically relevant descriptions of model dynamics. The MIST automatic vectorization/parallelization process serves both to distribute work among available nodes and separately to control storage requirements for intermediate expressions - enabling operations on very large domains for which memory availability may be an issue. MIST is designed to facilitate efficient interpreter based implementations. A prototype open source interpreter is available, coded in standard FORTRAN 95, with tools to rapidly integrate existing FORTRAN 77 or 95 code libraries. The language is formally specified and thus not limited to FORTRAN implementation or to an interpreter-based approach. A MIST to FORTRAN compiler is under development and volunteers are sought to create an ANSI-C implementation. Parallel processing is currently implemented using OpenMP. However, parallelization code is fully modularised and could be replaced with implementations using other libraries. GPU implementation is potentially possible.

Global Magnetohydrodynamic Simulation Using High Performance FORTRAN on Parallel Computers

NASA Astrophysics Data System (ADS)

Ogino, T.

High Performance Fortran (HPF) is one of modern and common techniques to achieve high performance parallel computation. We have translated a 3-dimensional magnetohydrodynamic (MHD) simulation code of the Earth's magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A 3-dimensional global MHD simulation of the earth's magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5 VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran.
Parallel-vector out-of-core equation solver for computational mechanics

NASA Technical Reports Server (NTRS)

Qin, J.; Agarwal, T. K.; Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.

1993-01-01

A parallel/vector out-of-core equation solver is developed for shared-memory computers, such as the Cray Y-MP machine. The input/ output (I/O) time is reduced by using the a synchronous BUFFER IN and BUFFER OUT, which can be executed simultaneously with the CPU instructions. The parallel and vector capability provided by the supercomputers is also exploited to enhance the performance. Numerical applications in large-scale structural analysis are given to demonstrate the efficiency of the present out-of-core solver.
GPU-accelerated adjoint algorithmic differentiation

NASA Astrophysics Data System (ADS)

Gremse, Felix; Höfter, Andreas; Razik, Lukas; Kiessling, Fabian; Naumann, Uwe

2016-03-01

Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the ;tape;. Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography.
GPU-Accelerated Adjoint Algorithmic Differentiation.

PubMed

Gremse, Felix; Höfter, Andreas; Razik, Lukas; Kiessling, Fabian; Naumann, Uwe

2016-03-01

Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the "tape". Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography.
GPU-Accelerated Adjoint Algorithmic Differentiation

PubMed Central

Gremse, Felix; Höfter, Andreas; Razik, Lukas; Kiessling, Fabian; Naumann, Uwe

2015-01-01

Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the “tape”. Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography. PMID:26941443
Parallel-vector solution of large-scale structural analysis problems on supercomputers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.

1989-01-01

A direct linear equation solution method based on the Choleski factorization procedure is presented which exploits both parallel and vector features of supercomputers. The new equation solver is described, and its performance is evaluated by solving structural analysis problems on three high-performance computers. The method has been implemented using Force, a generic parallel FORTRAN language.
Global MHD simulation of magnetosphere using HPF

NASA Astrophysics Data System (ADS)

Ogino, T.

We have translated a 3-dimensional magnetohydrodynamic (MHD) simulation code of the Earth's magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A 3-dimensional global MHD simulation of the earth's magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5% using 56 PEs of Fujitsu VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran.
Subatomic-scale force vector mapping above a Ge(001) dimer using bimodal atomic force microscopy

NASA Astrophysics Data System (ADS)

Naitoh, Yoshitaka; Turanský, Robert; Brndiar, Ján; Li, Yan Jun; Štich, Ivan; Sugawara, Yasuhiro

2017-07-01

Probing physical quantities on the nanoscale that have directionality, such as magnetic moments, electric dipoles, or the force response of a surface, is essential for characterizing functionalized materials for nanotechnological device applications. Currently, such physical quantities are usually experimentally obtained as scalars. To investigate the physical properties of a surface on the nanoscale in depth, these properties must be measured as vectors. Here we demonstrate a three-force-component detection method, based on multi-frequency atomic force microscopy on the subatomic scale and apply it to a Ge(001)-c(4 × 2) surface. We probed the surface-normal and surface-parallel force components above the surface and their direction-dependent anisotropy and expressed them as a three-dimensional force vector distribution. Access to the atomic-scale force distribution on the surface will enable better understanding of nanoscale surface morphologies, chemical composition and reactions, probing nanostructures via atomic or molecular manipulation, and provide insights into the behaviour of nano-machines on substrates.
Lanczos eigensolution method for high-performance computers

NASA Technical Reports Server (NTRS)

Bostic, Susan W.

1991-01-01

The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors.
VISPA2: a scalable pipeline for high-throughput identification and annotation of vector integration sites.

PubMed

Spinozzi, Giulio; Calabria, Andrea; Brasca, Stefano; Beretta, Stefano; Merelli, Ivan; Milanesi, Luciano; Montini, Eugenio

2017-11-25

Bioinformatics tools designed to identify lentiviral or retroviral vector insertion sites in the genome of host cells are used to address the safety and long-term efficacy of hematopoietic stem cell gene therapy applications and to study the clonal dynamics of hematopoietic reconstitution. The increasing number of gene therapy clinical trials combined with the increasing amount of Next Generation Sequencing data, aimed at identifying integration sites, require both highly accurate and efficient computational software able to correctly process "big data" in a reasonable computational time. Here we present VISPA2 (Vector Integration Site Parallel Analysis, version 2), the latest optimized computational pipeline for integration site identification and analysis with the following features: (1) the sequence analysis for the integration site processing is fully compliant with paired-end reads and includes a sequence quality filter before and after the alignment on the target genome; (2) an heuristic algorithm to reduce false positive integration sites at nucleotide level to reduce the impact of Polymerase Chain Reaction or trimming/alignment artifacts; (3) a classification and annotation module for integration sites; (4) a user friendly web interface as researcher front-end to perform integration site analyses without computational skills; (5) the time speedup of all steps through parallelization (Hadoop free). We tested VISPA2 performances using simulated and real datasets of lentiviral vector integration sites, previously obtained from patients enrolled in a hematopoietic stem cell gene therapy clinical trial and compared the results with other preexisting tools for integration site analysis. On the computational side, VISPA2 showed a > 6-fold speedup and improved precision and recall metrics (1 and 0.97 respectively) compared to previously developed computational pipelines. These performances indicate that VISPA2 is a fast, reliable and user-friendly tool for integration site analysis, which allows gene therapy integration data to be handled in a cost and time effective fashion. Moreover, the web access of VISPA2 ( http://openserver.itb.cnr.it/vispa/ ) ensures accessibility and ease of usage to researches of a complex analytical tool. We released the source code of VISPA2 in a public repository ( https://bitbucket.org/andreacalabria/vispa2 ).
Vectorized and multitasked solution of the few-group neutron diffusion equations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zee, S.K.; Turinsky, P.J.; Shayer, Z.

1989-03-01

A numerical algorithm with parallelism was used to solve the two-group, multidimensional neutron diffusion equations on computers characterized by shared memory, vector pipeline, and multi-CPU architecture features. Specifically, solutions were obtained on the Cray X/MP-48, the IBM-3090 with vector facilities, and the FPS-164. The material-centered mesh finite difference method approximation and outer-inner iteration method were employed. Parallelism was introduced in the inner iterations using the cyclic line successive overrelaxation iterative method and solving in parallel across lines. The outer iterations were completed using the Chebyshev semi-iterative method that allows parallelism to be introduced in both space and energy groups. Formore » the three-dimensional model, power, soluble boron, and transient fission product feedbacks were included. Concentrating on the pressurized water reactor (PWR), the thermal-hydraulic calculation of moderator density assumed single-phase flow and a closed flow channel, allowing parallelism to be introduced in the solution across the radial plane. Using a pinwise detail, quarter-core model of a typical PWR in cycle 1, for the two-dimensional model without feedback the measured million floating point operations per second (MFLOPS)/vector speedups were 83/11.7. 18/2.2, and 2.4/5.6 on the Cray, IBM, and FPS without multitasking, respectively. Lower performance was observed with a coarser mesh, i.e., shorter vector length, due to vector pipeline start-up. For an 18 x 18 x 30 (x-y-z) three-dimensional model with feedback of the same core, MFLOPS/vector speedups of --61/6.7 and an execution time of 0.8 CPU seconds on the Cray without multitasking were measured. Finally, using two CPUs and the vector pipelines of the Cray, a multitasking efficiency of 81% was noted for the three-dimensional model.« less
An efficient parallel algorithm for matrix-vector multiplication

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hendrickson, B.; Leland, R.; Plimpton, S.

The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
Scaling Optimization of the SIESTA MHD Code

NASA Astrophysics Data System (ADS)

Seal, Sudip; Hirshman, Steven; Perumalla, Kalyan

2013-10-01

SIESTA is a parallel three-dimensional plasma equilibrium code capable of resolving magnetic islands at high spatial resolutions for toroidal plasmas. Originally designed to exploit small-scale parallelism, SIESTA has now been scaled to execute efficiently over several thousands of processors P. This scaling improvement was accomplished with minimal intrusion to the execution flow of the original version. First, the efficiency of the iterative solutions was improved by integrating the parallel tridiagonal block solver code BCYCLIC. Krylov-space generation in GMRES was then accelerated using a customized parallel matrix-vector multiplication algorithm. Novel parallel Hessian generation algorithms were integrated and memory access latencies were dramatically reduced through loop nest optimizations and data layout rearrangement. These optimizations sped up equilibria calculations by factors of 30-50. It is possible to compute solutions with granularity N/P near unity on extremely fine radial meshes (N > 1024 points). Grid separation in SIESTA, which manifests itself primarily in the resonant components of the pressure far from rational surfaces, is strongly suppressed by finer meshes. Large problem sizes of up to 300 K simultaneous non-linear coupled equations have been solved on the NERSC supercomputers. Work supported by U.S. DOE under Contract DE-AC05-00OR22725 with UT-Battelle, LLC.
Efficient Machine Learning Approach for Optimizing Scientific Computing Applications on Emerging HPC Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arumugam, Kamesh

Efficient parallel implementations of scientific applications on multi-core CPUs with accelerators such as GPUs and Xeon Phis is challenging. This requires - exploiting the data parallel architecture of the accelerator along with the vector pipelines of modern x86 CPU architectures, load balancing, and efficient memory transfer between different devices. It is relatively easy to meet these requirements for highly structured scientific applications. In contrast, a number of scientific and engineering applications are unstructured. Getting performance on accelerators for these applications is extremely challenging because many of these applications employ irregular algorithms which exhibit data-dependent control-ow and irregular memory accesses. Furthermore,more » these applications are often iterative with dependency between steps, and thus making it hard to parallelize across steps. As a result, parallelism in these applications is often limited to a single step. Numerical simulation of charged particles beam dynamics is one such application where the distribution of work and memory access pattern at each time step is irregular. Applications with these properties tend to present significant branch and memory divergence, load imbalance between different processor cores, and poor compute and memory utilization. Prior research on parallelizing such irregular applications have been focused around optimizing the irregular, data-dependent memory accesses and control-ow during a single step of the application independent of the other steps, with the assumption that these patterns are completely unpredictable. We observed that the structure of computation leading to control-ow divergence and irregular memory accesses in one step is similar to that in the next step. It is possible to predict this structure in the current step by observing the computation structure of previous steps. In this dissertation, we present novel machine learning based optimization techniques to address the parallel implementation challenges of such irregular applications on different HPC architectures. In particular, we use supervised learning to predict the computation structure and use it to address the control-ow and memory access irregularities in the parallel implementation of such applications on GPUs, Xeon Phis, and heterogeneous architectures composed of multi-core CPUs with GPUs or Xeon Phis. We use numerical simulation of charged particles beam dynamics simulation as a motivating example throughout the dissertation to present our new approach, though they should be equally applicable to a wide range of irregular applications. The machine learning approach presented here use predictive analytics and forecasting techniques to adaptively model and track the irregular memory access pattern at each time step of the simulation to anticipate the future memory access pattern. Access pattern forecasts can then be used to formulate optimization decisions during application execution which improves the performance of the application at a future time step based on the observations from earlier time steps. In heterogeneous architectures, forecasts can also be used to improve the memory performance and resource utilization of all the processing units to deliver a good aggregate performance. We used these optimization techniques and anticipation strategy to design a cache-aware, memory efficient parallel algorithm to address the irregularities in the parallel implementation of charged particles beam dynamics simulation on different HPC architectures. Experimental result using a diverse mix of HPC architectures shows that our approach in using anticipation strategy is effective in maximizing data reuse, ensuring workload balance, minimizing branch and memory divergence, and in improving resource utilization.« less
Computational mechanics analysis tools for parallel-vector supercomputers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Baddourah, Majdi; Qin, Jiangning

1993-01-01

Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigensolution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization search analysis and domain decomposition. The source code for many of these algorithms is available.
Spatial data analytics on heterogeneous multi- and many-core parallel architectures using python

USGS Publications Warehouse

Laura, Jason R.; Rey, Sergio J.

2017-01-01

Parallel vector spatial analysis concerns the application of parallel computational methods to facilitate vector-based spatial analysis. The history of parallel computation in spatial analysis is reviewed, and this work is placed into the broader context of high-performance computing (HPC) and parallelization research. The rise of cyber infrastructure and its manifestation in spatial analysis as CyberGIScience is seen as a main driver of renewed interest in parallel computation in the spatial sciences. Key problems in spatial analysis that have been the focus of parallel computing are covered. Chief among these are spatial optimization problems, computational geometric problems including polygonization and spatial contiguity detection, the use of Monte Carlo Markov chain simulation in spatial statistics, and parallel implementations of spatial econometric methods. Future directions for research on parallelization in computational spatial analysis are outlined.
The Utility of SAR to Monitor Ocean Processes.

DTIC Science & Technology

1981-11-01

echo received from ocean waves include the motion of the a horizontally polarized wave will have its E vector parallel to scattering surfaces, the so...radiation is defined by the direction of the electric field intensity, E, vector . For example, a horizontally polarized wave will have its E vector ...Oil Spill Off the East Coast of the United States ................ .... 55 19. L-band Parallel and Cross Polarized SAR Imagery of Ice in the Beaufort
Computational mechanics analysis tools for parallel-vector supercomputers

NASA Technical Reports Server (NTRS)

Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.; Qin, J.

1993-01-01

Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigen-solution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization algorithm and domain decomposition. The source code for many of these algorithms is available from NASA Langley.
Massively Parallel Solution of Poisson Equation on Coarse Grain MIMD Architectures

NASA Technical Reports Server (NTRS)

Fijany, A.; Weinberger, D.; Roosta, R.; Gulati, S.

1998-01-01

In this paper a new algorithm, designated as Fast Invariant Imbedding algorithm, for solution of Poisson equation on vector and massively parallel MIMD architectures is presented. This algorithm achieves the same optimal computational efficiency as other Fast Poisson solvers while offering a much better structure for vector and parallel implementation. Our implementation on the Intel Delta and Paragon shows that a speedup of over two orders of magnitude can be achieved even for moderate size problems.
Sentence alignment using feed forward neural network.

PubMed

Fattah, Mohamed Abdel; Ren, Fuji; Kuroiwa, Shingo

2006-12-01

Parallel corpora have become an essential resource for work in multi lingual natural language processing. However, sentence aligned parallel corpora are more efficient than non-aligned parallel corpora for cross language information retrieval and machine translation applications. In this paper, we present a new approach to align sentences in bilingual parallel corpora based on feed forward neural network classifier. A feature parameter vector is extracted from the text pair under consideration. This vector contains text features such as length, punctuate score, and cognate score values. A set of manually prepared training data has been assigned to train the feed forward neural network. Another set of data was used for testing. Using this new approach, we could achieve an error reduction of 60% over length based approach when applied on English-Arabic parallel documents. Moreover this new approach is valid for any language pair and it is quite flexible approach since the feature parameter vector may contain more/less or different features than that we used in our system such as lexical match feature.

Vectoring of parallel synthetic jets: A parametric study

NASA Astrophysics Data System (ADS)

Berk, Tim; Gomit, Guillaume; Ganapathisubramani, Bharathram

2016-11-01

The vectoring of a pair of parallel synthetic jets can be described using five dimensionless parameters: the aspect ratio of the slots, the Strouhal number, the Reynolds number, the phase difference between the jets and the spacing between the slots. In the present study, the influence of the latter four on the vectoring behaviour of the jets is examined experimentally using particle image velocimetry. Time-averaged velocity maps are used to study the variations in vectoring behaviour for a parametric sweep of each of the four parameters independently. A topological map is constructed for the full four-dimensional parameter space. The vectoring behaviour is described both qualitatively and quantitatively. A vectoring mechanism is proposed, based on measured vortex positions. We acknowledge the financial support from the European Research Council (ERC Grant Agreement No. 277472).
Vector processing efficiency of plasma MHD codes by use of the FACOM 230-75 APU

NASA Astrophysics Data System (ADS)

Matsuura, T.; Tanaka, Y.; Naraoka, K.; Takizuka, T.; Tsunematsu, T.; Tokuda, S.; Azumi, M.; Kurita, G.; Takeda, T.

1982-06-01

In the framework of pipelined vector architecture, the efficiency of vector processing is assessed with respect to plasma MHD codes in nuclear fusion research. By using a vector processor, the FACOM 230-75 APU, the limit of the enhancement factor due to parallelism of current vector machines is examined for three numerical codes based on a fluid model. Reasonable speed-up factors of approximately 6,6 and 4 times faster than the highly optimized scalar version are obtained for ERATO (linear stability code), AEOLUS-R1 (nonlinear stability code) and APOLLO (1-1/2D transport code), respectively. Problems of the pipelined vector processors are discussed from the viewpoint of restructuring, optimization and choice of algorithms. In conclusion, the important concept of "concurrency within pipelined parallelism" is emphasized.
New Materials, Techniques and Device Concepts for Organic NLO Chromophore-based Electrooptic Devices. Part 1

DTIC Science & Technology

2006-08-23

polarization the electric field vector is parallel to the substrate, for TM polarization the magnetic field vector is parallel to the substrate. Figure...section can be obtained for the case of the two electromagnetic field polarization vectors λ and µ describing the two photons being absorbed (of the same or... polarization effects on two-photon absorption as investigated by the technique of thermal lensing detected absorption of a mode- locked laser beam. This
A GaAs vector processor based on parallel RISC microprocessors

NASA Astrophysics Data System (ADS)

Misko, Tim A.; Rasset, Terry L.

A vector processor architecture based on the development of a 32-bit microprocessor using gallium arsenide (GaAs) technology has been developed. The McDonnell Douglas vector processor (MVP) will be fabricated completely from GaAs digital integrated circuits. The MVP architecture includes a vector memory of 1 megabyte, a parallel bus architecture with eight processing elements connected in parallel, and a control processor. The processing elements consist of a reduced instruction set CPU (RISC) with four floating-point coprocessor units and necessary memory interface functions. This architecture has been simulated for several benchmark programs including complex fast Fourier transform (FFT), complex inner product, trigonometric functions, and sort-merge routine. The results of this study indicate that the MVP can process a 1024-point complex FFT at a speed of 112 microsec (389 megaflops) while consuming approximately 618 W of power in a volume of approximately 0.1 ft-cubed.
Three-dimensional Hybrid Simulation Study of Anisotropic Turbulence in the Proton Kinetic Regime

NASA Astrophysics Data System (ADS)

Vasquez, Bernard J.; Markovskii, Sergei A.; Chandran, Benjamin D. G.

2014-06-01

Three-dimensional numerical hybrid simulations with particle protons and quasi-neutralizing fluid electrons are conducted for a freely decaying turbulence that is anisotropic with respect to the background magnetic field. The turbulence evolution is determined by both the combined root-mean-square (rms) amplitude for fluctuating proton bulk velocity and magnetic field and by the ratio of perpendicular to parallel wavenumbers. This kind of relationship had been considered in the past with regard to interplanetary turbulence. The fluctuations nonlinearly evolve to a turbulent phase whose net wave vector anisotropy is usually more perpendicular than the initial one, irrespective of the initial ratio of perpendicular to parallel wavenumbers. Self-similar anisotropy evolution is found as a function of the rms amplitude and parallel wavenumber. Proton heating rates in the turbulent phase vary strongly with the rms amplitude but only weakly with the initial wave vector anisotropy. Even in the limit where wave vectors are confined to the plane perpendicular to the background magnetic field, the heating rate remains close to the corresponding case with finite parallel wave vector components. Simulation results obtained as a function of proton plasma to background magnetic pressure ratio β p in the range 0.1-0.5 show that the wave vector anisotropy also weakly depends on β p .
Vectoring of parallel synthetic jets

NASA Astrophysics Data System (ADS)

Berk, Tim; Ganapathisubramani, Bharathram; Gomit, Guillaume

2015-11-01

A pair of parallel synthetic jets can be vectored by applying a phase difference between the two driving signals. The resulting jet can be merged or bifurcated and either vectored towards the actuator leading in phase or the actuator lagging in phase. In the present study, the influence of phase difference and Strouhal number on the vectoring behaviour is examined experimentally. Phase-locked vorticity fields, measured using Particle Image Velocimetry (PIV), are used to track vortex pairs. The physical mechanisms that explain the diversity in vectoring behaviour are observed based on the vortex trajectories. For a fixed phase difference, the vectoring behaviour is shown to be primarily influenced by pinch-off time of vortex rings generated by the synthetic jets. Beyond a certain formation number, the pinch-off timescale becomes invariant. In this region, the vectoring behaviour is determined by the distance between subsequent vortex rings. We acknowledge the financial support from the European Research Council (ERC grant agreement no. 277472).
Parallel/Vector Integration Methods for Dynamical Astronomy

NASA Astrophysics Data System (ADS)

Fukushima, T.

Progress of parallel/vector computers has driven us to develop suitable numerical integrators utilizing their computational power to the full extent while being independent on the size of system to be integrated. Unfortunately, the parallel version of Runge-Kutta type integrators are known to be not so efficient. Recently we developed a parallel version of the extrapolation method (Ito and Fukushima 1997), which allows variable timesteps and still gives an acceleration factor of 3-4 for general problems. While the vector-mode usage of Picard-Chebyshev method (Fukushima 1997a, 1997b) will lead the acceleration factor of order of 1000 for smooth problems such as planetary/satellites orbit integration. The success of multiple-correction PECE mode of time-symmetric implicit Hermitian integrator (Kokubo 1998) seems to enlighten Milankar's so-called "pipelined predictor corrector method", which is expected to lead an acceleration factor of 3-4. We will review these directions and discuss future prospects.
Automated Vectorization of Decision-Based Algorithms

NASA Technical Reports Server (NTRS)

James, Mark

2006-01-01

Virtually all existing vectorization algorithms are designed to only analyze the numeric properties of an algorithm and distribute those elements across multiple processors. This advances the state of the practice because it is the only known system, at the time of this reporting, that takes high-level statements and analyzes them for their decision properties and converts them to a form that allows them to automatically be executed in parallel. The software takes a high-level source program that describes a complex decision- based condition and rewrites it as a disjunctive set of component Boolean relations that can then be executed in parallel. This is important because parallel architectures are becoming more commonplace in conventional systems and they have always been present in NASA flight systems. This technology allows one to take existing condition-based code and automatically vectorize it so it naturally decomposes across parallel architectures.
Multiscale asymmetric orthogonal wavelet kernel for linear programming support vector learning and nonlinear dynamic systems identification.

PubMed

Lu, Zhao; Sun, Jing; Butts, Kenneth

2014-05-01

Support vector regression for approximating nonlinear dynamic systems is more delicate than the approximation of indicator functions in support vector classification, particularly for systems that involve multitudes of time scales in their sampled data. The kernel used for support vector learning determines the class of functions from which a support vector machine can draw its solution, and the choice of kernel significantly influences the performance of a support vector machine. In this paper, to bridge the gap between wavelet multiresolution analysis and kernel learning, the closed-form orthogonal wavelet is exploited to construct new multiscale asymmetric orthogonal wavelet kernels for linear programming support vector learning. The closed-form multiscale orthogonal wavelet kernel provides a systematic framework to implement multiscale kernel learning via dyadic dilations and also enables us to represent complex nonlinear dynamics effectively. To demonstrate the superiority of the proposed multiscale wavelet kernel in identifying complex nonlinear dynamic systems, two case studies are presented that aim at building parallel models on benchmark datasets. The development of parallel models that address the long-term/mid-term prediction issue is more intricate and challenging than the identification of series-parallel models where only one-step ahead prediction is required. Simulation results illustrate the effectiveness of the proposed multiscale kernel learning.
Associative Pattern Recognition In Analog VLSI Circuits

NASA Technical Reports Server (NTRS)

Tawel, Raoul

1995-01-01

Winner-take-all circuit selects best-match stored pattern. Prototype cascadable very-large-scale integrated (VLSI) circuit chips built and tested to demonstrate concept of electronic associative pattern recognition. Based on low-power, sub-threshold analog complementary oxide/semiconductor (CMOS) VLSI circuitry, each chip can store 128 sets (vectors) of 16 analog values (vector components), vectors representing known patterns as diverse as spectra, histograms, graphs, or brightnesses of pixels in images. Chips exploit parallel nature of vector quantization architecture to implement highly parallel processing in relatively simple computational cells. Through collective action, cells classify input pattern in fraction of microsecond while consuming power of few microwatts.
Compute Server Performance Results

NASA Technical Reports Server (NTRS)

Stockdale, I. E.; Barton, John; Woodrow, Thomas (Technical Monitor)

1994-01-01

Parallel-vector supercomputers have been the workhorses of high performance computing. As expectations of future computing needs have risen faster than projected vector supercomputer performance, much work has been done investigating the feasibility of using Massively Parallel Processor systems as supercomputers. An even more recent development is the availability of high performance workstations which have the potential, when clustered together, to replace parallel-vector systems. We present a systematic comparison of floating point performance and price-performance for various compute server systems. A suite of highly vectorized programs was run on systems including traditional vector systems such as the Cray C90, and RISC workstations such as the IBM RS/6000 590 and the SGI R8000. The C90 system delivers 460 million floating point operations per second (FLOPS), the highest single processor rate of any vendor. However, if the price-performance ration (PPR) is considered to be most important, then the IBM and SGI processors are superior to the C90 processors. Even without code tuning, the IBM and SGI PPR's of 260 and 220 FLOPS per dollar exceed the C90 PPR of 160 FLOPS per dollar when running our highly vectorized suite,
Parallel/Vector Integration Methods for Dynamical Astronomy

NASA Astrophysics Data System (ADS)

Fukushima, Toshio

1999-01-01

This paper reviews three recent works on the numerical methods to integrate ordinary differential equations (ODE), which are specially designed for parallel, vector, and/or multi-processor-unit(PU) computers. The first is the Picard-Chebyshev method (Fukushima, 1997a). It obtains a global solution of ODE in the form of Chebyshev polynomial of large (> 1000) degree by applying the Picard iteration repeatedly. The iteration converges for smooth problems and/or perturbed dynamics. The method runs around 100-1000 times faster in the vector mode than in the scalar mode of a certain computer with vector processors (Fukushima, 1997b). The second is a parallelization of a symplectic integrator (Saha et al., 1997). It regards the implicit midpoint rules covering thousands of timesteps as large-scale nonlinear equations and solves them by the fixed-point iteration. The method is applicable to Hamiltonian systems and is expected to lead an acceleration factor of around 50 in parallel computers with more than 1000 PUs. The last is a parallelization of the extrapolation method (Ito and Fukushima, 1997). It performs trial integrations in parallel. Also the trial integrations are further accelerated by balancing computational load among PUs by the technique of folding. The method is all-purpose and achieves an acceleration factor of around 3.5 by using several PUs. Finally, we give a perspective on the parallelization of some implicit integrators which require multiple corrections in solving implicit formulas like the implicit Hermitian integrators (Makino and Aarseth, 1992), (Hut et al., 1995) or the implicit symmetric multistep methods (Fukushima, 1998), (Fukushima, 1999).
An analytic formula for the relativistic incoherent Thomson backscattering spectrum for a drifting bi-Maxwellian plasma

DOE Office of Scientific and Technical Information (OSTI.GOV)

Naito, O.

2015-08-15

An analytic formula has been derived for the relativistic incoherent Thomson backscattering spectrum for a drifting anisotropic plasma when the scattering vector is parallel to the drifting direction. The shape of the scattering spectrum is insensitive to the electron temperature perpendicular to the scattering vector, but its amplitude may be modulated. As a result, while the measured temperature correctly represents the electron distribution parallel to the scattering vector, the electron density may be underestimated when the perpendicular temperature is higher than the parallel temperature. Since the scattering spectrum in shorter wavelengths is greatly enhanced by the existence of drift, themore » diagnostics might be used to measure local electron current density in fusion plasmas.« less
Reduced Order Model Basis Vector Generation: Generates Basis Vectors fro ROMs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Arrighi, Bill

2016-03-03

libROM is a library that implements order reduction via singular value decomposition (SVD) of sampled state vectors. It implements 2 parallel, incremental SVD algorithms and one serial, non-incremental algorithm. It also provides a mechanism for adaptive sampling of basis vectors.
AZTEC. Parallel Iterative method Software for Solving Linear Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, S.; Shadid, J.; Tuminaro, R.

1995-07-01

AZTEC is an interactive library that greatly simplifies the parrallelization process when solving the linear systems of equations Ax=b where A is a user supplied n X n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. AZTEC is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparse unstructured matricesmore » for parallel solutions.« less
A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)

NASA Technical Reports Server (NTRS)

Straeter, T. A.; Markos, A. T.

1975-01-01

A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.
A performance comparison of the IBM RS/6000 and the Astronautics ZS-1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Smith, W.M.; Abraham, S.G.; Davidson, E.S.

1991-01-01

Concurrent uniprocessor architectures, of which vector and superscalar are two examples, are designed to capitalize on fine-grain parallelism. The authors have developed a performance evaluation method for comparing and improving these architectures, and in this article they present the methodology and a detailed case study of two machines. The runtime of many programs is dominated by time spent in loop constructs - for example, Fortran Do-loops. Loops generally comprise two logical processes: The access process generates addresses for memory operations while the execute process operates on floating-point data. Memory access patterns typically can be generated independently of the data inmore » the execute process. This independence allows the access process to slip ahead, thereby hiding memory latency. The IBM 360/91 was designed in 1967 to achieve slip dynamically, at runtime. One CPU unit executes integer operations while another handles floating-point operations. Other machines, including the VAX 9000 and the IBM RS/6000, use a similar approach.« less
A model for optimizing file access patterns using spatio-temporal parallelism

DOE Office of Scientific and Technical Information (OSTI.GOV)

Boonthanome, Nouanesengsy; Patchett, John; Geveci, Berk

2013-01-01

For many years now, I/O read time has been recognized as the primary bottleneck for parallel visualization and analysis of large-scale data. In this paper, we introduce a model that can estimate the read time for a file stored in a parallel filesystem when given the file access pattern. Read times ultimately depend on how the file is stored and the access pattern used to read the file. The file access pattern will be dictated by the type of parallel decomposition used. We employ spatio-temporal parallelism, which combines both spatial and temporal parallelism, to provide greater flexibility to possible filemore » access patterns. Using our model, we were able to configure the spatio-temporal parallelism to design optimized read access patterns that resulted in a speedup factor of approximately 400 over traditional file access patterns.« less
The parallel-antiparallel signal difference in double-wave-vector diffusion-weighted MR at short mixing times: A phase evolution perspective

NASA Astrophysics Data System (ADS)

Finsterbusch, Jürgen

2011-01-01

Experiments with two diffusion weightings applied in direct succession in a single acquisition, so-called double- or two-wave-vector diffusion-weighting (DWV) experiments at short mixing times, have been shown to be a promising tool to estimate cell or compartment sizes, e.g. in living tissue. The basic theory for such experiments predicts that the signal decays for parallel and antiparallel wave vector orientations differ by a factor of three for small wave vectors. This seems to be surprising because in standard, single-wave-vector experiments the polarity of the diffusion weighting has no influence on the signal attenuation. Thus, the question how this difference can be understood more pictorially is often raised. In this rather educational manuscript, the phase evolution during a DWV experiment for simple geometries, e.g. diffusion between parallel, impermeable planes oriented perpendicular to the wave vectors, is considered step-by-step and demonstrates how the signal difference develops. Considering the populations of the phase distributions obtained, the factor of three between the signal decays which is predicted by the theory can be reproduced. Furthermore, the intermediate signal decay for orthogonal wave vector orientations can be derived when investigating diffusion in a box. Thus, the presented “phase gymnastics” approach may help to understand the signal modulation observed in DWV experiments at short mixing times.
Limits on the Efficiency of Event-Based Algorithms for Monte Carlo Neutron Transport

DOE Office of Scientific and Technical Information (OSTI.GOV)

Romano, Paul K.; Siegel, Andrew R.

The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup duemore » to vectorization as a function of the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size to achieve vector efficiency greater than 90%. Lastly, when the execution times for events are allowed to vary, the vector speedup is also limited by differences in execution time for events being carried out in a single event-iteration.« less

Limits on the Efficiency of Event-Based Algorithms for Monte Carlo Neutron Transport

DOE PAGES

Romano, Paul K.; Siegel, Andrew R.

2017-07-01

The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup duemore » to vectorization as a function of the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size to achieve vector efficiency greater than 90%. Lastly, when the execution times for events are allowed to vary, the vector speedup is also limited by differences in execution time for events being carried out in a single event-iteration.« less
Parallel and fault-tolerant algorithms for hypercube multiprocessors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aykanat, C.

1988-01-01

Several techniques for increasing the performance of parallel algorithms on distributed-memory message-passing multi-processor systems are investigated. These techniques are effectively implemented for the parallelization of the Scaled Conjugate Gradient (SCG) algorithm on a hypercube connected message-passing multi-processor. Significant performance improvement is achieved by using these techniques. The SCG algorithm is used for the solution phase of an FE modeling system. Almost linear speed-up is achieved, and it is shown that hypercube topology is scalable for an FE class of problem. The SCG algorithm is also shown to be suitable for vectorization, and near supercomputer performance is achieved on a vectormore » hypercube multiprocessor by exploiting both parallelization and vectorization. Fault-tolerance issues for the parallel SCG algorithm and for the hypercube topology are also addressed.« less
A hybrid algorithm for parallel molecular dynamics simulations

NASA Astrophysics Data System (ADS)

Mangiardi, Chris M.; Meyer, R.

2017-10-01

This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.
Reviving the shear-free perfect fluid conjecture in general relativity

NASA Astrophysics Data System (ADS)

Sikhonde, Muzikayise E.; Dunsby, Peter K. S.

2017-12-01

Employing a Mathematica symbolic computer algebra package called xTensor, we present (1+3) -covariant special case proofs of the shear-free perfect fluid conjecture in general relativity. We first present the case where the pressure is constant, and where the acceleration is parallel to the vorticity vector. These cases were first presented in their covariant form by Senovilla et al. We then provide a covariant proof for the case where the acceleration and vorticity vectors are orthogonal, which leads to the existence of a Killing vector along the vorticity. This Killing vector satisfies the new constraint equations resulting from the vanishing of the shear. Furthermore, it is shown that in order for the conjecture to be true, this Killing vector must have a vanishing spatially projected directional covariant derivative along the velocity vector field. This in turn implies the existence of another basic vector field along the direction of the vorticity for the conjecture to hold. Finally, we show that in general, there exists a basic vector field parallel to the acceleration for which the conjecture is true.
Hypercluster - Parallel processing for computational mechanics

NASA Technical Reports Server (NTRS)

Blech, Richard A.

1988-01-01

An account is given of the development status, performance capabilities and implications for further development of NASA-Lewis' testbed 'hypercluster' parallel computer network, in which multiple processors communicate through a shared memory. Processors have local as well as shared memory; the hypercluster is expanded in the same manner as the hypercube, with processor clusters replacing the normal single processor node. The NASA-Lewis machine has three nodes with a vector personality and one node with a scalar personality. Each of the vector nodes uses four board-level vector processors, while the scalar node uses four general-purpose microcomputer boards.
Reversible vector ratchets for skyrmion systems

NASA Astrophysics Data System (ADS)

Ma, X.; Reichhardt, C. J. Olson; Reichhardt, C.

2017-03-01

We show that ac driven skyrmions interacting with an asymmetric substrate provide a realization of a class of ratchet system which we call a vector ratchet that arises due to the effect of the Magnus term on the skyrmion dynamics. In a vector ratchet, the dc motion induced by the ac drive can be described as a vector that can be rotated clockwise or counterclockwise relative to the substrate asymmetry direction. Up to a full 360∘ rotation is possible for varied ac amplitudes or skyrmion densities. In contrast to overdamped systems, in which ratchet motion is always parallel to the substrate asymmetry direction, vector ratchets allow the ratchet motion to be in any direction relative to the substrate asymmetry. It is also possible to obtain a reversal in the direction of rotation of the vector ratchet, permitting the creation of a reversible vector ratchet. We examine vector ratchets for ac drives applied parallel or perpendicular to the substrate asymmetry direction, and show that reverse ratchet motion can be produced by collective effects. No reversals occur for an isolated skyrmion on an asymmetric substrate. Since a vector ratchet can produce motion in any direction, it could represent a method for controlling skyrmion motion for spintronic applications.
Effective Vectorization with OpenMP 4.5

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huber, Joseph N.; Hernandez, Oscar R.; Lopez, Matthew Graham

This paper describes how the Single Instruction Multiple Data (SIMD) model and its extensions in OpenMP work, and how these are implemented in different compilers. Modern processors are highly parallel computational machines which often include multiple processors capable of executing several instructions in parallel. Understanding SIMD and executing instructions in parallel allows the processor to achieve higher performance without increasing the power required to run it. SIMD instructions can significantly reduce the runtime of code by executing a single operation on large groups of data. The SIMD model is so integral to the processor s potential performance that, if SIMDmore » is not utilized, less than half of the processor is ever actually used. Unfortunately, using SIMD instructions is a challenge in higher level languages because most programming languages do not have a way to describe them. Most compilers are capable of vectorizing code by using the SIMD instructions, but there are many code features important for SIMD vectorization that the compiler cannot determine at compile time. OpenMP attempts to solve this by extending the C++/C and Fortran programming languages with compiler directives that express SIMD parallelism. OpenMP is used to pass hints to the compiler about the code to be executed in SIMD. This is a key resource for making optimized code, but it does not change whether or not the code can use SIMD operations. However, in many cases critical functions are limited by a poor understanding of how SIMD instructions are actually implemented, as SIMD can be implemented through vector instructions or simultaneous multi-threading (SMT). We have found that it is often the case that code cannot be vectorized, or is vectorized poorly, because the programmer does not have sufficient knowledge of how SIMD instructions work.« less
GaAs Supercomputing: Architecture, Language, And Algorithms For Image Processing

NASA Astrophysics Data System (ADS)

Johl, John T.; Baker, Nick C.

1988-10-01

The application of high-speed GaAs processors in a parallel system matches the demanding computational requirements of image processing. The architecture of the McDonnell Douglas Astronautics Company (MDAC) vector processor is described along with the algorithms and language translator. Most image and signal processing algorithms can utilize parallel processing and show a significant performance improvement over sequential versions. The parallelization performed by this system is within each vector instruction. Since each vector has many elements, each requiring some computation, useful concurrent arithmetic operations can easily be performed. Balancing the memory bandwidth with the computation rate of the processors is an important design consideration for high efficiency and utilization. The architecture features a bus-based execution unit consisting of four to eight 32-bit GaAs RISC microprocessors running at a 200 MHz clock rate for a peak performance of 1.6 BOPS. The execution unit is connected to a vector memory with three buses capable of transferring two input words and one output word every 10 nsec. The address generators inside the vector memory perform different vector addressing modes and feed the data to the execution unit. The functions discussed in this paper include basic MATRIX OPERATIONS, 2-D SPATIAL CONVOLUTION, HISTOGRAM, and FFT. For each of these algorithms, assembly language programs were run on a behavioral model of the system to obtain performance figures.
Partitioning Rectangular and Structurally Nonsymmetric Sparse Matrices for Parallel Processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

B. Hendrickson; T.G. Kolda

1998-09-01

A common operation in scientific computing is the multiplication of a sparse, rectangular or structurally nonsymmetric matrix and a vector. In many applications the matrix- transpose-vector product is also required. This paper addresses the efficient parallelization of these operations. We show that the problem can be expressed in terms of partitioning bipartite graphs. We then introduce several algorithms for this partitioning problem and compare their performance on a set of test matrices.
Automatic differentiation for design sensitivity analysis of structural systems using multiple processors

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.; Storaasli, Olaf O.; Qin, Jiangning; Qamar, Ramzi

1994-01-01

An automatic differentiation tool (ADIFOR) is incorporated into a finite element based structural analysis program for shape and non-shape design sensitivity analysis of structural systems. The entire analysis and sensitivity procedures are parallelized and vectorized for high performance computation. Small scale examples to verify the accuracy of the proposed program and a medium scale example to demonstrate the parallel vector performance on multiple CRAY C90 processors are included.
Krylov subspace methods on supercomputers

NASA Technical Reports Server (NTRS)

Saad, Youcef

1988-01-01

A short survey of recent research on Krylov subspace methods with emphasis on implementation on vector and parallel computers is presented. Conjugate gradient methods have proven very useful on traditional scalar computers, and their popularity is likely to increase as three-dimensional models gain importance. A conservative approach to derive effective iterative techniques for supercomputers has been to find efficient parallel/vector implementations of the standard algorithms. The main source of difficulty in the incomplete factorization preconditionings is in the solution of the triangular systems at each step. A few approaches consisting of implementing efficient forward and backward triangular solutions are described in detail. Polynomial preconditioning as an alternative to standard incomplete factorization techniques is also discussed. Another efficient approach is to reorder the equations so as to improve the structure of the matrix to achieve better parallelism or vectorization. An overview of these and other ideas and their effectiveness or potential for different types of architectures is given.
Parallel-vector unsymmetric Eigen-Solver on high performance computers

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.; Jiangning, Qin

1993-01-01

The popular QR algorithm for solving all eigenvalues of an unsymmetric matrix is reviewed. Among the basic components in the QR algorithm, it was concluded from this study, that the reduction of an unsymmetric matrix to a Hessenberg form (before applying the QR algorithm itself) can be done effectively by exploiting the vector speed and multiple processors offered by modern high-performance computers. Numerical examples of several test cases have indicated that the proposed parallel-vector algorithm for converting a given unsymmetric matrix to a Hessenberg form offers computational advantages over the existing algorithm. The time saving obtained by the proposed methods is increased as the problem size increased.
An object-oriented approach to nested data parallelism

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.; Chatterjee, Siddhartha

1994-01-01

This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called 'collections' and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for 'nested data parallelism.' Few current programming languages support nested data parallelism however. In an object-oriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an object-oriented data-parallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a 'vector' that is implemented as a template. Only one new language feature is introduced: the 'foreach' construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested data-parallel C++ into ordinary C++. Extracting the potential parallelism in nested 'foreach' constructs is called 'flattening' nested parallelism. We show how to flatten 'foreach' constructs using a simple program transformation. Our prototype system produces vector code which has been successfully run on workstations, a CM-2, and a CM-5.
Limits on the Efficiency of Event-Based Algorithms for Monte Carlo Neutron Transport

DOE Office of Scientific and Technical Information (OSTI.GOV)

Romano, Paul K.; Siegel, Andrew R.

The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup duemore » to vectorization as a function of two parameters: the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size in order to achieve vector efficiency greater than 90%. When the execution times for events are allowed to vary, however, the vector speedup is also limited by differences in execution time for events being carried out in a single event-iteration. For some problems, this implies that vector effciencies over 50% may not be attainable. While there are many factors impacting performance of an event-based algorithm that are not captured by our model, it nevertheless provides insights into factors that may be limiting in a real implementation.« less
Reversible vector ratchets for skyrmion systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ma, Xiu; Reichhardt, Cynthia Jane Olson; Reichhardt, Charles

In this paper, we show that ac driven skyrmions interacting with an asymmetric substrate provide a realization of a class of ratchet system which we call a vector ratchet that arises due to the effect of the Magnus term on the skyrmion dynamics. In a vector ratchet, the dc motion induced by the ac drive can be described as a vector that can be rotated clockwise or counterclockwise relative to the substrate asymmetry direction. Up to a full 360° rotation is possible for varied ac amplitudes or skyrmion densities. In contrast to overdamped systems, in which ratchet motion is alwaysmore » parallel to the substrate asymmetry direction, vector ratchets allow the ratchet motion to be in any direction relative to the substrate asymmetry. It is also possible to obtain a reversal in the direction of rotation of the vector ratchet, permitting the creation of a reversible vector ratchet. We examine vector ratchets for ac drives applied parallel or perpendicular to the substrate asymmetry direction, and show that reverse ratchet motion can be produced by collective effects. No reversals occur for an isolated skyrmion on an asymmetric substrate. Finally, since a vector ratchet can produce motion in any direction, it could represent a method for controlling skyrmion motion for spintronic applications.« less
Reversible vector ratchets for skyrmion systems

DOE PAGES

Ma, Xiu; Reichhardt, Cynthia Jane Olson; Reichhardt, Charles

2017-03-03

In this paper, we show that ac driven skyrmions interacting with an asymmetric substrate provide a realization of a class of ratchet system which we call a vector ratchet that arises due to the effect of the Magnus term on the skyrmion dynamics. In a vector ratchet, the dc motion induced by the ac drive can be described as a vector that can be rotated clockwise or counterclockwise relative to the substrate asymmetry direction. Up to a full 360° rotation is possible for varied ac amplitudes or skyrmion densities. In contrast to overdamped systems, in which ratchet motion is alwaysmore » parallel to the substrate asymmetry direction, vector ratchets allow the ratchet motion to be in any direction relative to the substrate asymmetry. It is also possible to obtain a reversal in the direction of rotation of the vector ratchet, permitting the creation of a reversible vector ratchet. We examine vector ratchets for ac drives applied parallel or perpendicular to the substrate asymmetry direction, and show that reverse ratchet motion can be produced by collective effects. No reversals occur for an isolated skyrmion on an asymmetric substrate. Finally, since a vector ratchet can produce motion in any direction, it could represent a method for controlling skyrmion motion for spintronic applications.« less
A Connectionist Simulation of Attention and Vector Comparison: The Need for Serial Processing in Parallel Hardware

DTIC Science & Technology

1991-01-01

visual and three-layer connectionist network, in that the input layer of memory processing is serial, and is likely to represent each module is... Selective attention gates visual University Press. processing in the extrastnate cortex. Science, 229:782-784. Treasman, A.M. (1985). Preartentive...AD-A242 225 A CONNECTIONIST SIMULATION OF ATTENTION AND VECTOR COMPARISON: THE NEED FOR SERIAL PROCESSING IN PARALLEL HARDWARE Technical Report AlP
A Parallel Processing Algorithm for Remote Sensing Classification

NASA Technical Reports Server (NTRS)

Gualtieri, J. Anthony

2005-01-01

A current thread in parallel computation is the use of cluster computers created by networking a few to thousands of commodity general-purpose workstation-level commuters using the Linux operating system. For example on the Medusa cluster at NASA/GSFC, this provides for super computing performance, 130 G(sub flops) (Linpack Benchmark) at moderate cost, $370K. However, to be useful for scientific computing in the area of Earth science, issues of ease of programming, access to existing scientific libraries, and portability of existing code need to be considered. In this paper, I address these issues in the context of tools for rendering earth science remote sensing data into useful products. In particular, I focus on a problem that can be decomposed into a set of independent tasks, which on a serial computer would be performed sequentially, but with a cluster computer can be performed in parallel, giving an obvious speedup. To make the ideas concrete, I consider the problem of classifying hyperspectral imagery where some ground truth is available to train the classifier. In particular I will use the Support Vector Machine (SVM) approach as applied to hyperspectral imagery. The approach will be to introduce notions about parallel computation and then to restrict the development to the SVM problem. Pseudocode (an outline of the computation) will be described and then details specific to the implementation will be given. Then timing results will be reported to show what speedups are possible using parallel computation. The paper will close with a discussion of the results.
Kinematic sensitivity of robot manipulators

NASA Technical Reports Server (NTRS)

Vuskovic, Marko I.

1989-01-01

Kinematic sensitivity vectors and matrices for open-loop, n degrees-of-freedom manipulators are derived. First-order sensitivity vectors are defined as partial derivatives of the manipulator's position and orientation with respect to its geometrical parameters. The four-parameter kinematic model is considered, as well as the five-parameter model in case of nominally parallel joint axes. Sensitivity vectors are expressed in terms of coordinate axes of manipulator frames. Second-order sensitivity vectors, the partial derivatives of first-order sensitivity vectors, are also considered. It is shown that second-order sensitivity vectors can be expressed as vector products of the first-order sensitivity vectors.
Parallel-Vector Algorithm For Rapid Structural Anlysis

NASA Technical Reports Server (NTRS)

Agarwal, Tarun R.; Nguyen, Duc T.; Storaasli, Olaf O.

1993-01-01

New algorithm developed to overcome deficiency of skyline storage scheme by use of variable-band storage scheme. Exploits both parallel and vector capabilities of modern high-performance computers. Gives engineers and designers opportunity to include more design variables and constraints during optimization of structures. Enables use of more refined finite-element meshes to obtain improved understanding of complex behaviors of aerospace structures leading to better, safer designs. Not only attractive for current supercomputers but also for next generation of shared-memory supercomputers.

Electromagnetic Physics Models for Parallel Computing Architectures

NASA Astrophysics Data System (ADS)

Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

2016-10-01

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.
Brian hears: online auditory processing using vectorization over channels.

PubMed

Fontaine, Bertrand; Goodman, Dan F M; Benichoux, Victor; Brette, Romain

2011-01-01

The human cochlea includes about 3000 inner hair cells which filter sounds at frequencies between 20 Hz and 20 kHz. This massively parallel frequency analysis is reflected in models of auditory processing, which are often based on banks of filters. However, existing implementations do not exploit this parallelism. Here we propose algorithms to simulate these models by vectorizing computation over frequency channels, which are implemented in "Brian Hears," a library for the spiking neural network simulator package "Brian." This approach allows us to use high-level programming languages such as Python, because with vectorized operations, the computational cost of interpretation represents a small fraction of the total cost. This makes it possible to define and simulate complex models in a simple way, while all previous implementations were model-specific. In addition, we show that these algorithms can be naturally parallelized using graphics processing units, yielding substantial speed improvements. We demonstrate these algorithms with several state-of-the-art cochlear models, and show that they compare favorably with existing, less flexible, implementations.
Signal processing applications of massively parallel charge domain computing devices

NASA Technical Reports Server (NTRS)

Fijany, Amir (Inventor); Barhen, Jacob (Inventor); Toomarian, Nikzad (Inventor)

1999-01-01

The present invention is embodied in a charge coupled device (CCD)/charge injection device (CID) architecture capable of performing a Fourier transform by simultaneous matrix vector multiplication (MVM) operations in respective plural CCD/CID arrays in parallel in O(1) steps. For example, in one embodiment, a first CCD/CID array stores charge packets representing a first matrix operator based upon permutations of a Hartley transform and computes the Fourier transform of an incoming vector. A second CCD/CID array stores charge packets representing a second matrix operator based upon different permutations of a Hartley transform and computes the Fourier transform of an incoming vector. The incoming vector is applied to the inputs of the two CCD/CID arrays simultaneously, and the real and imaginary parts of the Fourier transform are produced simultaneously in the time required to perform a single MVM operation in a CCD/CID array.
Adaptive proxy map server for efficient vector spatial data rendering

NASA Astrophysics Data System (ADS)

Sayar, Ahmet

2013-01-01

The rapid transmission of vector map data over the Internet is becoming a bottleneck of spatial data delivery and visualization in web-based environment because of increasing data amount and limited network bandwidth. In order to improve both the transmission and rendering performances of vector spatial data over the Internet, we propose a proxy map server enabling parallel vector data fetching as well as caching to improve the performance of web-based map servers in a dynamic environment. Proxy map server is placed seamlessly anywhere between the client and the final services, intercepting users' requests. It employs an efficient parallelization technique based on spatial proximity and data density in case distributed replica exists for the same spatial data. The effectiveness of the proposed technique is proved at the end of the article by the application of creating map images enriched with earthquake seismic data records.
Wiimote Experiments: 3-D Inclined Plane Problem for Reinforcing the Vector Concept

ERIC Educational Resources Information Center

Kawam, Alae; Kouh, Minjoon

2011-01-01

In an introductory physics course where students first learn about vectors, they oftentimes struggle with the concept of vector addition and decomposition. For example, the classic physics problem involving a mass on an inclined plane requires the decomposition of the force of gravity into two directions that are parallel and perpendicular to the…
Super and parallel computers and their impact on civil engineering

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kamat, M.P.

1986-01-01

This book presents the papers given at a conference on the use of supercomputers in civil engineering. Topics considered at the conference included solving nonlinear equations on a hypercube, a custom architectured parallel processing system, distributed data processing, algorithms, computer architecture, parallel processing, vector processing, computerized simulation, and cost benefit analysis.
Aztec user`s guide. Version 1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S.

1995-10-01

Aztec is an iterative library that greatly simplifies the parallelization process when solving the linear systems of equations Ax = b where A is a user supplied n x n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. Aztec is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparsemore » unstructured matrices for parallel solution. Once the distributed matrix is created, computation can be performed on any of the parallel machines running Aztec: nCUBE 2, IBM SP2 and Intel Paragon, MPI platforms as well as standard serial and vector platforms. Aztec includes a number of Krylov iterative methods such as conjugate gradient (CG), generalized minimum residual (GMRES) and stabilized biconjugate gradient (BICGSTAB) to solve systems of equations. These Krylov methods are used in conjunction with various preconditioners such as polynomial or domain decomposition methods using LU or incomplete LU factorizations within subdomains. Although the matrix A can be general, the package has been designed for matrices arising from the approximation of partial differential equations (PDEs). In particular, the Aztec package is oriented toward systems arising from PDE applications.« less
SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX/80

NASA Astrophysics Data System (ADS)

Kamat, Manohar P.; Watson, Brian C.

1992-02-01

The results of a research activity aimed at providing a finite element capability for analyzing turbo-machinery bladed-disk assemblies in a vector/parallel processing environment are summarized. Analysis of aircraft turbofan engines is very computationally intensive. The performance limit of modern day computers with a single processing unit was estimated at 3 billions of floating point operations per second (3 gigaflops). In view of this limit of a sequential unit, performance rates higher than 3 gigaflops can be achieved only through vectorization and/or parallelization as on Alliant FX/80. Accordingly, the efforts of this critically needed research were geared towards developing and evaluating parallel finite element methods for static and vibration analysis. A special purpose code, named with the acronym SAPNEW, performs static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements.
SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX/80

NASA Technical Reports Server (NTRS)

Kamat, Manohar P.; Watson, Brian C.

1992-01-01

The results of a research activity aimed at providing a finite element capability for analyzing turbo-machinery bladed-disk assemblies in a vector/parallel processing environment are summarized. Analysis of aircraft turbofan engines is very computationally intensive. The performance limit of modern day computers with a single processing unit was estimated at 3 billions of floating point operations per second (3 gigaflops). In view of this limit of a sequential unit, performance rates higher than 3 gigaflops can be achieved only through vectorization and/or parallelization as on Alliant FX/80. Accordingly, the efforts of this critically needed research were geared towards developing and evaluating parallel finite element methods for static and vibration analysis. A special purpose code, named with the acronym SAPNEW, performs static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements.
Electromagnetic physics models for parallel computing architectures

DOE PAGES

Amadio, G.; Ananya, A.; Apostolakis, J.; ...

2016-11-21

The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part ofmore » the GeantV project. Finally, the results of preliminary performance evaluation and physics validation are presented as well.« less
Fast parallel tandem mass spectral library searching using GPU hardware acceleration.

PubMed

Baumgardner, Lydia Ashleigh; Shanmugam, Avinash Kumar; Lam, Henry; Eng, Jimmy K; Martin, Daniel B

2011-06-03

Mass spectrometry-based proteomics is a maturing discipline of biologic research that is experiencing substantial growth. Instrumentation has steadily improved over time with the advent of faster and more sensitive instruments collecting ever larger data files. Consequently, the computational process of matching a peptide fragmentation pattern to its sequence, traditionally accomplished by sequence database searching and more recently also by spectral library searching, has become a bottleneck in many mass spectrometry experiments. In both of these methods, the main rate-limiting step is the comparison of an acquired spectrum with all potential matches from a spectral library or sequence database. This is a highly parallelizable process because the core computational element can be represented as a simple but arithmetically intense multiplication of two vectors. In this paper, we present a proof of concept project taking advantage of the massively parallel computing available on graphics processing units (GPUs) to distribute and accelerate the process of spectral assignment using spectral library searching. This program, which we have named FastPaSS (for Fast Parallelized Spectral Searching), is implemented in CUDA (Compute Unified Device Architecture) from NVIDIA, which allows direct access to the processors in an NVIDIA GPU. Our efforts demonstrate the feasibility of GPU computing for spectral assignment, through implementation of the validated spectral searching algorithm SpectraST in the CUDA environment.
Progressive Vector Quantization on a massively parallel SIMD machine with application to multispectral image data

NASA Technical Reports Server (NTRS)

Manohar, Mareboyana; Tilton, James C.

1994-01-01

A progressive vector quantization (VQ) compression approach is discussed which decomposes image data into a number of levels using full search VQ. The final level is losslessly compressed, enabling lossless reconstruction. The computational difficulties are addressed by implementation on a massively parallel SIMD machine. We demonstrate progressive VQ on multispectral imagery obtained from the Advanced Very High Resolution Radiometer instrument and other Earth observation image data, and investigate the trade-offs in selecting the number of decomposition levels and codebook training method.
A compositional reservoir simulator on distributed memory parallel computers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Rame, M.; Delshad, M.

1995-12-31

This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
Pairwise Sequence Alignment Library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jeff Daily, PNNL

2015-05-20

Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, amore » novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.« less
A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gittens, Alex; Kottalam, Jey; Yang, Jiyan

We investigate the performance and scalability of the randomized CX low-rank matrix factorization and demonstrate its applicability through the analysis of a 1TB mass spectrometry imaging (MSI) dataset, using Apache Spark on an Amazon EC2 cluster, a Cray XC40 system, and an experimental Cray cluster. We implemented this factorization both as a parallelized C implementation with hand-tuned optimizations and in Scala using the Apache Spark high-level cluster computing framework. We obtained consistent performance across the three platforms: using Spark we were able to process the 1TB size dataset in under 30 minutes with 960 cores on all systems, with themore » fastest times obtained on the experimental Cray cluster. In comparison, the C implementation was 21X faster on the Amazon EC2 system, due to careful cache optimizations, bandwidth-friendly access of matrices and vector computation using SIMD units. We report these results and their implications on the hardware and software issues arising in supporting data-centric workloads in parallel and distributed environments.« less
Conceptual design of a hybrid parallel mechanism for mask exchanging of TMT

NASA Astrophysics Data System (ADS)

Wang, Jianping; Zhou, Hongfei; Li, Kexuan; Zhou, Zengxiang; Zhai, Chao

2015-10-01

Mask exchange system is an important part of the Multi-Object Broadband Imaging Echellette (MOBIE) on the Thirty Meter Telescope (TMT). To solve the problem of stiffness changing with the gravity vector of the mask exchange system in the MOBIE, the hybrid parallel mechanism design method was introduced into the whole research. By using the characteristics of high stiffness and precision of parallel structure, combined with large moving range of serial structure, a conceptual design of a hybrid parallel mask exchange system based on 3-RPS parallel mechanism was presented. According to the position requirements of the MOBIE, the SolidWorks structure model of the hybrid parallel mask exchange robot was established and the appropriate installation position without interfering with the related components and light path in the MOBIE of TMT was analyzed. Simulation results in SolidWorks suggested that 3-RPS parallel platform had good stiffness property in different gravity vector directions. Furthermore, through the research of the mechanism theory, the inverse kinematics solution of the 3-RPS parallel platform was calculated and the mathematical relationship between the attitude angle of moving platform and the angle of ball-hinges on the moving platform was established, in order to analyze the attitude adjustment ability of the hybrid parallel mask exchange robot. The proposed conceptual design has some guiding significance for the design of mask exchange system of the MOBIE on TMT.
Cache-Oblivious parallel SIMD Viterbi decoding for sequence search in HMMER.

PubMed

Ferreira, Miguel; Roma, Nuno; Russo, Luis M S

2014-05-30

HMMER is a commonly used bioinformatics tool based on Hidden Markov Models (HMMs) to analyze and process biological sequences. One of its main homology engines is based on the Viterbi decoding algorithm, which was already highly parallelized and optimized using Farrar's striped processing pattern with Intel SSE2 instruction set extension. A new SIMD vectorization of the Viterbi decoding algorithm is proposed, based on an SSE2 inter-task parallelization approach similar to the DNA alignment algorithm proposed by Rognes. Besides this alternative vectorization scheme, the proposed implementation also introduces a new partitioning of the Markov model that allows a significantly more efficient exploitation of the cache locality. Such optimization, together with an improved loading of the emission scores, allows the achievement of a constant processing throughput, regardless of the innermost-cache size and of the dimension of the considered model. The proposed optimized vectorization of the Viterbi decoding algorithm was extensively evaluated and compared with the HMMER3 decoder to process DNA and protein datasets, proving to be a rather competitive alternative implementation. Being always faster than the already highly optimized ViterbiFilter implementation of HMMER3, the proposed Cache-Oblivious Parallel SIMD Viterbi (COPS) implementation provides a constant throughput and offers a processing speedup as high as two times faster, depending on the model's size.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Dupertuis, M.A.; Proctor, M.; Acklin, B.

Energy balance and reciprocity relations are studied for harmonic inhomogeneous plane waves that are incident upon a stack of continuous absorbing dielectric media that are macroscopically characterized by their electric and magnetic permittivities and their conductivities. New cross terms between parallel electric and parallel magnetic modes are identified in the fully generalized Poynting vector. The symmetry and the relations between the general Fresnel coefficients are investigated in the context of energy balance at the interface. The contributions of the so-called mixed Poynting vector are discussed in detail. In particular a new transfer matrix is introduced for energy fluxes in thin-filmmore » optics based on the Poynting and mixed Poynting vectors. Finally, the study of reciprocity relations leads to a generalization of a theorem of reversibility for conducting and dielectric media. 16 refs.« less
Text analysis devices, articles of manufacture, and text analysis methods

DOEpatents

Turner, Alan E; Hetzler, Elizabeth G; Nakamura, Grant C

2015-03-31

Text analysis devices, articles of manufacture, and text analysis methods are described according to some aspects. In one aspect, a text analysis device includes a display configured to depict visible images, and processing circuitry coupled with the display and wherein the processing circuitry is configured to access a first vector of a text item and which comprises a plurality of components, to access a second vector of the text item and which comprises a plurality of components, to weight the components of the first vector providing a plurality of weighted values, to weight the components of the second vector providing a plurality of weighted values, and to combine the weighted values of the first vector with the weighted values of the second vector to provide a third vector.
Brian Hears: Online Auditory Processing Using Vectorization Over Channels

PubMed Central

Fontaine, Bertrand; Goodman, Dan F. M.; Benichoux, Victor; Brette, Romain

2011-01-01

The human cochlea includes about 3000 inner hair cells which filter sounds at frequencies between 20 Hz and 20 kHz. This massively parallel frequency analysis is reflected in models of auditory processing, which are often based on banks of filters. However, existing implementations do not exploit this parallelism. Here we propose algorithms to simulate these models by vectorizing computation over frequency channels, which are implemented in “Brian Hears,” a library for the spiking neural network simulator package “Brian.” This approach allows us to use high-level programming languages such as Python, because with vectorized operations, the computational cost of interpretation represents a small fraction of the total cost. This makes it possible to define and simulate complex models in a simple way, while all previous implementations were model-specific. In addition, we show that these algorithms can be naturally parallelized using graphics processing units, yielding substantial speed improvements. We demonstrate these algorithms with several state-of-the-art cochlear models, and show that they compare favorably with existing, less flexible, implementations. PMID:21811453

Efficient solution of parabolic equations by Krylov approximation methods

NASA Technical Reports Server (NTRS)

Gallopoulos, E.; Saad, Y.

1990-01-01

Numerical techniques for solving parabolic equations by the method of lines is addressed. The main motivation for the proposed approach is the possibility of exploiting a high degree of parallelism in a simple manner. The basic idea of the method is to approximate the action of the evolution operator on a given state vector by means of a projection process onto a Krylov subspace. Thus, the resulting approximation consists of applying an evolution operator of a very small dimension to a known vector which is, in turn, computed accurately by exploiting well-known rational approximations to the exponential. Because the rational approximation is only applied to a small matrix, the only operations required with the original large matrix are matrix-by-vector multiplications, and as a result the algorithm can easily be parallelized and vectorized. Some relevant approximation and stability issues are discussed. We present some numerical experiments with the method and compare its performance with a few explicit and implicit algorithms.
Verification of Electromagnetic Physics Models for Parallel Computing Architectures in the GeantV Project

DOE Office of Scientific and Technical Information (OSTI.GOV)

Amadio, G.; et al.

An intensive R&D and programming effort is required to accomplish new challenges posed by future experimental high-energy particle physics (HEP) programs. The GeantV project aims to narrow the gap between the performance of the existing HEP detector simulation software and the ideal performance achievable, exploiting latest advances in computing technology. The project has developed a particle detector simulation prototype capable of transporting in parallel particles in complex geometries exploiting instruction level microparallelism (SIMD and SIMT), task-level parallelism (multithreading) and high-level parallelism (MPI), leveraging both the multi-core and the many-core opportunities. We present preliminary verification results concerning the electromagnetic (EM) physicsmore » models developed for parallel computing architectures within the GeantV project. In order to exploit the potential of vectorization and accelerators and to make the physics model effectively parallelizable, advanced sampling techniques have been implemented and tested. In this paper we introduce a set of automated statistical tests in order to verify the vectorized models by checking their consistency with the corresponding Geant4 models and to validate them against experimental data.« less
Jupiter Environment Tool

NASA Technical Reports Server (NTRS)

Sturm, Erick J.; Monahue, Kenneth M.; Biehl, James P.; Kokorowski, Michael; Ngalande, Cedrick,; Boedeker, Jordan

2012-01-01

The Jupiter Environment Tool (JET) is a custom UI plug-in for STK that provides an interface to Jupiter environment models for visualization and analysis. Users can visualize the different magnetic field models of Jupiter through various rendering methods, which are fully integrated within STK s 3D Window. This allows users to take snapshots and make animations of their scenarios with magnetic field visualizations. Analytical data can be accessed in the form of custom vectors. Given these custom vectors, users have access to magnetic field data in custom reports, graphs, access constraints, coverage analysis, and anywhere else vectors are used within STK.
Method and means for measuring the anisotropy of a plasma in a magnetic field

DOEpatents

Shohet, J.L.; Greene, D.G.S.

1973-10-23

Anisotropy is measured of a free-free-bremsstrahlungradiation-generating plasma in a magnetic field by collimating the free-free bremsstrahlung radiation in a direction normal to the magnetic field and scattering the collimated free- free bremsstrahlung radiation to resolve the radiation into its vector components in a plane parallel to the electric field of the bremsstrahlung radiation. The scattered vector components are counted at particular energy levels in a direction parallel to the magnetic field and also normal to the magnetic field of the plasma to provide a measure of anisotropy of the plasma. (Official Gazette)
VectorBase: a home for invertebrate vectors of human pathogens

PubMed Central

Lawson, Daniel; Arensburger, Peter; Atkinson, Peter; Besansky, Nora J.; Bruggner, Robert V.; Butler, Ryan; Campbell, Kathryn S.; Christophides, George K.; Christley, Scott; Dialynas, Emmanuel; Emmert, David; Hammond, Martin; Hill, Catherine A.; Kennedy, Ryan C.; Lobo, Neil F.; MacCallum, M. Robert; Madey, Greg; Megy, Karine; Redmond, Seth; Russo, Susan; Severson, David W.; Stinson, Eric O.; Topalis, Pantelis; Zdobnov, Evgeny M.; Birney, Ewan; Gelbart, William M.; Kafatos, Fotis C.; Louis, Christos; Collins, Frank H.

2007-01-01

VectorBase () is a web-accessible data repository for information about invertebrate vectors of human pathogens. VectorBase annotates and maintains vector genomes providing an integrated resource for the research community. Currently, VectorBase contains genome information for two organisms: Anopheles gambiae, a vector for the Plasmodium protozoan agent causing malaria, and Aedes aegypti, a vector for the flaviviral agents causing Yellow fever and Dengue fever. PMID:17145709
SIERRA - A 3-D device simulator for reliability modeling

NASA Astrophysics Data System (ADS)

Chern, Jue-Hsien; Arledge, Lawrence A., Jr.; Yang, Ping; Maeda, John T.

1989-05-01

SIERRA is a three-dimensional general-purpose semiconductor-device simulation program which serves as a foundation for investigating integrated-circuit (IC) device and reliability issues. This program solves the Poisson and continuity equations in silicon under dc, transient, and small-signal conditions. Executing on a vector/parallel minisupercomputer, SIERRA utilizes a matrix solver which uses an incomplete LU (ILU) preconditioned conjugate gradient square (CGS, BCG) method. The ILU-CGS method provides a good compromise between memory size and convergence rate. The authors have observed a 5x to 7x speedup over standard direct methods in simulations of transient problems containing highly coupled Poisson and continuity equations such as those found in reliability-oriented simulations. The application of SIERRA to parasitic CMOS latchup and dynamic random-access memory single-event-upset studies is described.
Coding for parallel execution of hardware-in-the-loop millimeter-wave scene generation models on multicore SIMD processor architectures

NASA Astrophysics Data System (ADS)

Olson, Richard F.

2013-05-01

Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.
Parallel processors and nonlinear structural dynamics algorithms and software

NASA Technical Reports Server (NTRS)

Belytschko, Ted

1990-01-01

Techniques are discussed for the implementation and improvement of vectorization and concurrency in nonlinear explicit structural finite element codes. In explicit integration methods, the computation of the element internal force vector consumes the bulk of the computer time. The program can be efficiently vectorized by subdividing the elements into blocks and executing all computations in vector mode. The structuring of elements into blocks also provides a convenient way to implement concurrency by creating tasks which can be assigned to available processors for evaluation. The techniques were implemented in a 3-D nonlinear program with one-point quadrature shell elements. Concurrency and vectorization were first implemented in a single time step version of the program. Techniques were developed to minimize processor idle time and to select the optimal vector length. A comparison of run times between the program executed in scalar, serial mode and the fully vectorized code executed concurrently using eight processors shows speed-ups of over 25. Conjugate gradient methods for solving nonlinear algebraic equations are also readily adapted to a parallel environment. A new technique for improving convergence properties of conjugate gradients in nonlinear problems is developed in conjunction with other techniques such as diagonal scaling. A significant reduction in the number of iterations required for convergence is shown for a statically loaded rigid bar suspended by three equally spaced springs.
The International Conference on Vector and Parallel Computing (2nd)

DTIC Science & Technology

1989-01-17

Computation of the SVD of Bidiagonal Matrices" ...................................... 11 " Lattice QCD -As a Large Scale Scientific Computation...vectorizcd for the IBM 3090 Vector Facility. In addition, elapsed times " Lattice QCD -As a Large Scale Scientific have been reduced by using 3090...benchmarked Lattice QCD on a large number ofcompu- come from the wavefront solver routine. This was exten- ters: CrayX-MP and Cray 2 (vector
A third-generation density-functional-theory-based method for calculating canonical molecular orbitals of large molecules.

PubMed

Hirano, Toshiyuki; Sato, Fumitoshi

2014-07-28

We used grid-free modified Cholesky decomposition (CD) to develop a density-functional-theory (DFT)-based method for calculating the canonical molecular orbitals (CMOs) of large molecules. Our method can be used to calculate standard CMOs, analytically compute exchange-correlation terms, and maximise the capacity of next-generation supercomputers. Cholesky vectors were first analytically downscaled using low-rank pivoted CD and CD with adaptive metric (CDAM). The obtained Cholesky vectors were distributed and stored on each computer node in a parallel computer, and the Coulomb, Fock exchange, and pure exchange-correlation terms were calculated by multiplying the Cholesky vectors without evaluating molecular integrals in self-consistent field iterations. Our method enables DFT and massively distributed memory parallel computers to be used in order to very efficiently calculate the CMOs of large molecules.
Evaluation of a new parallel numerical parameter optimization algorithm for a dynamical system

NASA Astrophysics Data System (ADS)

Duran, Ahmet; Tuncel, Mehmet

2016-10-01

It is important to have a scalable parallel numerical parameter optimization algorithm for a dynamical system used in financial applications where time limitation is crucial. We use Message Passing Interface parallel programming and present such a new parallel algorithm for parameter estimation. For example, we apply the algorithm to the asset flow differential equations that have been developed and analyzed since 1989 (see [3-6] and references contained therein). We achieved speed-up for some time series to run up to 512 cores (see [10]). Unlike [10], we consider more extensive financial market situations, for example, in presence of low volatility, high volatility and stock market price at a discount/premium to its net asset value with varying magnitude, in this work. Moreover, we evaluated the convergence of the model parameter vector, the nonlinear least squares error and maximum improvement factor to quantify the success of the optimization process depending on the number of initial parameter vectors.
Research in computer science

NASA Technical Reports Server (NTRS)

Ortega, J. M.

1986-01-01

Various graduate research activities in the field of computer science are reported. Among the topics discussed are: (1) failure probabilities in multi-version software; (2) Gaussian Elimination on parallel computers; (3) three dimensional Poisson solvers on parallel/vector computers; (4) automated task decomposition for multiple robot arms; (5) multi-color incomplete cholesky conjugate gradient methods on the Cyber 205; and (6) parallel implementation of iterative methods for solving linear equations.
Spin wave filtering and guiding in Permalloy/iron nanowires

NASA Astrophysics Data System (ADS)

Silvani, R.; Kostylev, M.; Adeyeye, A. O.; Gubbiotti, G.

2018-03-01

We have investigated the spin wave filtering and guiding properties of periodic array of single (Permalloy and Fe) and bi-layer (Py/Fe) nanowires (NWs) by means of Brillouin light scattering measurements and micromagnetic simulations. For all the nanowire arrays, the thickness of the layers is 10 nm while all NWs have the same width of 340 nm and edge-to-edge separation of 100 nm. Spin wave dispersion has been measured in the Damon-Eshbach configuration for wave vector either parallel or perpendicular to the nanowire length. This study reveals the filtering property of the spin waves when the wave vector is perpendicular to the NW length, with frequency ranges where the spin wave propagation is permitted separated by frequency band gaps, and the guiding property of NW when the wave vector is oriented parallel to the NW, with spin wave modes propagating in parallel channels in the central and edge regions of the NW. The measured dispersions were well reproduced by micromagnetic simulations, which also deliver the spatial profiles for the modes at zero wave vector. To reproduce the dispersion of the modes localized close to the NW edges, uniaxial anisotropy has been introduced. In the case of Permalloy/iron NWs, the obtained results have been compared with those for a 20 nm thick effective NW having average magnetic properties of the two materials.
Reanalyzing the "far medial" (transcondylar-transtubercular) approach based on three anatomical vectors: the ventral posterolateral corridor.

PubMed

Chakravarthi, Srikant; Monroy-Sosa, Alejandro; Gonen, Lior; Fukui, Melanie; Rovin, Richard; Kojis, Nathaniel; Lindsay, Mark; Khalili, Sammy; Celix, Juanita; Corsten, Martin; Kassam, Amin B

2018-06-01

Endoscopic endonasal access to the jugular foramen and occipital condyle - the transcondylar-transtubercular approach - is anatomically complex and requires detailed knowledge of the relative position of critical neurovascular structures, in order to avoid inadvertent injury and resultant complications. However, access to this region can be confusing as the orientation and relationships of osseous, vascular, and neural structures are very much different from traditional dorsal approaches. This review aims at providing an organizational construct for a more understandable framework in accessing the transcondylar-transtubercular window. The region can be conceptualized using a three-vector coordinate system: vector 1 represents a dorsal or ventral corridor, vector 2 represents the outer and inner circumferential anatomical limits; in an "onion-skin" fashion, key osseous, vascular, and neural landmarks are organized based on a 360-degree skull base model, and vector 3 represents the final core or target of the surgical corridor. The creation of an organized "global-positioning system" may better guide the surgeon in accessing the far-medial transcondylar-transtubercular region, and related pathologies, and help understand the surgical limits to the occipital condyle and jugular foramen - the ventral posterolateral corridor - via the endoscopic endonasal approach.
Extendability of parallel sections in vector bundles

NASA Astrophysics Data System (ADS)

Kirschner, Tim

2016-01-01

I address the following question: Given a differentiable manifold M, what are the open subsets U of M such that, for all vector bundles E over M and all linear connections ∇ on E, any ∇-parallel section in E defined on U extends to a ∇-parallel section in E defined on M? For simply connected manifolds M (among others) I describe the entirety of all such sets U which are, in addition, the complement of a C1 submanifold, boundary allowed, of M. This delivers a partial positive answer to a problem posed by Antonio J. Di Scala and Gianni Manno (2014). Furthermore, in case M is an open submanifold of Rn, n ≥ 2, I prove that the complement of U in M, not required to be a submanifold now, can have arbitrarily large n-dimensional Lebesgue measure.
A Computer Simulation of the System-Wide Effects of Parallel-Offset Route Maneuvers

NASA Technical Reports Server (NTRS)

Lauderdale, Todd A.; Santiago, Confesor; Pankok, Carl

2010-01-01

Most aircraft managed by air-traffic controllers in the National Airspace System are capable of flying parallel-offset routes. This paper presents the results of two related studies on the effects of increased use of offset routes as a conflict resolution maneuver. The first study analyzes offset routes in the context of all standard resolution types which air-traffic controllers currently use. This study shows that by utilizing parallel-offset route maneuvers, significant system-wide savings in delay due to conflict resolution of up to 30% are possible. It also shows that most offset resolutions replace horizontal-vectoring resolutions. The second study builds on the results of the first and directly compares offset resolutions and standard horizontal-vectoring maneuvers to determine that in-trail conflicts are often more efficiently resolved by offset maneuvers.
Cache-Oblivious parallel SIMD Viterbi decoding for sequence search in HMMER

PubMed Central

2014-01-01

Background HMMER is a commonly used bioinformatics tool based on Hidden Markov Models (HMMs) to analyze and process biological sequences. One of its main homology engines is based on the Viterbi decoding algorithm, which was already highly parallelized and optimized using Farrar’s striped processing pattern with Intel SSE2 instruction set extension. Results A new SIMD vectorization of the Viterbi decoding algorithm is proposed, based on an SSE2 inter-task parallelization approach similar to the DNA alignment algorithm proposed by Rognes. Besides this alternative vectorization scheme, the proposed implementation also introduces a new partitioning of the Markov model that allows a significantly more efficient exploitation of the cache locality. Such optimization, together with an improved loading of the emission scores, allows the achievement of a constant processing throughput, regardless of the innermost-cache size and of the dimension of the considered model. Conclusions The proposed optimized vectorization of the Viterbi decoding algorithm was extensively evaluated and compared with the HMMER3 decoder to process DNA and protein datasets, proving to be a rather competitive alternative implementation. Being always faster than the already highly optimized ViterbiFilter implementation of HMMER3, the proposed Cache-Oblivious Parallel SIMD Viterbi (COPS) implementation provides a constant throughput and offers a processing speedup as high as two times faster, depending on the model’s size. PMID:24884826
Performance of the Galley Parallel File System

NASA Technical Reports Server (NTRS)

Nieuwejaar, Nils; Kotz, David

1996-01-01

As the input/output (I/O) needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. This interface conceals the parallism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. Initial experiments, reported in this paper, indicate that Galley is capable of providing high-performance 1/O to applications the applications that rely on them. In Section 3 we describe that access data in patterns that have been observed to be common.
Optical computing and image processing using photorefractive gallium arsenide

NASA Technical Reports Server (NTRS)

Cheng, Li-Jen; Liu, Duncan T. H.

1990-01-01

Recent experimental results on matrix-vector multiplication and multiple four-wave mixing using GaAs are presented. Attention is given to a simple concept of using two overlapping holograms in GaAs to do two matrix-vector multiplication processes operating in parallel with a common input vector. This concept can be used to construct high-speed, high-capacity, reconfigurable interconnection and multiplexing modules, important for optical computing and neural-network applications.
Fast parallel tandem mass spectral library searching using GPU hardware acceleration

PubMed Central

Baumgardner, Lydia Ashleigh; Shanmugam, Avinash Kumar; Lam, Henry; Eng, Jimmy K.; Martin, Daniel B.

2011-01-01

Mass spectrometry-based proteomics is a maturing discipline of biologic research that is experiencing substantial growth. Instrumentation has steadily improved over time with the advent of faster and more sensitive instruments collecting ever larger data files. Consequently, the computational process of matching a peptide fragmentation pattern to its sequence, traditionally accomplished by sequence database searching and more recently also by spectral library searching, has become a bottleneck in many mass spectrometry experiments. In both of these methods, the main rate limiting step is the comparison of an acquired spectrum with all potential matches from a spectral library or sequence database. This is a highly parallelizable process because the core computational element can be represented as a simple but arithmetically intense multiplication of two vectors. In this paper we present a proof of concept project taking advantage of the massively parallel computing available on graphics processing units (GPUs) to distribute and accelerate the process of spectral assignment using spectral library searching. This program, which we have named FastPaSS (for Fast Parallelized Spectral Searching) is implemented in CUDA (Compute Unified Device Architecture) from NVIDIA which allows direct access to the processors in an NVIDIA GPU. Our efforts demonstrate the feasibility of GPU computing for spectral assignment, through implementation of the validated spectral searching algorithm SpectraST in the CUDA environment. PMID:21545112

Implementation of a parallel unstructured Euler solver on the CM-5

NASA Technical Reports Server (NTRS)

Morano, Eric; Mavriplis, D. J.

1995-01-01

An efficient unstructured 3D Euler solver is parallelized on a Thinking Machine Corporation Connection Machine 5, distributed memory computer with vectoring capability. In this paper, the single instruction multiple data (SIMD) strategy is employed through the use of the CM Fortran language and the CMSSL scientific library. The performance of the CMSSL mesh partitioner is evaluated and the overall efficiency of the parallel flow solver is discussed.
Solution of partial differential equations on vector and parallel computers

NASA Technical Reports Server (NTRS)

Ortega, J. M.; Voigt, R. G.

1985-01-01

The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed.
Using domain decomposition in the multigrid NAS parallel benchmark on the Fujitsu VPP500

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wang, J.C.H.; Lung, H.; Katsumata, Y.

1995-12-01

In this paper, we demonstrate how domain decomposition can be applied to the multigrid algorithm to convert the code for MPP architectures. We also discuss the performance and scalability of this implementation on the new product line of Fujitsu`s vector parallel computer, VPP500. This computer has Fujitsu`s well-known vector processor as the PE each rated at 1.6 C FLOPS. The high speed crossbar network rated at 800 MB/s provides the inter-PE communication. The results show that the physical domain decomposition is the best way to solve MG problems on VPP500.
Sparse matrix-vector multiplication on network-on-chip

NASA Astrophysics Data System (ADS)

Sun, C.-C.; Götze, J.; Jheng, H.-Y.; Ruan, S.-J.

2010-12-01

In this paper, we present an idea for performing matrix-vector multiplication by using Network-on-Chip (NoC) architecture. In traditional IC design on-chip communications have been designed with dedicated point-to-point interconnections. Therefore, regular local data transfer is the major concept of many parallel implementations. However, when dealing with the parallel implementation of sparse matrix-vector multiplication (SMVM), which is the main step of all iterative algorithms for solving systems of linear equation, the required data transfers depend on the sparsity structure of the matrix and can be extremely irregular. Using the NoC architecture makes it possible to deal with arbitrary structure of the data transfers; i.e. with the irregular structure of the sparse matrices. So far, we have already implemented the proposed SMVM-NoC architecture with the size 4×4 and 5×5 in IEEE 754 single float point precision using FPGA.
Satellite Angular Rate Estimation From Vector Measurements

NASA Technical Reports Server (NTRS)

Azor, Ruth; Bar-Itzhack, Itzhack Y.; Harman, Richard R.

1996-01-01

This paper presents an algorithm for estimating the angular rate vector of a satellite which is based on the time derivatives of vector measurements expressed in a reference and body coordinate. The computed derivatives are fed into a spacial Kalman filter which yields an estimate of the spacecraft angular velocity. The filter, named Extended Interlaced Kalman Filter (EIKF), is an extension of the Kalman filter which, although being linear, estimates the state of a nonlinear dynamic system. It consists of two or three parallel Kalman filters whose individual estimates are fed to one another and are considered as known inputs by the other parallel filter(s). The nonlinear dynamics stem from the nonlinear differential equation that describes the rotation of a three dimensional body. Initial results, using simulated data, and real Rossi X ray Timing Explorer (RXTE) data indicate that the algorithm is efficient and robust.
Full-field drift Hamiltonian particle orbits in 3D geometry

NASA Astrophysics Data System (ADS)

Cooper, W. A.; Graves, J. P.; Brunner, S.; Isaev, M. Yu

2011-02-01

A Hamiltonian/Lagrangian theory to describe guiding centre orbit drift motion which is canonical in the Boozer coordinate frame has been extended to include full electromagnetic perturbed fields in anisotropic pressure 3D equilibria with nested magnetic flux surfaces. A redefinition of the guiding centre velocity to eliminate the motion due to finite equilibrium radial magnetic fields and the choice of a gauge condition that sets the radial component of the electromagnetic vector potential to zero are invoked to guarantee that the Boozer angular coordinates retain the canonical structure. The canonical momenta are identified and the guiding centre particle radial drift motion and parallel gyroradius evolution are derived. The particle coordinate position is linearly modified by wave-particle interactions. All the nonlinear wave-wave interactions appear explicitly only in the evolution of the parallel gyroradius. The radial variation of the electrostatic potential is related to the binormal component of the displacement vector for MHD-type perturbations. The electromagnetic vector potential projections can then be determined from the electrostatic potential and the radial component of the MHD displacement vector.
Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Carter, Jonathan; Shalf, John; Skinner, David; Ethier, Stephane; Biswas, Rupak; Djomehri, Jahed; VanderWijngaart, Rob

2003-01-01

The growing gap between sustained and peak performance for scientific applications has become a well-known problem in high performance computing. The recent development of parallel vector systems offers the potential to bridge this gap for a significant number of computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines a full spectrum of low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks using some simple optimizations. Finally, we evaluate the perfor- mance of several numerical codes from key scientific computing domains. Overall results demonstrate that the SX6 achieves high performance on a large fraction of our application suite and in many cases significantly outperforms the RISC-based architectures. However, certain classes of applications are not easily amenable to vectorization and would likely require extensive reengineering of both algorithm and implementation to utilize the SX6 effectively.
Comparison of serological and molecular panels for diagnosis of vector-borne diseases in dogs

PubMed Central

2014-01-01

Background Canine vector-borne diseases (CVBD) are caused by a diverse array of pathogens with varying biological behaviors that result in a wide spectrum of clinical presentations and laboratory abnormalities. For many reasons, the diagnosis of canine vector-borne infectious diseases can be challenging for clinicians. The aim of the present study was to compare CVBD serological and molecular testing as the two most common methodologies used for screening healthy dogs or diagnosing sick dogs in which a vector-borne disease is suspected. Methods We used serological (Anaplasma species, Babesia canis, Bartonella henselae, Bartonella vinsonii subspecies berkhoffii, Borrelia burgdorferi, Ehrlichia canis, and SFG Rickettsia) and molecular assays to assess for exposure to, or infection with, 10 genera of organisms that cause CVBDs (Anaplasma, Babesia, Bartonella, Borrelia, Ehrlichia, Francisella, hemotropic Mycoplasma, Neorickettsia, Rickettsia, and Dirofilaria). Paired serum and EDTA blood samples from 30 clinically healthy dogs (Group I) and from 69 sick dogs suspected of having one or more canine vector-borne diseases (Groups II-IV), were tested in parallel to establish exposure to or infection with the specific CVBDs targeted in this study. Results Among all dogs tested (Groups I-IV), the molecular prevalences for individual CVBD pathogens ranged between 23.3 and 39.1%. Similarly, pathogen-specific seroprevalences ranged from 43.3% to 59.4% among healthy and sick dogs (Groups I-IV). Among these representative sample groupings, a panel combining serological and molecular assays run in parallel resulted in a 4-58% increase in the recognition of exposure to or infection with CVBD. Conclusions We conclude that serological and PCR assays should be used in parallel to maximize CVBD diagnosis. PMID:24670154
Comparison of serological and molecular panels for diagnosis of vector-borne diseases in dogs.

PubMed

Maggi, Ricardo G; Birkenheuer, Adam J; Hegarty, Barbara C; Bradley, Julie M; Levy, Michael G; Breitschwerdt, Edward B

2014-03-26

Canine vector-borne diseases (CVBD) are caused by a diverse array of pathogens with varying biological behaviors that result in a wide spectrum of clinical presentations and laboratory abnormalities. For many reasons, the diagnosis of canine vector-borne infectious diseases can be challenging for clinicians. The aim of the present study was to compare CVBD serological and molecular testing as the two most common methodologies used for screening healthy dogs or diagnosing sick dogs in which a vector-borne disease is suspected. We used serological (Anaplasma species, Babesia canis, Bartonella henselae, Bartonella vinsonii subspecies berkhoffii, Borrelia burgdorferi, Ehrlichia canis, and SFG Rickettsia) and molecular assays to assess for exposure to, or infection with, 10 genera of organisms that cause CVBDs (Anaplasma, Babesia, Bartonella, Borrelia, Ehrlichia, Francisella, hemotropic Mycoplasma, Neorickettsia, Rickettsia, and Dirofilaria). Paired serum and EDTA blood samples from 30 clinically healthy dogs (Group I) and from 69 sick dogs suspected of having one or more canine vector-borne diseases (Groups II-IV), were tested in parallel to establish exposure to or infection with the specific CVBDs targeted in this study. Among all dogs tested (Groups I-IV), the molecular prevalences for individual CVBD pathogens ranged between 23.3 and 39.1%. Similarly, pathogen-specific seroprevalences ranged from 43.3% to 59.4% among healthy and sick dogs (Groups I-IV). Among these representative sample groupings, a panel combining serological and molecular assays run in parallel resulted in a 4-58% increase in the recognition of exposure to or infection with CVBD. We conclude that serological and PCR assays should be used in parallel to maximize CVBD diagnosis.
Parallel image compression

NASA Technical Reports Server (NTRS)

Reif, John H.

1987-01-01

A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.
Fast adaptive composite grid methods on distributed parallel architectures

NASA Technical Reports Server (NTRS)

Lemke, Max; Quinlan, Daniel

1992-01-01

The fast adaptive composite (FAC) grid method is compared with the adaptive composite method (AFAC) under variety of conditions including vectorization and parallelization. Results are given for distributed memory multiprocessor architectures (SUPRENUM, Intel iPSC/2 and iPSC/860). It is shown that the good performance of AFAC and its superiority over FAC in a parallel environment is a property of the algorithm and not dependent on peculiarities of any machine.
Using Intel's Knight Landing Processor to Accelerate Global Nested Air Quality Prediction Modeling System (GNAQPMS) Model

NASA Astrophysics Data System (ADS)

Wang, H.; Chen, H.; Chen, X.; Wu, Q.; Wang, Z.

2016-12-01

The Global Nested Air Quality Prediction Modeling System for Hg (GNAQPMS-Hg) is a global chemical transport model coupled Hg transport module to investigate the mercury pollution. In this study, we present our work of transplanting the GNAQPMS model on Intel Xeon Phi processor, Knights Landing (KNL) to accelerate the model. KNL is the second-generation product adopting Many Integrated Core Architecture (MIC) architecture. Compared with the first generation Knight Corner (KNC), KNL has more new hardware features, that it can be used as unique processor as well as coprocessor with other CPU. According to the Vtune tool, the high overhead modules in GNAQPMS model have been addressed, including CBMZ gas chemistry, advection and convection module, and wet deposition module. These high overhead modules were accelerated by optimizing code and using new techniques of KNL. The following optimized measures was done: 1) Changing the pure MPI parallel mode to hybrid parallel mode with MPI and OpenMP; 2.Vectorizing the code to using the 512-bit wide vector computation unit. 3. Reducing unnecessary memory access and calculation. 4. Reducing Thread Local Storage (TLS) for common variables with each OpenMP thread in CBMZ. 5. Changing the way of global communication from files writing and reading to MPI functions. After optimization, the performance of GNAQPMS is greatly increased both on CPU and KNL platform, the single-node test showed that optimized version has 2.6x speedup on two sockets CPU platform and 3.3x speedup on one socket KNL platform compared with the baseline version code, which means the KNL has 1.29x speedup when compared with 2 sockets CPU platform.
Accessing and visualizing scientific spatiotemporal data

NASA Technical Reports Server (NTRS)

Katz, Daniel S.; Bergou, Attila; Berriman, G. Bruce; Block, Gary L.; Collier, Jim; Curkendall, David W.; Good, John; Husman, Laura; Jacob, Joseph C.; Laity, Anastasia;

2004-01-01

This paper discusses work done by JPL's Parallel Applications Technologies Group in helping scientists access and visualize very large data sets through the use of multiple computing resources, such as parallel supercomputers, clusters, and grids.

Algorithms for parallel and vector computations

NASA Technical Reports Server (NTRS)

Ortega, James M.

1995-01-01

This is a final report on work performed under NASA grant NAG-1-1112-FOP during the period March, 1990 through February 1995. Four major topics are covered: (1) solution of nonlinear poisson-type equations; (2) parallel reduced system conjugate gradient method; (3) orderings for conjugate gradient preconditioners, and (4) SOR as a preconditioner.
Application of high-performance computing to numerical simulation of human movement

NASA Technical Reports Server (NTRS)

Anderson, F. C.; Ziegler, J. M.; Pandy, M. G.; Whalen, R. T.

1995-01-01

We have examined the feasibility of using massively-parallel and vector-processing supercomputers to solve large-scale optimization problems for human movement. Specifically, we compared the computational expense of determining the optimal controls for the single support phase of gait using a conventional serial machine (SGI Iris 4D25), a MIMD parallel machine (Intel iPSC/860), and a parallel-vector-processing machine (Cray Y-MP 8/864). With the human body modeled as a 14 degree-of-freedom linkage actuated by 46 musculotendinous units, computation of the optimal controls for gait could take up to 3 months of CPU time on the Iris. Both the Cray and the Intel are able to reduce this time to practical levels. The optimal solution for gait can be found with about 77 hours of CPU on the Cray and with about 88 hours of CPU on the Intel. Although the overall speeds of the Cray and the Intel were found to be similar, the unique capabilities of each machine are better suited to different portions of the computational algorithm used. The Intel was best suited to computing the derivatives of the performance criterion and the constraints whereas the Cray was best suited to parameter optimization of the controls. These results suggest that the ideal computer architecture for solving very large-scale optimal control problems is a hybrid system in which a vector-processing machine is integrated into the communication network of a MIMD parallel machine.
General formulation for magnetohydrodynamic wave propagation, fire-hose, and mirror instabilities in Harris-type current sheets

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hau, L.-N.; Department of Physics, National Central University, Jhongli, Taiwan; Lai, Y.-T.

Harris-type current sheets with the magnetic field model of B-vector=B{sub x}(z)x-caret+B{sub y}(z)y-caret have many important applications to space, astrophysical, and laboratory plasmas for which the temperature or pressure usually exhibits the gyrotropic form of p{r_reversible}=p{sub Parallel-To }b-caretb-caret+p{sub Up-Tack }(I{r_reversible}-b-caretb-caret). Here, p{sub Parallel-To} and p{sub Up-Tack} are, respectively, to be the pressure component along and perpendicular to the local magnetic field, b-caret=B-vector/B. This study presents the general formulation for magnetohydrodynamic (MHD) wave propagation, fire-hose, and mirror instabilities in general Harris-type current sheets. The wave equations are expressed in terms of the four MHD characteristic speeds of fast, intermediate, slow, and cuspmore » waves, and in the local (k{sub Parallel-To },k{sub Up-Tack },z) coordinates. Here, k{sub Parallel-To} and k{sub Up-Tack} are, respectively, to be the wave vector along and perpendicular to the local magnetic field. The parameter regimes for the existence of discrete and resonant modes are identified, which may become unstable at the local fire-hose and mirror instability thresholds. Numerical solutions for discrete eigenmodes are shown for stable and unstable cases. The results have important implications for the anomalous heating and stability of thin current sheets.« less
Parallel algorithm for determining motion vectors in ice floe images by matching edge features

NASA Technical Reports Server (NTRS)

Manohar, M.; Ramapriyan, H. K.; Strong, J. P.

1988-01-01

A parallel algorithm is described to determine motion vectors of ice floes using time sequences of images of the Arctic ocean obtained from the Synthetic Aperture Radar (SAR) instrument flown on-board the SEASAT spacecraft. Researchers describe a parallel algorithm which is implemented on the MPP for locating corresponding objects based on their translationally and rotationally invariant features. The algorithm first approximates the edges in the images by polygons or sets of connected straight-line segments. Each such edge structure is then reduced to a seed point. Associated with each seed point are the descriptions (lengths, orientations and sequence numbers) of the lines constituting the corresponding edge structure. A parallel matching algorithm is used to match packed arrays of such descriptions to identify corresponding seed points in the two images. The matching algorithm is designed such that fragmentation and merging of ice floes are taken into account by accepting partial matches. The technique has been demonstrated to work on synthetic test patterns and real image pairs from SEASAT in times ranging from .5 to 0.7 seconds for 128 x 128 images.
High-performance Chinese multiclass traffic sign detection via coarse-to-fine cascade and parallel support vector machine detectors

NASA Astrophysics Data System (ADS)

Chang, Faliang; Liu, Chunsheng

2017-09-01

The high variability of sign colors and shapes in uncontrolled environments has made the detection of traffic signs a challenging problem in computer vision. We propose a traffic sign detection (TSD) method based on coarse-to-fine cascade and parallel support vector machine (SVM) detectors to detect Chinese warning and danger traffic signs. First, a region of interest (ROI) extraction method is proposed to extract ROIs using color contrast features in local regions. The ROI extraction can reduce scanning regions and save detection time. For multiclass TSD, we propose a structure that combines a coarse-to-fine cascaded tree with a parallel structure of histogram of oriented gradients (HOG) + SVM detectors. The cascaded tree is designed to detect different types of traffic signs in a coarse-to-fine process. The parallel HOG + SVM detectors are designed to do fine detection of different types of traffic signs. The experiments demonstrate the proposed TSD method can rapidly detect multiclass traffic signs with different colors and shapes in high accuracy.
Waveform inversion for 3-D earth structure using the Direct Solution Method implemented on vector-parallel supercomputer

NASA Astrophysics Data System (ADS)

Hara, Tatsuhiko

2004-08-01

We implement the Direct Solution Method (DSM) on a vector-parallel supercomputer and show that it is possible to significantly improve its computational efficiency through parallel computing. We apply the parallel DSM calculation to waveform inversion of long period (250-500 s) surface wave data for three-dimensional (3-D) S-wave velocity structure in the upper and uppermost lower mantle. We use a spherical harmonic expansion to represent lateral variation with the maximum angular degree 16. We find significant low velocities under south Pacific hot spots in the transition zone. This is consistent with other seismological studies conducted in the Superplume project, which suggests deep roots of these hot spots. We also perform simultaneous waveform inversion for 3-D S-wave velocity and Q structure. Since resolution for Q is not good, we develop a new technique in which power spectra are used as data for inversion. We find good correlation between long wavelength patterns of Vs and Q in the transition zone such as high Vs and high Q under the western Pacific.
Implementation and analysis of a Navier-Stokes algorithm on parallel computers

NASA Technical Reports Server (NTRS)

Fatoohi, Raad A.; Grosch, Chester E.

1988-01-01

The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.

Vector-Based Data Services for NASA Earth Science

NASA Astrophysics Data System (ADS)

Rodriguez, J.; Roberts, J. T.; Ruvane, K.; Cechini, M. F.; Thompson, C. K.; Boller, R. A.; Baynes, K.

2016-12-01

Vector data sources offer opportunities for mapping and visualizing science data in a way that allows for more customizable rendering and deeper data analysis than traditional raster images, and popular formats like GeoJSON and Mapbox Vector Tiles allow diverse types of geospatial data to be served in a high-performance and easily consumed-package. Vector data is especially suited to highly dynamic mapping applications and visualization of complex datasets, while growing levels of support for vector formats and features in open-source mapping clients has made utilizing them easier and more powerful than ever. NASA's Global Imagery Browse Services (GIBS) is working to make NASA data more easily and conveniently accessible than ever by serving vector datasets via GeoJSON, Mapbox Vector Tiles, and raster images. This presentation will review these output formats, the services, including WFS, WMS, and WMTS, that can be used to access the data, and some ways in which vector sources can be utilized in popular open-source mapping clients like OpenLayers. Lessons learned from GIBS' recent move towards serving vector will be discussed, as well as how to use GIBS open source software to create, configure, and serve vector data sources using Mapserver and the GIBS OnEarth Apache module.
Rainfall and Coconut Accession Explain the Composition and Abundance of the Community of Potential Auchenorrhyncha Phytoplasma Vectors in Brazil.

PubMed

Silva, Flaviana G; Passos, Eliana M; Diniz, Leandro E C; Farias, Adriano P; Teodoro, Adenir V; Fernandes, Marcelo F; Dollet, Michel

2018-04-05

Coconut plantations are attacked by the lethal yellowing (LY), which is spreading rapidly with extremely destructive effects in several countries. The disease is caused by phytoplasmas that occur in the plant phloem and are transmitted by Haplaxius crudus (Van Duzee) (Auchenorrhyncha: Cixiidae). Owing to their phloem-sap feeding habit, other planthopper species possibly act as vectors. Here, we aimed at assessing the seasonal variation in the Auchenorrhyncha community in six dwarf coconut accessions. Also, we assessed the relative contribution of biotic (coconut accession) and abiotic (rainfall, temperature) in explaining Auchenorrhyncha composition and abundance. The Auchenorrhyncha community was monthly evaluated for 1 yr using yellow sticky traps. Among the most abundant species, Oecleus sp., Balclutha sp., Deltocephalinae sp.2, Deltocephalinae sp.3, Cenchreini sp., Omolicna nigripennis Caldwell (Derbidae), and Cedusa sp. are potential phytoplasma vectors. The composition of the Auchenorrhyncha community differed between dwarf coconut accessions and periods, namely, in March and April (transition from dry to rainy season) and August (transition from rainy to dry season). In these months, Oecleus sp. was predominantly found in the accessions Cameroon Red Dwarf, Malayan Red Dwarf, and Brazilian Red Dwarf Gramame, while Cenchreini sp. and Bolbonota sp. were dominant in the accessions Brazilian Yellow Dwarf Gramame, Malayan Yellow Dwarf, and Brazilian Green Dwarf Jequi. We conclude that dwarf coconut host several Auchenorrhyncha species potential phytoplasma vectors. Furthermore, coconut accessions could be exploited in breeding programs aiming at prevention of LY. However, rainfall followed by accessions mostly explained the composition and abundance of the Auchenorrhyncha community.
Parallelization of the Physical-Space Statistical Analysis System (PSAS)

NASA Technical Reports Server (NTRS)

Larson, J. W.; Guo, J.; Lyster, P. M.

1999-01-01

Atmospheric data assimilation is a method of combining observations with model forecasts to produce a more accurate description of the atmosphere than the observations or forecast alone can provide. Data assimilation plays an increasingly important role in the study of climate and atmospheric chemistry. The NASA Data Assimilation Office (DAO) has developed the Goddard Earth Observing System Data Assimilation System (GEOS DAS) to create assimilated datasets. The core computational components of the GEOS DAS include the GEOS General Circulation Model (GCM) and the Physical-space Statistical Analysis System (PSAS). The need for timely validation of scientific enhancements to the data assimilation system poses computational demands that are best met by distributed parallel software. PSAS is implemented in Fortran 90 using object-based design principles. The analysis portions of the code solve two equations. The first of these is the "innovation" equation, which is solved on the unstructured observation grid using a preconditioned conjugate gradient (CG) method. The "analysis" equation is a transformation from the observation grid back to a structured grid, and is solved by a direct matrix-vector multiplication. Use of a factored-operator formulation reduces the computational complexity of both the CG solver and the matrix-vector multiplication, rendering the matrix-vector multiplications as a successive product of operators on a vector. Sparsity is introduced to these operators by partitioning the observations using an icosahedral decomposition scheme. PSAS builds a large (approx. 128MB) run-time database of parameters used in the calculation of these operators. Implementing a message passing parallel computing paradigm into an existing yet developing computational system as complex as PSAS is nontrivial. One of the technical challenges is balancing the requirements for computational reproducibility with the need for high performance. The problem of computational reproducibility is well known in the parallel computing community. It is a requirement that the parallel code perform calculations in a fashion that will yield identical results on different configurations of processing elements on the same platform. In some cases this problem can be solved by sacrificing performance. Meeting this requirement and still achieving high performance is very difficult. Topics to be discussed include: current PSAS design and parallelization strategy; reproducibility issues; load balance vs. database memory demands, possible solutions to these problems.
Parallel integer sorting with medium and fine-scale parallelism

NASA Technical Reports Server (NTRS)

Dagum, Leonardo

1993-01-01

Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
Three-Dimensional Route Planning for a Cruise Missile for Minimal Detection by Observer

DTIC Science & Technology

1989-06-01

detect the enemy’s weakest avenues of approach are needed. Such systems could also be used to identify our own deficiencies and allow for...vector-k (oval (line-segment-direction-vector (oval line-i))))) ( Tk2 (vector-k (eval (line-segment-direction-voctor (oval line-2))))) (Tval ’nil...zerop Tkl)) (not (zerop Tk2 ))) (setf Tval (/ Tkl Tk2 ))) (t (return-from parallel-lines ’nil))) (cond ((and (equal Til (* Tval Ti2)) (equal Tjl (* Tval
Interaction of upgoing auroral H(+) and O(+) beams

NASA Technical Reports Server (NTRS)

Kaufmann, R. L.; Ludlow, G. R.; Collin, H. L.; Peterson, W. K.; Burch, J. L.

1986-01-01

Data from the S3-3 and DE 1 satellites are analyzed to study the interaction between H(+) and O(+) ions in upgoing auroral beams. Every data set analyzed showed some evidence of an interaction. The measured plasma was found to be unstable to a low-frequency electrostatic wave that propagates at an oblique angle to vector-B(0). A second wave, which can propagate parallel to vector-B(0), is weakly damped in the plasma studied in most detail. It is likely that the upgoing ion beams generate this parallel wave at lower altitudes. The resulting wave-particle interactions qualitatively can explain most of the features observed in ion distribution functions.
Real time display Fourier-domain OCT using multi-thread parallel computing with data vectorization

NASA Astrophysics Data System (ADS)

Eom, Tae Joong; Kim, Hoon Seop; Kim, Chul Min; Lee, Yeung Lak; Choi, Eun-Seo

2011-03-01

We demonstrate a real-time display of processed OCT images using multi-thread parallel computing with a quad-core CPU of a personal computer. The data of each A-line are treated as one vector to maximize the data translation rate between the cores of the CPU and RAM stored image data. A display rate of 29.9 frames/sec for processed OCT data (4096 FFT-size x 500 A-scans) is achieved in our system using a wavelength swept source with 52-kHz swept frequency. The data processing times of the OCT image and a Doppler OCT image with a 4-time average are 23.8 msec and 91.4 msec.
The Galley Parallel File System

NASA Technical Reports Server (NTRS)

Nieuwejaar, Nils; Kotz, David

1996-01-01

As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. The interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley's file structure and application interface, as well as an application that has been implemented using that interface.
Some fast elliptic solvers on parallel architectures and their complexities

NASA Technical Reports Server (NTRS)

Gallopoulos, E.; Saad, Y.

1989-01-01

The discretization of separable elliptic partial differential equations leads to linear systems with special block tridiagonal matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconstant coefficients. A method was recently proposed to parallelize and vectorize BCR. In this paper, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational compelxity lower than that of parallel BCR.
Some fast elliptic solvers on parallel architectures and their complexities

NASA Technical Reports Server (NTRS)

Gallopoulos, E.; Saad, Youcef

1989-01-01

The discretization of separable elliptic partial differential equations leads to linear systems with special block triangular matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconsistant coefficients. A method was recently proposed to parallelize and vectorize BCR. Here, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches, including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational complexity lower than that of parallel BCR.
Magic Pools: Parallel Assessment of Transposon Delivery Vectors in Bacteria

DOE Office of Scientific and Technical Information (OSTI.GOV)

Liu, Hualan; Price, Morgan N.; Waters, Robert Jordan

Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach for discovering the functions of bacterial genes. However, the development of a suitable TnSeq strategy for a given bacterium can be costly and time-consuming. To meet this challenge, we describe a part-based strategy for constructing libraries of hundreds of transposon delivery vectors, which we term “magic pools.” Within a magic pool, each transposon vector has a different combination of upstream sequences (promoters and ribosome binding sites) and antibiotic resistance markers as well as a random DNA barcode sequence, which allows the tracking of each vector during mutagenesis experiments. Tomore » identify an efficient vector for a given bacterium, we mutagenize it with a magic pool and sequence the resulting insertions; we then use this efficient vector to generate a large mutant library. We used the magic pool strategy to construct transposon mutant libraries in five genera of bacteria, including three genera of the phylumBacteroidetes. IMPORTANCEMolecular genetics is indispensable for interrogating the physiology of bacteria. However, the development of a functional genetic system for any given bacterium can be time-consuming. Here, we present a streamlined approach for identifying an effective transposon mutagenesis system for a new bacterium. Our strategy first involves the construction of hundreds of different transposon vector variants, which we term a “magic pool.” The efficacy of each vector in a magic pool is monitored in parallel using a unique DNA barcode that is introduced into each vector design. Using archived DNA “parts,” we next reassemble an effective vector for making a whole-genome transposon mutant library that is suitable for large-scale interrogation of gene function using competitive growth assays. Here, we demonstrate the utility of the magic pool system to make mutant libraries in five genera of bacteria.« less
Magic Pools: Parallel Assessment of Transposon Delivery Vectors in Bacteria

DOE PAGES

Liu, Hualan; Price, Morgan N.; Waters, Robert Jordan; ...

2018-01-16

Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach for discovering the functions of bacterial genes. However, the development of a suitable TnSeq strategy for a given bacterium can be costly and time-consuming. To meet this challenge, we describe a part-based strategy for constructing libraries of hundreds of transposon delivery vectors, which we term “magic pools.” Within a magic pool, each transposon vector has a different combination of upstream sequences (promoters and ribosome binding sites) and antibiotic resistance markers as well as a random DNA barcode sequence, which allows the tracking of each vector during mutagenesis experiments. Tomore » identify an efficient vector for a given bacterium, we mutagenize it with a magic pool and sequence the resulting insertions; we then use this efficient vector to generate a large mutant library. We used the magic pool strategy to construct transposon mutant libraries in five genera of bacteria, including three genera of the phylumBacteroidetes. IMPORTANCEMolecular genetics is indispensable for interrogating the physiology of bacteria. However, the development of a functional genetic system for any given bacterium can be time-consuming. Here, we present a streamlined approach for identifying an effective transposon mutagenesis system for a new bacterium. Our strategy first involves the construction of hundreds of different transposon vector variants, which we term a “magic pool.” The efficacy of each vector in a magic pool is monitored in parallel using a unique DNA barcode that is introduced into each vector design. Using archived DNA “parts,” we next reassemble an effective vector for making a whole-genome transposon mutant library that is suitable for large-scale interrogation of gene function using competitive growth assays. Here, we demonstrate the utility of the magic pool system to make mutant libraries in five genera of bacteria.« less
A conservative scheme for electromagnetic simulation of magnetized plasmas with kinetic electrons

NASA Astrophysics Data System (ADS)

Bao, J.; Lin, Z.; Lu, Z. X.

2018-02-01

A conservative scheme has been formulated and verified for gyrokinetic particle simulations of electromagnetic waves and instabilities in magnetized plasmas. An electron continuity equation derived from the drift kinetic equation is used to time advance the electron density perturbation by using the perturbed mechanical flow calculated from the parallel vector potential, and the parallel vector potential is solved by using the perturbed canonical flow from the perturbed distribution function. In gyrokinetic particle simulations using this new scheme, the shear Alfvén wave dispersion relation in the shearless slab and continuum damping in the sheared cylinder have been recovered. The new scheme overcomes the stringent requirement in the conventional perturbative simulation method that perpendicular grid size needs to be as small as electron collisionless skin depth even for the long wavelength Alfvén waves. The new scheme also avoids the problem in the conventional method that an unphysically large parallel electric field arises due to the inconsistency between electrostatic potential calculated from the perturbed density and vector potential calculated from the perturbed canonical flow. Finally, the gyrokinetic particle simulations of the Alfvén waves in sheared cylinder have superior numerical properties compared with the fluid simulations, which suffer from numerical difficulties associated with singular mode structures.
An M-step preconditioned conjugate gradient method for parallel computation

NASA Technical Reports Server (NTRS)

Adams, L.

1983-01-01

This paper describes a preconditioned conjugate gradient method that can be effectively implemented on both vector machines and parallel arrays to solve sparse symmetric and positive definite systems of linear equations. The implementation on the CYBER 203/205 and on the Finite Element Machine is discussed and results obtained using the method on these machines are given.
Wavelet Transforms in Parallel Image Processing

DTIC Science & Technology

1994-01-27

NUMBER OF PAGES Object Segmentation, Texture Segmentation, Image Compression, Image 137 Halftoning , Neural Network, Parallel Algorithms, 2D and 3D...Vector Quantization of Wavelet Transform Coefficients ........ ............................. 57 B.1.f Adaptive Image Halftoning based on Wavelet...application has been directed to the adaptive image halftoning . The gray information at a pixel, including its gray value and gradient, is represented by
Some Problems and Solutions in Transferring Ecosystem Simulation Codes to Supercomputers

NASA Technical Reports Server (NTRS)

Skiles, J. W.; Schulbach, C. H.

1994-01-01

Many computer codes for the simulation of ecological systems have been developed in the last twenty-five years. This development took place initially on main-frame computers, then mini-computers, and more recently, on micro-computers and workstations. Recent recognition of ecosystem science as a High Performance Computing and Communications Program Grand Challenge area emphasizes supercomputers (both parallel and distributed systems) as the next set of tools for ecological simulation. Transferring ecosystem simulation codes to such systems is not a matter of simply compiling and executing existing code on the supercomputer since there are significant differences in the system architectures of sequential, scalar computers and parallel and/or vector supercomputers. To more appropriately match the application to the architecture (necessary to achieve reasonable performance), the parallelism (if it exists) of the original application must be exploited. We discuss our work in transferring a general grassland simulation model (developed on a VAX in the FORTRAN computer programming language) to a Cray Y-MP. We show the Cray shared-memory vector-architecture, and discuss our rationale for selecting the Cray. We describe porting the model to the Cray and executing and verifying a baseline version, and we discuss the changes we made to exploit the parallelism in the application and to improve code execution. As a result, the Cray executed the model 30 times faster than the VAX 11/785 and 10 times faster than a Sun 4 workstation. We achieved an additional speed-up of approximately 30 percent over the original Cray run by using the compiler's vectorizing capabilities and the machine's ability to put subroutines and functions "in-line" in the code. With the modifications, the code still runs at only about 5% of the Cray's peak speed because it makes ineffective use of the vector processing capabilities of the Cray. We conclude with a discussion and future plans.
On the parallel solution of parabolic equations

NASA Technical Reports Server (NTRS)

Gallopoulos, E.; Saad, Youcef

1989-01-01

Parallel algorithms for the solution of linear parabolic problems are proposed. The first of these methods is based on using polynomial approximation to the exponential. It does not require solving any linear systems and is highly parallelizable. The two other methods proposed are based on Pade and Chebyshev approximations to the matrix exponential. The parallelization of these methods is achieved by using partial fraction decomposition techniques to solve the resulting systems and thus offers the potential for increased time parallelism in time dependent problems. Experimental results from the Alliant FX/8 and the Cray Y-MP/832 vector multiprocessors are also presented.
An OpenACC-Based Unified Programming Model for Multi-accelerator Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S

2015-01-01

This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.
Hypercluster Parallel Processor

NASA Technical Reports Server (NTRS)

Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela

1992-01-01

Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.
Research on Parallel Three Phase PWM Converters base on RTDS

NASA Astrophysics Data System (ADS)

Xia, Yan; Zou, Jianxiao; Li, Kai; Liu, Jingbo; Tian, Jun

2018-01-01

Converters parallel operation can increase capacity of the system, but it may lead to potential zero-sequence circulating current, so the control of circulating current was an important goal in the design of parallel inverters. In this paper, the Real Time Digital Simulator (RTDS) is used to model the converters parallel system in real time and study the circulating current restraining. The equivalent model of two parallel converters and zero-sequence circulating current(ZSCC) were established and analyzed, then a strategy using variable zero vector control was proposed to suppress the circulating current. For two parallel modular converters, hardware-in-the-loop(HIL) study based on RTDS and practical experiment were implemented, results prove that the proposed control strategy is feasible and effective.

Policy-based secure communication with automatic key management for industrial control and automation systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Chernoguzov, Alexander; Markham, Thomas R.; Haridas, Harshal S.

A method includes generating at least one access vector associated with a specified device in an industrial process control and automation system. The specified device has one of multiple device roles. The at least one access vector is generated based on one or more communication policies defining communications between one or more pairs of devices roles in the industrial process control and automation system, where each pair of device roles includes the device role of the specified device. The method also includes providing the at least one access vector to at least one of the specified device and one ormore » more other devices in the industrial process control and automation system in order to control communications to or from the specified device.« less
Parallel processing in finite element structural analysis

NASA Technical Reports Server (NTRS)

Noor, Ahmed K.

1987-01-01

A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).
Anisotropic Surface State Mediated RKKY Interaction Between Adatoms on a Hexagonal Lattice

NASA Astrophysics Data System (ADS)

Einstein, Theodore; Patrone, Paul

2012-02-01

Motivated by recent numerical studies of Ag on Pt(111), we derive a far-field expression for the RKKY interaction mediated by surface states on a (111) FCC surface, considering the effect of anisotropy in the Fermi edge. The main contribution to the interaction comes from electrons whose Fermi velocity vF is parallel to the vector R connecting the interacting adatoms; we show that in general, the corresponding Fermi wave-vector kF is not parallel to R. The interaction is oscillatory; the amplitude and wavelength of oscillations have angular dependence arising from the anisotropy of the surface state band structure. The wavelength, in particular, is determined by the component of the aforementioned kF that is parallel to R. Our analysis is easily generalized to other systems. For Ag on Pt(111), our results indicate that the RKKY interaction between pairs of adatoms should be nearly isotropic and so cannot account for the anisotropy found in the studies motivating our work.
An accessible four-dimensional treatment of Maxwell's equations in terms of differential forms

NASA Astrophysics Data System (ADS)

Sá, Lucas

2017-03-01

Maxwell’s equations are derived in terms of differential forms in the four-dimensional Minkowski representation, starting from the three-dimensional vector calculus differential version of these equations. Introducing all the mathematical and physical concepts needed (including the tool of differential forms), using only knowledge of elementary vector calculus and the local vector version of Maxwell’s equations, the equations are reduced to a simple and elegant set of two equations for a unified quantity, the electromagnetic field. The treatment should be accessible for students taking a first course on electromagnetism.
Equation solvers for distributed-memory computers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.

1994-01-01

A large number of scientific and engineering problems require the rapid solution of large systems of simultaneous equations. The performance of parallel computers in this area now dwarfs traditional vector computers by nearly an order of magnitude. This talk describes the major issues involved in parallel equation solvers with particular emphasis on the Intel Paragon, IBM SP-1 and SP-2 processors.
Studying an Eulerian Computer Model on Different High-performance Computer Platforms and Some Applications

NASA Astrophysics Data System (ADS)

Georgiev, K.; Zlatev, Z.

2010-11-01

The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.
Parallel implementation of an adaptive and parameter-free N-body integrator

NASA Astrophysics Data System (ADS)

Pruett, C. David; Ingham, William H.; Herman, Ralph D.

2011-05-01

Previously, Pruett et al. (2003) [3] described an N-body integrator of arbitrarily high order M with an asymptotic operation count of O(MN). The algorithm's structure lends itself readily to data parallelization, which we document and demonstrate here in the integration of point-mass systems subject to Newtonian gravitation. High order is shown to benefit parallel efficiency. The resulting N-body integrator is robust, parameter-free, highly accurate, and adaptive in both time-step and order. Moreover, it exhibits linear speedup on distributed parallel processors, provided that each processor is assigned at least a handful of bodies. Program summaryProgram title: PNB.f90 Catalogue identifier: AEIK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3052 No. of bytes in distributed program, including test data, etc.: 68 600 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors Operating system: Unix/Linux Has the code been vectorized or parallelized?: The code has been parallelized but has not been explicitly vectorized. RAM: Dependent upon N Classification: 4.3, 4.12, 6.5 Nature of problem: High accuracy numerical evaluation of trajectories of N point masses each subject to Newtonian gravitation. Solution method: Parallel and adaptive extrapolation in time via power series of arbitrary degree. Running time: 5.1 s for the demo program supplied with the package.
Transmission of Xylella fastidiosa to Grapevine by the Meadow Spittlebug.

PubMed

Cornara, D; Sicard, A; Zeilinger, A R; Porcelli, F; Purcell, A H; Almeida, R P P

2016-11-01

There is little information available on Xylella fastidiosa transmission by spittlebugs (Hemiptera, Cercopoidea). This group of insect vectors may be of epidemiological relevance in certain diseases, so it is important to better understand the basic parameters of X. fastidiosa transmission by spittlebugs. We used grapevines as a host plant and the aphrophorid Philaenus spumarius as a vector to estimate the effect of plant access time on X. fastidiosa transmission to plants; in addition, bacterial population estimates in the heads of vectors were determined and correlated with plant infection status. Results show that transmission efficiency of X. fastidiosa by P. spumarius increased with plant access time, similarly to insect vectors in another family (Hemiptera, Cicadellidae). Furthermore, a positive correlation between pathogen populations in P. spumarius and transmission to plants was observed. Bacterial populations in insects were one to two orders of magnitude lower than those observed in leafhopper vectors, and population size peaked within 3 days of plant access period. These results suggest that P. spumarius has either a limited number of sites in the foregut that may be colonized, or that fluid dynamics in the mouthparts of these insects is different from that in leafhoppers. Altogether our results indicate that X. fastidiosa transmission by spittlebugs is similar to that by leafhoppers. In addition, the relationship between cell numbers in vectors and plant infection may have under-appreciated consequences to pathogen spread.
Diderot: a Domain-Specific Language for Portable Parallel Scientific Visualization and Image Analysis.

PubMed

Kindlmann, Gordon; Chiw, Charisee; Seltzer, Nicholas; Samuels, Lamont; Reppy, John

2016-01-01

Many algorithms for scientific visualization and image analysis are rooted in the world of continuous scalar, vector, and tensor fields, but are programmed in low-level languages and libraries that obscure their mathematical foundations. Diderot is a parallel domain-specific language that is designed to bridge this semantic gap by providing the programmer with a high-level, mathematical programming notation that allows direct expression of mathematical concepts in code. Furthermore, Diderot provides parallel performance that takes advantage of modern multicore processors and GPUs. The high-level notation allows a concise and natural expression of the algorithms and the parallelism allows efficient execution on real-world datasets.
Kalman Filter Tracking on Parallel Architectures

NASA Astrophysics Data System (ADS)

Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

2016-11-01

Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors such as GPGPU, ARM and Intel MIC. In order to achieve the theoretical performance gains of these processors, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High-Luminosity Large Hadron Collider (HL-LHC), for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques such as Cellular Automata or Hough Transforms. The most common track finding techniques in use today, however, are those based on a Kalman filter approach. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust, and are in use today at the LHC. Given the utility of the Kalman filter in track finding, we have begun to port these algorithms to parallel architectures, namely Intel Xeon and Xeon Phi. We report here on our progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a simplified experimental environment.
Construction and characterization of VL-VH tail-parallel genetically engineered antibodies against staphylococcal enterotoxins.

PubMed

He, Xianzhi; Zhang, Lei; Liu, Pengchong; Liu, Li; Deng, Hui; Huang, Jinhai

2015-03-01

Staphylococcal enterotoxins (SEs) produced by Staphylococcus aureus have increasingly given rise to human health and food safety. Genetically engineered small molecular antibody is a useful tool in immuno-detection and treatment for clinical illness caused by SEs. In this study, we constructed the V(L)-V(H) tail-parallel genetically engineered antibody against SEs by using the repertoire of rearranged germ-line immunoglobulin variable region genes. Total RNA were extracted from six hybridoma cell lines that stably express anti-SEs antibodies. The variable region genes of light chain (V(L)) and heavy chain (V(H)) were cloned by reverse transcription PCR, and their classical murine antibody structure and functional V(D)J gene rearrangement were analyzed. To construct the eukaryotic V(H)-V(L) tail-parallel co-expression vectors based on the "5'-V(H)-ivs-IRES-V(L)-3'" mode, the ivs-IRES fragment and V(L) genes were spliced by two-step overlap extension PCR, and then, the recombined gene fragment and V(H) genes were inserted into the pcDNA3.1(+) expression vector sequentially. And then the constructed eukaryotic expression clones termed as p2C2HILO and p5C12HILO were transfected into baby hamster kidney 21 cell line, respectively. Two clonal cell lines stably expressing V(L)-V(H) tail-parallel antibodies against SEs were obtained, and the antibodies that expressed intracytoplasma were evaluated by enzyme-linked immunosorbent assay, immunofluorescence assay, and flow cytometry. SEs can stimulate the expression of some chemokines and chemokine receptors in porcine IPEC-J2 cells; mRNA transcription level of four chemokines and chemokine receptors can be blocked by the recombinant SE antibody prepared in this study. Our results showed that it is possible to get functional V(L)-V(H) tail-parallel genetically engineered antibodies in same vector using eukaryotic expression system.
Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.

PubMed

Rahn, René; Budach, Stefan; Costanza, Pascal; Ehrhardt, Marcel; Hancox, Jonny; Reinert, Knut

2018-05-03

Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (Single Instruction Multiple Data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we a) distribute many independent alignments on multiple threads and b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon Phi™ (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon Phi™ and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4. under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME::SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms. rene.rahn@fu-berlin.de.
VectorBase: a data resource for invertebrate vector genomics

PubMed Central

Lawson, Daniel; Arensburger, Peter; Atkinson, Peter; Besansky, Nora J.; Bruggner, Robert V.; Butler, Ryan; Campbell, Kathryn S.; Christophides, George K.; Christley, Scott; Dialynas, Emmanuel; Hammond, Martin; Hill, Catherine A.; Konopinski, Nathan; Lobo, Neil F.; MacCallum, Robert M.; Madey, Greg; Megy, Karine; Meyer, Jason; Redmond, Seth; Severson, David W.; Stinson, Eric O.; Topalis, Pantelis; Birney, Ewan; Gelbart, William M.; Kafatos, Fotis C.; Louis, Christos; Collins, Frank H.

2009-01-01

VectorBase (http://www.vectorbase.org) is an NIAID-funded Bioinformatic Resource Center focused on invertebrate vectors of human pathogens. VectorBase annotates and curates vector genomes providing a web accessible integrated resource for the research community. Currently, VectorBase contains genome information for three mosquito species: Aedes aegypti, Anopheles gambiae and Culex quinquefasciatus, a body louse Pediculus humanus and a tick species Ixodes scapularis. Since our last report VectorBase has initiated a community annotation system, a microarray and gene expression repository and controlled vocabularies for anatomy and insecticide resistance. We have continued to develop both the software infrastructure and tools for interrogating the stored data. PMID:19028744
Scaling Support Vector Machines On Modern HPC Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

You, Yang; Fu, Haohuan; Song, Shuaiwen

2015-02-01

We designed and implemented MIC-SVM, a highly efficient parallel SVM for x86 based multicore and many-core architectures, such as the Intel Ivy Bridge CPUs and Intel Xeon Phi co-processor (MIC). We propose various novel analysis methods and optimization techniques to fully utilize the multilevel parallelism provided by these architectures and serve as general optimization methods for other machine learning tools.
Using a constraint on the parallel velocity when determining electric fields with EISCAT

NASA Technical Reports Server (NTRS)

Caudal, G.; Blanc, M.

1988-01-01

A method is proposed to determine the perpendicular components of the ion velocity vector (and hence the perpendicular electric field) from EISCAT tristatic measurements, in which one introduces an additional constraint on the parallel velocity, in order to take account of our knowledge that the parallel velocity of ions is small. This procedure removes some artificial features introduced when the tristatic geometry becomes too unfavorable. It is particularly well suited for the southernmost or northernmost positions of the tristatic measurements performed by meridian scan experiments (CP3 mode).
Survey of new vector computers: The CRAY 1S from CRAY research; the CYBER 205 from CDC and the parallel computer from ICL - architecture and programming

NASA Technical Reports Server (NTRS)

Gentzsch, W.

1982-01-01

Problems which can arise with vector and parallel computers are discussed in a user oriented context. Emphasis is placed on the algorithms used and the programming techniques adopted. Three recently developed supercomputers are examined and typical application examples are given in CRAY FORTRAN, CYBER 205 FORTRAN and DAP (distributed array processor) FORTRAN. The systems performance is compared. The addition of parts of two N x N arrays is considered. The influence of the architecture on the algorithms and programming language is demonstrated. Numerical analysis of magnetohydrodynamic differential equations by an explicit difference method is illustrated, showing very good results for all three systems. The prognosis for supercomputer development is assessed.
Velocity statistics for interacting edge dislocations in one dimension from Dyson's Coulomb gas model.

PubMed

Jafarpour, Farshid; Angheluta, Luiza; Goldenfeld, Nigel

2013-10-01

The dynamics of edge dislocations with parallel Burgers vectors, moving in the same slip plane, is mapped onto Dyson's model of a two-dimensional Coulomb gas confined in one dimension. We show that the tail distribution of the velocity of dislocations is power law in form, as a consequence of the pair interaction of nearest neighbors in one dimension. In two dimensions, we show the presence of a pairing phase transition in a system of interacting dislocations with parallel Burgers vectors. The scaling exponent of the velocity distribution at effective temperatures well below this pairing transition temperature can be derived from the nearest-neighbor interaction, while near the transition temperature, the distribution deviates from the form predicted by the nearest-neighbor interaction, suggesting the presence of collective effects.
CFD Research, Parallel Computation and Aerodynamic Optimization

NASA Technical Reports Server (NTRS)

Ryan, James S.

1995-01-01

During the last five years, CFD has matured substantially. Pure CFD research remains to be done, but much of the focus has shifted to integration of CFD into the design process. The work under these cooperative agreements reflects this trend. The recent work, and work which is planned, is designed to enhance the competitiveness of the US aerospace industry. CFD and optimization approaches are being developed and tested, so that the industry can better choose which methods to adopt in their design processes. The range of computer architectures has been dramatically broadened, as the assumption that only huge vector supercomputers could be useful has faded. Today, researchers and industry can trade off time, cost, and availability, choosing vector supercomputers, scalable parallel architectures, networked workstations, or heterogenous combinations of these to complete required computations efficiently.
Solving the Cauchy-Riemann equations on parallel computers

NASA Technical Reports Server (NTRS)

Fatoohi, Raad A.; Grosch, Chester E.

1987-01-01

Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented.
Parallel-vector computation for structural analysis and nonlinear unconstrained optimization problems

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.

1990-01-01

Practical engineering application can often be formulated in the form of a constrained optimization problem. There are several solution algorithms for solving a constrained optimization problem. One approach is to convert a constrained problem into a series of unconstrained problems. Furthermore, unconstrained solution algorithms can be used as part of the constrained solution algorithms. Structural optimization is an iterative process where one starts with an initial design, a finite element structure analysis is then performed to calculate the response of the system (such as displacements, stresses, eigenvalues, etc.). Based upon the sensitivity information on the objective and constraint functions, an optimizer such as ADS or IDESIGN, can be used to find the new, improved design. For the structural analysis phase, the equation solver for the system of simultaneous, linear equations plays a key role since it is needed for either static, or eigenvalue, or dynamic analysis. For practical, large-scale structural analysis-synthesis applications, computational time can be excessively large. Thus, it is necessary to have a new structural analysis-synthesis code which employs new solution algorithms to exploit both parallel and vector capabilities offered by modern, high performance computers such as the Convex, Cray-2 and Cray-YMP computers. The objective of this research project is, therefore, to incorporate the latest development in the parallel-vector equation solver, PVSOLVE into the widely popular finite-element production code, such as the SAP-4. Furthermore, several nonlinear unconstrained optimization subroutines have also been developed and tested under a parallel computer environment. The unconstrained optimization subroutines are not only useful in their own right, but they can also be incorporated into a more popular constrained optimization code, such as ADS.

International workshop on insecticide resistance in vectors of arboviruses, December 2016, Rio de Janeiro, Brazil.

PubMed

Corbel, Vincent; Fonseca, Dina M; Weetman, David; Pinto, João; Achee, Nicole L; Chandre, Fabrice; Coulibaly, Mamadou B; Dusfour, Isabelle; Grieco, John; Juntarajumnong, Waraporn; Lenhart, Audrey; Martins, Ademir J; Moyes, Catherine; Ng, Lee Ching; Raghavendra, Kamaraju; Vatandoost, Hassan; Vontas, John; Muller, Pie; Kasai, Shinji; Fouque, Florence; Velayudhan, Raman; Durot, Claire; David, Jean-Philippe

2017-06-02

Vector-borne diseases transmitted by insect vectors such as mosquitoes occur in over 100 countries and affect almost half of the world's population. Dengue is currently the most prevalent arboviral disease but chikungunya, Zika and yellow fever show increasing prevalence and severity. Vector control, mainly by the use of insecticides, play a key role in disease prevention but the use of the same chemicals for more than 40 years, together with the dissemination of mosquitoes by trade and environmental changes, resulted in the global spread of insecticide resistance. In this context, innovative tools and strategies for vector control, including the management of resistance, are urgently needed. This report summarizes the main outputs of the first international workshop on Insecticide resistance in vectors of arboviruses held in Rio de Janeiro, Brazil, 5-8 December 2016. The primary aims of this workshop were to identify strategies for the development and implementation of standardized insecticide resistance management, also to allow comparisons across nations and across time, and to define research priorities for control of vectors of arboviruses. The workshop brought together 163 participants from 28 nationalities and was accessible, live, through the web (> 70,000 web-accesses over 3 days).
Evaluation of the SPAR thermal analyzer on the CYBER-203 computer

NASA Technical Reports Server (NTRS)

Robinson, J. C.; Riley, K. M.; Haftka, R. T.

1982-01-01

The use of the CYBER 203 vector computer for thermal analysis is investigated. Strengths of the CYBER 203 include the ability to perform, in vector mode using a 64 bit word, 50 million floating point operations per second (MFLOPS) for addition and subtraction, 25 MFLOPS for multiplication and 12.5 MFLOPS for division. The speed of scalar operation is comparable to that of a CDC 7600 and is some 2 to 3 times faster than Langley's CYBER 175s. The CYBER 203 has 1,048,576 64-bit words of real memory with an 80 nanosecond (nsec) access time. Memory is bit addressable and provides single error correction, double error detection (SECDED) capability. The virtual memory capability handles data in either 512 or 65,536 word pages. The machine has 256 registers with a 40 nsec access time. The weaknesses of the CYBER 203 include the amount of vector operation overhead and some data storage limitations. In vector operations there is a considerable amount of time before a single result is produced so that vector calculation speed is slower than scalar operation for short vectors.
In Vitro Inhibition of Leishmania Attachment to Sandfly Midguts and LL-5 Cells by Divalent Metal Chelators, Anti-gp63 and Phosphoglycans.

PubMed

Soares, Rodrigo Pedro; Altoé, Ellen Cristina Félix; Ennes-Vidal, Vítor; da Costa, Simone M; Rangel, Elizabeth Ferreira; de Souza, Nataly Araújo; da Silva, Vanderlei Campos; Volf, Petr; d'Avila-Levy, Claudia Masini

2017-07-01

Leishmania braziliensis and Leishmania infantum are the causative agents of cutaneous and visceral leishmaniasis, respectively. Several aspects of the vector-parasite interaction involving gp63 and phosphoglycans have been individually assayed in different studies. However, their role under the same experimental conditions was not studied yet. Here, the roles of divalent metal chelators, anti-gp63 antibodies and purified type I phosphoglycans (PGs) were evaluated during in vitro parasite attachment to the midgut of the vector. Parasites were treated with divalent metal chelators or anti-gp63 antibodies prior to the interaction with Lutzomyia longipalpis/Lutzomyia intermedia midguts or sand fly LL-5 cells. In vitro binding system was used to examine the role of PG and gp63 in parallel. Treatment with divalent metal chelators reduced Le. infantum adhesion to the Lu. longipalpis midguts. The most effective compound (Phen) inhibited the binding in both vectors. Similar results were observed in the interaction between both Leishmania species and the cell line LL-5. Finally, parallel experiments using anti-gp63-treated parasites and PG-incubated midguts demonstrated that both approaches substantially inhibited attachment in the natural parasite-vector pairs Le. infantum/Lu. longipalpis and Le. braziliensis/Lu. intermedia. Our results suggest that gp63 and/or PG are involved in parasite attachment to the midgut of these important vectors. Copyright © 2017 Elsevier GmbH. All rights reserved.
A Domain Decomposition Parallelization of the Fast Marching Method

NASA Technical Reports Server (NTRS)

Herrmann, M.

2003-01-01

In this paper, the first domain decomposition parallelization of the Fast Marching Method for level sets has been presented. Parallel speedup has been demonstrated in both the optimal and non-optimal domain decomposition case. The parallel performance of the proposed method is strongly dependent on load balancing separately the number of nodes on each side of the interface. A load imbalance of nodes on either side of the domain leads to an increase in communication and rollback operations. Furthermore, the amount of inter-domain communication can be reduced by aligning the inter-domain boundaries with the interface normal vectors. In the case of optimal load balancing and aligned inter-domain boundaries, the proposed parallel FMM algorithm is highly efficient, reaching efficiency factors of up to 0.98. Future work will focus on the extension of the proposed parallel algorithm to higher order accuracy. Also, to further enhance parallel performance, the coupling of the domain decomposition parallelization to the G(sub 0)-based parallelization will be investigated.
A conservative scheme of drift kinetic electrons for gyrokinetic simulation of kinetic-MHD processes in toroidal plasmas

NASA Astrophysics Data System (ADS)

Bao, J.; Liu, D.; Lin, Z.

2017-10-01

A conservative scheme of drift kinetic electrons for gyrokinetic simulations of kinetic-magnetohydrodynamic processes in toroidal plasmas has been formulated and verified. Both vector potential and electron perturbed distribution function are decomposed into adiabatic part with analytic solution and non-adiabatic part solved numerically. The adiabatic parallel electric field is solved directly from the electron adiabatic response, resulting in a high degree of accuracy. The consistency between electrostatic potential and parallel vector potential is enforced by using the electron continuity equation. Since particles are only used to calculate the non-adiabatic response, which is used to calculate the non-adiabatic vector potential through Ohm's law, the conservative scheme minimizes the electron particle noise and mitigates the cancellation problem. Linear dispersion relations of the kinetic Alfvén wave and the collisionless tearing mode in cylindrical geometry have been verified in gyrokinetic toroidal code simulations, which show that the perpendicular grid size can be larger than the electron collisionless skin depth when the mode wavelength is longer than the electron skin depth.
Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

DOEpatents

Blocksome, Michael A.; Mamidala, Amith R.

2013-09-03

Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

DOEpatents

Blocksome, Michael A; Mamidala, Amith R

2014-02-11

Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segment of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Role-based access control permissions

DOEpatents

Staggs, Kevin P.; Markham, Thomas R.; Hull Roskos, Julie J.; Chernoguzov, Alexander

2017-04-25

Devices, systems, and methods for role-based access control permissions are disclosed. One method includes a policy decision point that receives up-to-date security context information from one or more outside sources to determine whether to grant access for a data client to a portion of the system and creates an access vector including the determination; receiving, via a policy agent, a request by the data client for access to the portion of the computing system by the data client, wherein the policy agent checks to ensure there is a session established with communications and user/application enforcement points; receiving, via communications policy enforcement point, the request from the policy agent, wherein the communications policy enforcement point determines whether the data client is an authorized node, based upon the access vector received from the policy decision point; and receiving, via the user/application policy enforcement point, the request from the communications policy enforcement point.
Access and visualization using clusters and other parallel computers

NASA Technical Reports Server (NTRS)

Katz, Daniel S.; Bergou, Attila; Berriman, Bruce; Block, Gary; Collier, Jim; Curkendall, Dave; Good, John; Husman, Laura; Jacob, Joe; Laity, Anastasia;

2003-01-01

JPL's Parallel Applications Technologies Group has been exploring the issues of data access and visualization of very large data sets over the past 10 or so years. this work has used a number of types of parallel computers, and today includes the use of commodity clusters. This talk will highlight some of the applications and tools we have developed, including how they use parallel computing resources, and specifically how we are using modern clusters. Our applications focus on NASA's needs; thus our data sets are usually related to Earth and Space Science, including data delivered from instruments in space, and data produced by telescopes on the ground.

Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences

NASA Technical Reports Server (NTRS)

Domel, Neal D.

1996-01-01

Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Performance of GeantV EM Physics Models

NASA Astrophysics Data System (ADS)

Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Cosmo, G.; Duhem, L.; Elvira, D.; Folger, G.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.

2017-10-01

The recent progress in parallel hardware architectures with deeper vector pipelines or many-cores technologies brings opportunities for HEP experiments to take advantage of SIMD and SIMT computing models. Launched in 2013, the GeantV project studies performance gains in propagating multiple particles in parallel, improving instruction throughput and data locality in HEP event simulation on modern parallel hardware architecture. Due to the complexity of geometry description and physics algorithms of a typical HEP application, performance analysis is indispensable in identifying factors limiting parallel execution. In this report, we will present design considerations and preliminary computing performance of GeantV physics models on coprocessors (Intel Xeon Phi and NVidia GPUs) as well as on mainstream CPUs.
Experimental studies of susceptibility of Italian Aedes albopictus to Zika virus.

PubMed

Di Luca, Marco; Severini, Francesco; Toma, Luciano; Boccolini, Daniela; Romi, Roberto; Remoli, Maria Elena; Sabbatucci, Michela; Rizzo, Caterina; Venturi, Giulietta; Rezza, Giovanni; Fortuna, Claudia

2016-05-05

We report a study on vector competence of an Italian population of Aedes albopictus for Zika virus (ZIKV). Ae. albopictus was susceptible to ZIKV infection (infection rate: 10%), and the virus could disseminate and was secreted in the mosquito's saliva (dissemination rate: 29%; transmission rate: 29%) after an extrinsic incubation period of 11 days. The observed vector competence was lower than that of an Ae. aegypti colony tested in parallel.
Rational calculation accuracy in acousto-optical matrix-vector processor

NASA Astrophysics Data System (ADS)

Oparin, V. V.; Tigin, Dmitry V.

1994-01-01

The high speed of parallel computations for a comparatively small-size processor and acceptable power consumption makes the usage of acousto-optic matrix-vector multiplier (AOMVM) attractive for processing of large amounts of information in real time. The limited accuracy of computations is an essential disadvantage of such a processor. The reduced accuracy requirements allow for considerable simplification of the AOMVM architecture and the reduction of the demands on its components.
Arbitration in crossbar interconnect for low latency

DOEpatents

Ohmacht, Martin; Sugavanam, Krishnan

2013-02-05

A system and method and computer program product for reducing the latency of signals communicated through a crossbar switch, the method including using at slave arbitration logic devices associated with Slave devices for which access is requested from one or more Master devices, two or more priority vector signals cycled among their use every clock cycle for selecting one of the requesting Master devices and updates the respective priority vector signal used every clock cycle. Similarly, each Master for which access is requested from one or more Slave devices, can have two or more priority vectors and can cycle among their use every clock cycle to further reduce latency and increase throughput performance via the crossbar.
Distributed Storage Algorithm for Geospatial Image Data Based on Data Access Patterns.

PubMed

Pan, Shaoming; Li, Yongkai; Xu, Zhengquan; Chong, Yanwen

2015-01-01

Declustering techniques are widely used in distributed environments to reduce query response time through parallel I/O by splitting large files into several small blocks and then distributing those blocks among multiple storage nodes. Unfortunately, however, many small geospatial image data files cannot be further split for distributed storage. In this paper, we propose a complete theoretical system for the distributed storage of small geospatial image data files based on mining the access patterns of geospatial image data using their historical access log information. First, an algorithm is developed to construct an access correlation matrix based on the analysis of the log information, which reveals the patterns of access to the geospatial image data. Then, a practical heuristic algorithm is developed to determine a reasonable solution based on the access correlation matrix. Finally, a number of comparative experiments are presented, demonstrating that our algorithm displays a higher total parallel access probability than those of other algorithms by approximately 10-15% and that the performance can be further improved by more than 20% by simultaneously applying a copy storage strategy. These experiments show that the algorithm can be applied in distributed environments to help realize parallel I/O and thereby improve system performance.
Using AberOWL for fast and scalable reasoning over BioPortal ontologies.

PubMed

Slater, Luke; Gkoutos, Georgios V; Schofield, Paul N; Hoehndorf, Robert

2016-08-08

Reasoning over biomedical ontologies using their OWL semantics has traditionally been a challenging task due to the high theoretical complexity of OWL-based automated reasoning. As a consequence, ontology repositories, as well as most other tools utilizing ontologies, either provide access to ontologies without use of automated reasoning, or limit the number of ontologies for which automated reasoning-based access is provided. We apply the AberOWL infrastructure to provide automated reasoning-based access to all accessible and consistent ontologies in BioPortal (368 ontologies). We perform an extensive performance evaluation to determine query times, both for queries of different complexity and for queries that are performed in parallel over the ontologies. We demonstrate that, with the exception of a few ontologies, even complex and parallel queries can now be answered in milliseconds, therefore allowing automated reasoning to be used on a large scale, to run in parallel, and with rapid response times.
Project APhiD: A Lorenz-gauged A-Φ decomposition for parallelized computation of ultra-broadband electromagnetic induction in a fully heterogeneous Earth

NASA Astrophysics Data System (ADS)

Weiss, Chester J.

2013-08-01

An essential element for computational hypothesis testing, data inversion and experiment design for electromagnetic geophysics is a robust forward solver, capable of easily and quickly evaluating the electromagnetic response of arbitrary geologic structure. The usefulness of such a solver hinges on the balance among competing desires like ease of use, speed of forward calculation, scalability to large problems or compute clusters, parsimonious use of memory access, accuracy and by necessity, the ability to faithfully accommodate a broad range of geologic scenarios over extremes in length scale and frequency content. This is indeed a tall order. The present study addresses recent progress toward the development of a forward solver with these properties. Based on the Lorenz-gauged Helmholtz decomposition, a new finite volume solution over Cartesian model domains endowed with complex-valued electrical properties is shown to be stable over the frequency range 10-2-1010 Hz and range 10-3-105 m in length scale. Benchmark examples are drawn from magnetotellurics, exploration geophysics, geotechnical mapping and laboratory-scale analysis, showing excellent agreement with reference analytic solutions. Computational efficiency is achieved through use of a matrix-free implementation of the quasi-minimum-residual (QMR) iterative solver, which eliminates explicit storage of finite volume matrix elements in favor of "on the fly" computation as needed by the iterative Krylov sequence. Further efficiency is achieved through sparse coupling matrices between the vector and scalar potentials whose non-zero elements arise only in those parts of the model domain where the conductivity gradient is non-zero. Multi-thread parallelization in the QMR solver through OpenMP pragmas is used to reduce the computational cost of its most expensive step: the single matrix-vector product at each iteration. High-level MPI communicators farm independent processes to available compute nodes for simultaneous computation of multi-frequency or multi-transmitter responses.
A parallel-vector algorithm for rapid structural analysis on high-performance computers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.

1990-01-01

A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the 'loop unrolling' technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large-scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.
A parallel-vector algorithm for rapid structural analysis on high-performance computers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.

1990-01-01

A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the loop unrolling technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.

Improved parallel data partitioning by nested dissection with applications to information retrieval.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wolf, Michael M.; Chevalier, Cedric; Boman, Erik Gunnar

The computational work in many information retrieval and analysis algorithms is based on sparse linear algebra. Sparse matrix-vector multiplication is a common kernel in many of these computations. Thus, an important related combinatorial problem in parallel computing is how to distribute the matrix and the vectors among processors so as to minimize the communication cost. We focus on minimizing the total communication volume while keeping the computation balanced across processes. In [1], the first two authors presented a new 2D partitioning method, the nested dissection partitioning algorithm. In this paper, we improve on that algorithm and show that it ismore » a good option for data partitioning in information retrieval. We also show partitioning time can be substantially reduced by using the SCOTCH software, and quality improves in some cases, too.« less
Evaluating a Pivot-Based Approach for Bilingual Lexicon Extraction

PubMed Central

Kim, Jae-Hoon; Kwon, Hong-Seok; Seo, Hyeong-Won

2015-01-01

A pivot-based approach for bilingual lexicon extraction is based on the similarity of context vectors represented by words in a pivot language like English. In this paper, in order to show validity and usability of the pivot-based approach, we evaluate the approach in company with two different methods for estimating context vectors: one estimates them from two parallel corpora based on word association between source words (resp., target words) and pivot words and the other estimates them from two parallel corpora based on word alignment tools for statistical machine translation. Empirical results on two language pairs (e.g., Korean-Spanish and Korean-French) have shown that the pivot-based approach is very promising for resource-poor languages and this approach observes its validity and usability. Furthermore, for words with low frequency, our method is also well performed. PMID:25983745
Scale dependence of the alignment between strain rate and rotation in turbulent shear flow

NASA Astrophysics Data System (ADS)

Fiscaletti, D.; Elsinga, G. E.; Attili, A.; Bisetti, F.; Buxton, O. R. H.

2016-10-01

The scale dependence of the statistical alignment tendencies of the eigenvectors of the strain-rate tensor ei, with the vorticity vector ω , is examined in the self-preserving region of a planar turbulent mixing layer. Data from a direct numerical simulation are filtered at various length scales and the probability density functions of the magnitude of the alignment cosines between the two unit vectors | ei.ω ̂| are examined. It is observed that the alignment tendencies are insensitive to the concurrent large-scale velocity fluctuations, but are quantitatively affected by the nature of the concurrent large-scale velocity-gradient fluctuations. It is confirmed that the small-scale (local) vorticity vector is preferentially aligned in parallel with the large-scale (background) extensive strain-rate eigenvector e1, in contrast to the global tendency for ω to be aligned in parallel with the intermediate strain-rate eigenvector [Hamlington et al., Phys. Fluids 20, 111703 (2008), 10.1063/1.3021055]. When only data from regions of the flow that exhibit strong swirling are included, the so-called high-enstrophy worms, the alignment tendencies are exaggerated with respect to the global picture. These findings support the notion that the production of enstrophy, responsible for a net cascade of turbulent kinetic energy from large scales to small scales, is driven by vorticity stretching due to the preferential parallel alignment between ω and nonlocal e1 and that the strongly swirling worms are kinematically significant to this process.
On the impact of communication complexity in the design of parallel numerical algorithms

NASA Technical Reports Server (NTRS)

Gannon, D.; Vanrosendale, J.

1984-01-01

This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation.
On the impact of communication complexity on the design of parallel numerical algorithms

NASA Technical Reports Server (NTRS)

Gannon, D. B.; Van Rosendale, J.

1984-01-01

This paper describes two models of the cost of data movement in parallel numerical alorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In this second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm-independent upper bounds on system performance are derived for several problems that are important to scientific computation.
A Neural Network Architecture For Rapid Model Indexing In Computer Vision Systems

NASA Astrophysics Data System (ADS)

Pawlicki, Ted

1988-03-01

Models of objects stored in memory have been shown to be useful for guiding the processing of computer vision systems. A major consideration in such systems, however, is how stored models are initially accessed and indexed by the system. As the number of stored models increases, the time required to search memory for the correct model becomes high. Parallel distributed, connectionist, neural networks' have been shown to have appealing content addressable memory properties. This paper discusses an architecture for efficient storage and reference of model memories stored as stable patterns of activity in a parallel, distributed, connectionist, neural network. The emergent properties of content addressability and resistance to noise are exploited to perform indexing of the appropriate object centered model from image centered primitives. The system consists of three network modules each of which represent information relative to a different frame of reference. The model memory network is a large state space vector where fields in the vector correspond to ordered component objects and relative, object based spatial relationships between the component objects. The component assertion network represents evidence about the existence of object primitives in the input image. It establishes local frames of reference for object primitives relative to the image based frame of reference. The spatial relationship constraint network is an intermediate representation which enables the association between the object based and the image based frames of reference. This intermediate level represents information about possible object orderings and establishes relative spatial relationships from the image based information in the component assertion network below. It is also constrained by the lawful object orderings in the model memory network above. The system design is consistent with current psychological theories of recognition by component. It also seems to support Marr's notions of hierarchical indexing. (i.e. the specificity, adjunct, and parent indices) It supports the notion that multiple canonical views of an object may have to be stored in memory to enable its efficient identification. The use of variable fields in the state space vectors appears to keep the number of required nodes in the network down to a tractable number while imposing a semantic value on different areas of the state space. This semantic imposition supports an interface between the analogical aspects of neural networks and the propositional paradigms of symbolic processing.
Tiled vector data model for the geographical features of symbolized maps.

PubMed

Li, Lin; Hu, Wei; Zhu, Haihong; Li, You; Zhang, Hang

2017-01-01

Electronic maps (E-maps) provide people with convenience in real-world space. Although web map services can display maps on screens, a more important function is their ability to access geographical features. An E-map that is based on raster tiles is inferior to vector tiles in terms of interactive ability because vector maps provide a convenient and effective method to access and manipulate web map features. However, the critical issue regarding rendering tiled vector maps is that geographical features that are rendered in the form of map symbols via vector tiles may cause visual discontinuities, such as graphic conflicts and losses of data around the borders of tiles, which likely represent the main obstacles to exploring vector map tiles on the web. This paper proposes a tiled vector data model for geographical features in symbolized maps that considers the relationships among geographical features, symbol representations and map renderings. This model presents a method to tailor geographical features in terms of map symbols and 'addition' (join) operations on the following two levels: geographical features and map features. Thus, these maps can resolve the visual discontinuity problem based on the proposed model without weakening the interactivity of vector maps. The proposed model is validated by two map data sets, and the results demonstrate that the rendered (symbolized) web maps present smooth visual continuity.
Getting genetic access to natural adenovirus genomes to explore vector diversity.

PubMed

Zhang, Wenli; Ehrhardt, Anja

2017-10-01

Recombinant vectors based on the human adenovirus type 5 (HAdV5) have been developed and extensively used in preclinical and clinical studies for over 30 years. However, certain restrictions of HAdV5-based vectors have limited their clinical applications because they are rather inefficient in specifically transducing cells of therapeutic interest that lack the coxsackievirus and adenovirus receptor (CAR). Moreover, enhanced vector-associated toxicity and widespread preexisting immunity have been shown to significantly hamper the effectiveness of HAdV-5-mediated gene transfer. However, evolution of adenoviruses in the natural host is driving the generation of novel types with altered virulence, enhanced transmission, and altered tissue tropism. As a consequence, an increasing number of alternative adenovirus types were identified, which may represent a valuable resource for the development of novel vector types. Thus, researchers are focusing on the other naturally occurring adenovirus types, which are structurally similar but functionally different from HAdV5. To this end, several strategies have been devised for getting genetic access to adenovirus genomes, resulting in a new panel of adenoviral vectors. Importantly, these vectors were shown to have a host range different from HAdV5 and to escape the anti-HAdV5 immune response, thus underlining the great potential of this approach. In summary, this review provides a state-of-the-art overview of one essential step in adenoviral vector development.
Fast Eigensolver for Computing 3D Earth's Normal Modes

NASA Astrophysics Data System (ADS)

Shi, J.; De Hoop, M. V.; Li, R.; Xi, Y.; Saad, Y.

2017-12-01

We present a novel parallel computational approach to compute Earth's normal modes. We discretize Earth via an unstructured tetrahedral mesh and apply the continuous Galerkin finite element method to the elasto-gravitational system. To resolve the eigenvalue pollution issue, following the analysis separating the seismic point spectrum, we utilize explicitly a representation of the displacement for describing the oscillations of the non-seismic modes in the fluid outer core. Effectively, we separate out the essential spectrum which is naturally related to the Brunt-Väisälä frequency. We introduce two Lanczos approaches with polynomial and rational filtering for solving this generalized eigenvalue problem in prescribed intervals. The polynomial filtering technique only accesses the matrix pair through matrix-vector products and is an ideal candidate for solving three-dimensional large-scale eigenvalue problems. The matrix-free scheme allows us to deal with fluid separation and self-gravitation in an efficient way, while the standard shift-and-invert method typically needs an explicit shifted matrix and its factorization. The rational filtering method converges much faster than the standard shift-and-invert procedure when computing all the eigenvalues inside an interval. Both two Lanczos approaches solve for the internal eigenvalues extremely accurately, comparing with the standard eigensolver. In our computational experiments, we compare our results with the radial earth model benchmark, and visualize the normal modes using vector plots to illustrate the properties of the displacements in different modes.
A parallel variable metric optimization algorithm

NASA Technical Reports Server (NTRS)

Straeter, T. A.

1973-01-01

An algorithm, designed to exploit the parallel computing or vector streaming (pipeline) capabilities of computers is presented. When p is the degree of parallelism, then one cycle of the parallel variable metric algorithm is defined as follows: first, the function and its gradient are computed in parallel at p different values of the independent variable; then the metric is modified by p rank-one corrections; and finally, a single univariant minimization is carried out in the Newton-like direction. Several properties of this algorithm are established. The convergence of the iterates to the solution is proved for a quadratic functional on a real separable Hilbert space. For a finite-dimensional space the convergence is in one cycle when p equals the dimension of the space. Results of numerical experiments indicate that the new algorithm will exploit parallel or pipeline computing capabilities to effect faster convergence than serial techniques.
Parallel Algorithms for Groebner-Basis Reduction

DTIC Science & Technology

1987-09-25

22209 ELEMENT NO. NO. NO. ACCESSION NO. 11. TITLE (Include Security Classification) * PARALLEL ALGORITHMS FOR GROEBNER -BASIS REDUCTION 12. PERSONAL...All other editions are obsolete. Productivity Engineering in the UNIXt Environment p Parallel Algorithms for Groebner -Basis Reduction Technical Report
Progress report on Nuclear Density project with Lawrence Livermore National Lab Year 2010

DOE Office of Scientific and Technical Information (OSTI.GOV)

Johnson, C W; Krastev, P; Ormand, W E

2011-03-11

The main goal for year 2010 was to improve parallelization of the configuration interaction code BIGSTICK, co-written by W. Erich Ormand (LLNL) and Calvin W. Johnson (SDSU), with the parallelization carried out primarily by Plamen Krastev, a postdoc at SDSU and funded in part by this grant. The central computational algorithm is the Lanczos algorithm, which consists of a matrix-vector multiplication (matvec), followed by a Gram-Schmidt reorthogonalization.
Performance characteristics of a variable-area vane nozzle for vectoring an ASTOVL exhaust jet up to 45 deg

NASA Technical Reports Server (NTRS)

Mcardle, Jack G.; Esker, Barbara S.

1993-01-01

Many conceptual designs for advanced short-takeoff, vertical landing (ASTOVL) aircraft need exhaust nozzles that can vector the jet to provide forces and moments for controlling the aircraft's movement or attitude in flight near the ground. A type of nozzle that can both vector the jet and vary the jet flow area is called a vane nozzle. Basically, the nozzle consists of parallel, spaced-apart flow passages formed by pairs of vanes (vanesets) that can be rotated on axes perpendicular to the flow. Two important features of this type of nozzle are the abilities to vector the jet rearward up to 45 degrees and to produce less harsh pressure and velocity footprints during vertical landing than does an equivalent single jet. A one-third-scale model of a generic vane nozzle was tested with unheated air at the NASA Lewis Research Center's Powered Lift Facility. The model had three parallel flow passages. Each passage was formed by a vaneset consisting of a long and a short vane. The longer vanes controlled the jet vector angle, and the shorter controlled the flow area. Nozzle performance for three nominal flow areas (basic and plus or minus 21 percent of basic area), each at nominal jet vector angles from -20 deg (forward of vertical) to +45 deg (rearward of vertical) are presented. The tests were made with the nozzle mounted on a model tailpipe with a blind flange on the end to simulate a closed cruise nozzle, at tailpipe-to-ambient pressure ratios from 1.8 to 4.0. Also included are jet wake data, single-vaneset vector performance for long/short and equal-length vane designs, and pumping capability. The pumping capability arises from the subambient pressure developed in the cavities between the vanesets, which could be used to aspirate flow from a source such as the engine compartment. Some of the performance characteristics are compared with characteristics of a single-jet nozzle previously reported.
Optical systolic array processor using residue arithmetic

NASA Technical Reports Server (NTRS)

Jackson, J.; Casasent, D.

1983-01-01

The use of residue arithmetic to increase the accuracy and reduce the dynamic range requirements of optical matrix-vector processors is evaluated. It is determined that matrix-vector operations and iterative algorithms can be performed totally in residue notation. A new parallel residue quantizer circuit is developed which significantly improves the performance of the systolic array feedback processor. Results are presented of a computer simulation of this system used to solve a set of three simultaneous equations.
Classification and Compression of Multi-Resolution Vectors: A Tree Structured Vector Quantizer Approach

DTIC Science & Technology

2002-01-01

their expression profile and for classification of cells into tumerous and non- tumerous classes. Then we will present a parallel tree method for... cancerous cells. We will use the same dataset and use tree structured classifiers with multi-resolution analysis for classifying cancerous from non- cancerous ...cells. We have the expressions of 4096 genes from 98 different cell types. Of these 98, 72 are cancerous while 26 are non- cancerous . We are interested
Analysis of Particle Content of Recombinant Adeno-Associated Virus Serotype 8 Vectors by Ion-Exchange Chromatography

PubMed Central

Lock, Martin; Alvira, Mauricio R.

2012-01-01

Abstract Advances in adeno-associated virus (AAV)-mediated gene therapy have brought the possibility of commercial manufacturing of AAV vectors one step closer. To realize this prospect, a parallel effort with the goal of ever-increasing sophistication for AAV vector production technology and supporting assays will be required. Among the important release assays for a clinical gene therapy product, those monitoring potentially hazardous contaminants are most critical for patient safety. A prominent contaminant in many AAV vector preparations is vector particles lacking a genome, which can substantially increase the dose of AAV capsid proteins and lead to possible unwanted immunological consequences. Current methods to determine empty particle content suffer from inconsistency, are adversely affected by contaminants, or are not applicable to all serotypes. Here we describe the development of an ion-exchange chromatography-based assay that permits the rapid separation and relative quantification of AAV8 empty and full vector particles through the application of shallow gradients and a strong anion-exchange monolith chromatography medium. PMID:22428980
Discontinuous finite element method for vector radiative transfer

NASA Astrophysics Data System (ADS)

Wang, Cun-Hai; Yi, Hong-Liang; Tan, He-Ping

2017-03-01

The discontinuous finite element method (DFEM) is applied to solve the vector radiative transfer in participating media. The derivation in a discrete form of the vector radiation governing equations is presented, in which the angular space is discretized by the discrete-ordinates approach with a local refined modification, and the spatial domain is discretized into finite non-overlapped discontinuous elements. The elements in the whole solution domain are connected by modelling the boundary numerical flux between adjacent elements, which makes the DFEM numerically stable for solving radiative transfer equations. Several various problems of vector radiative transfer are tested to verify the performance of the developed DFEM, including vector radiative transfer in a one-dimensional parallel slab containing a Mie/Rayleigh/strong forward scattering medium and a two-dimensional square medium. The fact that DFEM results agree very well with the benchmark solutions in published references shows that the developed DFEM in this paper is accurate and effective for solving vector radiative transfer problems.
Feature extraction through parallel Probabilistic Principal Component Analysis for heart disease diagnosis

NASA Astrophysics Data System (ADS)

Shah, Syed Muhammad Saqlain; Batool, Safeera; Khan, Imran; Ashraf, Muhammad Usman; Abbas, Syed Hussnain; Hussain, Syed Adnan

2017-09-01

Automatic diagnosis of human diseases are mostly achieved through decision support systems. The performance of these systems is mainly dependent on the selection of the most relevant features. This becomes harder when the dataset contains missing values for the different features. Probabilistic Principal Component Analysis (PPCA) has reputation to deal with the problem of missing values of attributes. This research presents a methodology which uses the results of medical tests as input, extracts a reduced dimensional feature subset and provides diagnosis of heart disease. The proposed methodology extracts high impact features in new projection by using Probabilistic Principal Component Analysis (PPCA). PPCA extracts projection vectors which contribute in highest covariance and these projection vectors are used to reduce feature dimension. The selection of projection vectors is done through Parallel Analysis (PA). The feature subset with the reduced dimension is provided to radial basis function (RBF) kernel based Support Vector Machines (SVM). The RBF based SVM serves the purpose of classification into two categories i.e., Heart Patient (HP) and Normal Subject (NS). The proposed methodology is evaluated through accuracy, specificity and sensitivity over the three datasets of UCI i.e., Cleveland, Switzerland and Hungarian. The statistical results achieved through the proposed technique are presented in comparison to the existing research showing its impact. The proposed technique achieved an accuracy of 82.18%, 85.82% and 91.30% for Cleveland, Hungarian and Switzerland dataset respectively.
Automotive applications of superconductors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ginsberg, M.

1987-01-01

These proceedings compile papers on supercomputers in the automobile industry. Titles include: An automotive engineer's guide to the effective use of scalar, vector, and parallel computers; fluid mechanics, finite elements, and supercomputers; and Automotive crashworthiness performance on a supercomputer.
Synthesizing parallel imaging applications using the CAP (computer-aided parallelization) tool

NASA Astrophysics Data System (ADS)

Gennart, Benoit A.; Mazzariol, Marc; Messerli, Vincent; Hersch, Roger D.

1997-12-01

Imaging applications such as filtering, image transforms and compression/decompression require vast amounts of computing power when applied to large data sets. These applications would potentially benefit from the use of parallel processing. However, dedicated parallel computers are expensive and their processing power per node lags behind that of the most recent commodity components. Furthermore, developing parallel applications remains a difficult task: writing and debugging the application is difficult (deadlocks), programs may not be portable from one parallel architecture to the other, and performance often comes short of expectations. In order to facilitate the development of parallel applications, we propose the CAP computer-aided parallelization tool which enables application programmers to specify at a high-level of abstraction the flow of data between pipelined-parallel operations. In addition, the CAP tool supports the programmer in developing parallel imaging and storage operations. CAP enables combining efficiently parallel storage access routines and image processing sequential operations. This paper shows how processing and I/O intensive imaging applications must be implemented to take advantage of parallelism and pipelining between data access and processing. This paper's contribution is (1) to show how such implementations can be compactly specified in CAP, and (2) to demonstrate that CAP specified applications achieve the performance of custom parallel code. The paper analyzes theoretically the performance of CAP specified applications and demonstrates the accuracy of the theoretical analysis through experimental measurements.

Adaptive track scheduling to optimize concurrency and vectorization in GeantV

DOE PAGES

Apostolakis, J.; Bandieramonte, M.; Bitzes, G.; ...

2015-05-22

The GeantV project is focused on the R&D of new particle transport techniques to maximize parallelism on multiple levels, profiting from the use of both SIMD instructions and co-processors for the CPU-intensive calculations specific to this type of applications. In our approach, vectors of tracks belonging to multiple events and matching different locality criteria must be gathered and dispatched to algorithms having vector signatures. While the transport propagates tracks and changes their individual states, data locality becomes harder to maintain. The scheduling policy has to be changed to maintain efficient vectors while keeping an optimal level of concurrency. The modelmore » has complex dynamics requiring tuning the thresholds to switch between the normal regime and special modes, i.e. prioritizing events to allow flushing memory, adding new events in the transport pipeline to boost locality, dynamically adjusting the particle vector size or switching between vector to single track mode when vectorization causes only overhead. Lastly, this work requires a comprehensive study for optimizing these parameters to make the behaviour of the scheduler self-adapting, presenting here its initial results.« less
PCTDSE: A parallel Cartesian-grid-based TDSE solver for modeling laser-atom interactions

NASA Astrophysics Data System (ADS)

Fu, Yongsheng; Zeng, Jiaolong; Yuan, Jianmin

2017-01-01

We present a parallel Cartesian-grid-based time-dependent Schrödinger equation (TDSE) solver for modeling laser-atom interactions. It can simulate the single-electron dynamics of atoms in arbitrary time-dependent vector potentials. We use a split-operator method combined with fast Fourier transforms (FFT), on a three-dimensional (3D) Cartesian grid. Parallelization is realized using a 2D decomposition strategy based on the Message Passing Interface (MPI) library, which results in a good parallel scaling on modern supercomputers. We give simple applications for the hydrogen atom using the benchmark problems coming from the references and obtain repeatable results. The extensions to other laser-atom systems are straightforward with minimal modifications of the source code.
Fencing direct memory access data transfers in a parallel active messaging interface of a parallel computer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Blocksome, Michael A.; Mamidala, Amith R.

2013-09-03

Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to segments of shared random access memory through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and a segmentmore » of shared memory; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.« less
Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications

NASA Technical Reports Server (NTRS)

OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)

1998-01-01

This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
Multitasking a three-dimensional Navier-Stokes algorithm on the Cray-2

NASA Technical Reports Server (NTRS)

Swisshelm, Julie M.

1989-01-01

A three-dimensional computational aerodynamics algorithm has been multitasked for efficient parallel execution on the Cray-2. It provides a means for examining the multitasking performance of a complete CFD application code. An embedded zonal multigrid scheme is used to solve the Reynolds-averaged Navier-Stokes equations for an internal flow model problem. The explicit nature of each component of the method allows a spatial partitioning of the computational domain to achieve a well-balanced task load for MIMD computers with vector-processing capability. Experiments have been conducted with both two- and three-dimensional multitasked cases. The best speedup attained by an individual task group was 3.54 on four processors of the Cray-2, while the entire solver yielded a speedup of 2.67 on four processors for the three-dimensional case. The multiprocessing efficiency of various types of computational tasks is examined, performance on two Cray-2s with different memory access speeds is compared, and extrapolation to larger problems is discussed.
Capabilities of Fully Parallelized MHD Stability Code MARS

NASA Astrophysics Data System (ADS)

Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang

2016-10-01

Results of full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. Parallel version of MARS, named PMARS, has been recently developed at FAR-TECH. Parallelized MARS is an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, implemented in MARS. Parallelization of the code included parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse vector iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the MARS algorithm using parallel libraries and procedures. Parallelized MARS is capable of calculating eigenmodes with significantly increased spatial resolution: up to 5,000 adapted radial grid points with up to 500 poloidal harmonics. Such resolution is sufficient for simulation of kink, tearing and peeling-ballooning instabilities with physically relevant parameters. Work is supported by the U.S. DOE SBIR program.
GPU-based Branchless Distance-Driven Projection and Backprojection

PubMed Central

Liu, Rui; Fu, Lin; De Man, Bruno; Yu, Hengyong

2017-01-01

Projection and backprojection operations are essential in a variety of image reconstruction and physical correction algorithms in CT. The distance-driven (DD) projection and backprojection are widely used for their highly sequential memory access pattern and low arithmetic cost. However, a typical DD implementation has an inner loop that adjusts the calculation depending on the relative position between voxel and detector cell boundaries. The irregularity of the branch behavior makes it inefficient to be implemented on massively parallel computing devices such as graphics processing units (GPUs). Such irregular branch behaviors can be eliminated by factorizing the DD operation as three branchless steps: integration, linear interpolation, and differentiation, all of which are highly amenable to massive vectorization. In this paper, we implement and evaluate a highly parallel branchless DD algorithm for 3D cone beam CT. The algorithm utilizes the texture memory and hardware interpolation on GPUs to achieve fast computational speed. The developed branchless DD algorithm achieved 137-fold speedup for forward projection and 188-fold speedup for backprojection relative to a single-thread CPU implementation. Compared with a state-of-the-art 32-thread CPU implementation, the proposed branchless DD achieved 8-fold acceleration for forward projection and 10-fold acceleration for backprojection. GPU based branchless DD method was evaluated by iterative reconstruction algorithms with both simulation and real datasets. It obtained visually identical images as the CPU reference algorithm. PMID:29333480
GPU-based Branchless Distance-Driven Projection and Backprojection.

PubMed

Liu, Rui; Fu, Lin; De Man, Bruno; Yu, Hengyong

2017-12-01

Projection and backprojection operations are essential in a variety of image reconstruction and physical correction algorithms in CT. The distance-driven (DD) projection and backprojection are widely used for their highly sequential memory access pattern and low arithmetic cost. However, a typical DD implementation has an inner loop that adjusts the calculation depending on the relative position between voxel and detector cell boundaries. The irregularity of the branch behavior makes it inefficient to be implemented on massively parallel computing devices such as graphics processing units (GPUs). Such irregular branch behaviors can be eliminated by factorizing the DD operation as three branchless steps: integration, linear interpolation, and differentiation, all of which are highly amenable to massive vectorization. In this paper, we implement and evaluate a highly parallel branchless DD algorithm for 3D cone beam CT. The algorithm utilizes the texture memory and hardware interpolation on GPUs to achieve fast computational speed. The developed branchless DD algorithm achieved 137-fold speedup for forward projection and 188-fold speedup for backprojection relative to a single-thread CPU implementation. Compared with a state-of-the-art 32-thread CPU implementation, the proposed branchless DD achieved 8-fold acceleration for forward projection and 10-fold acceleration for backprojection. GPU based branchless DD method was evaluated by iterative reconstruction algorithms with both simulation and real datasets. It obtained visually identical images as the CPU reference algorithm.
Pattern classification using an olfactory model with PCA feature selection in electronic noses: study and application.

PubMed

Fu, Jun; Huang, Canqin; Xing, Jianguo; Zheng, Junbao

2012-01-01

Biologically-inspired models and algorithms are considered as promising sensor array signal processing methods for electronic noses. Feature selection is one of the most important issues for developing robust pattern recognition models in machine learning. This paper describes an investigation into the classification performance of a bionic olfactory model with the increase of the dimensions of input feature vector (outer factor) as well as its parallel channels (inner factor). The principal component analysis technique was applied for feature selection and dimension reduction. Two data sets of three classes of wine derived from different cultivars and five classes of green tea derived from five different provinces of China were used for experiments. In the former case the results showed that the average correct classification rate increased as more principal components were put in to feature vector. In the latter case the results showed that sufficient parallel channels should be reserved in the model to avoid pattern space crowding. We concluded that 6~8 channels of the model with principal component feature vector values of at least 90% cumulative variance is adequate for a classification task of 3~5 pattern classes considering the trade-off between time consumption and classification rate.
Acoustic 3D modeling by the method of integral equations

NASA Astrophysics Data System (ADS)

Malovichko, M.; Khokhlov, N.; Yavich, N.; Zhdanov, M.

2018-02-01

This paper presents a parallel algorithm for frequency-domain acoustic modeling by the method of integral equations (IE). The algorithm is applied to seismic simulation. The IE method reduces the size of the problem but leads to a dense system matrix. A tolerable memory consumption and numerical complexity were achieved by applying an iterative solver, accompanied by an effective matrix-vector multiplication operation, based on the fast Fourier transform (FFT). We demonstrate that, the IE system matrix is better conditioned than that of the finite-difference (FD) method, and discuss its relation to a specially preconditioned FD matrix. We considered several methods of matrix-vector multiplication for the free-space and layered host models. The developed algorithm and computer code were benchmarked against the FD time-domain solution. It was demonstrated that, the method could accurately calculate the seismic field for the models with sharp material boundaries and a point source and receiver located close to the free surface. We used OpenMP to speed up the matrix-vector multiplication, while MPI was used to speed up the solution of the system equations, and also for parallelizing across multiple sources. The practical examples and efficiency tests are presented as well.
(Re)engineering Earth System Models to Expose Greater Concurrency for Ultrascale Computing: Practice, Experience, and Musings

NASA Astrophysics Data System (ADS)

Mills, R. T.

2014-12-01

As the high performance computing (HPC) community pushes towards the exascale horizon, the importance and prevalence of fine-grained parallelism in new computer architectures is increasing. This is perhaps most apparent in the proliferation of so-called "accelerators" such as the Intel Xeon Phi or NVIDIA GPGPUs, but the trend also holds for CPUs, where serial performance has grown slowly and effective use of hardware threads and vector units are becoming increasingly important to realizing high performance. This has significant implications for weather, climate, and Earth system modeling codes, many of which display impressive scalability across MPI ranks but take relatively little advantage of threading and vector processing. In addition to increasing parallelism, next generation codes will also need to address increasingly deep hierarchies for data movement: NUMA/cache levels, on node vs. off node, local vs. wide neighborhoods on the interconnect, and even in the I/O system. We will discuss some approaches (grounded in experiences with the Intel Xeon Phi architecture) for restructuring Earth science codes to maximize concurrency across multiple levels (vectors, threads, MPI ranks), and also discuss some novel approaches for minimizing expensive data movement/communication.
A complete set of two-dimensional harmonic vortices on a spherical surface

NASA Astrophysics Data System (ADS)

Esparza, Christian; Rendón, Pablo Luis; Ley Koo, Eugenio

2018-03-01

The solutions of the Euler equations on a spherical surface are constructed, starting from a vector velocity potential A in the radial direction and with a two-dimensional spherical harmonic variation of order m and well-defined parity under \\varphi \\mapsto -\\varphi . The solutions are well-behaved on the entire surface and continuous at the position of a parallel circle θ ={θ }0, where the vorticity is shown to be harmonically distributed. The velocity field is evaluated as the curl of the vector potential: it is shown that the velocity is divergenceless and distributed on the spherical surface. Its polar components at the parallel circle are shown to be continuous, confirming its divergenceless nature, while its azimuthal components are discontinuous at the circle, and their discontinuity is a measure of the vorticity in the radial direction. A closed form for the velocity field lines is also obtained in terms of fixed values of the scalar harmonic function associated with the vector potential. Additionally, the connections of the solutions on a spherical surface with their circular, elliptic and bipolar counterparts on the equatorial plane are implemented via stereographic projections.
Morphological evidence for parallel processing of information in rat macula.

PubMed

Ross, M D

1988-01-01

Study of montages, tracings and reconstructions prepared from a series of 570 consecutive ultrathin sections shows that rat maculas are morphologically organized for parallel processing of linear acceleratory information. Type II cells of one terminal field distribute information to neighboring terminals as well. The findings are examined in light of physiological data which indicate that macular receptor fields have a preferred directional vector, and are interpreted by analogy to a computer technology known as an information network.
Determining the Index of Refraction of an Unknown Object using Passive Polarimetric Imagery Degraded by Atmospheric Turbulence

DTIC Science & Technology

2010-08-09

44 9 A photograph of a goniophotometer used by Bell and a schematic of a goniophotometer used by Mian et al...plane is called the parallel field component because it lies parallel to the specular plane. The incident electric field vector component which...resides in the plane or- thogonal to the specular plane is called the perpendicular field component because it lies perpendicular to the specular plane. If
Large-Scale Parallel Simulations of Turbulent Combustion using Combined Dimension Reduction and Tabulation of Chemistry

DTIC Science & Technology

2012-05-22

tabulation of the reduced space is performed using the In Situ Adaptive Tabulation ( ISAT ) algorithm. In addition, we use x2f mpi – a Fortran library...for parallel vector-valued function evaluation (used with ISAT in this context) – to efficiently redistribute the chemistry workload among the...Constrained-Equilibrium (RCCE) method, and tabulation of the reduced space is performed using the In Situ Adaptive Tabulation ( ISAT ) algorithm. In addition
Parallel Visualization Co-Processing of Overnight CFD Propulsion Applications

NASA Technical Reports Server (NTRS)

Edwards, David E.; Haimes, Robert

1999-01-01

An interactive visualization system pV3 is being developed for the investigation of advanced computational methodologies employing visualization and parallel processing for the extraction of information contained in large-scale transient engineering simulations. Visual techniques for extracting information from the data in terms of cutting planes, iso-surfaces, particle tracing and vector fields are included in this system. This paper discusses improvements to the pV3 system developed under NASA's Affordable High Performance Computing project.
Real-time dynamic simulation of the Cassini spacecraft using DARTS. Part 2: Parallel/vectorized real-time implementation

NASA Technical Reports Server (NTRS)

Fijany, A.; Roberts, J. A.; Jain, A.; Man, G. K.

1993-01-01

Part 1 of this paper presented the requirements for the real-time simulation of Cassini spacecraft along with some discussion of the DARTS algorithm. Here, in Part 2 we discuss the development and implementation of parallel/vectorized DARTS algorithm and architecture for real-time simulation. Development of the fast algorithms and architecture for real-time hardware-in-the-loop simulation of spacecraft dynamics is motivated by the fact that it represents a hard real-time problem, in the sense that the correctness of the simulation depends on both the numerical accuracy and the exact timing of the computation. For a given model fidelity, the computation should be computed within a predefined time period. Further reduction in computation time allows increasing the fidelity of the model (i.e., inclusion of more flexible modes) and the integration routine.
A hybrid dynamic harmony search algorithm for identical parallel machines scheduling

NASA Astrophysics Data System (ADS)

Chen, Jing; Pan, Quan-Ke; Wang, Ling; Li, Jun-Qing

2012-02-01

In this article, a dynamic harmony search (DHS) algorithm is proposed for the identical parallel machines scheduling problem with the objective to minimize makespan. First, an encoding scheme based on a list scheduling rule is developed to convert the continuous harmony vectors to discrete job assignments. Second, the whole harmony memory (HM) is divided into multiple small-sized sub-HMs, and each sub-HM performs evolution independently and exchanges information with others periodically by using a regrouping schedule. Third, a novel improvisation process is applied to generate a new harmony by making use of the information of harmony vectors in each sub-HM. Moreover, a local search strategy is presented and incorporated into the DHS algorithm to find promising solutions. Simulation results show that the hybrid DHS (DHS_LS) is very competitive in comparison to its competitors in terms of mean performance and average computational time.
Implementation of an ADI method on parallel computers

NASA Technical Reports Server (NTRS)

Fatoohi, Raad A.; Grosch, Chester E.

1987-01-01

The implementation of an ADI method for solving the diffusion equation on three parallel/vector computers is discussed. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, an SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the FLEX/32 and CRAY/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.
Implementation of an ADI method on parallel computers

NASA Technical Reports Server (NTRS)

Fatoohi, Raad A.; Grosch, Chester E.

1987-01-01

In this paper the implementation of an ADI method for solving the diffusion equation on three parallel/vector computers is discussed. The computers were chosen so as to encompass a variety of architectures. They are the MPP, an SIMD machine with 16-Kbit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the Flex/32 and Cray/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally conclusions are presented.

Multi-Modulator for Bandwidth-Efficient Communication

NASA Technical Reports Server (NTRS)

Gray, Andrew; Lee, Dennis; Lay, Norman; Cheetham, Craig; Fong, Wai; Yeh, Pen-Shu; King, Robin; Ghuman, Parminder; Hoy, Scott; Fisher, Dave

2009-01-01

A modulator circuit board has recently been developed to be used in conjunction with a vector modulator to generate any of a large number of modulations for bandwidth-efficient radio transmission of digital data signals at rates than can exceed 100 Mb/s. The modulations include quadrature phaseshift keying (QPSK), offset quadrature phase-shift keying (OQPSK), Gaussian minimum-shift keying (GMSK), and octonary phase-shift keying (8PSK) with square-root raised-cosine pulse shaping. The figure is a greatly simplified block diagram showing the relationship between the modulator board and the rest of the transmitter. The role of the modulator board is to encode the incoming data stream and to shape the resulting pulses, which are fed as inputs to the vector modulator. The combination of encoding and pulse shaping in a given application is chosen to maximize the bandwidth efficiency. The modulator board includes gallium arsenide serial-to-parallel converters at its input end. A complementary metal oxide/semiconductor (CMOS) field-programmable gate array (FPGA) performs the coding and modulation computations and utilizes parallel processing in doing so. The results of the parallel computation are combined and converted to pulse waveforms by use of gallium arsenide parallel-to-serial converters integrated with digital-to-analog converters. Without changing the hardware, one can configure the modulator to produce any of the designed combinations of coding and modulation by loading the appropriate bit configuration file into the FPGA.
A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoemmen, Mark

2010-11-01

Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, 'communication' includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches formore » orthogonalizing the vectors within each block ('normalization'). TSQR's rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5-20) of very long vectors. The shared-memory parallel component uses Intel's Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.« less
Aerodynamic Characterizations of Asymmetric and Maneuvering 105-, 120-, and 155-mm Fin-Stabilized Projectiles Derived from Telemetry Experiments

DTIC Science & Technology

2011-04-01

roll rates are estimates of projectile roll rates with respect to the sun and the local geomagnetic field respectively. The solar aspect angle is the...vector and a vector originating at the CG and parallel to the local geomagnetic field. Methodologies employed to obtain these and other airframe states...and an independent approach (POINTER) and relative magnitude information about the side moments was obtained. VAPP-24 underwent a reversal in coning
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zuo, Wangda; McNeil, Andrew; Wetter, Michael

2011-09-06

We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
A high performance linear equation solver on the VPP500 parallel supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nakanishi, Makoto; Ina, Hiroshi; Miura, Kenichi

1994-12-31

This paper describes the implementation of two high performance linear equation solvers developed for the Fujitsu VPP500, a distributed memory parallel supercomputer system. The solvers take advantage of the key architectural features of VPP500--(1) scalability for an arbitrary number of processors up to 222 processors, (2) flexible data transfer among processors provided by a crossbar interconnection network, (3) vector processing capability on each processor, and (4) overlapped computation and transfer. The general linear equation solver based on the blocked LU decomposition method achieves 120.0 GFLOPS performance with 100 processors in the LIN-PACK Highly Parallel Computing benchmark.
Successes and failures of sixty years of vector control in French Guiana: what is the next step?

PubMed

Epelboin, Yanouk; Chaney, Sarah C; Guidez, Amandine; Habchi-Hanriot, Nausicaa; Talaga, Stanislas; Wang, Lanjiao; Dusfour, Isabelle

2018-03-12

Since the 1940s, French Guiana has implemented vector control to contain or eliminate malaria, yellow fever, and, recently, dengue, chikungunya, and Zika. Over time, strategies have evolved depending on the location, efficacy of the methods, development of insecticide resistance, and advances in vector control techniques. This review summarises the history of vector control in French Guiana by reporting the records found in the private archives of the Institute Pasteur in French Guiana and those accessible in libraries worldwide. This publication highlights successes and failures in vector control and identifies the constraints and expectations for vector control in this French overseas territory in the Americas.
Proceedings: Sisal `93

DOE Office of Scientific and Technical Information (OSTI.GOV)

Feo, J.T.

1993-10-01

This report contain papers on: Programmability and performance issues; The case of an iterative partial differential equation solver; Implementing the kernal of the Australian Region Weather Prediction Model in Sisal; Even and quarter-even prime length symmetric FFTs and their Sisal Implementations; Top-down thread generation for Sisal; Overlapping communications and computations on NUMA architechtures; Compiling technique based on dataflow analysis for funtional programming language Valid; Copy elimination for true multidimensional arrays in Sisal 2.0; Increasing parallelism for an optimization that reduces copying in IF2 graphs; Caching in on Sisal; Cache performance of Sisal Vs. FORTRAN; FFT algorithms on a shared-memory multiprocessor;more » A parallel implementation of nonnumeric search problems in Sisal; Computer vision algorithms in Sisal; Compilation of Sisal for a high-performance data driven vector processor; Sisal on distributed memory machines; A virtual shared addressing system for distributed memory Sisal; Developing a high-performance FFT algorithm in Sisal for a vector supercomputer; Implementation issues for IF2 on a static data-flow architechture; and Systematic control of parallelism in array-based data-flow computation. Selected papers have been indexed separately for inclusion in the Energy Science and Technology Database.« less
A software architecture for multidisciplinary applications: Integrating task and data parallelism

NASA Technical Reports Server (NTRS)

Chapman, Barbara; Mehrotra, Piyush; Vanrosendale, John; Zima, Hans

1994-01-01

Data parallel languages such as Vienna Fortran and HPF can be successfully applied to a wide range of numerical applications. However, many advanced scientific and engineering applications are of a multidisciplinary and heterogeneous nature and thus do not fit well into the data parallel paradigm. In this paper we present new Fortran 90 language extensions to fill this gap. Tasks can be spawned as asynchronous activities in a homogeneous or heterogeneous computing environment; they interact by sharing access to Shared Data Abstractions (SDA's). SDA's are an extension of Fortran 90 modules, representing a pool of common data, together with a set of Methods for controlled access to these data and a mechanism for providing persistent storage. Our language supports the integration of data and task parallelism as well as nested task parallelism and thus can be used to express multidisciplinary applications in a natural and efficient way.
Iterative-method performance evaluation for multiple vectors associated with a large-scale sparse matrix

NASA Astrophysics Data System (ADS)

Imamura, Seigo; Ono, Kenji; Yokokawa, Mitsuo

2016-07-01

Ensemble computing, which is an instance of capacity computing, is an effective computing scenario for exascale parallel supercomputers. In ensemble computing, there are multiple linear systems associated with a common coefficient matrix. We improve the performance of iterative solvers for multiple vectors by solving them at the same time, that is, by solving for the product of the matrices. We implemented several iterative methods and compared their performance. The maximum performance on Sparc VIIIfx was 7.6 times higher than that of a naïve implementation. Finally, to deal with the different convergence processes of linear systems, we introduced a control method to eliminate the calculation of already converged vectors.
Enhanced secure 4-D modulation space optical multi-carrier system based on joint constellation and Stokes vector scrambling.

PubMed

Liu, Bo; Zhang, Lijia; Xin, Xiangjun

2018-03-19

This paper proposes and demonstrates an enhanced secure 4-D modulation optical generalized filter bank multi-carrier (GFBMC) system based on joint constellation and Stokes vector scrambling. The constellation and Stokes vectors are scrambled by using different scrambling parameters. A multi-scroll Chua's circuit map is adopted as the chaotic model. Large secure key space can be obtained due to the multi-scroll attractors and independent operability of subcarriers. A 40.32Gb/s encrypted optical GFBMC signal with 128 parallel subcarriers is successfully demonstrated in the experiment. The results show good resistance against the illegal receiver and indicate a potential way for the future optical multi-carrier system.
High-performance ultra-low power VLSI analog processor for data compression

NASA Technical Reports Server (NTRS)

Tawel, Raoul (Inventor)

1996-01-01

An apparatus for data compression employing a parallel analog processor. The apparatus includes an array of processor cells with N columns and M rows wherein the processor cells have an input device, memory device, and processor device. The input device is used for inputting a series of input vectors. Each input vector is simultaneously input into each column of the array of processor cells in a pre-determined sequential order. An input vector is made up of M components, ones of which are input into ones of M processor cells making up a column of the array. The memory device is used for providing ones of M components of a codebook vector to ones of the processor cells making up a column of the array. A different codebook vector is provided to each of the N columns of the array. The processor device is used for simultaneously comparing the components of each input vector to corresponding components of each codebook vector, and for outputting a signal representative of the closeness between the compared vector components. A combination device is used to combine the signal output from each processor cell in each column of the array and to output a combined signal. A closeness determination device is then used for determining which codebook vector is closest to an input vector from the combined signals, and for outputting a codebook vector index indicating which of the N codebook vectors was the closest to each input vector input into the array.
A remote sensing and geographic information system approach to sampling malaria vector habitats in Chiapas, Mexico

NASA Astrophysics Data System (ADS)

Beck, L.; Wood, B.; Whitney, S.; Rossi, R.; Spanner, M.; Rodriguez, M.; Rodriguez-Ramirez, A.; Salute, J.; Legters, L.; Roberts, D.; Rejmankova, E.; Washino, R.

1993-08-01

This paper describes a procedure whereby remote sensing and geographic information system (GIS) technologies are used in a sample design to study the habitat of Anopheles albimanus, one of the principle vectors of malaria in Central America. This procedure incorporates Landsat-derived land cover maps with digital elevation and road network data to identify a random selection of larval habitats accessible for field sampling. At the conclusion of the sampling season, the larval counts will be used to determine habitat productivity, and then integrated with information on human settlement to assess where people are at high risk of malaria. This aproach would be appropriate in areas where land cover information is lacking and problems of access constrain field sampling. The use of a GIS also permits other data (such as insecticide spraying data) to the incorporated in the sample design as they arise. This approach would also be pertinent for other tropical vector-borne diseases, particularly where human activities impact disease vector habitat.
Simultaneous generation of 40, 80 and 120 GHz optical millimeter-wave from one Mach-Zehnder modulator and demonstration of millimeter-wave transmission and down-conversion

NASA Astrophysics Data System (ADS)

Zhou, Wen; Qin, Chaoyi

2017-09-01

We demonstrate multi-frequency QPSK millimeter-wave (mm-wave) vector signal generation enabled by MZM-based optical carrier suppression (OCS) modulation and in-phase/quadrature (I/Q) modulation. We numerically simulate the generation of 40-, 80- and 120-GHz vector signal. Here, the three different signals carry the same QPSK modulation information. We also experimentally realize 11Gbaud/s QPSK vector signal transmission over 20 km fiber, and the generation of the vector signals at 40-GHz, 80-GHz and 120-GHz. The experimental results show that the bit-error-rate (BER) for all the three different signals can reach the forward-error-correction (FEC) threshold of 3.8×10-3. The advantage of the proposed system is that provide high-speed, high-bandwidth and high-capacity seamless access of TDM and wireless network. These features indicate the important application prospect in wireless access networks for WiMax, Wi-Fi and 5G/LTE.
File concepts for parallel I/O

NASA Technical Reports Server (NTRS)

Crockett, Thomas W.

1989-01-01

The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.
Computational Performance of Intel MIC, Sandy Bridge, and GPU Architectures: Implementation of a 1D c++/OpenMP Electrostatic Particle-In-Cell Code

DTIC Science & Technology

2014-05-01

fusion, space and astrophysical plasmas, but still the general picture can be presented quite well with the fluid approach [6, 7]. The microscopic...purpose computing CPU for algorithms where processing of large blocks of data is done in parallel. The reason for that is the GPU’s highly effective...parallel structure. Most of the image and video processing computations involve heavy matrix and vector op- erations over large amounts of data and
Gyroscope precession along bound equatorial plane orbits around a Kerr black hole

NASA Astrophysics Data System (ADS)

Bini, Donato; Geralico, Andrea; Jantzen, Robert T.

2016-09-01

The precession of a test gyroscope along stable bound equatorial plane orbits around a Kerr black hole is analyzed, and the precession angular velocity of the gyro's parallel transported spin vector and the increment in the precession angle after one orbital period is evaluated. The parallel transported Marck frame which enters this discussion is shown to have an elegant geometrical explanation in terms of the electric and magnetic parts of the Killing-Yano 2-form and a Wigner rotation effect.
Vectorial magnetometry with the magneto-optic Kerr effect applied to Co/Cu/Co trilayer structures

NASA Astrophysics Data System (ADS)

Daboo, C.; Bland, J. A. C.; Hicken, R. J.; Ives, A. J. R.; Baird, M. J.; Walker, M. J.

1993-05-01

We describe an arrangement in which the magnetization components parallel and perpendicular to the applied field are both determined from longitudinal magneto-optic Kerr effect measurements. This arrangement differs from the usual procedures in that the same optical geometry is used but the magnet geometry altered. This leads to two magneto-optic signals which are directly comparable in magnitude thereby giving the in-plane magnetization vector directly. We show that it is of great value to study both in-plane magnetization vector components when studying coupled structures where significant anisotropies are also present. We discuss simulations which show that it is possible to accurately determine the coupling strength in such structures by examining the behavior of the component of magnetization perpendicular to the applied field in the vicinity of the hard in-plane anisotropy axis. We illustrate this technique by examining the magnetization and magnetic anisotropy behavior of ultrathin Co/Cu(111)/Co (dCu=20 Å and 27 Å) trilayer structures prepared by molecular beam epitaxy, in which coherent rotation of the magnetization vector is observed when the magnetic field B is applied along the hard in-plane anisotropy axis, with the magnitude of the magnetization vector constant and close to its bulk value. Results of micromagnetic calculations closely reproduce the observed parallel and perpendicular magnetization loops, and yield strong uniaxial magnetic anisotropies in both layers, while the interlayer coupling appears to be absent or negligible in comparison with the anisotropy strengths.
TRAC Searchable Research Library

DTIC Science & Technology

2016-05-01

network accessible document repository for technical documents and similar document artifacts. We used a model-based approach using the Vector...demonstration and model refinement. 14. SUBJECT TERMS Knowledge Management, Document Repository , Digital Library, Vector Directional Data Model...27 Figure D1. Administrator Repository Upload Page. ................................................................... D-2 Figure D2
On the Impact of Widening Vector Registers on Sequence Alignment

DOE Office of Scientific and Technical Information (OSTI.GOV)

Daily, Jeffrey A.; Kalyanaraman, Anantharaman; Krishnamoorthy, Sriram

2016-09-22

Vector extensions, such as SSE, have been part of the x86 since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. In this paper, we demonstrate that the trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based onmore » striped data layouts. We present a practically efficient SIMD implementation of a parallel scan based sequence alignment algorithm that can better exploit wider SIMD units. We conduct comprehensive workload and use case analyses to characterize the relative behavior of the striped and scan approaches and identify the best choice of algorithm based on input length and SIMD width.« less
Electric fields and vector potentials of thin cylindrical antennas

NASA Astrophysics Data System (ADS)

King, Ronold W. P.

1990-09-01

The vector potential and electric field generated by the current in a center-driven or parasitic dipole antenna that extends from z = -h to z = h are investigated for each of the several components of the current. These include sin k(h - absolute value of z), sin k (absolute value of z) - sin kh, cos kz - cos kh, and cos kz/2 - cos kh/2. Of special interest are the interactions among the variously spaced elements in parallel nonstaggered arrays. These depend on the mutual vector potentials. It is shown that at a radial distance rho approximately = h and in the range z = -h to h, the vector potentials due to all four components become alike and have an approximately plane-wave form. Simple approximate formulas for the electric fields and vector potentials generated by each of the four distributions are derived and compared with the exact results. The application of the new formulas to large arrays is discussed.

CRITICAL EVALUATION OF THE EFFECTIVENESS OF SEWAGE SLUDGE DISINFECTION AND VECTOR ATTRACTION REDUCTION PROCESSES

EPA Science Inventory

What is the current state of management practices for biosolids production and application, and how can those be made more effective? How effective are Class B disinfection and vector attraction processes, and public access and harvesting restrictions at reducing the public's exp...
A Vector Representation for Thermodynamic Relationships

ERIC Educational Resources Information Center

Pogliani, Lionello

2006-01-01

The existing vector formalism method for thermodynamic relationship maintains tractability and uses accessible mathematics, which can be seen as a diverting and entertaining step into the mathematical formalism of thermodynamics and as an elementary application of matrix algebra. The method is based on ideas and operations apt to improve the…
Access to CAMAC from VxWorks and UNIX in DART

DOE Office of Scientific and Technical Information (OSTI.GOV)

Streets, J.; Meadows, J.; Moore, C.

1995-05-01

As part of the DART Project the authors have developed a package of software for CAMAC access from UNIX and VxWorks platforms, with support for several hardware interfaces. They report on developments for the CES CBD8210 VME to parallel CAMAC, the Hytec VSD2992 VME to serial CAMAC and Jorway 411S SCSI to parallel and serial CAMAC branch drivers, and give a summary of the timings obtained.
[Use of a novel baculovirus vector to express nucleoprotein gene of Crimean-Congo hemorrhagic fever virus in both insect and mammalian cells].

PubMed

Ma, Benjiang; Hang, Changshou; Zhao, Yun; Wang, Shiwen; Xie, Yanxiang

2002-09-01

To construct a novel baculovirus vector which is capable of promoting the high-yield expression of foreign gene in mammalian cells and to express by this vector the nucleoprotein (NP) gene of Crimean-Congo hemorrhagic fever virus (CCHFV) Chinese isolate (Xinjiang hemorrhagic fever virus, XHFV) BA88166 in insect and Vero cells. Human cytomegalovirus (CMV) immediate early (IE) promoter was ligated to the baculovirus vector pFastBac1 downstream of the polyhedrin promoter to give rise to the novel vector pCB1. XHFV NP gene was cloned to this vector and was well expressed in COS-7 cells and Vero cells by means of recombinant plasmid transfection and baculovirus infection. The XHFV NP gene in vector pCB1 could be well expressed in mammalian cells. Vero cells infected with recombinant baculovirus harboring NP gene could be employed as antigens to detect XHF serum specimens whose results were in good correlation with those of ELISA and in parallel with clinical diagnoses. This novel baculovirus vector is able to express the foreign gene efficiently in both insect and mammalian cells, which provides not only the convenient diagnostic antigens but also the potential for developing recombinant virus vaccines and gene therapies.
Efficient ICCG on a shared memory multiprocessor

NASA Technical Reports Server (NTRS)

Hammond, Steven W.; Schreiber, Robert

1989-01-01

Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially.
A parallel finite-difference method for computational aerodynamics

NASA Technical Reports Server (NTRS)

Swisshelm, Julie M.

1989-01-01

A finite-difference scheme for solving complex three-dimensional aerodynamic flow on parallel-processing supercomputers is presented. The method consists of a basic flow solver with multigrid convergence acceleration, embedded grid refinements, and a zonal equation scheme. Multitasking and vectorization have been incorporated into the algorithm. Results obtained include multiprocessed flow simulations from the Cray X-MP and Cray-2. Speedups as high as 3.3 for the two-dimensional case and 3.5 for segments of the three-dimensional case have been achieved on the Cray-2. The entire solver attained a factor of 2.7 improvement over its unitasked version on the Cray-2. The performance of the parallel algorithm on each machine is analyzed.
A VLSI chip set for real time vector quantization of image sequences

NASA Technical Reports Server (NTRS)

Baker, Richard L.

1989-01-01

The architecture and implementation of a VLSI chip set that vector quantizes (VQ) image sequences in real time is described. The chip set forms a programmable Single-Instruction, Multiple-Data (SIMD) machine which can implement various vector quantization encoding structures. Its VQ codebook may contain unlimited number of codevectors, N, having dimension up to K = 64. Under a weighted least squared error criterion, the engine locates at video rates the best code vector in full-searched or large tree searched VQ codebooks. The ability to manipulate tree structured codebooks, coupled with parallelism and pipelining, permits searches in as short as O (log N) cycles. A full codebook search results in O(N) performance, compared to O(KN) for a Single-Instruction, Single-Data (SISD) machine. With this VLSI chip set, an entire video code can be built on a single board that permits realtime experimentation with very large codebooks.
Kalman Filter Tracking on Parallel Architectures

NASA Astrophysics Data System (ADS)

Cerati, Giuseppe; Elmer, Peter; Lantz, Steven; McDermott, Kevin; Riley, Dan; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi

2015-12-01

Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors, but the future will be even more exciting. In order to stay within the power density limits but still obtain Moore's Law performance/price gains, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Example technologies today include Intel's Xeon Phi and GPGPUs. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High Luminosity LHC, for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques including Cellular Automata or returning to Hough Transform. The most common track finding techniques in use today are however those based on the Kalman Filter [2]. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust and are exactly those being used today for the design of the tracking system for HL-LHC. Our previous investigations showed that, using optimized data structures, track fitting with Kalman Filter can achieve large speedup both with Intel Xeon and Xeon Phi. We report here our further progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a realistic simulation setup.
Lattice QCD calculation using VPP500

DOE Office of Scientific and Technical Information (OSTI.GOV)

Kim, Seyong; Ohta, Shigemi

1995-02-01

A new vector parallel supercomputer, Fujitsu VPP500, was installed at RIKEN earlier this year. It consists of 30 vector computers, each with 1.6 GFLOPS peak speed and 256 MB memory, connected by a crossbar switch with 400 MB/s peak data transfer rate each way between any pair of nodes. The authors developed a Fortran lattice QCD simulation code for it. It runs at about 1.1 GFLOPS sustained per node for Metropolis pure-gauge update, and about 0.8 GFLOPS sustained per node for conjugate gradient inversion of staggered fermion matrix.
Dual-scale topology optoelectronic processor.

PubMed

Marsden, G C; Krishnamoorthy, A V; Esener, S C; Lee, S H

1991-12-15

The dual-scale topology optoelectronic processor (D-STOP) is a parallel optoelectronic architecture for matrix algebraic processing. The architecture can be used for matrix-vector multiplication and two types of vector outer product. The computations are performed electronically, which allows multiplication and summation concepts in linear algebra to be generalized to various nonlinear or symbolic operations. This generalization permits the application of D-STOP to many computational problems. The architecture uses a minimum number of optical transmitters, which thereby reduces fabrication requirements while maintaining area-efficient electronics. The necessary optical interconnections are space invariant, minimizing space-bandwidth requirements.
Full Wave Analysis of RF Signal Attenuation in a Lossy Cave using a High Order Time Domain Vector Finite Element Method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pingenot, J; Rieben, R; White, D

2004-12-06

We present a computational study of signal propagation and attenuation of a 200 MHz dipole antenna in a cave environment. The cave is modeled as a straight and lossy random rough wall. To simulate a broad frequency band, the full wave Maxwell equations are solved directly in the time domain via a high order vector finite element discretization using the massively parallel CEM code EMSolve. The simulation is performed for a series of random meshes in order to generate statistical data for the propagation and attenuation properties of the cave environment. Results for the power spectral density and phase ofmore » the electric field vector components are presented and discussed.« less
Optimized scalable network switch

DOEpatents

Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY

2007-12-04

In a massively parallel computing system having a plurality of nodes configured in m multi-dimensions, each node including a computing device, a method for routing packets towards their destination nodes is provided which includes generating at least one of a 2m plurality of compact bit vectors containing information derived from downstream nodes. A multilevel arbitration process in which downstream information stored in the compact vectors, such as link status information and fullness of downstream buffers, is used to determine a preferred direction and virtual channel for packet transmission. Preferred direction ranges are encoded and virtual channels are selected by examining the plurality of compact bit vectors. This dynamic routing method eliminates the necessity of routing tables, thus enhancing scalability of the switch.
Electromagnetic fluctuation spectra of collective oscillations in magnetized Maxwellian equal mass plasmas for low-frequency waves

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vafin, S.; Schlickeiser, R.; Yoon, P. H.

Recently, the general electromagnetic fluctuation theory for magnetized plasmas has been used to study the steady-state fluctuation spectra and the total intensity of low-frequency collective weakly damped modes for parallel wave vectors in Maxwellian plasmas. Now, we address the same question with respect to an arbitrary direction of the wave-vector. Here, we analyze this problem for equal mass plasmas. These plasmas are a very good tool to study various plasma phenomena, as they considerably facilitate the theoretical consideration and at the same time provide with their clear physical picture. Finally, we compare our results in the limiting case of parallelmore » wave vectors with the previous study.« less
Method and apparatus for second-rank tensor generation

NASA Technical Reports Server (NTRS)

Liu, Hua-Kuang (Inventor)

1991-01-01

A method and apparatus are disclosed for generation of second-rank tensors using a photorefractive crystal to perform the outer-product between two vectors via four-wave mixing, thereby taking 2n input data to a control n squared output data points. Two orthogonal amplitude modulated coherent vector beams x and y are expanded and then parallel sides of the photorefractive crystal in exact opposition. A beamsplitter is used to direct a coherent pumping beam onto the crystal at an appropriate angle so as to produce a conjugate beam that is the matrix product of the vector beam that propagates in the exact opposite direction from the pumping beam. The conjugate beam thus separated is the tensor output xy (sup T).
Optimized scalable network switch

DOEpatents

Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.

2010-02-23

In a massively parallel computing system having a plurality of nodes configured in m multi-dimensions, each node including a computing device, a method for routing packets towards their destination nodes is provided which includes generating at least one of a 2m plurality of compact bit vectors containing information derived from downstream nodes. A multilevel arbitration process in which downstream information stored in the compact vectors, such as link status information and fullness of downstream buffers, is used to determine a preferred direction and virtual channel for packet transmission. Preferred direction ranges are encoded and virtual channels are selected by examining the plurality of compact bit vectors. This dynamic routing method eliminates the necessity of routing tables, thus enhancing scalability of the switch.
A Performance Evaluation of the Cray X1 for Scientific Applications

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak; Borrill, Julian; Canning, Andrew; Carter, Jonathan; Djomehri, M. Jahed; Shan, Hongzhang; Skinner, David

2004-01-01

The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end capability and cost effectiveness. However, the recent development of massively parallel vector systems is having a significant effect on the supercomputing landscape. In this paper, we compare the performance of the recently released Cray X1 vector system with that of the cacheless NEC SX-6 vector machine, and the superscalar cache-based IBM Power3 and Power4 architectures for scientific applications. Overall results demonstrate that the X1 is quite promising, but performance improvements are expected as the hardware, systems software, and numerical libraries mature. Code reengineering to effectively utilize the complex architecture may also lead to significant efficiency enhancements.
A bibliography on parallel and vector numerical algorithms

NASA Technical Reports Server (NTRS)

Ortega, James M.; Voigt, Robert G.; Romine, Charles H.

1988-01-01

This is a bibliography on numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are also listed.
A bibliography on parallel and vector numerical algorithms

NASA Technical Reports Server (NTRS)

Ortega, J. M.; Voigt, R. G.

1987-01-01

This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also.
A bibliography on parallel and vector numerical algorithms

NASA Technical Reports Server (NTRS)

Ortega, James M.; Voigt, Robert G.; Romine, Charles H.

1990-01-01

This is a bibliography on numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are also listed.
Scalable Parallel Density-based Clustering and Applications

NASA Astrophysics Data System (ADS)

Patwary, Mostofa Ali

2014-04-01

Recently, density-based clustering algorithms (DBSCAN and OPTICS) have gotten significant attention of the scientific community due to their unique capability of discovering arbitrary shaped clusters and eliminating noise data. These algorithms have several applications, which require high performance computing, including finding halos and subhalos (clusters) from massive cosmology data in astrophysics, analyzing satellite images, X-ray crystallography, and anomaly detection. However, parallelization of these algorithms are extremely challenging as they exhibit inherent sequential data access order, unbalanced workload resulting in low parallel efficiency. To break the data access sequentiality and to achieve high parallelism, we develop new parallel algorithms, both for DBSCAN and OPTICS, designed using graph algorithmic techniques. For example, our parallel DBSCAN algorithm exploits the similarities between DBSCAN and computing connected components. Using datasets containing up to a billion floating point numbers, we show that our parallel density-based clustering algorithms significantly outperform the existing algorithms, achieving speedups up to 27.5 on 40 cores on shared memory architecture and speedups up to 5,765 using 8,192 cores on distributed memory architecture. In our experiments, we found that while achieving the scalability, our algorithms produce clustering results with comparable quality to the classical algorithms.

A parallel solver for huge dense linear systems

NASA Astrophysics Data System (ADS)

Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.

2011-11-01

HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system: Linux/Unix Has the code been vectorized or parallelized?: Yes, includes MPI primitives. RAM: Tested for up to 190 GB Classification: 6.5 External routines: MPI ( http://www.mpi-forum.org/), BLAS ( http://www.netlib.org/blas/), PLAPACK ( http://www.cs.utexas.edu/~plapack/), POOCLAPACK ( ftp://ftp.cs.utexas.edu/pub/rvdg/PLAPACK/pooclapack.ps) (code for PLAPACK and POOCLAPACK is included in the distribution). Catalogue identifier of previous version: AEHU_v1_0 Journal reference of previous version: Comput. Phys. Comm. 182 (2011) 533 Does the new version supersede the previous version?: Yes Nature of problem: Huge scale dense systems of linear equations, Ax=B, beyond standard LAPACK capabilities. Solution method: The linear systems are solved by means of parallelized routines based on the LU factorization, using efficient secondary storage algorithms when the available main memory is insufficient. Reasons for new version: In many applications we need to guarantee a high accuracy in the solution of very large linear systems and we can do it by using double-precision arithmetic. Summary of revisions: Version 1.1 Can be used to solve linear systems using double-precision arithmetic. New version of the initialization routine. The user can choose the kind of arithmetic and the values of several parameters of the environment. Running time: About 5 hours to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors using double-precision arithmetic on an eight-node commodity cluster with a total of 64 Intel cores.
Enhancing membrane protein subcellular localization prediction by parallel fusion of multi-view features.

PubMed

Yu, Dongjun; Wu, Xiaowei; Shen, Hongbin; Yang, Jian; Tang, Zhenmin; Qi, Yong; Yang, Jingyu

2012-12-01

Membrane proteins are encoded by ~ 30% in the genome and function importantly in the living organisms. Previous studies have revealed that membrane proteins' structures and functions show obvious cell organelle-specific properties. Hence, it is highly desired to predict membrane protein's subcellular location from the primary sequence considering the extreme difficulties of membrane protein wet-lab studies. Although many models have been developed for predicting protein subcellular locations, only a few are specific to membrane proteins. Existing prediction approaches were constructed based on statistical machine learning algorithms with serial combination of multi-view features, i.e., different feature vectors are simply serially combined to form a super feature vector. However, such simple combination of features will simultaneously increase the information redundancy that could, in turn, deteriorate the final prediction accuracy. That's why it was often found that prediction success rates in the serial super space were even lower than those in a single-view space. The purpose of this paper is investigation of a proper method for fusing multiple multi-view protein sequential features for subcellular location predictions. Instead of serial strategy, we propose a novel parallel framework for fusing multiple membrane protein multi-view attributes that will represent protein samples in complex spaces. We also proposed generalized principle component analysis (GPCA) for feature reduction purpose in the complex geometry. All the experimental results through different machine learning algorithms on benchmark membrane protein subcellular localization datasets demonstrate that the newly proposed parallel strategy outperforms the traditional serial approach. We also demonstrate the efficacy of the parallel strategy on a soluble protein subcellular localization dataset indicating the parallel technique is flexible to suite for other computational biology problems. The software and datasets are available at: http://www.csbio.sjtu.edu.cn/bioinf/mpsp.
Hybrid simulations of radial transport driven by the Rayleigh-Taylor instability

NASA Astrophysics Data System (ADS)

Delamere, P. A.; Stauffer, B. H.; Ma, X.

2017-12-01

Plasma transport in the rapidly rotating giant magnetospheres is thought to involve a centrifugally-driven flux tube interchange instability, similar to the Rayleigh-Taylor (RT) instability. In three dimensions, the convective flow patterns associated with the RT instability can produce strong guide field reconnection, allowing plasma mass to move radially outward while conserving magnetic flux (Ma et al., 2016). We present a set of hybrid (kinetic ion / fluid electron) plasma simulations of the RT instability using high plasma beta conditions appropriate for Jupiter's inner and middle magnetosphere. A density gradient, combined with a centrifugal force, provide appropriate RT onset conditions. Pressure balance is achieved by initializing two ion populations: one with fixed temperature, but varying density, and the other with fixed density, but a temperature gradient that offsets the density gradient from the first population and the centrifugal force (effective gravity). We first analyze two-dimensional results for the plane perpendicular to the magnetic field by comparing growth rates as a function of wave vector following Huba et al. (1998). Prescribed perpendicular wave modes are seeded with an initial velocity perturbation. We then extend the model to three dimensions, introducing a stabilizing parallel wave vector. Boundary conditions in the parallel direction prohibit motion of the magnetic field line footprints to model the eigenmodes of the magnetodisc's resonant cavity. We again compare growth rates based on perpendicular wave number, but also on the parallel extent of the resonant cavity, which fixes the size of the largest parallel wavelength. Finally, we search for evidence of strong guide field magnetic reconnection within the domain by identifying areas with large parallel electric fields or changes in magnetic field topology.
Characterizing parallel file-access patterns on a large-scale multiprocessor

NASA Technical Reports Server (NTRS)

Purakayastha, Apratim; Ellis, Carla Schlatter; Kotz, David; Nieuwejaar, Nils; Best, Michael

1994-01-01

Rapid increases in the computational speeds of multiprocessors have not been matched by corresponding performance enhancements in the I/O subsystem. To satisfy the large and growing I/O requirements of some parallel scientific applications, we need parallel file systems that can provide high-bandwidth and high-volume data transfer between the I/O subsystem and thousands of processors. Design of such high-performance parallel file systems depends on a thorough grasp of the expected workload. So far there have been no comprehensive usage studies of multiprocessor file systems. Our CHARISMA project intends to fill this void. The first results from our study involve an iPSC/860 at NASA Ames. This paper presents results from a different platform, the CM-5 at the National Center for Supercomputing Applications. The CHARISMA studies are unique because we collect information about every individual read and write request and about the entire mix of applications running on the machines. The results of our trace analysis lead to recommendations for parallel file system design. First the file system should support efficient concurrent access to many files, and I/O requests from many jobs under varying load conditions. Second, it must efficiently manage large files kept open for long periods. Third, it should expect to see small requests predominantly sequential access patterns, application-wide synchronous access, no concurrent file-sharing between jobs appreciable byte and block sharing between processes within jobs, and strong interprocess locality. Finally, the trace data suggest that node-level write caches and collective I/O request interfaces may be useful in certain environments.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bemporad, G.A.; Rubin, H.

This manuscript concerns the onset of thermohaline convection in a solar pond subject to field conditions as well as a small scale laboratory test section simulating the solar pond performance. The onset of thermohaline convection is analyzed in this study by means of a linear stability analysis in which the flow field perturbations are expended in sets of complete orthonormal functions satisfying the boundary conditions of the flow field. The linear stability analysis is first performed with regard to an advanced solar pond (ASP) subject to field conditions in which thermohaline convection develops in planes perpendicular to the unperturbed flowmore » velocity vector. In the laboratory simulator of the ASP the width and depth are of the same order of magnitude. In this case it is found that the side walls delay the onset of convection in planes perpendicular to the unperturbed flow velocity vector. The presence of the side walls may cause the planes parallel to the flow velocity to be the most susceptible to the development on all three spatial variables, are predicted. They may develop in planes parallel or perpendicular to the unperturbed velocity vector according to the value of the Reynolds number of the unperturbed flow and the ratio between the width and depth of the ASP simulator.« less
Pattern Classification Using an Olfactory Model with PCA Feature Selection in Electronic Noses: Study and Application

PubMed Central

Fu, Jun; Huang, Canqin; Xing, Jianguo; Zheng, Junbao

2012-01-01

Biologically-inspired models and algorithms are considered as promising sensor array signal processing methods for electronic noses. Feature selection is one of the most important issues for developing robust pattern recognition models in machine learning. This paper describes an investigation into the classification performance of a bionic olfactory model with the increase of the dimensions of input feature vector (outer factor) as well as its parallel channels (inner factor). The principal component analysis technique was applied for feature selection and dimension reduction. Two data sets of three classes of wine derived from different cultivars and five classes of green tea derived from five different provinces of China were used for experiments. In the former case the results showed that the average correct classification rate increased as more principal components were put in to feature vector. In the latter case the results showed that sufficient parallel channels should be reserved in the model to avoid pattern space crowding. We concluded that 6∼8 channels of the model with principal component feature vector values of at least 90% cumulative variance is adequate for a classification task of 3∼5 pattern classes considering the trade-off between time consumption and classification rate. PMID:22736979
A new orientation relationship between cementite and austenite and coexistence of pseudo-primary and secondary dislocations in the habit plane

NASA Astrophysics Data System (ADS)

Xu, Wen-Sheng; Zhang, Wen-Zheng

2018-01-01

A new orientation relationship (OR) is found between Widmanstätten cementite precipitates and the austenite matrix in a 1.3C-14Mn steel. The associated habit plane (HP) and the dislocations in the HP have been investigated with transmission electron microscopy. The HP is parallel to ? in cementite, and it is parallel to ? in austenite. Three groups of interfacial dislocations are observed in the HP, with limited quantitative experimental data. The line directions, the spacing and the Burgers vectors of two sets of dislocations have been calculated based on a misfit analysis, which combines the CSL/DSC/O-lattice theories, row matching and good matching site (GMS) mappings. The calculated results are in reasonable agreement with the experimental results. The dislocations 'Coarse 1' and 'Fine 1' are in the same direction as the matching rows, i.e. ?. 'Coarse 1' dislocations are secondary dislocations with a Burgers vector of ?, and 'Fine 1' dislocations are pseudo-primary dislocations with a plausible Burgers vector of ?. The reason why the fraction of the new OR is much less than that of the dominant Pitsch OR has been discussed in terms of the degree of matching in the HPs.
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns

DOE PAGES

Carter Edwards, H.; Trott, Christian R.; Sunderland, Daniel

2014-07-22

The manycore revolution can be characterized by increasing thread counts, decreasing memory per thread, and diversity of continually evolving manycore architectures. High performance computing (HPC) applications and libraries must exploit increasingly finer levels of parallelism within their codes to sustain scalability on these devices. We found that a major obstacle to performance portability is the diverse and conflicting set of constraints on memory access patterns across devices. Contemporary portable programming models address manycore parallelism (e.g., OpenMP, OpenACC, OpenCL) but fail to address memory access patterns. The Kokkos C++ library enables applications and domain libraries to achieve performance portability on diversemore » manycore architectures by unifying abstractions for both fine-grain data parallelism and memory access patterns. In this paper we describe Kokkos’ abstractions, summarize its application programmer interface (API), present performance results for unit-test kernels and mini-applications, and outline an incremental strategy for migrating legacy C++ codes to Kokkos. Furthermore, the Kokkos library is under active research and development to incorporate capabilities from new generations of manycore architectures, and to address a growing list of applications and domain libraries.« less
Measurement of large parallel and perpendicular electric fields on electron spatial scales in the terrestrial bow shock.

PubMed

Bale, S D; Mozer, F S

2007-05-18

Large parallel (
The generalized accessibility and spectral gap of lower hybrid waves in tokamaks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Takahashi, Hironori

1994-03-01

The generalized accessibility of lower hybrid waves, primarily in the current drive regime of tokamak plasmas, which may include shifting, either upward or downward, of the parallel refractive index (n{sub {parallel}}), is investigated, based upon a cold plasma dispersion relation and various geometrical constraint (G.C.) relations imposed on the behavior of n{sub {parallel}}. It is shown that n{sub {parallel}} upshifting can be bounded and insufficient to bridge a large spectral gap to cause wave damping, depending upon whether the G.C. relation allows the oblique resonance to occur. The traditional n{sub {parallel}} upshifting mechanism caused by the pitch angle of magneticmore » field lines is shown to lead to contradictions with experimental observations. An upshifting mechanism brought about by the density gradient along field lines is proposed, which is not inconsistent with experimental observations, and provides plausible explanations to some unresolved issues of lower hybrid wave theory, including generation of {open_quote}seed electrons.{close_quote}« less
Bayer image parallel decoding based on GPU

NASA Astrophysics Data System (ADS)

Hu, Rihui; Xu, Zhiyong; Wei, Yuxing; Sun, Shaohua

2012-11-01

In the photoelectrical tracking system, Bayer image is decompressed in traditional method, which is CPU-based. However, it is too slow when the images become large, for example, 2K×2K×16bit. In order to accelerate the Bayer image decoding, this paper introduces a parallel speedup method for NVIDA's Graphics Processor Unit (GPU) which supports CUDA architecture. The decoding procedure can be divided into three parts: the first is serial part, the second is task-parallelism part, and the last is data-parallelism part including inverse quantization, inverse discrete wavelet transform (IDWT) as well as image post-processing part. For reducing the execution time, the task-parallelism part is optimized by OpenMP techniques. The data-parallelism part could advance its efficiency through executing on the GPU as CUDA parallel program. The optimization techniques include instruction optimization, shared memory access optimization, the access memory coalesced optimization and texture memory optimization. In particular, it can significantly speed up the IDWT by rewriting the 2D (Tow-dimensional) serial IDWT into 1D parallel IDWT. Through experimenting with 1K×1K×16bit Bayer image, data-parallelism part is 10 more times faster than CPU-based implementation. Finally, a CPU+GPU heterogeneous decompression system was designed. The experimental result shows that it could achieve 3 to 5 times speed increase compared to the CPU serial method.
The Development of WARP - A Framework for Continuous Energy Monte Carlo Neutron Transport in General 3D Geometries on GPUs

NASA Astrophysics Data System (ADS)

Bergmann, Ryan

Graphics processing units, or GPUs, have gradually increased in computational power from the small, job-specific boards of the early 1990s to the programmable powerhouses of today. Compared to more common central processing units, or CPUs, GPUs have a higher aggregate memory bandwidth, much higher floating-point operations per second (FLOPS), and lower energy consumption per FLOP. Because one of the main obstacles in exascale computing is power consumption, many new supercomputing platforms are gaining much of their computational capacity by incorporating GPUs into their compute nodes. Since CPU-optimized parallel algorithms are not directly portable to GPU architectures (or at least not without losing substantial performance), transport codes need to be rewritten to execute efficiently on GPUs. Unless this is done, reactor simulations cannot take full advantage of these new supercomputers. WARP, which can stand for ``Weaving All the Random Particles,'' is a three-dimensional (3D) continuous energy Monte Carlo neutron transport code developed in this work as to efficiently implement a continuous energy Monte Carlo neutron transport algorithm on a GPU. WARP accelerates Monte Carlo simulations while preserving the benefits of using the Monte Carlo Method, namely, very few physical and geometrical simplifications. WARP is able to calculate multiplication factors, flux tallies, and fission source distributions for time-independent problems, and can run in both criticality or fixed source modes. WARP can transport neutrons in unrestricted arrangements of parallelepipeds, hexagonal prisms, cylinders, and spheres. WARP uses an event-based algorithm, but with some important differences. Moving data is expensive, so WARP uses a remapping vector of pointer/index pairs to direct GPU threads to the data they need to access. The remapping vector is sorted by reaction type after every transport iteration using a high-efficiency parallel radix sort, which serves to keep the reaction types as contiguous as possible and removes completed histories from the transport cycle. The sort reduces the amount of divergence in GPU ``thread blocks,'' keeps the SIMD units as full as possible, and eliminates using memory bandwidth to check if a neutron in the batch has been terminated or not. Using a remapping vector means the data access pattern is irregular, but this is mitigated by using large batch sizes where the GPU can effectively eliminate the high cost of irregular global memory access. WARP modifies the standard unionized energy grid implementation to reduce memory traffic. Instead of storing a matrix of pointers indexed by reaction type and energy, WARP stores three matrices. The first contains cross section values, the second contains pointers to angular distributions, and a third contains pointers to energy distributions. This linked list type of layout increases memory usage, but lowers the number of data loads that are needed to determine a reaction by eliminating a pointer load to find a cross section value. Optimized, high-performance GPU code libraries are also used by WARP wherever possible. The CUDA performance primitives (CUDPP) library is used to perform the parallel reductions, sorts and sums, the CURAND library is used to seed the linear congruential random number generators, and the OptiX ray tracing framework is used for geometry representation. OptiX is a highly-optimized library developed by NVIDIA that automatically builds hierarchical acceleration structures around user-input geometry so only surfaces along a ray line need to be queried in ray tracing. WARP also performs material and cell number queries with OptiX by using a point-in-polygon like algorithm. WARP has shown that GPUs are an effective platform for performing Monte Carlo neutron transport with continuous energy cross sections. Currently, WARP is the most detailed and feature-rich program in existence for performing continuous energy Monte Carlo neutron transport in general 3D geometries on GPUs, but compared to production codes like Serpent and MCNP, WARP has limited capabilities. Despite WARP's lack of features, its novel algorithm implementations show that high performance can be achieved on a GPU despite the inherently divergent program flow and sparse data access patterns. WARP is not ready for everyday nuclear reactor calculations, but is a good platform for further development of GPU-accelerated Monte Carlo neutron transport. In it's current state, it may be a useful tool for multiplication factor searches, i.e. determining reactivity coefficients by perturbing material densities or temperatures, since these types of calculations typically do not require many flux tallies. (Abstract shortened by UMI.)
High Performance Distributed Computing in a Supercomputer Environment: Computational Services and Applications Issues

NASA Technical Reports Server (NTRS)

Kramer, Williams T. C.; Simon, Horst D.

1994-01-01

This tutorial proposes to be a practical guide for the uninitiated to the main topics and themes of high-performance computing (HPC), with particular emphasis to distributed computing. The intent is first to provide some guidance and directions in the rapidly increasing field of scientific computing using both massively parallel and traditional supercomputers. Because of their considerable potential computational power, loosely or tightly coupled clusters of workstations are increasingly considered as a third alternative to both the more conventional supercomputers based on a small number of powerful vector processors, as well as high massively parallel processors. Even though many research issues concerning the effective use of workstation clusters and their integration into a large scale production facility are still unresolved, such clusters are already used for production computing. In this tutorial we will utilize the unique experience made at the NAS facility at NASA Ames Research Center. Over the last five years at NAS massively parallel supercomputers such as the Connection Machines CM-2 and CM-5 from Thinking Machines Corporation and the iPSC/860 (Touchstone Gamma Machine) and Paragon Machines from Intel were used in a production supercomputer center alongside with traditional vector supercomputers such as the Cray Y-MP and C90.
Developing an Application to Increase the Accessibility of Planetary Geologic Maps

NASA Astrophysics Data System (ADS)

Jacobsen, R. E.; Fay, C.

2018-06-01

USGS planetary geologic maps are widely used digital products with text, raster, vector, and temporal data, within a highly standardized design. This tool will augment the user experience by improving accessibility among the various forms of data.
Maximizing Accessibility to Spatially Referenced Digital Data.

ERIC Educational Resources Information Center

Hunt, Li; Joselyn, Mark

1995-01-01

Discusses some widely available spatially referenced datasets, including raster and vector datasets. Strategies for improving accessibility include: acquisition of data in a software-dependent format; reorganization of data into logical geographic units; acquisition of intelligent retrieval software; improving computer hardware; and intelligent…
Performance Evaluation of Remote Memory Access (RMA) Programming on Shared Memory Parallel Computers

NASA Technical Reports Server (NTRS)

Jin, Hao-Qiang; Jost, Gabriele; Biegel, Bryan A. (Technical Monitor)

2002-01-01

The purpose of this study is to evaluate the feasibility of remote memory access (RMA) programming on shared memory parallel computers. We discuss different RMA based implementations of selected CFD application benchmark kernels and compare them to corresponding message passing based codes. For the message-passing implementation we use MPI point-to-point and global communication routines. For the RMA based approach we consider two different libraries supporting this programming model. One is a shared memory parallelization library (SMPlib) developed at NASA Ames, the other is the MPI-2 extensions to the MPI Standard. We give timing comparisons for the different implementation strategies and discuss the performance.
Optimization of sparse matrix-vector multiplication on emerging multicore platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Samuel; Oliker, Leonid; Vuduc, Richard

2007-01-01

We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD dual-core and Intel quad-core designs, the heterogeneous STI Cell, as well as the first scientificmore » study of the highly multithreaded Sun Niagara2. We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural tradeoffs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.« less
Rigorous vector wave propagation for arbitrary flat media

NASA Astrophysics Data System (ADS)

Bos, Steven P.; Haffert, Sebastiaan Y.; Keller, Christoph U.

2017-08-01

Precise modelling of the (off-axis) point spread function (PSF) to identify geometrical and polarization aberrations is important for many optical systems. In order to characterise the PSF of the system in all Stokes parameters, an end-to-end simulation of the system has to be performed in which Maxwell's equations are rigorously solved. We present the first results of a python code that we are developing to perform multiscale end-to-end wave propagation simulations that include all relevant physics. Currently we can handle plane-parallel near- and far-field vector diffraction effects of propagating waves in homogeneous isotropic and anisotropic materials, refraction and reflection of flat parallel surfaces, interference effects in thin films and unpolarized light. We show that the code has a numerical precision on the order of 10-16 for non-absorbing isotropic and anisotropic materials. For absorbing materials the precision is on the order of 10-8. The capabilities of the code are demonstrated by simulating a converging beam reflecting from a flat aluminium mirror at normal incidence.
A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction

DOE PAGES

Kumar, B.; Huang, C. -H.; Sadayappan, P.; ...

1995-01-01

In this article, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to high-performance parallel/vector codes for various architectures. In this article, we present a nonrecursive implementation of Strassen's algorithm for shared memory vector processors such as the Cray Y-MP. A previous implementation of Strassen's algorithm synthesized from tensor product formulas required working storagemore » of size O(7 n ) for multiplying 2 n × 2 n matrices. We present a modified formulation in which the working storage requirement is reduced to O(4 n ). The modified formulation exhibits sufficient parallelism for efficient implementation on a shared memory multiprocessor. Performance results on a Cray Y-MP8/64 are presented.« less
The ASC Sequoia Programming Model

DOE Office of Scientific and Technical Information (OSTI.GOV)

Seager, M

2008-08-06

In the late 1980's and early 1990's, Lawrence Livermore National Laboratory was deeply engrossed in determining the next generation programming model for the Integrated Design Codes (IDC) beyond vectorization for the Cray 1s series of computers. The vector model, developed in mid 1970's first for the CDC 7600 and later extended from stack based vector operation to memory to memory operations for the Cray 1s, lasted approximately 20 years (See Slide 5). The Cray vector era was deemed an extremely long lived era as it allowed vector codes to be developed over time (the Cray 1s were faster in scalarmore » mode than the CDC 7600) with vector unit utilization increasing incrementally over time. The other attributes of the Cray vector era at LLNL were that we developed, supported and maintained the Operating System (LTSS and later NLTSS), communications protocols (LINCS), Compilers (Civic Fortran77 and Model), operating system tools (e.g., batch system, job control scripting, loaders, debuggers, editors, graphics utilities, you name it) and math and highly machine optimized libraries (e.g., SLATEC, and STACKLIB). Although LTSS was adopted by Cray for early system generations, they later developed COS and UNICOS operating systems and environment on their own. In the late 1970s and early 1980s two trends appeared that made the Cray vector programming model (described above including both the hardware and system software aspects) seem potentially dated and slated for major revision. These trends were the appearance of low cost CMOS microprocessors and their attendant, departmental and mini-computers and later workstations and personal computers. With the wide spread adoption of Unix in the early 1980s, it appeared that LLNL (and the other DOE Labs) would be left out of the mainstream of computing without a rapid transition to these 'Killer Micros' and modern OS and tools environments. The other interesting advance in the period is that systems were being developed with multiple 'cores' in them and called Symmetric Multi-Processor or Shared Memory Processor (SMP) systems. The parallel revolution had begun. The Laboratory started a small 'parallel processing project' in 1983 to study the new technology and its application to scientific computing with four people: Tim Axelrod, Pete Eltgroth, Paul Dubois and Mark Seager. Two years later, Eugene Brooks joined the team. This team focused on Unix and 'killer micro' SMPs. Indeed, Eugene Brooks was credited with coming up with the 'Killer Micro' term. After several generations of SMP platforms (e.g., Sequent Balance 8000 with 8 33MHz MC32032s, Allian FX8 with 8 MC68020 and FPGA based Vector Units and finally the BB&N Butterfly with 128 cores), it became apparent to us that the killer micro revolution would indeed take over Crays and that we definitely needed a new programming and systems model. The model developed by Mark Seager and Dale Nielsen focused on both the system aspects (Slide 3) and the code development aspects (Slide 4). Although now succinctly captured in two attached slides, at the time there was tremendous ferment in the research community as to what parallel programming model would emerge, dominate and survive. In addition, we wanted a model that would provide portability between platforms of a single generation but also longevity over multiple--and hopefully--many generations. Only after we developed the 'Livermore Model' and worked it out in considerable detail did it become obvious that what we came up with was the right approach. In a nutshell, the applications programming model of the Livermore Model posited that SMP parallelism would ultimately not scale indefinitely and one would have to bite the bullet and implement MPI parallelism within the Integrated Design Code (IDC). We also had a major emphasis on doing everything in a completely standards based, portable methodology with POSIX/Unix as the target environment. We decided against specialized libraries like STACKLIB for performance, but kept as many general purpose, portable math libraries as were needed by the codes. Third, we assumed that the SMPs in clusters would evolve in time to become more powerful, feature rich and, in particular, offer more cores. Thus, we focused on OpenMP, and POSIX PThreads for programming SMP parallelism. These code porting efforts were lead by Dale Nielsen, A-Division code group leader, and Randy Christensen, B-Division code group leader. Most of the porting effort revolved removing 'Crayisms' in the codes: artifacts of LTSS/NLTSS, Civic compiler extensions beyond Fortran77, IO libraries and dealing with new code control languages (we switched to Perl and later to Python). Adding MPI to the codes was initially problematic and error prone because the programmers used MPI directly and sprinkled the calls throughout the code.« less

Electromagnetic fluctuation spectra of collective oscillations in magnetized Maxwellian plasmas for parallel wave vectors

NASA Astrophysics Data System (ADS)

Vafin, S.; Schlickeiser, R.; Yoon, P. H.

2016-05-01

The general electromagnetic fluctuation theory for magnetized plasmas is used to calculate the steady-state wave number spectra and total electromagnetic field strength of low-frequency collective weakly damped eigenmodes with parallel wavevectors in a Maxwellian electron-proton plasma. These result from the equilibrium of spontaneous emission and collisionless damping, and they represent the minimum electromagnetic fluctuations guaranteed in quiet thermal space plasmas, including the interstellar and interplanetary medium. Depending on the plasma beta, the ratio of |δB |/B0 can be as high as 10-12 .
Understanding the Cray X1 System

NASA Technical Reports Server (NTRS)

Cheung, Samson

2004-01-01

This paper helps the reader understand the characteristics of the Cray X1 vector supercomputer system, and provides hints and information to enable the reader to port codes to the system. It provides a comparison between the basic performance of the X1 platform and other platforms that are available at NASA Ames Research Center. A set of codes, solving the Laplacian equation with different parallel paradigms, is used to understand some features of the X1 compiler. An example code from the NAS Parallel Benchmarks is used to demonstrate performance optimization on the X1 platform.
Memory Retrieval Given Two Independent Cues: Cue Selection or Parallel Access?

ERIC Educational Resources Information Center

Rickard, Timothy C.; Bajic, Daniel

2004-01-01

A basic but unresolved issue in the study of memory retrieval is whether multiple independent cues can be used concurrently (i.e., in parallel) to recall a single, common response. A number of empirical results, as well as potentially applicable theories, suggest that retrieval can proceed in parallel, though Rickard (1997) set forth a model that…
A portable approach for PIC on emerging architectures

NASA Astrophysics Data System (ADS)

Decyk, Viktor

2016-03-01

A portable approach for designing Particle-in-Cell (PIC) algorithms on emerging exascale computers, is based on the recognition that 3 distinct programming paradigms are needed. They are: low level vector (SIMD) processing, middle level shared memory parallel programing, and high level distributed memory programming. In addition, there is a memory hierarchy associated with each level. Such algorithms can be initially developed using vectorizing compilers, OpenMP, and MPI. This is the approach recommended by Intel for the Phi processor. These algorithms can then be translated and possibly specialized to other programming models and languages, as needed. For example, the vector processing and shared memory programming might be done with CUDA instead of vectorizing compilers and OpenMP, but generally the algorithm itself is not greatly changed. The UCLA PICKSC web site at http://www.idre.ucla.edu/ contains example open source skeleton codes (mini-apps) illustrating each of these three programming models, individually and in combination. Fortran2003 now supports abstract data types, and design patterns can be used to support a variety of implementations within the same code base. Fortran2003 also supports interoperability with C so that implementations in C languages are also easy to use. Finally, main codes can be translated into dynamic environments such as Python, while still taking advantage of high performing compiled languages. Parallel languages are still evolving with interesting developments in co-Array Fortran, UPC, and OpenACC, among others, and these can also be supported within the same software architecture. Work supported by NSF and DOE Grants.
Gravitropism in Arabidopsis thaliana: violation of the sine- and resultant-law

NASA Astrophysics Data System (ADS)

Galland, Paul

We investigated the gravitropic bending of hypocotyls and roots of seedlings of Arabidopsis tha-liana in response to long-term centrifugal accelerations in a range of 5 x 10-3 to 4 x g. The so-cal-led resultant law of gravitropism, a corollary of the so called sine law, claims that during centri-fugation a gravitropic organ aligns itself parallel to the resultant stimulus vector. We show here that neither of the two empirical “laws” is apt to describe the complex gravitropic behaviour of seedlings of Arabidopsis. Hypocotyls obey reasonably well the resultant law while roots display a complex behaviour that is clearly at variance with it. Horizontally centrifuged seedlings sense minute accelerations acting parallel to the longitudinal axis. If the centrifugal vector points to-ward the cotyledons, then the bending of hypocotyls and roots is greatly enhanced. If the centri-fugal vector points, however, toward the root tip, then only the bending of roots is enhanced by accelerations as low as 5 x 10-3 x g (positive tonic effect). The absolute gravitropic thresholds were determined for hypocotyls and roots in a clinostat-centrifuge and found to be near 1.5 x 10-2 x g. A behavioural mutant, ehb1-2 (Knauer et al. 2011), displays a lower gravitropic threshold for roots, not however, for hypocotyls. The complex gravitropic behaviour of seedlings of Arabi-dopsis is at odds with the classical sine- as well as the resultant law and can indicates the eminent role that is played by the acceleration vector operating longitudinally to the seedling axis.
Development and Evaluation of Vectorised and Multi-Core Event Reconstruction Algorithms within the CMS Software Framework

NASA Astrophysics Data System (ADS)

Hauth, T.; Innocente and, V.; Piparo, D.

2012-12-01

The processing of data acquired by the CMS detector at LHC is carried out with an object-oriented C++ software framework: CMSSW. With the increasing luminosity delivered by the LHC, the treatment of recorded data requires extraordinary large computing resources, also in terms of CPU usage. A possible solution to cope with this task is the exploitation of the features offered by the latest microprocessor architectures. Modern CPUs present several vector units, the capacity of which is growing steadily with the introduction of new processor generations. Moreover, an increasing number of cores per die is offered by the main vendors, even on consumer hardware. Most recent C++ compilers provide facilities to take advantage of such innovations, either by explicit statements in the programs sources or automatically adapting the generated machine instructions to the available hardware, without the need of modifying the existing code base. Programming techniques to implement reconstruction algorithms and optimised data structures are presented, that aim to scalable vectorization and parallelization of the calculations. One of their features is the usage of new language features of the C++11 standard. Portions of the CMSSW framework are illustrated which have been found to be especially profitable for the application of vectorization and multi-threading techniques. Specific utility components have been developed to help vectorization and parallelization. They can easily become part of a larger common library. To conclude, careful measurements are described, which show the execution speedups achieved via vectorised and multi-threaded code in the context of CMSSW.
VectorBase: an updated bioinformatics resource for invertebrate vectors and other organisms related with human diseases

PubMed Central

Giraldo-Calderón, Gloria I.; Emrich, Scott J.; MacCallum, Robert M.; Maslen, Gareth; Dialynas, Emmanuel; Topalis, Pantelis; Ho, Nicholas; Gesing, Sandra; Madey, Gregory; Collins, Frank H.; Lawson, Daniel

2015-01-01

VectorBase is a National Institute of Allergy and Infectious Diseases supported Bioinformatics Resource Center (BRC) for invertebrate vectors of human pathogens. Now in its 11th year, VectorBase currently hosts the genomes of 35 organisms including a number of non-vectors for comparative analysis. Hosted data range from genome assemblies with annotated gene features, transcript and protein expression data to population genetics including variation and insecticide-resistance phenotypes. Here we describe improvements to our resource and the set of tools available for interrogating and accessing BRC data including the integration of Web Apollo to facilitate community annotation and providing Galaxy to support user-based workflows. VectorBase also actively supports our community through hands-on workshops and online tutorials. All information and data are freely available from our website at https://www.vectorbase.org/. PMID:25510499
UPC++ Programmer’s Guide (v1.0 2017.9)

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bachan, J.; Baden, S.; Bonachea, D.

UPC++ is a C++11 library that provides Asynchronous Partitioned Global Address Space (APGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The APGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, APGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, allmore » operations that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.« less
UPC++ Programmer’s Guide, v1.0-2018.3.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bachan, J.; Baden, S.; Bonachea, Dan

UPC++ is a C++11 library that provides Partitioned Global Address Space (PGAS) programming. It is designed for writing parallel programs that run efficiently and scale well on distributed-memory parallel computers. The PGAS model is single program, multiple-data (SPMD), with each separate thread of execution (referred to as a rank, a term borrowed from MPI) having access to local memory as it would in C++. However, PGAS also provides access to a global address space, which is allocated in shared segments that are distributed over the ranks. UPC++ provides numerous methods for accessing and using global memory. In UPC++, all operationsmore » that access remote memory are explicit, which encourages programmers to be aware of the cost of communication and data movement. Moreover, all remote-memory access operations are by default asynchronous, to enable programmers to write code that scales well even on hundreds of thousands of cores.« less
Dynamic file-access characteristics of a production parallel scientific workload

NASA Technical Reports Server (NTRS)

Kotz, David; Nieuwejaar, Nils

1994-01-01

Multiprocessors have permitted astounding increases in computational performance, but many cannot meet the intense I/O requirements of some scientific applications. An important component of any solution to this I/O bottleneck is a parallel file system that can provide high-bandwidth access to tremendous amounts of data in parallel to hundreds or thousands of processors. Most successful systems are based on a solid understanding of the expected workload, but thus far there have been no comprehensive workload characterizations of multiprocessor file systems. This paper presents the results of a three week tracing study in which all file-related activity on a massively parallel computer was recorded. Our instrumentation differs from previous efforts in that it collects information about every I/O request and about the mix of jobs running in a production environment. We also present the results of a trace-driven caching simulation and recommendations for designers of multiprocessor file systems.
Fencing network direct memory access data transfers in a parallel active messaging interface of a parallel computer

DOEpatents

Blocksome, Michael A.; Mamidala, Amith R.

2015-07-07

Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
Fencing network direct memory access data transfers in a parallel active messaging interface of a parallel computer

DOEpatents

Blocksome, Michael A.; Mamidala, Amith R.

2015-07-14

Fencing direct memory access (`DMA`) data transfers in a parallel active messaging interface (`PAMI`) of a parallel computer, the PAMI including data communications endpoints, each endpoint including specifications of a client, a context, and a task, the endpoints coupled for data communications through the PAMI and through DMA controllers operatively coupled to a deterministic data communications network through which the DMA controllers deliver data communications deterministically, including initiating execution through the PAMI of an ordered sequence of active DMA instructions for DMA data transfers between two endpoints, effecting deterministic DMA data transfers through a DMA controller and the deterministic data communications network; and executing through the PAMI, with no FENCE accounting for DMA data transfers, an active FENCE instruction, the FENCE instruction completing execution only after completion of all DMA instructions initiated prior to execution of the FENCE instruction for DMA data transfers between the two endpoints.
The Skeletal Muscle Environment and Its Role in Immunity and Tolerance to AAV Vector-Mediated Gene Transfer

PubMed Central

Boisgérault, Florence; Mingozzi, Federico

2015-01-01

Since the early days of gene therapy, muscle has been one the most studied tissue targets for the correction of enzyme deficiencies and myopathies. Several preclinical and clinical studies have been conducted using adeno-associated virus (AAV) vectors. Exciting progress has been made in the gene delivery technologies, from the identification of novel AAV serotypes to the development of novel vector delivery techniques. In parallel, significant knowledge has been generated on the host immune system and its interaction with both the vector and the transgene at the muscle level. In particular, the role of underlying muscle inflammation, characteristic of several diseases affecting the muscle, has been defined in terms of its potential detrimental impact on gene transfer with AAV vectors. At the same time, feedback immunomodulatory mechanisms peculiar of skeletal muscle involving resident regulatory T cells have been identified, which seem to play an important role in maintaining, at least to some extent, muscle homeostasis during inflammation and regenerative processes. Devising strategies to tip this balance towards unresponsiveness may represent an avenue to improve the safety and efficacy of muscle gene transfer with AAV vectors. PMID:26122097
Access to CAMAC from VxWorks and UNIX in DART

NASA Astrophysics Data System (ADS)

Streets, J.; Meadows, J.; Moore, C.; Pordes, R.; Slimmer, D.; Vittone, M.; Stern, E.

1996-02-01

As part of the DART Project [Data acquisition for the next Generation Fermilab Fixed Target Experiments] we have developed a package of software for CAMAC access from UNIX and VxWorks platforms, with support for several hardware interfaces. We report on developments for the CES CBD8210 VME to parallel CAMAC, the Hytec VSD2992 VME to serial CAMAC and Jorway 411s SCSI to parallel and serial CAMAC branch drivers, and give a summary of the timings obtained.
Inhalation of Nebulized Perfluorochemical Enhances Recombinant Adenovirus and Adeno-Associated Virus-Mediated Gene Expression in Lung Epithelium

PubMed Central

Beckett, Travis; Bonneau, Laura; Howard, Alan; Blanchard, James; Borda, Juan; Weiner, Daniel J.; Wang, Lili; Gao, Guang Ping; Kolls, Jay K.; Bohm, Rudolf; Liggitt, Denny

2012-01-01

Abstract Use of perfluorochemical liquids during intratracheal vector administration enhances recombinant adenovirus and adeno-associated virus (AAV)-mediated lung epithelial gene expression. We hypothesized that inhalation of nebulized perfluorochemical vapor would also enhance epithelial gene expression after subsequent intratracheal vector administration. Freely breathing adult C57BL/6 mice were exposed for selected times to nebulized perflubron or sterile saline in a sealed Plexiglas chamber. Recombinant adenoviral vector was administered by transtracheal puncture at selected times afterward and mice were killed 3 days after vector administration to assess transgene expression. Mice tolerated the nebulized perflubron without obvious ill effects. Vector administration 6 hr after nebulized perflubron exposure resulted in an average 540% increase in gene expression in airway and alveolar epithelium, compared with that with vector alone or saline plus vector control (p<0.05). However, vector administration 1 hr, 1 day, or 3 days after perflubron exposure was not different from either nebulized saline with vector or vector alone and a 60-min exposure to nebulized perflubron is required. In parallel pilot studies in macaques, inhalation of nebulized perflubron enhanced recombinant AAV2/5 vector expression throughout the lung. Serial chest radiographs, bronchoalveolar lavages, and results of complete blood counts and serum biochemistries demonstrated no obvious adverse effects of nebulized perflubron. Further, one macaque receiving nebulized perflubron only was monitored for 1 year with no obvious adverse effects of exposure. These results demonstrate that inhalation of nebulized perflubron, a simple, clinically more feasible technique than intratracheal administration of liquid perflubron, safely enhances lung gene expression. PMID:22568624
FoSSI: the family of simplified solver interfaces for the rapid development of parallel numerical atmosphere and ocean models

NASA Astrophysics Data System (ADS)

Frickenhaus, Stephan; Hiller, Wolfgang; Best, Meike

The portable software FoSSI is introduced that—in combination with additional free solver software packages—allows for an efficient and scalable parallel solution of large sparse linear equations systems arising in finite element model codes. FoSSI is intended to support rapid model code development, completely hiding the complexity of the underlying solver packages. In particular, the model developer need not be an expert in parallelization and is yet free to switch between different solver packages by simple modifications of the interface call. FoSSI offers an efficient and easy, yet flexible interface to several parallel solvers, most of them available on the web, such as PETSC, AZTEC, MUMPS, PILUT and HYPRE. FoSSI makes use of the concept of handles for vectors, matrices, preconditioners and solvers, that is frequently used in solver libraries. Hence, FoSSI allows for a flexible treatment of several linear equations systems and associated preconditioners at the same time, even in parallel on separate MPI-communicators. The second special feature in FoSSI is the task specifier, being a combination of keywords, each configuring a certain phase in the solver setup. This enables the user to control a solver over one unique subroutine. Furthermore, FoSSI has rather similar features for all solvers, making a fast solver intercomparison or exchange an easy task. FoSSI is a community software, proven in an adaptive 2D-atmosphere model and a 3D-primitive equation ocean model, both formulated in finite elements. The present paper discusses perspectives of an OpenMP-implementation of parallel iterative solvers based on domain decomposition methods. This approach to OpenMP solvers is rather attractive, as the code for domain-local operations of factorization, preconditioning and matrix-vector product can be readily taken from a sequential implementation that is also suitable to be used in an MPI-variant. Code development in this direction is in an advanced state under the name ScOPES: the Scalable Open Parallel sparse linear Equations Solver.
A novel double fine guide sensor design on space telescope

NASA Astrophysics Data System (ADS)

Zhang, Xu-xu; Yin, Da-yi

2018-02-01

To get high precision attitude for space telescope, a double marginal FOV (field of view) FGS (Fine Guide Sensor) is proposed. It is composed of two large area APS CMOS sensors and both share the same lens in main light of sight. More star vectors can be get by two FGS and be used for high precision attitude determination. To improve star identification speed, the vector cross product in inter-star angles for small marginal FOV different from traditional way is elaborated and parallel processing method is applied to pyramid algorithm. The star vectors from two sensors are then used to attitude fusion with traditional QUEST algorithm. The simulation results show that the system can get high accuracy three axis attitudes and the scheme is feasibility.
A Performance Evaluation of the Cray X1 for Scientific Applications

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Biswas, Rupak; Borrill, Julian; Canning, Andrew; Carter, Jonathan; Djomehri, M. Jahed; Shan, Hongzhang; Skinner, David

2003-01-01

The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end capability and capacity computers because of their generality, scalability, and cost effectiveness. However, the recent development of massively parallel vector systems is having a significant effect on the supercomputing landscape. In this paper, we compare the performance of the recently-released Cray X1 vector system with that of the cacheless NEC SX-6 vector machine, and the superscalar cache-based IBM Power3 and Power4 architectures for scientific applications. Overall results demonstrate that the X1 is quite promising, but performance improvements are expected as the hardware, systems software, and numerical libraries mature. Code reengineering to effectively utilize the complex architecture may also lead to significant efficiency enhancements.
Plant Virus–Insect Vector Interactions: Current and Potential Future Research Directions

PubMed Central

Dietzgen, Ralf G.; Mann, Krin S.; Johnson, Karyn N.

2016-01-01

Acquisition and transmission by an insect vector is central to the infection cycle of the majority of plant pathogenic viruses. Plant viruses can interact with their insect host in a variety of ways including both non-persistent and circulative transmission; in some cases, the latter involves virus replication in cells of the insect host. Replicating viruses can also elicit both innate and specific defense responses in the insect host. A consistent feature is that the interaction of the virus with its insect host/vector requires specific molecular interactions between virus and host, commonly via proteins. Understanding the interactions between plant viruses and their insect host can underpin approaches to protect plants from infection by interfering with virus uptake and transmission. Here, we provide a perspective focused on identifying novel approaches and research directions to facilitate control of plant viruses by better understanding and targeting virus–insect molecular interactions. We also draw parallels with molecular interactions in insect vectors of animal viruses, and consider technical advances for their control that may be more broadly applicable to plant virus vectors. PMID:27834855
Plant Virus-Insect Vector Interactions: Current and Potential Future Research Directions.

PubMed

Dietzgen, Ralf G; Mann, Krin S; Johnson, Karyn N

2016-11-09

Acquisition and transmission by an insect vector is central to the infection cycle of the majority of plant pathogenic viruses. Plant viruses can interact with their insect host in a variety of ways including both non-persistent and circulative transmission; in some cases, the latter involves virus replication in cells of the insect host. Replicating viruses can also elicit both innate and specific defense responses in the insect host. A consistent feature is that the interaction of the virus with its insect host/vector requires specific molecular interactions between virus and host, commonly via proteins. Understanding the interactions between plant viruses and their insect host can underpin approaches to protect plants from infection by interfering with virus uptake and transmission. Here, we provide a perspective focused on identifying novel approaches and research directions to facilitate control of plant viruses by better understanding and targeting virus-insect molecular interactions. We also draw parallels with molecular interactions in insect vectors of animal viruses, and consider technical advances for their control that may be more broadly applicable to plant virus vectors.

An investigation for the development of an integrated optical data preprocessor. [preprocessing remote sensor outputs

NASA Technical Reports Server (NTRS)

Verber, C. M.; Kenan, R. P.; Hartman, N. F.; Chapman, C. M.

1980-01-01

A laboratory model of a 16 channel integrated optical data preprocessor was fabricated and tested in response to a need for a device to evaluate the outputs of a set of remote sensors. It does this by accepting the outputs of these sensors, in parallel, as the components of a multidimensional vector descriptive of the data and comparing this vector to one or more reference vectors which are used to classify the data set. The comparison is performed by taking the difference between the signal and reference vectors. The preprocessor is wholly integrated upon the surface of a LiNbO3 single crystal with the exceptions of the source and the detector. He-Ne laser light is coupled in and out of the waveguide by prism couplers. The integrated optical circuit consists of a titanium infused waveguide pattern, electrode structures and grating beam splitters. The waveguide and electrode patterns, by virtue of their complexity, make the vector subtraction device the most complex integrated optical structure fabricated to date.
Vectorcardiographic diagnostic & prognostic information derived from the 12-lead electrocardiogram: Historical review and clinical perspective.

PubMed

Man, Sumche; Maan, Arie C; Schalij, Martin J; Swenne, Cees A

2015-01-01

In the course of time, electrocardiography has assumed several modalities with varying electrode numbers, electrode positions and lead systems. 12-lead electrocardiography and 3-lead vectorcardiography have become particularly popular. These modalities developed in parallel through the mid-twentieth century. In the same time interval, the physical concepts underlying electrocardiography were defined and worked out. In particular, the vector concept (heart vector, lead vector, volume conductor) appeared to be essential to understanding the manifestations of electrical heart activity, both in the 12-lead electrocardiogram (ECG) and in the 3-lead vectorcardiogram (VCG). Not universally appreciated in the clinic, the vectorcardiogram, and with it the vector concept, went out of use. A revival of vectorcardiography started in the 90's, when VCGs were mathematically synthesized from standard 12-lead ECGs. This facilitated combined electrocardiography and vectorcardiography without the need for a special recording system. This paper gives an overview of these historical developments, elaborates on the vector concept and seeks to define where VCG analysis/interpretation can add diagnostic/prognostic value to conventional 12-lead ECG analysis. Copyright © 2015 Elsevier Inc. All rights reserved.
A synthetic system for expression of components of a bacterial microcompartment.

PubMed

Sargent, Frank; Davidson, Fordyce A; Kelly, Ciarán L; Binny, Rachelle; Christodoulides, Natasha; Gibson, David; Johansson, Emelie; Kozyrska, Katarzyna; Lado, Lucia Licandro; Maccallum, Jane; Montague, Rachel; Ortmann, Brian; Owen, Richard; Coulthurst, Sarah J; Dupuy, Lionel; Prescott, Alan R; Palmer, Tracy

2013-11-01

In general, prokaryotes are considered to be single-celled organisms that lack internal membrane-bound organelles. However, many bacteria produce proteinaceous microcompartments that serve a similar purpose, i.e. to concentrate specific enzymic reactions together or to shield the wider cytoplasm from toxic metabolic intermediates. In this paper, a synthetic operon encoding the key structural components of a microcompartment was designed based on the genes for the Salmonella propanediol utilization (Pdu) microcompartment. The genes chosen included pduA, -B, -J, -K, -N, -T and -U, and each was shown to produce protein in an Escherichia coli chassis. In parallel, a set of compatible vectors designed to express non-native cargo proteins was also designed and tested. Engineered hexa-His tags allowed isolation of the components of the microcompartments together with co-expressed, untagged, cargo proteins. Finally, an in vivo protease accessibility assay suggested that a PduD-GFP fusion could be protected from proteolysis when co-expressed with the synthetic microcompartment operon. This work gives encouragement that it may be possible to harness the genes encoding a non-native microcompartment for future biotechnological applications.
Nuclear Energy Gradients for Internally Contracted Complete Active Space Second-Order Perturbation Theory: Multistate Extensions.

PubMed

Vlaisavljevich, Bess; Shiozaki, Toru

2016-08-09

We report the development of the theory and computer program for analytical nuclear energy gradients for (extended) multistate complete active space perturbation theory (CASPT2) with full internal contraction. The vertical shifts are also considered in this work. This is an extension of the fully internally contracted CASPT2 nuclear gradient program recently developed for a state-specific variant by us [MacLeod and Shiozaki, J. Chem. Phys. 2015, 142, 051103]; in this extension, the so-called λ equation is solved to account for the variation of the multistate CASPT2 energies with respect to the change in the amplitudes obtained in the preceding state-specific CASPT2 calculations, and the Z vector equations are modified accordingly. The program is parallelized using the MPI3 remote memory access protocol that allows us to perform efficient one-sided communication. The optimized geometries of the ground and excited states of a copper corrole and benzophenone are presented as numerical examples. The code is publicly available under the GNU General Public License.
On nonlinear finite element analysis in single-, multi- and parallel-processors

NASA Technical Reports Server (NTRS)

Utku, S.; Melosh, R.; Islam, M.; Salama, M.

1982-01-01

Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.
Domain decomposition methods in aerodynamics

NASA Technical Reports Server (NTRS)

Venkatakrishnan, V.; Saltz, Joel

1990-01-01

Compressible Euler equations are solved for two-dimensional problems by a preconditioned conjugate gradient-like technique. An approximate Riemann solver is used to compute the numerical fluxes to second order accuracy in space. Two ways to achieve parallelism are tested, one which makes use of parallelism inherent in triangular solves and the other which employs domain decomposition techniques. The vectorization/parallelism in triangular solves is realized by the use of a recording technique called wavefront ordering. This process involves the interpretation of the triangular matrix as a directed graph and the analysis of the data dependencies. It is noted that the factorization can also be done in parallel with the wave front ordering. The performances of two ways of partitioning the domain, strips and slabs, are compared. Results on Cray YMP are reported for an inviscid transonic test case. The performances of linear algebra kernels are also reported.
Essential issues in multiprocessor systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gajski, D.D.; Peir, J.K.

1985-06-01

During the past several years, a great number of proposals have been made with the objective to increase supercomputer performance by an order of magnitude on the basis of a utilization of new computer architectures. The present paper is concerned with a suitable classification scheme for comparing these architectures. It is pointed out that there are basically four schools of thought as to the most important factor for an enhancement of computer performance. According to one school, the development of faster circuits will make it possible to retain present architectures, except, possibly, for a mechanism providing synchronization of parallel processes.more » A second school assigns priority to the optimization and vectorization of compilers, which will detect parallelism and help users to write better parallel programs. A third school believes in the predominant importance of new parallel algorithms, while the fourth school supports new models of computation. The merits of the four approaches are critically evaluated. 50 references.« less
Parallel algorithms for modeling flow in permeable media. Annual report, February 15, 1995 - February 14, 1996

DOE Office of Scientific and Technical Information (OSTI.GOV)

G.A. Pope; K. Sephernoori; D.C. McKinney

1996-03-15

This report describes the application of distributed-memory parallel programming techniques to a compositional simulator called UTCHEM. The University of Texas Chemical Flooding reservoir simulator (UTCHEM) is a general-purpose vectorized chemical flooding simulator that models the transport of chemical species in three-dimensional, multiphase flow through permeable media. The parallel version of UTCHEM addresses solving large-scale problems by reducing the amount of time that is required to obtain the solution as well as providing a flexible and portable programming environment. In this work, the original parallel version of UTCHEM was modified and ported to CRAY T3D and CRAY T3E, distributed-memory, multiprocessor computersmore » using CRAY-PVM as the interprocessor communication library. Also, the data communication routines were modified such that the portability of the original code across different computer architectures was mad possible.« less
An implementation of a tree code on a SIMD, parallel computer

NASA Technical Reports Server (NTRS)

Olson, Kevin M.; Dorband, John E.

1994-01-01

We describe a fast tree algorithm for gravitational N-body simulation on SIMD parallel computers. The tree construction uses fast, parallel sorts. The sorted lists are recursively divided along their x, y and z coordinates. This data structure is a completely balanced tree (i.e., each particle is paired with exactly one other particle) and maintains good spatial locality. An implementation of this tree-building algorithm on a 16k processor Maspar MP-1 performs well and constitutes only a small fraction (approximately 15%) of the entire cycle of finding the accelerations. Each node in the tree is treated as a monopole. The tree search and the summation of accelerations also perform well. During the tree search, node data that is needed from another processor is simply fetched. Roughly 55% of the tree search time is spent in communications between processors. We apply the code to two problems of astrophysical interest. The first is a simulation of the close passage of two gravitationally, interacting, disk galaxies using 65,636 particles. We also simulate the formation of structure in an expanding, model universe using 1,048,576 particles. Our code attains speeds comparable to one head of a Cray Y-MP, so single instruction, multiple data (SIMD) type computers can be used for these simulations. The cost/performance ratio for SIMD machines like the Maspar MP-1 make them an extremely attractive alternative to either vector processors or large multiple instruction, multiple data (MIMD) type parallel computers. With further optimizations (e.g., more careful load balancing), speeds in excess of today's vector processing computers should be possible.
User's and test case manual for FEMATS

NASA Technical Reports Server (NTRS)

Chatterjee, Arindam; Volakis, John; Nurnberger, Mike; Natzke, John

1995-01-01

The FEMATS program incorporates first-order edge-based finite elements and vector absorbing boundary conditions into the scattered field formulation for computation of the scattering from three-dimensional geometries. The code has been validated extensively for a large class of geometries containing inhomogeneities and satisfying transition conditions. For geometries that are too large for the workstation environment, the FEMATS code has been optimized to run on various supercomputers. Currently, FEMATS has been configured to run on the HP 9000 workstation, vectorized for the Cray Y-MP, and parallelized to run on the Kendall Square Research (KSR) architecture and the Intel Paragon.
Smisc - A collection of miscellaneous functions

DOE Office of Scientific and Technical Information (OSTI.GOV)

Landon Sego, PNNL

2015-08-31

A collection of functions for statistical computing and data manipulation. These include routines for rapidly aggregating heterogeneous matrices, manipulating file names, loading R objects, sourcing multiple R files, formatting datetimes, multi-core parallel computing, stream editing, specialized plotting, etc. Smisc-package A collection of miscellaneous functions allMissing Identifies missing rows or columns in a data frame or matrix as.numericSilent Silent wrapper for coercing a vector to numeric comboList Produces all possible combinations of a set of linear model predictors cumMax Computes the maximum of the vector up to the current index cumsumNA Computes the cummulative sum of a vector without propogating NAsmore » d2binom Probability functions for the sum of two independent binomials dataIn A flexible way to import data into R. dbb The Beta-Binomial Distribution df2list Row-wise conversion of a data frame to a list dfplapply Parallelized single row processing of a data frame dframeEquiv Examines the equivalence of two dataframes or matrices dkbinom Probability functions for the sum of k independent binomials factor2character Converts all factor variables in a dataframe to character variables findDepMat Identify linearly dependent rows or columns in a matrix formatDT Converts date or datetime strings into alternate formats getExtension Filename manipulations: remove the extension or path, extract the extension or path getPath Filename manipulations: remove the extension or path, extract the extension or path grabLast Filename manipulations: remove the extension or path, extract the extension or path ifelse1 Non-vectorized version of ifelse integ Simple numerical integration routine interactionPlot Two-way Interaction Plot with Error Bar linearMap Linear mapping of a numerical vector or scalar list2df Convert a list to a data frame loadObject Loads and returns the object(s) in an ".Rdata" file more Display the contents of a file to the R terminal movAvg2 Calculate the moving average using a 2-sided window openDevice Opens a graphics device based on the filename extension p2binom Probability functions for the sum of two independent binomials padZero Pad a vector of numbers with zeros parseJob Parses a collection of elements into (almost) equal sized groups pbb The Beta-Binomial Distribution pcbinom A continuous version of the binomial cdf pkbinom Probability functions for the sum of k independent binomials plapply Simple parallelization of lapply plotFun Plot one or more functions on a single plot PowerData An example of power data pvar Prints the name and value of one or more objects qbb The Beta-Binomial Distribution rbb And numerous others (space limits reporting).« less
IQ imbalance tolerable parallel-channel DMT transmission for coherent optical OFDMA access network

NASA Astrophysics Data System (ADS)

Jung, Sang-Min; Mun, Kyoung-Hak; Jung, Sun-Young; Han, Sang-Kook

2016-12-01

Phase diversity of coherent optical communication provides spectrally efficient higher-order modulation for optical communications. However, in-phase/quadrature (IQ) imbalance in coherent optical communication degrades transmission performance by introducing unwanted signal distortions. In a coherent optical orthogonal frequency division multiple access (OFDMA) passive optical network (PON), IQ imbalance-induced signal distortions degrade transmission performance by interferences of mirror subcarriers, inter-symbol interference (ISI), and inter-channel interference (ICI). We propose parallel-channel discrete multitone (DMT) transmission to mitigate transceiver IQ imbalance-induced signal distortions in coherent orthogonal frequency division multiplexing (OFDM) transmissions. We experimentally demonstrate the effectiveness of parallel-channel DMT transmission compared with that of OFDM transmission in the presence of IQ imbalance.
Accessing and Visualizing scientific spatiotemporal data

NASA Technical Reports Server (NTRS)

Katz, Daniel S.; Bergou, Attila; Berriman, Bruce G.; Block, Gary L.; Collier, Jim; Curkendall, David W.; Good, John; Husman, Laura; Jacob, Joseph C.; Laity, Anastasia;

2004-01-01

This paper discusses work done by JPL 's Parallel Applications Technologies Group in helping scientists access and visualize very large data sets through the use of multiple computing resources, such as parallel supercomputers, clusters, and grids These tools do one or more of the following tasks visualize local data sets for local users, visualize local data sets for remote users, and access and visualize remote data sets The tools are used for various types of data, including remotely sensed image data, digital elevation models, astronomical surveys, etc The paper attempts to pull some common elements out of these tools that may be useful for others who have to work with similarly large data sets.

Access to CAMAC from VxWorks and UNIX in DART

DOE Office of Scientific and Technical Information (OSTI.GOV)

Streets, J.; Meadows, J.; Moore, C.

1996-02-01

All High Energy Physics experiments at Fermilab include CAMAC modules which need to be read out for each triggered event. There is also a need to access CAMAC modules for control and monitoring of the experiment. As part of the DART Project the authors have developed a package of software for CAMAC access from UNIX and VxWorks platforms, with support for several hardware interfaces. The authors report on developments for the CES CBD8210 VME to parallel CAMAC, the Hytec VSD2992 VME to serial CAMAC and Jorway 411S SCSI to parallel and serial CAMAC branch drivers, and give a summary ofmore » the timings obtained.« less
Integrated molecular and morphological studies of Daucus

USDA-ARS?s Scientific Manuscript database

Ninety-four nuclear orthologs were used to analyze phylogenetic structure in 89 accessions of 13 species and two subspecies of Daucus, and an additional ten accessions of related genera. A near parallel set of accessions were used for morphological analyses of germplasm planted in a common garden in...
Cost Analysis of Public Rights-of-Way Accessibility Guidelines

DOT National Transportation Integrated Search

2010-11-29

Accessible Pedestrian Signals (APS) provide auditory and tactile information about the : pedestrian signal phases (walk and dont walk) at signalized pedestrian crossings. : This information parallels the visual information provided by ...
Cylindrical Vector Beams for Rapid Polarization-Dependent Measurements in Atomic Systems

DTIC Science & Technology

2011-12-05

www.opticsinfobase.org/abstract.cfm?URI=oe-18-24-25035. 16. S. Tripathi and K. C. Toussaint, Jr., “Rapid Mueller matrix polarimetry based on parallelized...optical trapping [11], atom guiding [12], laser machining [13], charged particle acceleration [14,15], and polarimetry [16]. Yet despite numerous
Male motion coordination in swarming Anopheles gambiae and Anopheles coluzzii

USDA-ARS?s Scientific Manuscript database

The Anopheles gambiae species complex comprises the primary vectors of malaria in much of sub-Saharan Africa; most of the mating in these species occurs in swarms composed almost entirely of males. Intermittent, parallel flight patterns in such swarms have been observed, but a detailed description o...
Computational and experimental analysis of TMS-induced electric field vectors critical to neuronal activation

NASA Astrophysics Data System (ADS)

Krieg, Todd D.; Salinas, Felipe S.; Narayana, Shalini; Fox, Peter T.; Mogul, David J.

2015-08-01

Objective. Transcranial magnetic stimulation (TMS) represents a powerful technique to noninvasively modulate cortical neurophysiology in the brain. However, the relationship between the magnetic fields created by TMS coils and neuronal activation in the cortex is still not well-understood, making predictable cortical activation by TMS difficult to achieve. Our goal in this study was to investigate the relationship between induced electric fields and cortical activation measured by blood flow response. Particularly, we sought to discover the E-field characteristics that lead to cortical activation. Approach. Subject-specific finite element models (FEMs) of the head and brain were constructed for each of six subjects using magnetic resonance image scans. Positron emission tomography (PET) measured each subject’s cortical response to image-guided robotically-positioned TMS to the primary motor cortex. FEM models that employed the given coil position, orientation, and stimulus intensity in experimental applications of TMS were used to calculate the electric field (E-field) vectors within a region of interest for each subject. TMS-induced E-fields were analyzed to better understand what vector components led to regional cerebral blood flow (CBF) responses recorded by PET. Main results. This study found that decomposing the E-field into orthogonal vector components based on the cortical surface geometry (and hence, cortical neuron directions) led to significant differences between the regions of cortex that were active and nonactive. Specifically, active regions had significantly higher E-field components in the normal inward direction (i.e., parallel to pyramidal neurons in the dendrite-to-axon orientation) and in the tangential direction (i.e., parallel to interneurons) at high gradient. In contrast, nonactive regions had higher E-field vectors in the outward normal direction suggesting inhibitory responses. Significance. These results provide critical new understanding of the factors by which TMS induces cortical activation necessary for predictive and repeatable use of this noninvasive stimulation modality.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Bailey, David H.

The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, althoughmore » the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage over vector supercomputers, and, if so, which of the parallel offerings would be most useful in real-world scientific computation. In part to draw attention to some of the performance reporting abuses prevalent at the time, the present author wrote a humorous essay 'Twelve Ways to Fool the Masses,' which described in a light-hearted way a number of the questionable ways in which both vendor marketing people and scientists were inflating and distorting their performance results. All of this underscored the need for an objective and scientifically defensible measure to compare performance on these systems.« less

Memory access in shared virtual memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berrendorf, R.

1992-01-01

Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
Memory access in shared virtual memory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Berrendorf, R.

1992-09-01

Shared virtual memory (SVM) is a virtual memory layer with a single address space on top of a distributed real memory on parallel computers. We examine the behavior and performance of SVM running a parallel program with medium-grained, loop-level parallelism on top of it. A simulator for the underlying parallel architecture can be used to examine the behavior of SVM more deeply. The influence of several parameters, such as the number of processors, page size, cold or warm start, and restricted page replication, is studied.
GNAQPMS v1.1: accelerating the Global Nested Air Quality Prediction Modeling System (GNAQPMS) on Intel Xeon Phi processors

NASA Astrophysics Data System (ADS)

Wang, Hui; Chen, Huansheng; Wu, Qizhong; Lin, Junmin; Chen, Xueshun; Xie, Xinwei; Wang, Rongrong; Tang, Xiao; Wang, Zifa

2017-08-01

The Global Nested Air Quality Prediction Modeling System (GNAQPMS) is the global version of the Nested Air Quality Prediction Modeling System (NAQPMS), which is a multi-scale chemical transport model used for air quality forecast and atmospheric environmental research. In this study, we present the porting and optimisation of GNAQPMS on a second-generation Intel Xeon Phi processor, codenamed Knights Landing (KNL). Compared with the first-generation Xeon Phi coprocessor (codenamed Knights Corner, KNC), KNL has many new hardware features such as a bootable processor, high-performance in-package memory and ISA compatibility with Intel Xeon processors. In particular, we describe the five optimisations we applied to the key modules of GNAQPMS, including the CBM-Z gas-phase chemistry, advection, convection and wet deposition modules. These optimisations work well on both the KNL 7250 processor and the Intel Xeon E5-2697 V4 processor. They include (1) updating the pure Message Passing Interface (MPI) parallel mode to the hybrid parallel mode with MPI and OpenMP in the emission, advection, convection and gas-phase chemistry modules; (2) fully employing the 512 bit wide vector processing units (VPUs) on the KNL platform; (3) reducing unnecessary memory access to improve cache efficiency; (4) reducing the thread local storage (TLS) in the CBM-Z gas-phase chemistry module to improve its OpenMP performance; and (5) changing the global communication from writing/reading interface files to MPI functions to improve the performance and the parallel scalability. These optimisations greatly improved the GNAQPMS performance. The same optimisations also work well for the Intel Xeon Broadwell processor, specifically E5-2697 v4. Compared with the baseline version of GNAQPMS, the optimised version was 3.51 × faster on KNL and 2.77 × faster on the CPU. Moreover, the optimised version ran at 26 % lower average power on KNL than on the CPU. With the combined performance and energy improvement, the KNL platform was 37.5 % more efficient on power consumption compared with the CPU platform. The optimisations also enabled much further parallel scalability on both the CPU cluster and the KNL cluster scaled to 40 CPU nodes and 30 KNL nodes, with a parallel efficiency of 70.4 and 42.2 %, respectively.
Vector Beam Polarization State Spectrum Analyzer.

PubMed

Moreno, Ignacio; Davis, Jeffrey A; Badham, Katherine; Sánchez-López, María M; Holland, Joseph E; Cottrell, Don M

2017-05-22

We present a proof of concept for a vector beam polarization state spectrum analyzer based on the combination of a polarization diffraction grating (PDG) and an encoded harmonic q-plate grating (QPG). As a result, a two-dimensional polarization diffraction grating is formed that generates six different q-plate channels with topological charges from -3 to +3 in the horizontal direction, and each is split in the vertical direction into the six polarization channels at the cardinal points of the corresponding higher-order Poincaré sphere. Consequently, 36 different channels are generated in parallel. This special polarization diffractive element is experimentally demonstrated using a single phase-only spatial light modulator in a reflective optical architecture. Finally, we show that this system can be used as a vector beam polarization state spectrum analyzer, where both the topological charge and the state of polarization of an input vector beam can be simultaneously determined in a single experiment. We expect that these results would be useful for applications in optical communications.
Modeling Cooperative Threads to Project GPU Performance for Adaptive Parallelism

DOE Office of Scientific and Technical Information (OSTI.GOV)

Meng, Jiayuan; Uram, Thomas; Morozov, Vitali A.

Most accelerators, such as graphics processing units (GPUs) and vector processors, are particularly suitable for accelerating massively parallel workloads. On the other hand, conventional workloads are developed for multi-core parallelism, which often scale to only a few dozen OpenMP threads. When hardware threads significantly outnumber the degree of parallelism in the outer loop, programmers are challenged with efficient hardware utilization. A common solution is to further exploit the parallelism hidden deep in the code structure. Such parallelism is less structured: parallel and sequential loops may be imperfectly nested within each other, neigh boring inner loops may exhibit different concurrency patternsmore » (e.g. Reduction vs. Forall), yet have to be parallelized in the same parallel section. Many input-dependent transformations have to be explored. A programmer often employs a larger group of hardware threads to cooperatively walk through a smaller outer loop partition and adaptively exploit any encountered parallelism. This process is time-consuming and error-prone, yet the risk of gaining little or no performance remains high for such workloads. To reduce risk and guide implementation, we propose a technique to model workloads with limited parallelism that can automatically explore and evaluate transformations involving cooperative threads. Eventually, our framework projects the best achievable performance and the most promising transformations without implementing GPU code or using physical hardware. We envision our technique to be integrated into future compilers or optimization frameworks for autotuning.« less
Selection rules for Cooper pairing in two-dimensional interfaces and sheets

NASA Astrophysics Data System (ADS)

Scheurer, Mathias S.; Agterberg, Daniel F.; Schmalian, Jörg

2017-12-01

Thin sheets deposited on a substrate and interfaces of correlated materials offer a plethora of routes towards the realization of exotic phases of matter. In these systems, inversion symmetry is broken which strongly affects the properties of possible instabilities—in particular in the superconducting channel. By combining symmetry and energetic arguments, we derive general and experimentally accessible selection rules for Cooper instabilities in noncentrosymmetric systems, which yield necessary and sufficient conditions for spontaneous time-reversal-symmetry breaking at the superconducting transition and constrain the orientation of the triplet vector. We discuss in detail the implications for various different materials. For instance, we conclude that the pairing state in thin layers of Sr2RuO4 must, as opposed to its bulk superconducting state, preserve time-reversal symmetry with its triplet vector being parallel to the plane of the system. All triplet states of this system allowed by the selection rules are predicted to display topological Majorana modes at dislocations or at the edge of the system. Applying our results to the LaAlO3/SrTiO3 heterostructures, we find that while the condensates of the (001) and (110) oriented interfaces must be time-reversal symmetric, spontaneous time-reversal-symmetry breaking can only occur for the less studied (111) interface. We also discuss the consequences for thin layers of URu2Si2 and UPt3 as well as for single-layer FeSe. On a more general level, our considerations might serve as a design principle in the search for time-reversal-symmetry-breaking superconductivity in the absence of external magnetic fields.
Deterministic binary vectors for efficient automated indexing of MEDLINE/PubMed abstracts.

PubMed

Wahle, Manuel; Widdows, Dominic; Herskovic, Jorge R; Bernstam, Elmer V; Cohen, Trevor

2012-01-01

The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for newly encountered articles. Given the rapid expansion of this literature, it is essential that these methods be scalable. Document vector representations are commonly used for automated indexing, and Random Indexing (RI) provides the means to generate them efficiently. However, RI is difficult to implement in real-world indexing systems, as (1) efficient nearest-neighbor search requires retaining all document vectors in RAM, and (2) it is necessary to maintain a store of randomly generated term vectors to index future documents. Motivated by these concerns, this paper documents the development and evaluation of a deterministic binary variant of RI. The increased capacity demonstrated by binary vectors has implications for information retrieval, and the elimination of the need to retain term vectors facilitates distributed implementations, enhancing the scalability of RI.
Deterministic Binary Vectors for Efficient Automated Indexing of MEDLINE/PubMed Abstracts

PubMed Central

Wahle, Manuel; Widdows, Dominic; Herskovic, Jorge R.; Bernstam, Elmer V.; Cohen, Trevor

2012-01-01

The need to maintain accessibility of the biomedical literature has led to development of methods to assist human indexers by recommending index terms for newly encountered articles. Given the rapid expansion of this literature, it is essential that these methods be scalable. Document vector representations are commonly used for automated indexing, and Random Indexing (RI) provides the means to generate them efficiently. However, RI is difficult to implement in real-world indexing systems, as (1) efficient nearest-neighbor search requires retaining all document vectors in RAM, and (2) it is necessary to maintain a store of randomly generated term vectors to index future documents. Motivated by these concerns, this paper documents the development and evaluation of a deterministic binary variant of RI. The increased capacity demonstrated by binary vectors has implications for information retrieval, and the elimination of the need to retain term vectors facilitates distributed implementations, enhancing the scalability of RI. PMID:23304369
DNA compaction into new DNA vectors based on cyclodextrin polymer: surface enhanced Raman spectroscopy characterization.

PubMed

Burckbuchler, V; Wintgens, V; Lecomte, S; Percot, A; Leborgne, C; Danos, O; Kichler, A; Amiel, C

2006-04-05

The ability of DNA to bind polycation yielding polyplexes is widely used in nonviral gene delivery. The aim of the present study was to evaluate the DNA compaction with a new DNA vector using Raman spectroscopy. The polyplexes result from an association of a beta-cyclodextrin polymer (polybeta-CD), an amphiphilic cationic connector (DC-Chol or adamantane derivative Ada2), and DNA. The charge of the polymeric vector is effectively controlled by simple addition of cationic connector in the medium. We used surface enhanced Raman spectroscopy (SERS) to characterize this ternary complex, monitoring the accessibility of adenyl residues to silver colloids. The first experiments were performed using model systems based on polyA (polyadenosine monophosphate) well characterized by SERS. This model was then extended to plasmid DNA to study polybeta-CD/Ada2/DNA and polybeta-CD/DC-Chol/DNA polyplexes. The SERS spectra show a decrease of signal intensity when the vector/DNA charge ratio (Z+/-) increases. At the highest ratio (Z+/- = 10) the signal is 6-fold and 3-fold less intense than the DNA reference signal for Ada2 and DC-Chol polyplexes, respectively. Thus adenyl residues have a reduced accessibility as DNA is bound to the vector. Moreover, the SERS intensity variations are in agreement with gel electrophoresis and zeta potential experiments on the same systems. The overall study clearly demonstrates that the cationic charges neutralizing the negative charges of DNA result in the formation of stable polyplexes. In vitro transfection efficiency of those DNA vectors are also presented and compared to the classical DC-Chol lipoplexes (DC-Chol/DNA). The results show an increase of the transfection efficiency 2-fold higher with our vector based on polybeta-CD. Copyright 2005 Wiley Periodicals, Inc.
A fast pulse design for parallel excitation with gridding conjugate gradient.

PubMed

Feng, Shuo; Ji, Jim

2013-01-01

Parallel excitation (pTx) is recognized as a crucial technique in high field MRI to address the transmit field inhomogeneity problem. However, it can be time consuming to design pTx pulses which is not desirable. In this work, we propose a pulse design with gridding conjugate gradient (CG) based on the small-tip-angle approximation. The two major time consuming matrix-vector multiplications are substituted by two operators which involves with FFT and gridding only. Simulation results have shown that the proposed method is 3 times faster than conventional method and the memory cost is reduced by 1000 times.
Magnetosheath for almost-aligned solar wind magnetic field and flow vectors: Wind observations across the dawnside magnetosheath at X = -12 Re

NASA Astrophysics Data System (ADS)

Farrugia, C. J.; Erkaev, N. V.; Torbert, R. B.; Biernat, H. K.; Gratton, F. T.; Szabo, A.; Kucharek, H.; Matsui, H.; Lin, R. P.; Ogilvie, K. W.; Lepping, R. P.; Smith, C. W.

2010-08-01

While there are many approximations describing the flow of the solar wind past the magnetosphere in the magnetosheath, the case of perfectly aligned (parallel or anti-parallel) interplanetary magnetic field (IMF) and solar wind flow vectors can be treated exactly in a magnetohydrodynamic (MHD) approach. In this work we examine a case of nearly-opposed (to within 15°) interplanetary field and flow vectors, which occurred on October 24-25, 2001 during passage of the last interplanetary coronal mass ejection in an ejecta merger. Interplanetary data are from the ACE spacecraft. Simultaneously Wind was crossing the near-Earth (X ˜ -13 Re) geomagnetic tail and subsequently made an approximately 5-hour-long magnetosheath crossing close to the ecliptic plane (Z = -0.7 Re). Geomagnetic activity was returning steadily to quiet, “ground” conditions. We first compare the predictions of the Spreiter and Rizzi theory with the Wind magnetosheath observations and find fair agreement, in particular as regards the proportionality of the magnetic field strength and the product of the plasma density and bulk speed. We then carry out a small-perturbation analysis of the Spreiter and Rizzi solution to account for the small IMF components perpendicular to the flow vector. The resulting expression is compared to the time series of the observations and satisfactory agreement is obtained. We also present and discuss observations in the dawnside boundary layer of pulsed, high-speed (v ˜ 600 km/s) flows exceeding the solar wind flow speeds. We examine various generating mechanisms and suggest that the most likely cause is a wave of frequency 3.2 mHz excited at the inner edge of the boundary layer by the Kelvin-Helmholtz instability.
Magnetosheath for almost-aligned solar wind magnetic field and flow vectors: Windobservations across the dawnside magnetosheath at X = -12 Re

NASA Astrophysics Data System (ADS)

Farrugia, Charles

While there are many approximations describing the flow of the solar wind past the mag-netosphere in the magnetosheath, the case of perfectly aligned (parallel or anti-parallel) in-terplanetary magnetic field (IMF) and solar wind flow vectors can be treated exactly in an magnetohydrodynamic (MHD) approach (Spreiter and Rizzi, 1974). In this work we examine a case of nearly-opposed (to within 15 deg) interplanetary field and flow vectors, which occurred on October 24-25, 2001 during passage of the last interplanetary coronal mass ejection in an ejecta merger. Interplanetary data are from the ACE spacecraft. Simultaneously Wind was crossing the near-Earth (X -13 Re) geomagnetic tail and subsequently made a 5-hour-long magnetosheath crossing close to the ecliptic plane (Z = -0.7 Re). Geomagnetic activity was returning steadily to quiet, "ground" conditions. We first compare the predictions of the Spre-iter and Rizzi theory with the Wind magnetosheath observations and find fair agreement, in particular as regards the proportionality of the magnetic field strength and the product of the plasma density and bulk speed. We then carry out a small-perturbation analysis of the Spreiter and Rizzi solution to account for the small IMF components perpendicular to the flow vector. The resulting expression is compared to the time series of the observations and satisfactory agreement is obtained. We also present and discuss observations in the dawnside boundary layer of pulsed, high-speed (v 600 km/s) flows exceeding the solar wind flow speeds. We examine various generating mechanisms and suggest that the most likely causeis a wave of frequency 3.2 mHz excited at the inner edge of the boundary layer.
Efficient Iterative Methods Applied to the Solution of Transonic Flows

NASA Astrophysics Data System (ADS)

Wissink, Andrew M.; Lyrintzis, Anastasios S.; Chronopoulos, Anthony T.

1996-02-01

We investigate the use of an inexact Newton's method to solve the potential equations in the transonic regime. As a test case, we solve the two-dimensional steady transonic small disturbance equation. Approximate factorization/ADI techniques have traditionally been employed for implicit solutions of this nonlinear equation. Instead, we apply Newton's method using an exact analytical determination of the Jacobian with preconditioned conjugate gradient-like iterative solvers for solution of the linear systems in each Newton iteration. Two iterative solvers are tested; a block s-step version of the classical Orthomin(k) algorithm called orthogonal s-step Orthomin (OSOmin) and the well-known GMRES method. The preconditioner is a vectorizable and parallelizable version of incomplete LU (ILU) factorization. Efficiency of the Newton-Iterative method on vector and parallel computer architectures is the main issue addressed. In vectorized tests on a single processor of the Cray C-90, the performance of Newton-OSOmin is superior to Newton-GMRES and a more traditional monotone AF/ADI method (MAF) for a variety of transonic Mach numbers and mesh sizes. Newton-GMRES is superior to MAF for some cases. The parallel performance of the Newton method is also found to be very good on multiple processors of the Cray C-90 and on the massively parallel thinking machine CM-5, where very fast execution rates (up to 9 Gflops) are found for large problems.
Evaluation of the power consumption of a high-speed parallel robot

NASA Astrophysics Data System (ADS)

Han, Gang; Xie, Fugui; Liu, Xin-Jun

2018-06-01

An inverse dynamic model of a high-speed parallel robot is established based on the virtual work principle. With this dynamic model, a new evaluation method is proposed to measure the power consumption of the robot during pick-and-place tasks. The power vector is extended in this method and used to represent the collinear velocity and acceleration of the moving platform. Afterward, several dynamic performance indices, which are homogenous and possess obvious physical meanings, are proposed. These indices can evaluate the power input and output transmissibility of the robot in a workspace. The distributions of the power input and output transmissibility of the high-speed parallel robot are derived with these indices and clearly illustrated in atlases. Furtherly, a low-power-consumption workspace is selected for the robot.
The optical analogy for vector fields

NASA Technical Reports Server (NTRS)

Parker, E. N. (Editor)

1991-01-01

This paper develops the optical analogy for a general vector field. The optical analogy allows the examination of certain aspects of a vector field that are not otherwise readily accessible. In particular, in the cases of a stationary Eulerian flow v of an ideal fluid and a magnetostatic field B, the vectors v and B have surface loci in common with their curls. The intrinsic discontinuities around local maxima in absolute values of v and B take the form of vortex sheets and current sheets, respectively, the former playing a fundamental role in the development of hydrodyamic turbulence and the latter playing a major role in heating the X-ray coronas of stars and galaxies.
A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations

PubMed Central

Ho, ThienLuan; Oh, Seung-Rohk

2017-01-01

Approximate string matching with k-differences has a number of practical applications, ranging from pattern recognition to computational biology. This paper proposes an efficient memory-access algorithm for parallel approximate string matching with k-differences on Graphics Processing Units (GPUs). In the proposed algorithm, all threads in the same GPUs warp share data using warp-shuffle operation instead of accessing the shared memory. Moreover, we implement the proposed algorithm by exploiting the memory structure of GPUs to optimize its performance. Experiment results for real DNA packages revealed that the performance of the proposed algorithm and its implementation archived up to 122.64 and 1.53 times compared to that of sequential algorithm on CPU and previous parallel approximate string matching algorithm on GPUs, respectively. PMID:29016700
Design of Unstructured Adaptive (UA) NAS Parallel Benchmark Featuring Irregular, Dynamic Memory Accesses

NASA Technical Reports Server (NTRS)

Feng, Hui-Yu; VanderWijngaart, Rob; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2001-01-01

We describe the design of a new method for the measurement of the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. The method involves the solution of a stylized heat transfer problem on an unstructured, adaptive grid. A Spectral Element Method (SEM) with an adaptive, nonconforming mesh is selected to discretize the transport equation. The relatively high order of the SEM lowers the fraction of wall clock time spent on inter-processor communication, which eases the load balancing task and allows us to concentrate on the memory accesses. The benchmark is designed to be three-dimensional. Parallelization and load balance issues of a reference implementation will be described in detail in future reports.
Design Sketches For Optical Crossbar Switches Intended For Large-Scale Parallel Processing Applications

NASA Astrophysics Data System (ADS)

Hartmann, Alfred; Redfield, Steve

1989-04-01

This paper discusses design of large-scale (1000x 1000) optical crossbar switching networks for use in parallel processing supercom-puters. Alternative design sketches for an optical crossbar switching network are presented using free-space optical transmission with either a beam spreading/masking model or a beam steering model for internodal communications. The performances of alternative multiple access channel communications protocol-unslotted and slotted ALOHA and carrier sense multiple access (CSMA)-are compared with the performance of the classic arbitrated bus crossbar of conventional electronic parallel computing. These comparisons indicate an almost inverse relationship between ease of implementation and speed of operation. Practical issues of optical system design are addressed, and an optically addressed, composite spatial light modulator design is presented for fabrication to arbitrarily large scale. The wide range of switch architecture, communications protocol, optical systems design, device fabrication, and system performance problems presented by these design sketches poses a serious challenge to practical exploitation of highly parallel optical interconnects in advanced computer designs.
A Systolic Architecture for Singular Value Decomposition,

DTIC Science & Technology

1983-01-01

Presented at the 1 st International Colloquium on Vector and Parallel Computing in Scientific Applications, Paris, March 191J Contract N00014-82-K.0703...Gene Golub. Private comunication . given inputs x and n 2 , compute 2 2 2 2 /6/ G. H. Golub and F. T. Luk : "Singular Value I + X1 Decomposition
NAS Applications and Advanced Algorithms

NASA Technical Reports Server (NTRS)

Bailey, David H.; Biswas, Rupak; VanDerWijngaart, Rob; Kutler, Paul (Technical Monitor)

1997-01-01

This paper examines the applications most commonly run on the supercomputers at the Numerical Aerospace Simulation (NAS) facility. It analyzes the extent to which such applications are fundamentally oriented to vector computers, and whether or not they can be efficiently implemented on hierarchical memory machines, such as systems with cache memories and highly parallel, distributed memory systems.

Molecular Symmetry in Ab Initio Calculations

NASA Astrophysics Data System (ADS)

Madhavan, P. V.; Written, J. L.

1987-05-01

A scheme is presented for the construction of the Fock matrix in LCAO-SCF calculations and for the transformation of basis integrals to LCAO-MO integrals that can utilize several symmetry unique lists of integrals corresponding to different symmetry groups. The algorithm is fully compatible with vector processing machines and is especially suited for parallel processing machines.
Electromagnetic banana kinetic equation and its applications in tokamaks

NASA Astrophysics Data System (ADS)

Shaing, K. C.; Chu, M. S.; Sabbagh, S. A.; Seol, J.

2018-03-01

A banana kinetic equation in tokamaks that includes effects of the finite banana width is derived for the electromagnetic waves with frequencies lower than the gyro-frequency and the bounce frequency of the trapped particles. The radial wavelengths are assumed to be either comparable to or shorter than the banana width, but much wider than the gyro-radius. One of the consequences of the banana kinetics is that the parallel component of the vector potential is not annihilated by the orbit averaging process and appears in the banana kinetic equation. The equation is solved to calculate the neoclassical quasilinear transport fluxes in the superbanana plateau regime caused by electromagnetic waves. The transport fluxes can be used to model electromagnetic wave and the chaotic magnetic field induced thermal particle or energetic alpha particle losses in tokamaks. It is shown that the parallel component of the vector potential enhances losses when it is the sole transport mechanism. In particular, the fact that the drift resonance can cause significant transport losses in the chaotic magnetic field in the hitherto unknown low collisionality regimes is emphasized.
Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms

DOE Office of Scientific and Technical Information (OSTI.GOV)

Williams, Samuel; Oliker, Leonid; Vuduc, Richard

2008-10-16

We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific-optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD quad-core, AMD dual-core, and Intel quad-core designs, the heterogeneous STI Cell, as well as one ofmore » the first scientific studies of the highly multithreaded Sun Victoria Falls (a Niagara2 SMP). We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural trade-offs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.« less
Optoelectronic associative recall using motionless-head parallel readout optical disk

NASA Astrophysics Data System (ADS)

Marchand, P. J.; Krishnamoorthy, A. V.; Ambs, P.; Esener, S. C.

1990-12-01

High data rates, low retrieval times, and simple implementation are presently shown to be obtainable by means of a motionless-head 2D parallel-readout system for optical disks. Since the optical disk obviates mechanical head motions for access, focusing, and tracking, addressing is performed exclusively through the disk's rotation. Attention is given to a high-performance associative memory system configuration which employs a parallel readout disk.
A Parallel Ghosting Algorithm for The Flexible Distributed Mesh Database

DOE PAGES

Mubarak, Misbah; Seol, Seegyoung; Lu, Qiukai; ...

2013-01-01

Critical to the scalability of parallel adaptive simulations are parallel control functions including load balancing, reduced inter-process communication and optimal data decomposition. In distributed meshes, many mesh-based applications frequently access neighborhood information for computational purposes which must be transmitted efficiently to avoid parallel performance degradation when the neighbors are on different processors. This article presents a parallel algorithm of creating and deleting data copies, referred to as ghost copies, which localize neighborhood data for computation purposes while minimizing inter-process communication. The key characteristics of the algorithm are: (1) It can create ghost copies of any permissible topological order in amore » 1D, 2D or 3D mesh based on selected adjacencies. (2) It exploits neighborhood communication patterns during the ghost creation process thus eliminating all-to-all communication. (3) For applications that need neighbors of neighbors, the algorithm can create n number of ghost layers up to a point where the whole partitioned mesh can be ghosted. Strong and weak scaling results are presented for the IBM BG/P and Cray XE6 architectures up to a core count of 32,768 processors. The algorithm also leads to scalable results when used in a parallel super-convergent patch recovery error estimator, an application that frequently accesses neighborhood data to carry out computation.« less
Secure access control and large scale robust representation for online multimedia event detection.

PubMed

Liu, Changyu; Lu, Bin; Li, Huiling

2014-01-01

We developed an online multimedia event detection (MED) system. However, there are a secure access control issue and a large scale robust representation issue when we want to integrate traditional event detection algorithms into the online environment. For the first issue, we proposed a tree proxy-based and service-oriented access control (TPSAC) model based on the traditional role based access control model. Verification experiments were conducted on the CloudSim simulation platform, and the results showed that the TPSAC model is suitable for the access control of dynamic online environments. For the second issue, inspired by the object-bank scene descriptor, we proposed a 1000-object-bank (1000OBK) event descriptor. Feature vectors of the 1000OBK were extracted from response pyramids of 1000 generic object detectors which were trained on standard annotated image datasets, such as the ImageNet dataset. A spatial bag of words tiling approach was then adopted to encode these feature vectors for bridging the gap between the objects and events. Furthermore, we performed experiments in the context of event classification on the challenging TRECVID MED 2012 dataset, and the results showed that the robust 1000OBK event descriptor outperforms the state-of-the-art approaches.
Massively parallel data processing for quantitative total flow imaging with optical coherence microscopy and tomography

NASA Astrophysics Data System (ADS)

Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo

2017-08-01

We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU data processing time)
[Orthogonal Vector Projection Algorithm for Spectral Unmixing].

PubMed

Song, Mei-ping; Xu, Xing-wei; Chang, Chein-I; An, Ju-bai; Yao, Li

2015-12-01

Spectrum unmixing is an important part of hyperspectral technologies, which is essential for material quantity analysis in hyperspectral imagery. Most linear unmixing algorithms require computations of matrix multiplication and matrix inversion or matrix determination. These are difficult for programming, especially hard for realization on hardware. At the same time, the computation costs of the algorithms increase significantly as the number of endmembers grows. Here, based on the traditional algorithm Orthogonal Subspace Projection, a new method called. Orthogonal Vector Projection is prompted using orthogonal principle. It simplifies this process by avoiding matrix multiplication and inversion. It firstly computes the final orthogonal vector via Gram-Schmidt process for each endmember spectrum. And then, these orthogonal vectors are used as projection vector for the pixel signature. The unconstrained abundance can be obtained directly by projecting the signature to the projection vectors, and computing the ratio of projected vector length and orthogonal vector length. Compared to the Orthogonal Subspace Projection and Least Squares Error algorithms, this method does not need matrix inversion, which is much computation costing and hard to implement on hardware. It just completes the orthogonalization process by repeated vector operations, easy for application on both parallel computation and hardware. The reasonability of the algorithm is proved by its relationship with Orthogonal Sub-space Projection and Least Squares Error algorithms. And its computational complexity is also compared with the other two algorithms', which is the lowest one. At last, the experimental results on synthetic image and real image are also provided, giving another evidence for effectiveness of the method.
Adeno-Associated Virus Type 2 Wild-Type and Vector-Mediated Genomic Integration Profiles of Human Diploid Fibroblasts Analyzed by Third-Generation PacBio DNA Sequencing

PubMed Central

Hüser, Daniela; Gogol-Döring, Andreas; Chen, Wei

2014-01-01

ABSTRACT Genome-wide analysis of adeno-associated virus (AAV) type 2 integration in HeLa cells has shown that wild-type AAV integrates at numerous genomic sites, including AAVS1 on chromosome 19q13.42. Multiple GAGY/C repeats, resembling consensus AAV Rep-binding sites are preferred, whereas rep-deficient AAV vectors (rAAV) regularly show a random integration profile. This study is the first study to analyze wild-type AAV integration in diploid human fibroblasts. Applying high-throughput third-generation PacBio-based DNA sequencing, integration profiles of wild-type AAV and rAAV are compared side by side. Bioinformatic analysis reveals that both wild-type AAV and rAAV prefer open chromatin regions. Although genomic features of AAV integration largely reproduce previous findings, the pattern of integration hot spots differs from that described in HeLa cells before. DNase-Seq data for human fibroblasts and for HeLa cells reveal variant chromatin accessibility at preferred AAV integration hot spots that correlates with variant hot spot preferences. DNase-Seq patterns of these sites in human tissues, including liver, muscle, heart, brain, skin, and embryonic stem cells further underline variant chromatin accessibility. In summary, AAV integration is dependent on cell-type-specific, variant chromatin accessibility leading to random integration profiles for rAAV, whereas wild-type AAV integration sites cluster near GAGY/C repeats. IMPORTANCE Adeno-associated virus type 2 (AAV) is assumed to establish latency by chromosomal integration of its DNA. This is the first genome-wide analysis of wild-type AAV2 integration in diploid human cells and the first to compare wild-type to recombinant AAV vector integration side by side under identical experimental conditions. Major determinants of wild-type AAV integration represent open chromatin regions with accessible consensus AAV Rep-binding sites. The variant chromatin accessibility of different human tissues or cell types will have impact on vector targeting to be considered during gene therapy. PMID:25031342
A portable MPI-based parallel vector template library

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.

1995-01-01

This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of C or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
A Portable MPI-Based Parallel Vector Template Library

NASA Technical Reports Server (NTRS)

Sheffler, Thomas J.

1995-01-01

This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C + + by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of c or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems

NASA Technical Reports Server (NTRS)

Taft, James R.

1999-01-01

Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
National malaria vector control policy: an analysis of the decision to scale-up larviciding in Nigeria.

PubMed

Tesfazghi, Kemi; Hill, Jenny; Jones, Caroline; Ranson, Hilary; Worrall, Eve

2016-02-01

New vector control tools are needed to combat insecticide resistance and reduce malaria transmission. The World Health Organization (WHO) endorses larviciding as a supplementary vector control intervention using larvicides recommended by the WHO Pesticides Evaluation Scheme (WHOPES). The decision to scale-up larviciding in Nigeria provided an opportunity to investigate the factors influencing policy adoption and assess the role that actors and evidence play in the policymaking process, in order to draw lessons that help accelerate the uptake of new methods for vector control. A retrospective policy analysis was carried out using in-depth interviews with national level policy stakeholders to establish normative national vector control policy or strategy decision-making processes and compare these with the process that led to the decision to scale-up larviciding. The interviews were transcribed, then coded and analyzed using NVivo10. Data were coded according to pre-defined themes from an analytical policy framework developed a priori. Stakeholders reported that the larviciding decision-making process deviated from the normative vector control decision-making process. National malaria policy is normally strongly influenced by WHO recommendations, but the potential of larviciding to contribute to national economic development objectives through larvicide production in Nigeria was cited as a key factor shaping the decision. The larviciding decision involved a restricted range of policy actors, and notably excluded actors that usually play advisory, consultative and evidence generation roles. Powerful actors limited the access of some actors to the policy processes and content. This may have limited the influence of scientific evidence in this policy decision. This study demonstrates that national vector control policy change can be facilitated by linking malaria control objectives to wider socioeconomic considerations and through engaging powerful policy champions to drive policy change and thereby accelerate access to new vector control tools. © The Author 2015. Published by Oxford University Press in association with The London School of Hygiene and Tropical Medicine.
Dynamic current-current susceptibility in three-dimensional Dirac and Weyl semimetals

NASA Astrophysics Data System (ADS)

Thakur, Anmol; Sadhukhan, Krishanu; Agarwal, Amit

2018-01-01

We study the linear response of doped three-dimensional Dirac and Weyl semimetals to vector potentials, by calculating the wave-vector- and frequency-dependent current-current response function analytically. The longitudinal part of the dynamic current-current response function is then used to study the plasmon dispersion and the optical conductivity. The transverse response in the static limit yields the orbital magnetic susceptibility. In a Weyl semimetal, along with the current-current response function, all these quantities are significantly impacted by the presence of parallel electric and magnetic fields (a finite E .B term) and can be used to experimentally explore the chiral anomaly.
A limited innate immune response is induced by a replication-defective herpes simplex virus vector following delivery to the murine central nervous system

PubMed Central

Zeier, Zane; Aguilar, J Santiago; Lopez, Cecilia M; Devi-Rao, G B; Watson, Zachary L; Baker, Henry V; Wagner, Edward K; Bloom, David C

2010-01-01

Herpes simplex virus type 1 (HSV-1)–based vectors readily transduce neurons and have a large payload capacity, making them particularly amenable to gene therapy applications within the central nervous system (CNS). Because aspects of the host responses to HSV-1 vectors in the CNS are largely unknown, we compared the host response of a nonreplicating HSV-1 vector to that of a replication-competent HSV-1 virus using microarray analysis. In parallel, HSV-1 gene expression was tracked using HSV-specific oligonucleotide-based arrays in order to correlate viral gene expression with observed changes in host response. Microarray analysis was performed following stereotactic injection into the right hippocampal formation of mice with either a replication-competent HSV-1 or a nonreplicating recombinant of HSV-1, lacking the ICP4 gene (ICP4−). Genes that demonstrated a significant change (P < .001) in expression in response to the replicating HSV-1 outnumbered those that changed in response to mock or nonreplicating vector by approximately 3-fold. Pathway analysis revealed that both the replicating and nonreplicating vectors induced robust antigen presentation but only mild interferon, chemokine, and cytokine signaling responses. The ICP4− vector was restricted in several of the Toll-like receptor-signaling pathways, indicating reduced stimulation of the innate immune response. These array analyses suggest that although the nonreplicating vector induces detectable activation of immune response pathways, the number and magnitude of the induced response is dramatically restricted compared to the replicating vector, and with the exception of antigen presentation, host gene expression induced by the non-replicating vector largely resembles mock infection. PMID:20095947
Efficacy of Code Optimization on Cache-based Processors

NASA Technical Reports Server (NTRS)

VanderWijngaart, Rob F.; Chancellor, Marisa K. (Technical Monitor)

1997-01-01

The current common wisdom in the U.S. is that the powerful, cost-effective supercomputers of tomorrow will be based on commodity (RISC) micro-processors with cache memories. Already, most distributed systems in the world use such hardware as building blocks. This shift away from vector supercomputers and towards cache-based systems has brought about a change in programming paradigm, even when ignoring issues of parallelism. Vector machines require inner-loop independence and regular, non-pathological memory strides (usually this means: non-power-of-two strides) to allow efficient vectorization of array operations. Cache-based systems require spatial and temporal locality of data, so that data once read from main memory and stored in high-speed cache memory is used optimally before being written back to main memory. This means that the most cache-friendly array operations are those that feature zero or unit stride, so that each unit of data read from main memory (a cache line) contains information for the next iteration in the loop. Moreover, loops ought to be 'fat', meaning that as many operations as possible are performed on cache data-provided instruction caches do not overflow and enough registers are available. If unit stride is not possible, for example because of some data dependency, then care must be taken to avoid pathological strides, just ads on vector computers. For cache-based systems the issues are more complex, due to the effects of associativity and of non-unit block (cache line) size. But there is more to the story. Most modern micro-processors are superscalar, which means that they can issue several (arithmetic) instructions per clock cycle, provided that there are enough independent instructions in the loop body. This is another argument for providing fat loop bodies. With these restrictions, it appears fairly straightforward to produce code that will run efficiently on any cache-based system. It can be argued that although some of the important computational algorithms employed at NASA Ames require different programming styles on vector machines and cache-based machines, respectively, neither architecture class appeared to be favored by particular algorithms in principle. Practice tells us that the situation is more complicated. This report presents observations and some analysis of performance tuning for cache-based systems. We point out several counterintuitive results that serve as a cautionary reminder that memory accesses are not the only factors that determine performance, and that within the class of cache-based systems, significant differences exist.
Enabling Object Storage via shims for Grid Middleware

NASA Astrophysics Data System (ADS)

Cadellin Skipsey, Samuel; De Witt, Shaun; Dewhurst, Alastair; Britton, David; Roy, Gareth; Crooks, David

2015-12-01

The Object Store model has quickly become the basis of most commercially successful mass storage infrastructure, backing so-called ”Cloud” storage such as Amazon S3, but also underlying the implementation of most parallel distributed storage systems. Many of the assumptions in Object Store design are similar, but not identical, to concepts in the design of Grid Storage Elements, although the requirement for ”POSIX-like” filesystem structures on top of SEs makes the disjunction seem larger. As modern Object Stores provide many features that most Grid SEs do not (block level striping, parallel access, automatic file repair, etc.), it is of interest to see how easily we can provide interfaces to typical Object Stores via plugins and shims for Grid tools, and how well experiments can adapt their data models to them. We present evaluation of, and first-deployment experiences with, (for example) Xrootd-Ceph interfaces for direct object-store access, as part of an initiative within GridPP[1] hosted at RAL. Additionally, we discuss the tradeoffs and experience of developing plugins for the currently-popular Ceph parallel distributed filesystem for the GFAL2 access layer, at Glasgow.
Highly scalable parallel processing of extracellular recordings of Multielectrode Arrays.

PubMed

Gehring, Tiago V; Vasilaki, Eleni; Giugliano, Michele

2015-01-01

Technological advances of Multielectrode Arrays (MEAs) used for multisite, parallel electrophysiological recordings, lead to an ever increasing amount of raw data being generated. Arrays with hundreds up to a few thousands of electrodes are slowly seeing widespread use and the expectation is that more sophisticated arrays will become available in the near future. In order to process the large data volumes resulting from MEA recordings there is a pressing need for new software tools able to process many data channels in parallel. Here we present a new tool for processing MEA data recordings that makes use of new programming paradigms and recent technology developments to unleash the power of modern highly parallel hardware, such as multi-core CPUs with vector instruction sets or GPGPUs. Our tool builds on and complements existing MEA data analysis packages. It shows high scalability and can be used to speed up some performance critical pre-processing steps such as data filtering and spike detection, helping to make the analysis of larger data sets tractable.
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe

A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
The design and implementation of a parallel unstructured Euler solver using software primitives

NASA Technical Reports Server (NTRS)

Das, R.; Mavriplis, D. J.; Saltz, J.; Gupta, S.; Ponnusamy, R.

1992-01-01

This paper is concerned with the implementation of a three-dimensional unstructured grid Euler-solver on massively parallel distributed-memory computer architectures. The goal is to minimize solution time by achieving high computational rates with a numerically efficient algorithm. An unstructured multigrid algorithm with an edge-based data structure has been adopted, and a number of optimizations have been devised and implemented in order to accelerate the parallel communication rates. The implementation is carried out by creating a set of software tools, which provide an interface between the parallelization issues and the sequential code, while providing a basis for future automatic run-time compilation support. Large practical unstructured grid problems are solved on the Intel iPSC/860 hypercube and Intel Touchstone Delta machine. The quantitative effect of the various optimizations are demonstrated, and we show that the combined effect of these optimizations leads to roughly a factor of three performance improvement. The overall solution efficiency is compared with that obtained on the CRAY-YMP vector supercomputer.

Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations

DOE PAGES

Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; ...

2017-11-14

A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations

NASA Astrophysics Data System (ADS)

Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; Gagliardi, Laura; de Jong, Wibe A.

2017-11-01

A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals for the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. The chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.
LOADING AND UNLOADING DEVICE

DOEpatents

Treshow, M.

1960-08-16

A device for loading and unloading fuel rods into and from a reactor tank through an access hole includes parallel links carrying a gripper. These links enable the gripper to go through the access hole and then to be moved laterally from the axis of the access hole to the various locations of the fuel rods in the reactor tank.
Vector correlation between the alignment of reactant N2 (A 3Σu+) and the alignment of product NO (A 2Σ+) rotation in the energy transfer reaction of aligned N2 (A 3Σu+) + NO (X 2Π) → NO (A 2Σ+) + N2 (X 1Σg+).

PubMed

Ohoyama, H

2013-12-21

The vector correlation between the alignment of reactant N2 (A (3)Σu(+)) and the alignment of product NO (A (2)Σ(+)) rotation has been studied in the energy transfer reaction of aligned N2 (A (3)Σu(+)) + NO (X (2)Π) → NO (A (2)Σ(+)) + N2 (X (1)Σg(+)) under the crossed beam condition at a collision energy of ~0.07 eV. NO (A (2)Σ(+)) emission in the two linear polarization directions (i.e., parallel and perpendicular with respect to the relative velocity vector v(R)) has been measured as a function of the alignment of N2 (A (3)Σu(+)) along its molecular axis in the collision frame. The degree of polarization of NO (A (2)Σ(+)) emission is found to depend on the alignment angle (θ(v(R))) of N2 (A (3)Σu(+)) in the collision frame. The shape of the steric opacity function at the two polarization conditions turns out to be extremely different from each other: The steric opacity function at the parallel polarization condition is more favorable for the oblique configuration of N2 (A (3)Σu(+)) at an alignment angle of θ(v(R)) ~ 45° as compared with that at the perpendicular polarization condition. The alignment of N2 (A (3)Σu(+)) is found to give a significant effect on the alignment of NO (A (2)Σ(+)) rotation in the collision frame: The N2 (A (3)Σu(+)) configuration at an oblique alignment angle θ(v(R)) ~ 45° leads to a parallel alignment of NO (A (2)Σ(+)) rotation (J-vector) with respect to v(R), while the axial and sideways configurations of N2 (A (3)Σu(+)) lead to a perpendicular alignment of NO (A (2)Σ(+)) rotation with respect to vR. These stereocorrelated alignments of the product rotation have a good correlation with the stereocorrelated reactivity observed in the multi-dimensional steric opacity function [H. Ohoyama and S. Maruyama, J. Chem. Phys. 137, 064311 (2012)].
DOE Office of Scientific and Technical Information (OSTI.GOV)

Kargupta, H.; Stafford, B.; Hamzaoglu, I.

This paper describes an experimental parallel/distributed data mining system PADMA (PArallel Data Mining Agents) that uses software agents for local data accessing and analysis and a web based interface for interactive data visualization. It also presents the results of applying PADMA for detecting patterns in unstructured texts of postmortem reports and laboratory test data for Hepatitis C patients.
Hardware packet pacing using a DMA in a parallel computer

DOEpatents

Chen, Dong; Heidelberger, Phillip; Vranas, Pavlos

2013-08-13

Method and system for hardware packet pacing using a direct memory access controller in a parallel computer which, in one aspect, keeps track of a total number of bytes put on the network as a result of a remote get operation, using a hardware token counter.
A Lightweight Remote Parallel Visualization Platform for Interactive Massive Time-varying Climate Data Analysis

NASA Astrophysics Data System (ADS)

Li, J.; Zhang, T.; Huang, Q.; Liu, Q.

2014-12-01

Today's climate datasets are featured with large volume, high degree of spatiotemporal complexity and evolving fast overtime. As visualizing large volume distributed climate datasets is computationally intensive, traditional desktop based visualization applications fail to handle the computational intensity. Recently, scientists have developed remote visualization techniques to address the computational issue. Remote visualization techniques usually leverage server-side parallel computing capabilities to perform visualization tasks and deliver visualization results to clients through network. In this research, we aim to build a remote parallel visualization platform for visualizing and analyzing massive climate data. Our visualization platform was built based on Paraview, which is one of the most popular open source remote visualization and analysis applications. To further enhance the scalability and stability of the platform, we have employed cloud computing techniques to support the deployment of the platform. In this platform, all climate datasets are regular grid data which are stored in NetCDF format. Three types of data access methods are supported in the platform: accessing remote datasets provided by OpenDAP servers, accessing datasets hosted on the web visualization server and accessing local datasets. Despite different data access methods, all visualization tasks are completed at the server side to reduce the workload of clients. As a proof of concept, we have implemented a set of scientific visualization methods to show the feasibility of the platform. Preliminary results indicate that the framework can address the computation limitation of desktop based visualization applications.
Paging memory from random access memory to backing storage in a parallel computer

DOEpatents

Archer, Charles J; Blocksome, Michael A; Inglett, Todd A; Ratterman, Joseph D; Smith, Brian E

2013-05-21

Paging memory from random access memory (`RAM`) to backing storage in a parallel computer that includes a plurality of compute nodes, including: executing a data processing application on a virtual machine operating system in a virtual machine on a first compute node; providing, by a second compute node, backing storage for the contents of RAM on the first compute node; and swapping, by the virtual machine operating system in the virtual machine on the first compute node, a page of memory from RAM on the first compute node to the backing storage on the second compute node.
Dimensional Crossover and Its Interplay with In-Plane Anisotropy of Upper Critical Field in β-(BDA-TTP)2SbF6

NASA Astrophysics Data System (ADS)

Yasuzuka, Syuma; Koga, Hiroaki; Yamamura, Yasuhisa; Saito, Kazuya; Uji, Shinya; Terashima, Taichi; Akutsu, Hiroki; Yamada, Jun-ichi

2017-08-01

Resistance measurements have been performed to investigate the dimensionality and the in-plane anisotropy of the upper critical field (Hc2) for β-(BDA-TTP)2SbF6 in fields H up to 15 T and at temperatures T from 1.5 to 7.5 K, where BDA-TTP stands for 2,5-bis(1,3-dithian-2-ylidene)-1,3,4,6-tetrathiapentalene. The upper critical fields parallel and perpendicular to the conduction layer are determined and dimensional crossover from anisotropic three-dimensional behavior to two-dimensional behavior is found at around 6 K. When the direction of H is varied within the conducting layer at 6.0 K, Hc2 shows twofold symmetry: Hc2 along the minimum Fermi wave vector (maximum Fermi velocity) is larger than that along the maximum Fermi wave vector (minimum Fermi velocity). The normal-state magnetoresistance has twofold symmetry similar to Hc2 and shows a maximum when the magnetic field is nearly parallel to the maximum Fermi wave vector. This tendency is consistent with the Fermi surface anisotropy. At 3.5 K, we found clear fourfold symmetry of Hc2 despite the fact that the normal-state magnetoresistance shows twofold symmetry arising from the Fermi surface anisotropy. The origin of the fourfold symmetry of Hc2 is discussed in terms of the superconducting gap structure in β-(BDA-TTP)2SbF6.
Cholesterol derived cationic lipids as potential non-viral gene delivery vectors and their serum compatibility.

PubMed

Ju, Jia; Huan, Meng-Lei; Wan, Ning; Hou, Yi-Lin; Ma, Xi-Xi; Jia, Yi-Yang; Li, Chen; Zhou, Si-Yuan; Zhang, Bang-Le

2016-05-15

Cholesterol derivatives M1-M6 as synthetic cationic lipids were designed and the biological evaluation of the cationic liposomes based on them as non-viral gene delivery vectors were described. Plasmid pEGFP-N1, used as model gene, was transferred into 293T cells by cationic liposomes formed with M1-M6 and transfection efficiency and GFP expression were tested. Cationic liposomes prepared with cationic lipids M1-M6 exhibited good transfection activity, and the transfection activity was parallel (M2 and M4) or superior (M1 and M6) to that of DC-Chol derived from the same backbone. Among them, the transfection efficiency of cationic lipid M6 was parallel to that of the commercially available Lipofectamine2000. The optimal formulation of M1 and M6 were found to be at a mol ratio of 1:0.5 for cationic lipid/DOPE, and at a N/P charge mol ratio of 3:1 for liposome/DNA. Under optimized conditions, the efficiency of M1 and M6 is greater than that of all the tested commercial liposomes DC-Chol and Lipofectamine2000, even in the presence of serum. The results indicated that M1 and M6 exhibited low cytotoxicity, good serum compatibility and efficient transfection performance, having the potential of being excellent non-viral vectors for gene delivery. Copyright © 2016 Elsevier Ltd. All rights reserved.
Automated extraction and analysis of rock discontinuity characteristics from 3D point clouds

NASA Astrophysics Data System (ADS)

Bianchetti, Matteo; Villa, Alberto; Agliardi, Federico; Crosta, Giovanni B.

2016-04-01

A reliable characterization of fractured rock masses requires an exhaustive geometrical description of discontinuities, including orientation, spacing, and size. These are required to describe discontinuum rock mass structure, perform Discrete Fracture Network and DEM modelling, or provide input for rock mass classification or equivalent continuum estimate of rock mass properties. Although several advanced methodologies have been developed in the last decades, a complete characterization of discontinuity geometry in practice is still challenging, due to scale-dependent variability of fracture patterns and difficult accessibility to large outcrops. Recent advances in remote survey techniques, such as terrestrial laser scanning and digital photogrammetry, allow a fast and accurate acquisition of dense 3D point clouds, which promoted the development of several semi-automatic approaches to extract discontinuity features. Nevertheless, these often need user supervision on algorithm parameters which can be difficult to assess. To overcome this problem, we developed an original Matlab tool, allowing fast, fully automatic extraction and analysis of discontinuity features with no requirements on point cloud accuracy, density and homogeneity. The tool consists of a set of algorithms which: (i) process raw 3D point clouds, (ii) automatically characterize discontinuity sets, (iii) identify individual discontinuity surfaces, and (iv) analyse their spacing and persistence. The tool operates in either a supervised or unsupervised mode, starting from an automatic preliminary exploration data analysis. The identification and geometrical characterization of discontinuity features is divided in steps. First, coplanar surfaces are identified in the whole point cloud using K-Nearest Neighbor and Principal Component Analysis algorithms optimized on point cloud accuracy and specified typical facet size. Then, discontinuity set orientation is calculated using Kernel Density Estimation and principal vector similarity criteria. Poles to points are assigned to individual discontinuity objects using easy custom vector clustering and Jaccard distance approaches, and each object is segmented into planar clusters using an improved version of the DBSCAN algorithm. Modal set orientations are then recomputed by cluster-based orientation statistics to avoid the effects of biases related to cluster size and density heterogeneity of the point cloud. Finally, spacing values are measured between individual discontinuity clusters along scanlines parallel to modal pole vectors, whereas individual feature size (persistence) is measured using 3D convex hull bounding boxes. Spacing and size are provided both as raw population data and as summary statistics. The tool is optimized for parallel computing on 64bit systems, and a Graphic User Interface (GUI) has been developed to manage data processing, provide several outputs, including reclassified point clouds, tables, plots, derived fracture intensity parameters, and export to modelling software tools. We present test applications performed both on synthetic 3D data (simple 3D solids) and real case studies, validating the results with existing geomechanical datasets.
Job Management Requirements for NAS Parallel Systems and Clusters

NASA Technical Reports Server (NTRS)

Saphir, William; Tanner, Leigh Ann; Traversat, Bernard

1995-01-01

A job management system is a critical component of a production supercomputing environment, permitting oversubscribed resources to be shared fairly and efficiently. Job management systems that were originally designed for traditional vector supercomputers are not appropriate for the distributed-memory parallel supercomputers that are becoming increasingly important in the high performance computing industry. Newer job management systems offer new functionality but do not solve fundamental problems. We address some of the main issues in resource allocation and job scheduling we have encountered on two parallel computers - a 160-node IBM SP2 and a cluster of 20 high performance workstations located at the Numerical Aerodynamic Simulation facility. We describe the requirements for resource allocation and job management that are necessary to provide a production supercomputing environment on these machines, prioritizing according to difficulty and importance, and advocating a return to fundamental issues.
Efficient diagonalization of the sparse matrices produced within the framework of the UK R-matrix molecular codes

NASA Astrophysics Data System (ADS)

Galiatsatos, P. G.; Tennyson, J.

2012-11-01

The most time consuming step within the framework of the UK R-matrix molecular codes is that of the diagonalization of the inner region Hamiltonian matrix (IRHM). Here we present the method that we follow to speed up this step. We use shared memory machines (SMM), distributed memory machines (DMM), the OpenMP directive based parallel language, the MPI function based parallel language, the sparse matrix diagonalizers ARPACK and PARPACK, a variation for real symmetric matrices of the official coordinate sparse matrix format and finally a parallel sparse matrix-vector product (PSMV). The efficient application of the previous techniques rely on two important facts: the sparsity of the matrix is large enough (more than 98%) and in order to get back converged results we need a small only part of the matrix spectrum.
Full speed ahead for software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wolfe, A.

1986-03-10

Supercomputing software is moving into high gear, spurred by the rapid spread of supercomputers into new applications. The critical challenge is how to develop tools that will make it easier for programmers to write applications that take advantage of vectorizing in the classical supercomputer and the parallelism that is emerging in supercomputers and minisupercomputers. Writing parallel software is a challenge that every programmer must face because parallel architectures are springing up across the range of computing. Cray is developing a host of tools for programmers. Tools to support multitasking (in supercomputer parlance, multitasking means dividing up a single program tomore » run on multiple processors) are high on Cray's agenda. On tap for multitasking is Premult, dubbed a microtasking tool. As a preprocessor for Cray's CFT77 FORTRAN compiler, Premult will provide fine-grain multitasking.« less
GPU-accelerated algorithms for compressed signals recovery with application to astronomical imagery deblurring

NASA Astrophysics Data System (ADS)

Fiandrotti, Attilio; Fosson, Sophie M.; Ravazzi, Chiara; Magli, Enrico

2018-04-01

Compressive sensing promises to enable bandwidth-efficient on-board compression of astronomical data by lifting the encoding complexity from the source to the receiver. The signal is recovered off-line, exploiting GPUs parallel computation capabilities to speedup the reconstruction process. However, inherent GPU hardware constraints limit the size of the recoverable signal and the speedup practically achievable. In this work, we design parallel algorithms that exploit the properties of circulant matrices for efficient GPU-accelerated sparse signals recovery. Our approach reduces the memory requirements, allowing us to recover very large signals with limited memory. In addition, it achieves a tenfold signal recovery speedup thanks to ad-hoc parallelization of matrix-vector multiplications and matrix inversions. Finally, we practically demonstrate our algorithms in a typical application of circulant matrices: deblurring a sparse astronomical image in the compressed domain.
Dynamics modeling for parallel haptic interfaces with force sensing and control.

PubMed

Bernstein, Nicholas; Lawrence, Dale; Pao, Lucy

2013-01-01

Closed-loop force control can be used on haptic interfaces (HIs) to mitigate the effects of mechanism dynamics. A single multidimensional force-torque sensor is often employed to measure the interaction force between the haptic device and the user's hand. The parallel haptic interface at the University of Colorado (CU) instead employs smaller 1D force sensors oriented along each of the five actuating rods to build up a 5D force vector. This paper shows that a particular manipulandum/hand partition in the system dynamics is induced by the placement and type of force sensing, and discusses the implications on force and impedance control for parallel haptic interfaces. The details of a "squaring down" process are also discussed, showing how to obtain reduced degree-of-freedom models from the general six degree-of-freedom dynamics formulation.
High Performance Compression of Science Data

NASA Technical Reports Server (NTRS)

Storer, James A.; Carpentieri, Bruno; Cohn, Martin

1994-01-01

Two papers make up the body of this report. One presents a single-pass adaptive vector quantization algorithm that learns a codebook of variable size and shape entries; the authors present experiments on a set of test images showing that with no training or prior knowledge of the data, for a given fidelity, the compression achieved typically equals or exceeds that of the JPEG standard. The second paper addresses motion compensation, one of the most effective techniques used in interframe data compression. A parallel block-matching algorithm for estimating interframe displacement of blocks with minimum error is presented. The algorithm is designed for a simple parallel architecture to process video in real time.
Legal aspects of public health: difficulties in controlling vector-borne and zoonotic diseases in Brazil.

PubMed

Mendes, Marcílio S; de Moraes, Josué

2014-11-01

In recent years, vector-borne and zoonotic diseases have become a major challenge for public health. Dengue fever and leptospirosis are the most important communicable diseases in Brazil based on their prevalence and the healthy life years lost from disability. The primary strategy for preventing human exposure to these diseases is effective insect and rodent control in and around the home. However, health authorities have difficulties in controlling vector-borne and zoonotic diseases because residents often refuse access to their homes. This study discusses aspects related to the activities performed by Brazilian health authorities to combat vector-borne and zoonotic diseases, particularly difficulties in relation to the legal aspect, which often impede the quick and effective actions of these professionals. How might it be possible to reconcile the need to preserve public health and the rule on the inviolability of the home, especially in the case of abandoned properties or illegal residents and the refusal of residents to allow the health authority access? Do residents have the right to hinder the performance of health workers even in the face of a significant and visible focus of disease transmission? This paper argues that a comprehensive legal plan aimed at the control of invasive vector-borne and zoonotic diseases including synanthropic animals of public health importance should be considered. In addition, this paper aims to bridge the gap between lawyers and public health professionals and to facilitate communication between them. Copyright © 2014 Elsevier B.V. All rights reserved.
Rapid Parallel Screening for Strain Optimization

DTIC Science & Technology

2013-08-16

fermentation yields of industrially relevant biological compounds. Screening of the desired chemicals was completed previously. Microbes that can...reporter, and, 2) a yeast TAR cloning shuttle vector for transferring catabolic clusters to E. coli. 15. SUBJECT TERMS NA 16. SECURITY CLASSIFICATION OF... fermentation yields of industrially relevant biological compounds. Screening of the desired chemicals was completed previously. Microbes that can utilize
Rapid Parallel Screening for Strain Optimization

DTIC Science & Technology

2013-05-16

fermentation yields of industrially relevant biological compounds. Screening of the desired chemicals was completed previously. Microbes that can...reporter, and, 2) a yeast TAR cloning shuttle vector for transferring catabolic clusters to E. coli. 15. SUBJECT TERMS NA 16. SECURITY CLASSIFICATION OF... fermentation yields of industrially relevant biological compounds. Screening of the desired chemicals was completed previously. Microbes that can utilize

Multiprocessing MCNP on an IBM RS/6000 cluster

DOE Office of Scientific and Technical Information (OSTI.GOV)

McKinney, G.W.; West, J.T.

1993-01-01

The advent of high-performance computer systems has brought to maturity programming concepts like vectorization, multiprocessing, and multitasking. While there are many schools of thought as to the most significant factor in obtaining order-of-magnitude increases in performance, such speedup can only be achieved by integrating the computer system and application code. Vectorization leads to faster manipulation of arrays by overlapping instruction CPU cycles. Discrete ordinates codes, which require the solving of large matrices, have proved to be major benefactors of vectorization. Monte Carlo transport, on the other hand, typically contains numerous logic statements and requires extensive redevelopment to benefit from vectorization.more » Multiprocessing and multitasking provide additional CPU cycles via multiple processors. Such systems are generally designed with either common memory access (multitasking) or distributed memory access. In both cases, theoretical speedup, as a function of the number of processors (P) and the fraction of task time that multiprocesses (f), can be formulated using Amdahl's Law S ((f,P) = 1 f + f/P). However, for most applications this theoretical limit cannot be achieved, due to additional terms not included in Amdahl's Law. Monte Carlo transport is a natural candidate for multiprocessing, since the particle tracks are generally independent and the precision of the result increases as the square root of the number of particles tracked.« less
Gene therapy decreases seizures in a model of Incontinentia pigmenti.

PubMed

Dogbevia, Godwin K; Töllner, Kathrin; Körbelin, Jakob; Bröer, Sonja; Ridder, Dirk A; Grasshoff, Hanna; Brandt, Claudia; Wenzel, Jan; Straub, Beate K; Trepel, Martin; Löscher, Wolfgang; Schwaninger, Markus

2017-07-01

Incontinentia pigmenti (IP) is a genetic disease leading to severe neurological symptoms, such as epileptic seizures, but no specific treatment is available. IP is caused by pathogenic variants that inactivate the Nemo gene. Replacing Nemo through gene therapy might provide therapeutic benefits. In a mouse model of IP, we administered a single intravenous dose of the adeno-associated virus (AAV) vector, AAV-BR1-CAG-NEMO, delivering the Nemo gene to the brain endothelium. Spontaneous epileptic seizures and the integrity of the blood-brain barrier (BBB) were monitored. The endothelium-targeted gene therapy improved the integrity of the BBB. In parallel, it reduced the incidence of seizures and delayed their occurrence. Neonate mice intravenously injected with the AAV-BR1-CAG-NEMO vector developed no hepatocellular carcinoma or other major adverse effects 11 months after vector injection, demonstrating that the vector has a favorable safety profile. The data show that the BBB is a target of antiepileptic treatment and, more specifically, provide evidence for the therapeutic benefit of a brain endothelial-targeted gene therapy in IP. Ann Neurol 2017;82:93-104. © 2017 American Neurological Association.
VLSI realization of learning vector quantization with hardware/software co-design for different applications

NASA Astrophysics Data System (ADS)

An, Fengwei; Akazawa, Toshinobu; Yamasaki, Shogo; Chen, Lei; Jürgen Mattausch, Hans

2015-04-01

This paper reports a VLSI realization of learning vector quantization (LVQ) with high flexibility for different applications. It is based on a hardware/software (HW/SW) co-design concept for on-chip learning and recognition and designed as a SoC in 180 nm CMOS. The time consuming nearest Euclidean distance search in the LVQ algorithm’s competition layer is efficiently implemented as a pipeline with parallel p-word input. Since neuron number in the competition layer, weight values, input and output number are scalable, the requirements of many different applications can be satisfied without hardware changes. Classification of a d-dimensional input vector is completed in n × \\lceil d/p \\rceil + R clock cycles, where R is the pipeline depth, and n is the number of reference feature vectors (FVs). Adjustment of stored reference FVs during learning is done by the embedded 32-bit RISC CPU, because this operation is not time critical. The high flexibility is verified by the application of human detection with different numbers for the dimensionality of the FVs.
T-cell receptor transfer into human T cells with ecotropic retroviral vectors.

PubMed

Koste, L; Beissert, T; Hoff, H; Pretsch, L; Türeci, Ö; Sahin, U

2014-05-01

Adoptive T-cell transfer for cancer immunotherapy requires genetic modification of T cells with recombinant T-cell receptors (TCRs). Amphotropic retroviral vectors (RVs) used for TCR transduction for this purpose are considered safe in principle. Despite this, TCR-coding and packaging vectors could theoretically recombine to produce replication competent vectors (RCVs), and transduced T-cell preparations must be proven free of RCV. To eliminate the need for RCV testing, we transduced human T cells with ecotropic RVs so potential RCV would be non-infectious for human cells. We show that transfection of synthetic messenger RNA encoding murine cationic amino-acid transporter 1 (mCAT-1), the receptor for murine retroviruses, enables efficient transient ecotropic transduction of human T cells. mCAT-1-dependent transduction was more efficient than amphotropic transduction performed in parallel, and preferentially targeted naive T cells. Moreover, we demonstrate that ecotropic TCR transduction results in antigen-specific restimulation of primary human T cells. Thus, ecotropic RVs represent a versatile, safe and potent tool to prepare T cells for the adoptive transfer.
Local NMR relaxation rates T1-1 and T2-1 depending on the d -vector symmetry in the vortex state of chiral and helical p -wave superconductors

NASA Astrophysics Data System (ADS)

Tanaka, Kenta K.; Ichioka, Masanori; Onari, Seiichiro

2018-04-01

Local NMR relaxation rates in the vortex state of chiral and helical p -wave superconductors are investigated by the quasiclassical Eilenberger theory. We calculate the spatial and resonance frequency dependences of the local NMR spin-lattice relaxation rate T1-1 and spin-spin relaxation rate T2-1. Depending on the relation between the NMR relaxation direction and the d -vector symmetry, the local T1-1 and T2-1 in the vortex core region show different behaviors. When the NMR relaxation direction is parallel to the d -vector component, the local NMR relaxation rate is anomalously suppressed by the negative coherence effect due to the spin dependence of the odd-frequency s -wave spin-triplet Cooper pairs. The difference between the local T1-1 and T2-1 in the site-selective NMR measurement is expected to be a method to examine the d -vector symmetry of candidate materials for spin-triplet superconductors.
Exact simulation of polarized light reflectance by particle deposits

NASA Astrophysics Data System (ADS)

Ramezan Pour, B.; Mackowski, D. W.

2015-12-01

The use of polarimetric light reflection measurements as a means of identifying the physical and chemical characteristics of particulate materials obviously relies on an accurate model of predicting the effects of particle size, shape, concentration, and refractive index on polarized reflection. The research examines two methods for prediction of reflection from plane parallel layers of wavelength—sized particles. The first method is based on an exact superposition solution to Maxwell's time harmonic wave equations for a deposit of spherical particles that are exposed to a plane incident wave. We use a FORTRAN-90 implementation of this solution (the Multiple Sphere T Matrix (MSTM) code), coupled with parallel computational platforms, to directly simulate the reflection from particle layers. The second method examined is based upon the vector radiative transport equation (RTE). Mie theory is used in our RTE model to predict the extinction coefficient, albedo, and scattering phase function of the particles, and the solution of the RTE is obtained from adding—doubling method applied to a plane—parallel configuration. Our results show that the MSTM and RTE predictions of the Mueller matrix elements converge when particle volume fraction in the particle layer decreases below around five percent. At higher volume fractions the RTE can yield results that, depending on the particle size and refractive index, significantly depart from the exact predictions. The particle regimes which lead to dependent scattering effects, and the application of methods to correct the vector RTE for particle interaction, will be discussed.
V/STOL Tandem Fan transition section model test. [in the Lewis Research Center 10-by-10 foot wind tunnel

NASA Technical Reports Server (NTRS)

Simpkin, W. E.

1982-01-01

An approximately 0.25 scale model of the transition section of a tandem fan variable cycle engine nacelle was tested in the NASA Lewis Research Center 10-by-10 foot wind tunnel. Two 12-inch, tip-turbine driven fans were used to simulate a tandem fan engine. Three testing modes simulated a V/STOL tandem fan airplane. Parallel mode has two separate propulsion streams for maximum low speed performance. A front inlet, fan, and downward vectorable nozzle forms one stream. An auxilliary top inlet provides air to the aft fan - supplying the core engine and aft vectorable nozzle. Front nozzle and top inlet closure, and removal of a blocker door separating the two streams configures the tandem fan for series mode operations as a typical aircraft propulsion system. Transition mode operation is formed by intermediate settings of the front nozzle, blocker door, and top inlet. Emphasis was on the total pressure recovery and flow distortion at the aft fan face. A range of fan flow rates were tested at tunnel airspeeds from 0 to 240 knots, and angles-of-attack from -10 to 40 deg for all three modes. In addition to the model variables for the three modes, model variants of the top inlet were tested in the parallel mode only. These lip variables were: aft lip boundary layer bleed holes, and Three position turning vane. Also a bellmouth extension of the top inlet side lips was tested in parallel mode.
Efficient iterative methods applied to the solution of transonic flows

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wissink, A.M.; Lyrintzis, A.S.; Chronopoulos, A.T.

1996-02-01

We investigate the use of an inexact Newton`s method to solve the potential equations in the transonic regime. As a test case, we solve the two-dimensional steady transonic small disturbance equation. Approximate factorization/ADI techniques have traditionally been employed for implicit solutions of this nonlinear equation. Instead, we apply Newton`s method using an exact analytical determination of the Jacobian with preconditioned conjugate gradient-like iterative solvers for solution of the linear systems in each Newton iteration. Two iterative solvers are tested; a block s-step version of the classical Orthomin(k) algorithm called orthogonal s-step Orthomin (OSOmin) and the well-known GIVIRES method. The preconditionermore » is a vectorizable and parallelizable version of incomplete LU (ILU) factorization. Efficiency of the Newton-Iterative method on vector and parallel computer architectures is the main issue addressed. In vectorized tests on a single processor of the Cray C-90, the performance of Newton-OSOmin is superior to Newton-GMRES and a more traditional monotone AF/ADI method (MAF) for a variety of transonic Mach numbers and mesh sizes. Newton- GIVIRES is superior to MAF for some cases. The parallel performance of the Newton method is also found to be very good on multiple processors of the Cray C-90 and on the massively parallel thinking machine CM-5, where very fast execution rates (up to 9 Gflops) are found for large problems. 38 refs., 14 figs., 7 tabs.« less
Efficiently modeling neural networks on massively parallel computers

NASA Technical Reports Server (NTRS)

Farber, Robert M.

1993-01-01

Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.
Simplified production and concentration of HIV-1-based lentiviral vectors using HYPERFlask vessels and anion exchange membrane chromatography

PubMed Central

Kutner, Robert H; Puthli, Sharon; Marino, Michael P; Reiser, Jakob

2009-01-01

Background During the past twelve years, lentiviral (LV) vectors have emerged as valuable tools for transgene delivery because of their ability to transduce nondividing cells and their capacity to sustain long-term transgene expression in target cells in vitro and in vivo. However, despite significant progress, the production and concentration of high-titer, high-quality LV vector stocks is still cumbersome and costly. Methods Here we present a simplified protocol for LV vector production on a laboratory scale using HYPERFlask vessels. HYPERFlask vessels are high-yield, high-performance flasks that utilize a multilayered gas permeable growth surface for efficient gas exchange, allowing convenient production of high-titer LV vectors. For subsequent concentration of LV vector stocks produced in this way, we describe a facile protocol involving Mustang Q anion exchange membrane chromatography. Results Our results show that unconcentrated LV vector stocks with titers in excess of 108 transduction units (TU) per ml were obtained using HYPERFlasks and that these titers were higher than those produced in parallel using regular 150-cm2 tissue culture dishes. We also show that up to 500 ml of an unconcentrated LV vector stock prepared using a HYPERFlask vessel could be concentrated using a single Mustang Q Acrodisc with a membrane volume of 0.18 ml. Up to 5.3 × 1010 TU were recovered from a single HYPERFlask vessel. Conclusion The protocol described here is easy to implement and should facilitate high-titer LV vector production for preclinical studies in animal models without the need for multiple tissue culture dishes and ultracentrifugation-based concentration protocols. PMID:19220915
A concise history of central venous access.

PubMed

Beheshti, Michael V

2011-12-01

Central venous access has become a mainstay of modern interventional radiology practice. Its history has paralleled and enabled many current medical therapies. This short overview provides an interesting historical perspective of these increasingly common interventional procedures. Copyright © 2011 Elsevier Inc. All rights reserved.
Concurrent Breakpoints

DTIC Science & Technology

2011-12-18

Proceedings of the SIGMET- RICS Symposium on Parallel and Distributed Tools, pages 48–59, 1998. [8] A. Dinning and E. Schonberg . Detecting access...multi- threaded programs. ACM Trans. Comput. Syst., 15(4):391– 411, 1997. [38] E. Schonberg . On-the-fly detection of access anomalies. In Proceedings
A distributed parallel storage architecture and its potential application within EOSDIS

NASA Technical Reports Server (NTRS)

Johnston, William E.; Tierney, Brian; Feuquay, Jay; Butzer, Tony

1994-01-01

We describe the architecture, implementation, use of a scalable, high performance, distributed-parallel data storage system developed in the ARPA funded MAGIC gigabit testbed. A collection of wide area distributed disk servers operate in parallel to provide logical block level access to large data sets. Operated primarily as a network-based cache, the architecture supports cooperation among independently owned resources to provide fast, large-scale, on-demand storage to support data handling, simulation, and computation.
Constant time worker thread allocation via configuration caching

DOE Office of Scientific and Technical Information (OSTI.GOV)

Eichenberger, Alexandre E; O'Brien, John K. P.

Mechanisms are provided for allocating threads for execution of a parallel region of code. A request for allocation of worker threads to execute the parallel region of code is received from a master thread. Cached thread allocation information identifying prior thread allocations that have been performed for the master thread are accessed. Worker threads are allocated to the master thread based on the cached thread allocation information. The parallel region of code is executed using the allocated worker threads.
Flow velocity vector fields by ultrasound particle imaging velocimetry: in vitro comparison with optical flow velocimetry.

PubMed

Westerdale, John; Belohlavek, Marek; McMahon, Eileen M; Jiamsripong, Panupong; Heys, Jeffrey J; Milano, Michele

2011-02-01

We performed an in vitro study to assess the precision and accuracy of particle imaging velocimetry (PIV) data acquired using a clinically available portable ultrasound system via comparison with stereo optical PIV. The performance of ultrasound PIV was compared with optical PIV on a benchmark problem involving vortical flow with a substantial out-of-plane velocity component. Optical PIV is capable of stereo image acquisition, thus measuring out-of-plane velocity components. This allowed us to quantify the accuracy of ultrasound PIV, which is limited to in-plane acquisition. The system performance was assessed by considering the instantaneous velocity fields without extracting velocity profiles by spatial averaging. Within the 2-dimensional correlation window, using 7 time-averaged frames, the vector fields were found to have correlations of 0.867 in the direction along the ultrasound beam and 0.738 in the perpendicular direction. Out-of-plane motion of greater than 20% of the in-plane vector magnitude was found to increase the SD by 11% for the vectors parallel to the ultrasound beam direction and 8.6% for the vectors perpendicular to the beam. The results show a close correlation and agreement of individual velocity vectors generated by ultrasound PIV compared with optical PIV. Most of the measurement distortions were caused by out-of-plane velocity components.
Methods of parallel computation applied on granular simulations

NASA Astrophysics Data System (ADS)

Martins, Gustavo H. B.; Atman, Allbens P. F.

2017-06-01

Every year, parallel computing has becoming cheaper and more accessible. As consequence, applications were spreading over all research areas. Granular materials is a promising area for parallel computing. To prove this statement we study the impact of parallel computing in simulations of the BNE (Brazil Nut Effect). This property is due the remarkable arising of an intruder confined to a granular media when vertically shaken against gravity. By means of DEM (Discrete Element Methods) simulations, we study the code performance testing different methods to improve clock time. A comparison between serial and parallel algorithms, using OpenMP® is also shown. The best improvement was obtained by optimizing the function that find contacts using Verlet's cells.
Secure Access Control and Large Scale Robust Representation for Online Multimedia Event Detection

PubMed Central

Liu, Changyu; Li, Huiling

2014-01-01

We developed an online multimedia event detection (MED) system. However, there are a secure access control issue and a large scale robust representation issue when we want to integrate traditional event detection algorithms into the online environment. For the first issue, we proposed a tree proxy-based and service-oriented access control (TPSAC) model based on the traditional role based access control model. Verification experiments were conducted on the CloudSim simulation platform, and the results showed that the TPSAC model is suitable for the access control of dynamic online environments. For the second issue, inspired by the object-bank scene descriptor, we proposed a 1000-object-bank (1000OBK) event descriptor. Feature vectors of the 1000OBK were extracted from response pyramids of 1000 generic object detectors which were trained on standard annotated image datasets, such as the ImageNet dataset. A spatial bag of words tiling approach was then adopted to encode these feature vectors for bridging the gap between the objects and events. Furthermore, we performed experiments in the context of event classification on the challenging TRECVID MED 2012 dataset, and the results showed that the robust 1000OBK event descriptor outperforms the state-of-the-art approaches. PMID:25147840
Actin cytoskeleton rearrangements in Arabidopsis roots under stress and during gravitropic response

NASA Astrophysics Data System (ADS)

Pozhvanov, Gregory; Medvedev, Sergei; Suslov, Dmitry; Demidchik, Vadim

Among environmental factors, gravity vector is the only one which is constant in direction and accompanies the whole plant ontogenesis. That said, gravity vector can be considered as an essential factor for correct development of plants. Gravitropism is a plant growth response against changing its position relative to the gravity vector. It is well estableshed that gravitropism is directed by auxin redistribution across the gravistimulated organ. In addition to auxin, actin cytoskeleton was shown to be involved in gravitropism at different stages: gravity perception, signal transduction and gravitropic bending formation. However, the relationship between IAA and actin is still under discussion. In this work we studied rearrangements of actin cytoskeleton during root gravitropic response. Actin microfilaments were visualized in vivo in GFP-fABD2 transgenic Arabidopsis plants, and their angle distribution was acquired from MicroFilament Analyzer software. The curvature of actin microfilaments in root elongation zone was shown to be increased within 30-60 min of gravistimulation, the fraction of axially oriented microfilaments decreased with a concomitant increase in the fraction of oblique and transversally oriented microfilaments. In particular, the fraction of transversally oriented microfilaments (i.e. parallel to the gravity vector) increased 3-5 times. Under 10 min of sub-lethal salt stress impact, actin microfilament orientations widened from an initial axial orientation to a set of peaks at 15(°) , 45(°) and 90(°) . We conclude that the actin cytoskeleton rearrangements observed are associated with the regulation of basic mechanisms of cell extension growth by which the gravitropic bending is formed. Having common stress-related features, gravity-induced actin cytoskeleton rearrangement is slower but results in higher number of g-vector-parallel microfilaments when compared to salt stress-induced rearrangement. Also, differences in gravistimulated root growth between wild type and GFP-fABD2 plants are discussed. Project was supported by the OPTEC / Carl Zeiss Personal grant to G.P. (2012), grants of Russian Foundation for Basic Research (11-04-00701a, 14-04-01624a) and by the grant of St.-Petersburg State University (1.38.233.2014).
Spinning particles in vacuum spacetimes of different curvature types

NASA Astrophysics Data System (ADS)

Semerák, O.; Šrámek, M.

2015-09-01

We consider the motion of spinning test particles with nonzero rest mass in the "pole-dipole" approximation, as described by the Mathisson-Papapetrou-Dixon (MPD) equations, and examine its properties in dependence on the spin supplementary condition added to close the system. In order to better understand the spin-curvature interaction, the MPD equation of motion is decomposed in the orthonormal tetrad whose time vector is given by the four-velocity Vμ chosen to fix the spin condition (the "reference observer") and the first spatial vector by the corresponding spin sμ; such projections do not contain the Weyl scalars Ψ0 and Ψ4 obtained in the associated Newman-Penrose (NP) null tetrad. One natural option of how to choose the remaining two spatial basis vectors is shown to follow "intrinsically" whenever Vμ has been chosen; it is realizable if the particle's four-velocity and four-momentum are not parallel. In order to see how the problem depends on the algebraic type of curvature, one first identifies the first vector of the NP tetrad kμ with the highest-multiplicity principal null direction of the Weyl tensor, and then sets Vμ so that kμ belong to the spin-bivector eigenplane. In spacetimes of any algebraic type but III, it is known to be possible to rotate the tetrads so as to become "transverse," namely so that Ψ1 and Ψ3 vanish. If the spin-bivector eigenplane could be made to coincide with the real-vector plane of any of such transverse frames, the spinning particle motion would consequently be fully determined by Ψ2 and the cosmological constant; however, this can be managed in exceptional cases only. Besides focusing on specific Petrov types, we derive several sets of useful relations that are valid generally and check whether/how the exercise simplifies for some specific types of motion. The particular option of having four-velocity parallel to four-momentum is advocated, and a natural resolution of nonuniqueness of the corresponding reference observer Vμ is suggested.
Analysis of ground-motion simulation big data

NASA Astrophysics Data System (ADS)

Maeda, T.; Fujiwara, H.

2016-12-01

We developed a parallel distributed processing system which applies a big data analysis to the large-scale ground motion simulation data. The system uses ground-motion index values and earthquake scenario parameters as input. We used peak ground velocity value and velocity response spectra as the ground-motion index. The ground-motion index values are calculated from our simulation data. We used simulated long-period ground motion waveforms at about 80,000 meshes calculated by a three dimensional finite difference method based on 369 earthquake scenarios of a great earthquake in the Nankai Trough. These scenarios were constructed by considering the uncertainty of source model parameters such as source area, rupture starting point, asperity location, rupture velocity, fmax and slip function. We used these parameters as the earthquake scenario parameter. The system firstly carries out the clustering of the earthquake scenario in each mesh by the k-means method. The number of clusters is determined in advance using a hierarchical clustering by the Ward's method. The scenario clustering results are converted to the 1-D feature vector. The dimension of the feature vector is the number of scenario combination. If two scenarios belong to the same cluster the component of the feature vector is 1, and otherwise the component is 0. The feature vector shows a `response' of mesh to the assumed earthquake scenario group. Next, the system performs the clustering of the mesh by k-means method using the feature vector of each mesh previously obtained. Here the number of clusters is arbitrarily given. The clustering of scenarios and meshes are performed by parallel distributed processing with Hadoop and Spark, respectively. In this study, we divided the meshes into 20 clusters. The meshes in each cluster are geometrically concentrated. Thus this system can extract regions, in which the meshes have similar `response', as clusters. For each cluster, it is possible to determine particular scenario parameters which characterize the cluster. In other word, by utilizing this system, we can obtain critical scenario parameters of the ground-motion simulation for each evaluation point objectively. This research was supported by CREST, JST.

A series of vectors to construct lacZ fusions for the study of gene expression in Schizosaccharomyces pombe.

PubMed

Lafuente, M J; Petit, T; Gancedo, C

1997-12-22

We have constructed a series of plasmids to facilitate the fusion of promoters with or without coding regions of genes of Schizosaccharomyces pombe to the lacZ gene of Escherichia coli. These vectors carry a multiple cloning region in which fission yeast DNA may be inserted in three different reading frames with respect to the coding region of lacZ. The plasmids were constructed with the ura4+ or the his3+ marker of S. pombe. Functionality of the plasmids was tested measuring in parallel the expression of fructose 1,6-bisphosphatase and beta-galactosidase under the control of the fbp1+ promoter in different conditions.
Beyond core count: a look at new mainstream computing platforms for HEP workloads

NASA Astrophysics Data System (ADS)

Szostek, P.; Nowak, A.; Bitzes, G.; Valsan, L.; Jarp, S.; Dotti, A.

2014-06-01

As Moore's Law continues to deliver more and more transistors, the mainstream processor industry is preparing to expand its investments in areas other than simple core count. These new interests include deep integration of on-chip components, advanced vector units, memory, cache and interconnect technologies. We examine these moving trends with parallelized and vectorized High Energy Physics workloads in mind. In particular, we report on practical experience resulting from experiments with scalable HEP benchmarks on the Intel "Ivy Bridge-EP" and "Haswell" processor families. In addition, we examine the benefits of the new "Haswell" microarchitecture and its impact on multiple facets of HEP software. Finally, we report on the power efficiency of new systems.
Digitally programmable signal generator and method

DOEpatents

Priatko, G.J.; Kaskey, J.A.

1989-11-14

Disclosed is a digitally programmable waveform generator for generating completely arbitrary digital or analog waveforms from very low frequencies to frequencies in the gigasample per second range. A memory array with multiple parallel outputs is addressed; then the parallel output data is latched into buffer storage from which it is serially multiplexed out at a data rate many times faster than the access time of the memory array itself. While data is being multiplexed out serially, the memory array is accessed with the next required address and presents its data to the buffer storage before the serial multiplexing of the last group of data is completed, allowing this new data to then be latched into the buffer storage for smooth continuous serial data output. In a preferred implementation, a plurality of these serial data outputs are paralleled to form the input to a digital to analog converter, providing a programmable analog output. 6 figs.
Digitally programmable signal generator and method

DOEpatents

Priatko, Gordon J.; Kaskey, Jeffrey A.

1989-01-01

A digitally programmable waveform generator for generating completely arbitrary digital or analog waveforms from very low frequencies to frequencies in the gigasample per second range. A memory array with multiple parallel outputs is addressed; then the parallel output data is latched into buffer storage from which it is serially multiplexed out at a data rate many times faster than the access time of the memory array itself. While data is being multiplexed out serially, the memory array is accessed with the next required address and presents its data to the buffer storage before the serial multiplexing of the last group of data is completed, allowing this new data to then be latched into the buffer storage for smooth continuous serial data output. In a preferred implementation, a plurality of these serial data outputs are paralleled to form the input to a digital to analog converter, providing a programmable analog output.
A scalable and multi-purpose point cloud server (PCS) for easier and faster point cloud data management and processing

NASA Astrophysics Data System (ADS)

Cura, Rémi; Perret, Julien; Paparoditis, Nicolas

2017-05-01

In addition to more traditional geographical data such as images (rasters) and vectors, point cloud data are becoming increasingly available. Such data are appreciated for their precision and true three-Dimensional (3D) nature. However, managing point clouds can be difficult due to scaling problems and specificities of this data type. Several methods exist but are usually fairly specialised and solve only one aspect of the management problem. In this work, we propose a comprehensive and efficient point cloud management system based on a database server that works on groups of points (patches) rather than individual points. This system is specifically designed to cover the basic needs of point cloud users: fast loading, compressed storage, powerful patch and point filtering, easy data access and exporting, and integrated processing. Moreover, the proposed system fully integrates metadata (like sensor position) and can conjointly use point clouds with other geospatial data, such as images, vectors, topology and other point clouds. Point cloud (parallel) processing can be done in-base with fast prototyping capabilities. Lastly, the system is built on open source technologies; therefore it can be easily extended and customised. We test the proposed system with several billion points obtained from Lidar (aerial and terrestrial) and stereo-vision. We demonstrate loading speeds in the ˜50 million pts/h per process range, transparent-for-user and greater than 2 to 4:1 compression ratio, patch filtering in the 0.1 to 1 s range, and output in the 0.1 million pts/s per process range, along with classical processing methods, such as object detection.
Combining the Complete Active Space Self-Consistent Field Method and the Full Configuration Interaction Quantum Monte Carlo within a Super-CI Framework, with Application to Challenging Metal-Porphyrins.

PubMed

Li Manni, Giovanni; Smart, Simon D; Alavi, Ali

2016-03-08

A novel stochastic Complete Active Space Self-Consistent Field (CASSCF) method has been developed and implemented in the Molcas software package. A two-step procedure is used, in which the CAS configuration interaction secular equations are solved stochastically with the Full Configuration Interaction Quantum Monte Carlo (FCIQMC) approach, while orbital rotations are performed using an approximated form of the Super-CI method. This new method does not suffer from the strong combinatorial limitations of standard MCSCF implementations using direct schemes and can handle active spaces well in excess of those accessible to traditional CASSCF approaches. The density matrix formulation of the Super-CI method makes this step independent of the size of the CI expansion, depending exclusively on one- and two-body density matrices with indices restricted to the relatively small number of active orbitals. No sigma vectors need to be stored in memory for the FCIQMC eigensolver--a substantial gain in comparison to implementations using the Davidson method, which require three or more vectors of the size of the CI expansion. Further, no orbital Hessian is computed, circumventing limitations on basis set expansions. Like the parent FCIQMC method, the present technique is scalable on massively parallel architectures. We present in this report the method and its application to the free-base porphyrin, Mg(II) porphyrin, and Fe(II) porphyrin. In the present study, active spaces up to 32 electrons and 29 orbitals in orbital expansions containing up to 916 contracted functions are treated with modest computational resources. Results are quite promising even without accounting for the correlation outside the active space. The systems here presented clearly demonstrate that large CASSCF calculations are possible via FCIQMC-CASSCF without limitations on basis set size.
Best Practices for Preventing Vector-Borne Diseases in Dogs and Humans.

PubMed

Dantas-Torres, Filipe; Otranto, Domenico

2016-01-01

Vector-borne diseases constitute a diversified group of illnesses, which are caused by a multitude of pathogens transmitted by arthropod vectors, such as mosquitoes, fleas, ticks, and sand flies. Proper management of these diseases is important from both human and veterinary medicine standpoints, given that many of these pathogens are transmissible to humans and dogs, which often live in close contact. In this review, we summarize the most important vector-borne diseases of dogs and humans and the best practices for their prevention. The control of these diseases would ultimately improve animal and human health and wellbeing, particularly in developing countries in the tropics, where the risk of these diseases is high and access to health care is poor. Copyright © 2015 Elsevier Ltd. All rights reserved.
Parallel Preconditioning for CFD Problems on the CM-5

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Kremenetsky, Mark D.; Richardson, John; Lasinski, T. A. (Technical Monitor)

1994-01-01

Up to today, preconditioning methods on massively parallel systems have faced a major difficulty. The most successful preconditioning methods in terms of accelerating the convergence of the iterative solver such as incomplete LU factorizations are notoriously difficult to implement on parallel machines for two reasons: (1) the actual computation of the preconditioner is not very floating-point intensive, but requires a large amount of unstructured communication, and (2) the application of the preconditioning matrix in the iteration phase (i.e. triangular solves) are difficult to parallelize because of the recursive nature of the computation. Here we present a new approach to preconditioning for very large, sparse, unsymmetric, linear systems, which avoids both difficulties. We explicitly compute an approximate inverse to our original matrix. This new preconditioning matrix can be applied most efficiently for iterative methods on massively parallel machines, since the preconditioning phase involves only a matrix-vector multiplication, with possibly a dense matrix. Furthermore the actual computation of the preconditioning matrix has natural parallelism. For a problem of size n, the preconditioning matrix can be computed by solving n independent small least squares problems. The algorithm and its implementation on the Connection Machine CM-5 are discussed in detail and supported by extensive timings obtained from real problem data.
PlasmoGEM, a database supporting a community resource for large-scale experimental genetics in malaria parasites.

PubMed

Schwach, Frank; Bushell, Ellen; Gomes, Ana Rita; Anar, Burcu; Girling, Gareth; Herd, Colin; Rayner, Julian C; Billker, Oliver

2015-01-01

The Plasmodium Genetic Modification (PlasmoGEM) database (http://plasmogem.sanger.ac.uk) provides access to a resource of modular, versatile and adaptable vectors for genome modification of Plasmodium spp. parasites. PlasmoGEM currently consists of >2000 plasmids designed to modify the genome of Plasmodium berghei, a malaria parasite of rodents, which can be requested by non-profit research organisations free of charge. PlasmoGEM vectors are designed with long homology arms for efficient genome integration and carry gene specific barcodes to identify individual mutants. They can be used for a wide array of applications, including protein localisation, gene interaction studies and high-throughput genetic screens. The vector production pipeline is supported by a custom software suite that automates both the vector design process and quality control by full-length sequencing of the finished vectors. The PlasmoGEM web interface allows users to search a database of finished knock-out and gene tagging vectors, view details of their designs, download vector sequence in different formats and view available quality control data as well as suggested genotyping strategies. We also make gDNA library clones and intermediate vectors available for researchers to produce vectors for themselves. © The Author(s) 2014. Published by Oxford University Press on behalf of Nucleic Acids Research.
Extreme-Scale Algorithms & Software Resilience (EASIR) Architecture-Aware Algorithms for Scalable Performance and Resilience on Heterogeneous Architectures

DOE Office of Scientific and Technical Information (OSTI.GOV)

Demmel, James W.

This project addresses both communication-avoiding algorithms, and reproducible floating-point computation. Communication, i.e. moving data, either between levels of memory or processors over a network, is much more expensive per operation than arithmetic (measured in time or energy), so we seek algorithms that greatly reduce communication. We developed many new algorithms for both dense and sparse, and both direct and iterative linear algebra, attaining new communication lower bounds, and getting large speedups in many cases. We also extended this work in several ways: (1) We minimize writes separately from reads, since writes may be much more expensive than reads on emergingmore » memory technologies, like Flash, sometimes doing asymptotically fewer writes than reads. (2) We extend the lower bounds and optimal algorithms to arbitrary algorithms that may be expressed as perfectly nested loops accessing arrays, where the array subscripts may be arbitrary affine functions of the loop indices (eg A(i), B(i,j+k, k+3*m-7, …) etc.). (3) We extend our communication-avoiding approach to some machine learning algorithms, such as support vector machines. This work has won a number of awards. We also address reproducible floating-point computation. We define reproducibility to mean getting bitwise identical results from multiple runs of the same program, perhaps with different hardware resources or other changes that should ideally not change the answer. Many users depend on reproducibility for debugging or correctness. However, dynamic scheduling of parallel computing resources, combined with nonassociativity of floating point addition, makes attaining reproducibility a challenge even for simple operations like summing a vector of numbers, or more complicated operations like the Basic Linear Algebra Subprograms (BLAS). We describe an algorithm that computes a reproducible sum of floating point numbers, independent of the order of summation. The algorithm depends only on a subset of the IEEE Floating Point Standard 754-2008, uses just 6 words to represent a “reproducible accumulator,” and requires just one read-only pass over the data, or one reduction in parallel. New instructions based on this work are being considered for inclusion in the future IEEE 754-2018 floating-point standard, and new reproducible BLAS are being considered for the next version of the BLAS standard.« less
Characterizing parallel file-access patterns on a large-scale multiprocessor

NASA Technical Reports Server (NTRS)

Purakayastha, A.; Ellis, Carla; Kotz, David; Nieuwejaar, Nils; Best, Michael L.

1995-01-01

High-performance parallel file systems are needed to satisfy tremendous I/O requirements of parallel scientific applications. The design of such high-performance parallel file systems depends on a comprehensive understanding of the expected workload, but so far there have been very few usage studies of multiprocessor file systems. This paper is part of the CHARISMA project, which intends to fill this void by measuring real file-system workloads on various production parallel machines. In particular, we present results from the CM-5 at the National Center for Supercomputing Applications. Our results are unique because we collect information about nearly every individual I/O request from the mix of jobs running on the machine. Analysis of the traces leads to various recommendations for parallel file-system design.
Recognizing Activities via Bag of Words for Attribute Dynamics (Open Access)

DTIC Science & Technology

2013-10-03

56.4% 73.4% clean-jerk 83.2% 84.1% 78.2% 79.4% 85.1% 78.2% 85.4% javelin throw 61.1% 74.6% 79.5% 62.1% 87.5% 56.6% 76.7% ham. throw 65.1% 77.5% 70.5...1 ∈ Rτ is the vector of all ones. Each column of C is a basis vector of a latent subspace and the t-th column of X contains the coordinates of the yt...binary PCA is first applied to all attribute score vectors in P ′. The parame- ters of the hidden Gauss-Markov process are then learned by solving a least
Efficiency Analysis of the Parallel Implementation of the SIMPLE Algorithm on Multiprocessor Computers

NASA Astrophysics Data System (ADS)

Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.

2017-12-01

This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
A privacy-preserving parallel and homomorphic encryption scheme

NASA Astrophysics Data System (ADS)

Min, Zhaoe; Yang, Geng; Shi, Jingqi

2017-04-01

In order to protect data privacy whilst allowing efficient access to data in multi-nodes cloud environments, a parallel homomorphic encryption (PHE) scheme is proposed based on the additive homomorphism of the Paillier encryption algorithm. In this paper we propose a PHE algorithm, in which plaintext is divided into several blocks and blocks are encrypted with a parallel mode. Experiment results demonstrate that the encryption algorithm can reach a speed-up ratio at about 7.1 in the MapReduce environment with 16 cores and 4 nodes.
Parallelization of Program to Optimize Simulated Trajectories (POST3D)

NASA Technical Reports Server (NTRS)

Hammond, Dana P.; Korte, John J. (Technical Monitor)

2001-01-01

This paper describes the parallelization of the Program to Optimize Simulated Trajectories (POST3D). POST3D uses a gradient-based optimization algorithm that reaches an optimum design point by moving from one design point to the next. The gradient calculations required to complete the optimization process, dominate the computational time and have been parallelized using a Single Program Multiple Data (SPMD) on a distributed memory NUMA (non-uniform memory access) architecture. The Origin2000 was used for the tests presented.
Unstructured Adaptive Meshes: Bad for Your Memory?

NASA Technical Reports Server (NTRS)

Biswas, Rupak; Feng, Hui-Yu; VanderWijngaart, Rob

2003-01-01

This viewgraph presentation explores the need for a NASA Advanced Supercomputing (NAS) parallel benchmark for problems with irregular dynamical memory access. This benchmark is important and necessary because: 1) Problems with localized error source benefit from adaptive nonuniform meshes; 2) Certain machines perform poorly on such problems; 3) Parallel implementation may provide further performance improvement but is difficult. Some examples of problems which use irregular dynamical memory access include: 1) Heat transfer problem; 2) Heat source term; 3) Spectral element method; 4) Base functions; 5) Elemental discrete equations; 6) Global discrete equations. Nonconforming Mesh and Mortar Element Method are covered in greater detail in this presentation.
Staging memory for massively parallel processor

NASA Technical Reports Server (NTRS)

Batcher, Kenneth E. (Inventor)

1988-01-01

The invention herein relates to a computer organization capable of rapidly processing extremely large volumes of data. A staging memory is provided having a main stager portion consisting of a large number of memory banks which are accessed in parallel to receive, store, and transfer data words simultaneous with each other. Substager portions interconnect with the main stager portion to match input and output data formats with the data format of the main stager portion. An address generator is coded for accessing the data banks for receiving or transferring the appropriate words. Input and output permutation networks arrange the lineal order of data into and out of the memory banks.
Enhancing Application Performance Using Mini-Apps: Comparison of Hybrid Parallel Programming Paradigms

NASA Technical Reports Server (NTRS)

Lawson, Gary; Poteat, Michael; Sosonkina, Masha; Baurle, Robert; Hammond, Dana

2016-01-01

In this work, several mini-apps have been created to enhance a real-world application performance, namely the VULCAN code for complex flow analysis developed at the NASA Langley Research Center. These mini-apps explore hybrid parallel programming paradigms with Message Passing Interface (MPI) for distributed memory access and either Shared MPI (SMPI) or OpenMP for shared memory accesses. Performance testing shows that MPI+SMPI yields the best execution performance, while requiring the largest number of code changes. A maximum speedup of 23X was measured for MPI+SMPI, but only 10X was measured for MPI+OpenMP.
Multiradio Resource Management: Parallel Transmission for Higher Throughput?

NASA Astrophysics Data System (ADS)

Bazzi, Alessandro; Pasolini, Gianni; Andrisano, Oreste

2008-12-01

Mobile communication systems beyond the third generation will see the interconnection of heterogeneous radio access networks (UMTS, WiMax, wireless local area networks, etc.) in order to always provide the best quality of service (QoS) to users with multimode terminals. This scenario poses a number of critical issues, which have to be faced in order to get the best from the integrated access network. In this paper, we will investigate the issue of parallel transmission over multiple radio access technologies (RATs), focusing the attention on the QoS perceived by final users. We will show that the achievement of a real benefit from parallel transmission over multiple RATs is conditioned to the fulfilment of some requirements related to the kind of RATs, the multiradio resource management (MRRM) strategy, and the transport-level protocol behaviour. All these aspects will be carefully considered in our investigation, which will be carried out partly adopting an analytical approach and partly by means of simulations. In this paper, in particular, we will propose a simple but effective MRRM algorithm, whose performance will be investigated in IEEE802.11a-UMTS and IEEE802.11a-IEEE802.16e heterogeneous networks (adopted as case studies).
Spiking Neural P Systems With Rules on Synapses Working in Maximum Spiking Strategy.

PubMed

Tao Song; Linqiang Pan

2015-06-01

Spiking neural P systems (called SN P systems for short) are a class of parallel and distributed neural-like computation models inspired by the way the neurons process information and communicate with each other by means of impulses or spikes. In this work, we introduce a new variant of SN P systems, called SN P systems with rules on synapses working in maximum spiking strategy, and investigate the computation power of the systems as both number and vector generators. Specifically, we prove that i) if no limit is imposed on the number of spikes in any neuron during any computation, such systems can generate the sets of Turing computable natural numbers and the sets of vectors of positive integers computed by k-output register machine; ii) if an upper bound is imposed on the number of spikes in each neuron during any computation, such systems can characterize semi-linear sets of natural numbers as number generating devices; as vector generating devices, such systems can only characterize the family of sets of vectors computed by sequential monotonic counter machine, which is strictly included in family of semi-linear sets of vectors. This gives a positive answer to the problem formulated in Song et al., Theor. Comput. Sci., vol. 529, pp. 82-95, 2014.

Passing in Command Line Arguments and Parallel Cluster/Multicore Batching in R with batch.

PubMed

Hoffmann, Thomas J

2011-03-01

It is often useful to rerun a command line R script with some slight change in the parameters used to run it - a new set of parameters for a simulation, a different dataset to process, etc. The R package batch provides a means to pass in multiple command line options, including vectors of values in the usual R format, easily into R. The same script can be setup to run things in parallel via different command line arguments. The R package batch also provides a means to simplify this parallel batching by allowing one to use R and an R-like syntax for arguments to spread a script across a cluster or local multicore/multiprocessor computer, with automated syntax for several popular cluster types. Finally it provides a means to aggregate the results together of multiple processes run on a cluster.
Potential Application of a Graphical Processing Unit to Parallel Computations in the NUBEAM Code

NASA Astrophysics Data System (ADS)

Payne, J.; McCune, D.; Prater, R.

2010-11-01

NUBEAM is a comprehensive computational Monte Carlo based model for neutral beam injection (NBI) in tokamaks. NUBEAM computes NBI-relevant profiles in tokamak plasmas by tracking the deposition and the slowing of fast ions. At the core of NUBEAM are vector calculations used to track fast ions. These calculations have recently been parallelized to run on MPI clusters. However, cost and interlink bandwidth limit the ability to fully parallelize NUBEAM on an MPI cluster. Recent implementation of double precision capabilities for Graphical Processing Units (GPUs) presents a cost effective and high performance alternative or complement to MPI computation. Commercially available graphics cards can achieve up to 672 GFLOPS double precision and can handle hundreds of thousands of threads. The ability to execute at least one thread per particle simultaneously could significantly reduce the execution time and the statistical noise of NUBEAM. Progress on implementation on a GPU will be presented.
Optimizing Excited-State Electronic-Structure Codes for Intel Knights Landing: A Case Study on the BerkeleyGW Software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Deslippe, Jack; da Jornada, Felipe H.; Vigil-Fowler, Derek

2016-10-06

We profile and optimize calculations performed with the BerkeleyGW code on the Xeon-Phi architecture. BerkeleyGW depends both on hand-tuned critical kernels as well as on BLAS and FFT libraries. We describe the optimization process and performance improvements achieved. We discuss a layered parallelization strategy to take advantage of vector, thread and node-level parallelism. We discuss locality changes (including the consequence of the lack of L3 cache) and effective use of the on-package high-bandwidth memory. We show preliminary results on Knights-Landing including a roofline study of code performance before and after a number of optimizations. We find that the GW methodmore » is particularly well-suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-wave components, band-pairs, and frequencies.« less
Acoustooptic linear algebra processors - Architectures, algorithms, and applications

NASA Technical Reports Server (NTRS)

Casasent, D.

1984-01-01

Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.
Summary of research in applied mathematics, numerical analysis, and computer sciences

NASA Technical Reports Server (NTRS)

1986-01-01

The major categories of current ICASE research programs addressed include: numerical methods, with particular emphasis on the development and analysis of basic numerical algorithms; control and parameter identification problems, with emphasis on effective numerical methods; computational problems in engineering and physical sciences, particularly fluid dynamics, acoustics, and structural analysis; and computer systems and software, especially vector and parallel computers.
Multiple-algorithm parallel fusion of infrared polarization and intensity images based on algorithmic complementarity and synergy

NASA Astrophysics Data System (ADS)

Zhang, Lei; Yang, Fengbao; Ji, Linna; Lv, Sheng

2018-01-01

Diverse image fusion methods perform differently. Each method has advantages and disadvantages compared with others. One notion is that the advantages of different image methods can be effectively combined. A multiple-algorithm parallel fusion method based on algorithmic complementarity and synergy is proposed. First, in view of the characteristics of the different algorithms and difference-features among images, an index vector-based feature-similarity is proposed to define the degree of complementarity and synergy. This proposed index vector is a reliable evidence indicator for algorithm selection. Second, the algorithms with a high degree of complementarity and synergy are selected. Then, the different degrees of various features and infrared intensity images are used as the initial weights for the nonnegative matrix factorization (NMF). This avoids randomness of the NMF initialization parameter. Finally, the fused images of different algorithms are integrated using the NMF because of its excellent data fusing performance on independent features. Experimental results demonstrate that the visual effect and objective evaluation index of the fused images obtained using the proposed method are better than those obtained using traditional methods. The proposed method retains all the advantages that individual fusion algorithms have.
Theoretical study of the generation of terahertz radiation by the interaction of two laser beams with graphite nanoparticles

NASA Astrophysics Data System (ADS)

Sepehri Javan, N.; Rouhi Erdi, F.

2017-12-01

In this theoretical study, we investigate the generation of terahertz radiation by considering the beating of two similar Gaussian laser beams with different frequencies of ω1 and ω2 in a spatially modulated medium of graphite nanoparticles. The medium is assumed to contain spherical graphite nanoparticles of two different configurations: in the first configuration, the electric fields of the laser beams are parallel to the normal vector of the basal plane of the graphite structure, whereas in the second configuration, the electric fields are perpendicular to the normal vector of the basal plane. The interaction of the electric fields of lasers with the electronic clouds of the nanoparticles generates a ponderomotive force that in turn leads to the creation of a macroscopic electron current in the direction of laser polarizations and at the beat frequency ω1-ω2 , which can generate terahertz radiation. We show that, when the beat frequency lies near the effective plasmon frequency of the nanoparticles and the electric fields are parallel to the basal-plane normal, a resonant interaction of the laser beams causes intense terahertz radiation.
Effects of mechanostimulation on gravitropism and signal persistence in flax roots.

PubMed

John, Susan P; Hasenstein, Karl H

2011-09-01

Gravitropism describes curvature of plants in response to gravity or differential acceleration and clinorotation is commonly used to compensate unilateral effect of gravity. We report on experiments that examine the persistence of the gravity signal and separate mechanostimulation from gravistimulation. Flax roots were reoriented (placed horizontally for 5, 10 or 15 min) and clinorotated at a rate of 0.5 to 5 rpm either vertically (parallel to the gravity vector and root axis) or horizontally (perpendicular to the gravity vector and parallel to the root axis). Image sequences showed that horizontal clinorotation did not affect root growth rate (0.81 ± 0.03 mm h-1) but vertical clinorotation reduced root growth by about 7%. The angular velocity (speed of clinorotation) did not affect growth for either direction. However, maximal curvature for vertical clinorotation decreased with increasing rate of rotation and produced straight roots at 5 rpm. In contrast, horizontal clinorotation increased curvature with increasing angular velocity. The point of maximal curvature was used to determine the longevity (memory) of the gravity signal, which lasted about 120 min. The data indicate that mechanostimulation modifies the magnitude of the graviresponse but does not affect memory persistence.
Anisotropic surface-state-mediated RKKY interaction between adatoms on a hexagonal lattice

NASA Astrophysics Data System (ADS)

Patrone, Paul N.; Einstein, T. L.

2012-01-01

Motivated by recent numerical studies of Ag on Pt(111), we derive an expression for the RKKY interaction mediated by surface states, considering the effect of anisotropy in the Fermi edge. Our analysis is based on a stationary phase approximation. The main contribution to the interaction comes from electrons whose Fermi velocity vF is parallel to the vector R connecting the interacting adatoms; we show that, in general, the corresponding Fermi wave vector kF is not parallel to R. The interaction is oscillatory; the amplitude and wavelength of oscillations have angular dependence arising from the anisotropy of the surface-state band structure. The wavelength, in particular, is determined by the projection of this kF (corresponding to vF) onto the direction of R. Our analysis is easily generalized to other systems. For Ag on Pt(111), our results indicate that the RKKY interaction between pairs of adatoms should be nearly isotropic and so cannot account for the anisotropy found in the studies motivating our work. However, for metals with surface-state dispersions similar to Be(101¯0), we show that the RKKY interaction should have considerable anisotropy.
Development of iterative techniques for the solution of unsteady compressible viscous flows

NASA Technical Reports Server (NTRS)

Hixon, Duane; Sankar, L. N.

1993-01-01

During the past two decades, there has been significant progress in the field of numerical simulation of unsteady compressible viscous flows. At present, a variety of solution techniques exist such as the transonic small disturbance analyses (TSD), transonic full potential equation-based methods, unsteady Euler solvers, and unsteady Navier-Stokes solvers. These advances have been made possible by developments in three areas: (1) improved numerical algorithms; (2) automation of body-fitted grid generation schemes; and (3) advanced computer architectures with vector processing and massively parallel processing features. In this work, the GMRES scheme has been considered as a candidate for acceleration of a Newton iteration time marching scheme for unsteady 2-D and 3-D compressible viscous flow calculation; from preliminary calculations, this will provide up to a 65 percent reduction in the computer time requirements over the existing class of explicit and implicit time marching schemes. The proposed method has ben tested on structured grids, but is flexible enough for extension to unstructured grids. The described scheme has been tested only on the current generation of vector processor architecture of the Cray Y/MP class, but should be suitable for adaptation to massively parallel machines.
Advancing vector biology research: a community survey for future directions, research applications and infrastructure requirements

PubMed Central

Kohl, Alain; Pondeville, Emilie; Schnettler, Esther; Crisanti, Andrea; Supparo, Clelia; Christophides, George K.; Kersey, Paul J.; Maslen, Gareth L.; Takken, Willem; Koenraadt, Constantianus J. M.; Oliva, Clelia F.; Busquets, Núria; Abad, F. Xavier; Failloux, Anna-Bella; Levashina, Elena A.; Wilson, Anthony J.; Veronesi, Eva; Pichard, Maëlle; Arnaud Marsh, Sarah; Simard, Frédéric; Vernick, Kenneth D.

2016-01-01

Vector-borne pathogens impact public health, animal production, and animal welfare. Research on arthropod vectors such as mosquitoes, ticks, sandflies, and midges which transmit pathogens to humans and economically important animals is crucial for development of new control measures that target transmission by the vector. While insecticides are an important part of this arsenal, appearance of resistance mechanisms is increasingly common. Novel tools for genetic manipulation of vectors, use of Wolbachia endosymbiotic bacteria, and other biological control mechanisms to prevent pathogen transmission have led to promising new intervention strategies, adding to strong interest in vector biology and genetics as well as vector–pathogen interactions. Vector research is therefore at a crucial juncture, and strategic decisions on future research directions and research infrastructure investment should be informed by the research community. A survey initiated by the European Horizon 2020 INFRAVEC-2 consortium set out to canvass priorities in the vector biology research community and to determine key activities that are needed for researchers to efficiently study vectors, vector-pathogen interactions, as well as access the structures and services that allow such activities to be carried out. We summarize the most important findings of the survey which in particular reflect the priorities of researchers in European countries, and which will be of use to stakeholders that include researchers, government, and research organizations. PMID:27677378
Optical memories in digital computing

NASA Technical Reports Server (NTRS)

Alford, C. O.; Gaylord, T. K.

1979-01-01

High capacity optical memories with relatively-high data-transfer rate and multiport simultaneous access capability may serve as basis for new computer architectures. Several computer structures that might profitably use memories are: a) simultaneous record-access system, b) simultaneously-shared memory computer system, and c) parallel digital processing structure.
Research on parallel algorithm for sequential pattern mining

NASA Astrophysics Data System (ADS)

Zhou, Lijuan; Qin, Bai; Wang, Yu; Hao, Zhongxiao

2008-03-01

Sequential pattern mining is the mining of frequent sequences related to time or other orders from the sequence database. Its initial motivation is to discover the laws of customer purchasing in a time section by finding the frequent sequences. In recent years, sequential pattern mining has become an important direction of data mining, and its application field has not been confined to the business database and has extended to new data sources such as Web and advanced science fields such as DNA analysis. The data of sequential pattern mining has characteristics as follows: mass data amount and distributed storage. Most existing sequential pattern mining algorithms haven't considered the above-mentioned characteristics synthetically. According to the traits mentioned above and combining the parallel theory, this paper puts forward a new distributed parallel algorithm SPP(Sequential Pattern Parallel). The algorithm abides by the principal of pattern reduction and utilizes the divide-and-conquer strategy for parallelization. The first parallel task is to construct frequent item sets applying frequent concept and search space partition theory and the second task is to structure frequent sequences using the depth-first search method at each processor. The algorithm only needs to access the database twice and doesn't generate the candidated sequences, which abates the access time and improves the mining efficiency. Based on the random data generation procedure and different information structure designed, this paper simulated the SPP algorithm in a concrete parallel environment and implemented the AprioriAll algorithm. The experiments demonstrate that compared with AprioriAll, the SPP algorithm had excellent speedup factor and efficiency.
Automated vector selection of SIVQ and parallel computing integration MATLAB™: Innovations supporting large-scale and high-throughput image analysis studies.

PubMed

Cheng, Jerome; Hipp, Jason; Monaco, James; Lucas, David R; Madabhushi, Anant; Balis, Ulysses J

2011-01-01

Spatially invariant vector quantization (SIVQ) is a texture and color-based image matching algorithm that queries the image space through the use of ring vectors. In prior studies, the selection of one or more optimal vectors for a particular feature of interest required a manual process, with the user initially stochastically selecting candidate vectors and subsequently testing them upon other regions of the image to verify the vector's sensitivity and specificity properties (typically by reviewing a resultant heat map). In carrying out the prior efforts, the SIVQ algorithm was noted to exhibit highly scalable computational properties, where each region of analysis can take place independently of others, making a compelling case for the exploration of its deployment on high-throughput computing platforms, with the hypothesis that such an exercise will result in performance gains that scale linearly with increasing processor count. An automated process was developed for the selection of optimal ring vectors to serve as the predicate matching operator in defining histopathological features of interest. Briefly, candidate vectors were generated from every possible coordinate origin within a user-defined vector selection area (VSA) and subsequently compared against user-identified positive and negative "ground truth" regions on the same image. Each vector from the VSA was assessed for its goodness-of-fit to both the positive and negative areas via the use of the receiver operating characteristic (ROC) transfer function, with each assessment resulting in an associated area-under-the-curve (AUC) figure of merit. Use of the above-mentioned automated vector selection process was demonstrated in two cases of use: First, to identify malignant colonic epithelium, and second, to identify soft tissue sarcoma. For both examples, a very satisfactory optimized vector was identified, as defined by the AUC metric. Finally, as an additional effort directed towards attaining high-throughput capability for the SIVQ algorithm, we demonstrated the successful incorporation of it with the MATrix LABoratory (MATLAB™) application interface. The SIVQ algorithm is suitable for automated vector selection settings and high throughput computation.
Asymmetry in the Farley-Buneman dispersion relation caused by parallel electric fields

NASA Astrophysics Data System (ADS)

Forsythe, Victoriya V.; Makarevich, Roman A.

2016-11-01

An implicit assumption utilized in studies of E region plasma waves generated by the Farley-Buneman instability (FBI) is that the FBI dispersion relation and its solutions for the growth rate and phase velocity are perfectly symmetric with respect to the reversal of the wave propagation component parallel to the magnetic field. In the present study, a recently derived general dispersion relation that describes fundamental plasma instabilities in the lower ionosphere including FBI is considered and it is demonstrated that the dispersion relation is symmetric only for background electric fields that are perfectly perpendicular to the magnetic field. It is shown that parallel electric fields result in significant differences between the growth rates and phase velocities for propagation of parallel components of opposite signs. These differences are evaluated using numerical solutions of the general dispersion relation and shown to exhibit an approximately linear relationship with the parallel electric field near the E region peak altitude of 110 km. An analytic expression for the differences is also derived from an approximate version of the dispersion relation, with comparisons between numerical and analytic results agreeing near 110 km. It is further demonstrated that parallel electric fields do not change the overall symmetry when the full 3-D wave propagation vector is reversed, with no symmetry seen when either the perpendicular or parallel component is reversed. The present results indicate that moderate-to-strong parallel electric fields of 0.1-1.0 mV/m can result in experimentally measurable differences between the characteristics of plasma waves with parallel propagation components of opposite polarity.
Checkpoint-Restart in User Space

DOE Office of Scientific and Technical Information (OSTI.GOV)

CRUISE implements a user-space file system that stores data in main memory and transparently spills over to other storage, like local flash memory or the parallel file system, as needed. CRUISE also exposes file contents fo remote direct memory access, allowing external tools to copy files to the parallel file system in the background with reduced CPU interruption.
Unstructured Adaptive (UA) NAS Parallel Benchmark. Version 1.0

NASA Technical Reports Server (NTRS)

Feng, Huiyu; VanderWijngaart, Rob; Biswas, Rupak; Mavriplis, Catherine

2004-01-01

We present a complete specification of a new benchmark for measuring the performance of modern computer systems when solving scientific problems featuring irregular, dynamic memory accesses. It complements the existing NAS Parallel Benchmark suite. The benchmark involves the solution of a stylized heat transfer problem in a cubic domain, discretized on an adaptively refined, unstructured mesh.
76 FR 73513 - Fisheries of the Exclusive Economic Zone Off Alaska; Revisions to Pacific Cod Fishing in the...

Federal Register 2010, 2011, 2012, 2013, 2014

2011-11-29

... miles of shore adjacent to the Bering Sea and Aleutian Islands management area (BSAI). The affected... Cod Fishing in the Parallel Fishery in the Bering Sea and Aleutian Islands Management Area AGENCY... limits access by Federally permitted vessels to the parallel fishery for Pacific cod in three ways. First...
Implementation of High-Order Multireference Coupled-Cluster Methods on Intel Many Integrated Core Architecture.

PubMed

Aprà, E; Kowalski, K

2016-03-08

In this paper we discuss the implementation of multireference coupled-cluster formalism with singles, doubles, and noniterative triples (MRCCSD(T)), which is capable of taking advantage of the processing power of the Intel Xeon Phi coprocessor. We discuss the integration of two levels of parallelism underlying the MRCCSD(T) implementation with computational kernels designed to offload the computationally intensive parts of the MRCCSD(T) formalism to Intel Xeon Phi coprocessors. Special attention is given to the enhancement of the parallel performance by task reordering that has improved load balancing in the noniterative part of the MRCCSD(T) calculations. We also discuss aspects regarding efficient optimization and vectorization strategies.
Antiparallel spin does not always contain more information

NASA Astrophysics Data System (ADS)

Ghosh, Sibasish; Roy, Anirban; Sen, Ujjwal

2001-01-01

We show that the Bloch vectors lying on any great circle comprise the largest set SL for which the parallel states \\|n-->,n-->> can always be exactly transformed into the antiparallel states \\|n-->,-n-->>. Thus more information about n--> is not extractable from \\|n-->,-n-->> than from \\|n-->,n-->> by any measuring strategy, for n-->∈SL. Surprisingly this most general transformation reduces to just a flip operation on the second particle. We also show here that a probabilistic exact parallel to antiparallel transformation is not possible if the corresponding antiparallel states span the whole Hilbert space of the two qubits. These considerations allow us to generalize a conjecture of Gisin and Popescu [Phys. Rev. Lett. 83, 432 (1999)].

Matrix Product Operator Simulations of Quantum Algorithms

DTIC Science & Technology

2015-02-01

parallel to the Grover subspace parametrically: (Zi|φ〉)‖ = s cos γ|α〉+ s sin γ|β〉, s = √ a(k)2 (N − 1)2 + b(k)2, γ = tan −1 ( b(k)(N − 1) a(k) ) (6.32) Each...of this vector parallel to the Grover subspace in parametric form: (XiZi|φ〉)‖ = s cos(γ)|α〉+ s sin(γ)|β〉, s = 1√ N − 1 , γ = tan −1 ( cot (( k + 1 2 ) θ...quant- ph/0001106, 2000. Bibliography 146 [30] Jérémie Roland and Nicolas J Cerf. Quantum search by local adiabatic evolution. Physical Review A, 65(4
High performance compression of science data

NASA Technical Reports Server (NTRS)

Storer, James A.; Cohn, Martin

1994-01-01

Two papers make up the body of this report. One presents a single-pass adaptive vector quantization algorithm that learns a codebook of variable size and shape entries; the authors present experiments on a set of test images showing that with no training or prior knowledge of the data, for a given fidelity, the compression achieved typically equals or exceeds that of the JPEG standard. The second paper addresses motion compensation, one of the most effective techniques used in the interframe data compression. A parallel block-matching algorithm for estimating interframe displacement of blocks with minimum error is presented. The algorithm is designed for a simple parallel architecture to process video in real time.
Brief announcement: Hypergraph parititioning for parallel sparse matrix-matrix multiplication

DOE PAGES

Ballard, Grey; Druinsky, Alex; Knight, Nicholas; ...

2015-01-01

The performance of parallel algorithms for sparse matrix-matrix multiplication is typically determined by the amount of interprocessor communication performed, which in turn depends on the nonzero structure of the input matrices. In this paper, we characterize the communication cost of a sparse matrix-matrix multiplication algorithm in terms of the size of a cut of an associated hypergraph that encodes the computation for a given input nonzero structure. Obtaining an optimal algorithm corresponds to solving a hypergraph partitioning problem. Furthermore, our hypergraph model generalizes several existing models for sparse matrix-vector multiplication, and we can leverage hypergraph partitioners developed for that computationmore » to improve application-specific algorithms for multiplying sparse matrices.« less
An Efficient Wait-Free Vector

DOE PAGES

Feldman, Steven; Valera-Leon, Carlos; Dechev, Damian

2016-03-01

The vector is a fundamental data structure, which provides constant-time access to a dynamically-resizable range of elements. Currently, there exist no wait-free vectors. The only non-blocking version supports only a subset of the sequential vector API and exhibits significant synchronization overhead caused by supporting opposing operations. Since many applications operate in phases of execution, wherein each phase only a subset of operations are used, this overhead is unnecessary for the majority of the application. To address the limitations of the non-blocking version, we present a new design that is wait-free, supports more of the operations provided by the sequential vector,more » and provides alternative implementations of key operations. These alternatives allow the developer to balance the performance and functionality of the vector as requirements change throughout execution. Compared to the known non-blocking version and the concurrent vector found in Intel’s TBB library, our design outperforms or provides comparable performance in the majority of tested scenarios. Over all tested scenarios, the presented design performs an average of 4.97 times more operations per second than the non-blocking vector and 1.54 more than the TBB vector. In a scenario designed to simulate the filling of a vector, performance improvement increases to 13.38 and 1.16 times. This work presents the first ABA-free non-blocking vector. Finally, unlike the other non-blocking approach, all operations are wait-free and bounds-checked and elements are stored contiguously in memory.« less
Thermo-optic microring resonator switching elements made of dielectric-loaded plasmonic waveguides

NASA Astrophysics Data System (ADS)

Tsilipakos, Odysseas; Kriezis, Emmanouil E.; Bozhevolnyi, Sergey I.

2011-04-01

Thermo-optic switching elements made of dielectric-loaded plasmonic (DLSPP) waveguides are theoretically investigated by utilizing the three-dimensional vector finite element method. The configurations considered employ microring resonators, whose resonant frequency is varied by means of thermal tuning. First, a classic add-drop filter with parallel access waveguides is examined. Such a component features very poor drop port extinction ratio (ER). We therefore extend the analysis to add-drop filters with perpendicular access waveguides, which are found to exhibit superior drop port ERs, due to interference effects associated with the drop port transmission. In the process, the performance of a DLSPP waveguide crossing is also assessed, since it is a building block of those filters whose bus waveguides intersect. An elliptic tapering scheme is proposed for minimizing cross talk and its effect on the filter performance is explored. The dual-resonator add-drop filter with perpendicular bus waveguides and an untreated waveguide crossing of Sec. V can act as an efficient 2×2 switching element (the single-resonator variant can only act as a 1×2 switch due to structure asymmetry), possessing two equivalent input ports and featuring high ERs for both output ports over a broad wavelength range. Specifically, an extinction ratio of at least 8 dB can be attained for both output ports over a wavelength range of 3.2 nm, accommodating four 100-GHz-spaced channels. Switching times are in the order of a few microseconds, rendering the aforementioned structure capable of handling real-world routing scenarios.
Amplitude and dynamics of polarization-plane signaling in the central complex of the locust brain

PubMed Central

Bockhorst, Tobias

2015-01-01

The polarization pattern of skylight provides a compass cue that various insect species use for allocentric orientation. In the desert locust, Schistocerca gregaria, a network of neurons tuned to the electric field vector (E-vector) angle of polarized light is present in the central complex of the brain. Preferred E-vector angles vary along slices of neuropils in a compasslike fashion (polarotopy). We studied how the activity in this polarotopic population is modulated in ways suited to control compass-guided locomotion. To this end, we analyzed tuning profiles using measures of correlation between spike rate and E-vector angle and, furthermore, tested for adaptation to stationary angles. The results suggest that the polarotopy is stabilized by antagonistic integration across neurons with opponent tuning. Downstream to the input stage of the network, responses to stationary E-vector angles adapted quickly, which may correlate with a tendency to steer a steady course previously observed in tethered flying locusts. By contrast, rotating E-vectors corresponding to changes in heading direction under a natural sky elicited nonadapting responses. However, response amplitudes were particularly variable at the output stage, covarying with the level of ongoing activity. Moreover, the responses to rotating E-vector angles depended on the direction of rotation in an anticipatory manner. Our observations support a view of the central complex as a substrate of higher-stage processing that could assign contextual meaning to sensory input for motor control in goal-driven behaviors. Parallels to higher-stage processing of sensory information in vertebrates are discussed. PMID:25609107
Multiprocessing MCNP on an IBN RS/6000 cluster

DOE Office of Scientific and Technical Information (OSTI.GOV)

McKinney, G.W.; West, J.T.

1993-01-01

The advent of high-performance computer systems has brought to maturity programming concepts like vectorization, multiprocessing, and multitasking. While there are many schools of thought as to the most significant factor in obtaining order-of-magnitude increases in performance, such speedup can only be achieved by integrating the computer system and application code. Vectorization leads to faster manipulation of arrays by overlapping instruction CPU cycles. Discrete ordinates codes, which require the solving of large matrices, have proved to be major benefactors of vectorization. Monte Carlo transport, on the other hand, typically contains numerous logic statements and requires extensive redevelopment to benefit from vectorization.more » Multiprocessing and multitasking provide additional CPU cycles via multiple processors. Such systems are generally designed with either common memory access (multitasking) or distributed memory access. In both cases, theoretical speedup, as a function of the number of processors P and the fraction f of task time that multiprocesses, can be formulated using Amdahl's law: S(f, P) =1/(1-f+f/P). However, for most applications, this theoretical limit cannot be achieved because of additional terms (e.g., multitasking overhead, memory overlap, etc.) that are not included in Amdahl's law. Monte Carlo transport is a natural candidate for multiprocessing because the particle tracks are generally independent, and the precision of the result increases as the square Foot of the number of particles tracked.« less
Multiprocessing MCNP on an IBM RS/6000 cluster

DOE Office of Scientific and Technical Information (OSTI.GOV)

McKinney, G.W.; West, J.T.

1993-03-01

The advent of high-performance computer systems has brought to maturity programming concepts like vectorization, multiprocessing, and multitasking. While there are many schools of thought as to the most significant factor in obtaining order-of-magnitude increases in performance, such speedup can only be achieved by integrating the computer system and application code. Vectorization leads to faster manipulation of arrays by overlapping instruction CPU cycles. Discrete ordinates codes, which require the solving of large matrices, have proved to be major benefactors of vectorization. Monte Carlo transport, on the other hand, typically contains numerous logic statements and requires extensive redevelopment to benefit from vectorization.more » Multiprocessing and multitasking provide additional CPU cycles via multiple processors. Such systems are generally designed with either common memory access (multitasking) or distributed memory access. In both cases, theoretical speedup, as a function of the number of processors (P) and the fraction of task time that multiprocesses (f), can be formulated using Amdahl`s Law S ((f,P) = 1 f + f/P). However, for most applications this theoretical limit cannot be achieved, due to additional terms not included in Amdahl`s Law. Monte Carlo transport is a natural candidate for multiprocessing, since the particle tracks are generally independent and the precision of the result increases as the square root of the number of particles tracked.« less
Parallel seed-based approach to multiple protein structure similarities detection

DOE PAGES

Chapuis, Guillaume; Le Boudic-Jamin, Mathilde; Andonov, Rumen; ...

2015-01-01

Finding similarities between protein structures is a crucial task in molecular biology. Most of the existing tools require proteins to be aligned in order-preserving way and only find single alignments even when multiple similar regions exist. We propose a new seed-based approach that discovers multiple pairs of similar regions. Its computational complexity is polynomial and it comes with a quality guarantee—the returned alignments have both root mean squared deviations (coordinate-based as well as internal-distances based) lower than a given threshold, if such exist. We do not require the alignments to be order preserving (i.e., we consider nonsequential alignments), which makesmore » our algorithm suitable for detecting similar domains when comparing multidomain proteins as well as to detect structural repetitions within a single protein. Because the search space for nonsequential alignments is much larger than for sequential ones, the computational burden is addressed by extensive use of parallel computing techniques: a coarse-grain level parallelism making use of available CPU cores for computation and a fine-grain level parallelism exploiting bit-level concurrency as well as vector instructions.« less
SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX-80

NASA Astrophysics Data System (ADS)

Kamat, Manohar P.; Watson, Brian C.

1992-11-01

The finite element method has proven to be an invaluable tool for analysis and design of complex, high performance systems, such as bladed-disk assemblies in aircraft turbofan engines. However, as the problem size increase, the computation time required by conventional computers can be prohibitively high. Parallel processing computers provide the means to overcome these computation time limits. This report summarizes the results of a research activity aimed at providing a finite element capability for analyzing turbomachinery bladed-disk assemblies in a vector/parallel processing environment. A special purpose code, named with the acronym SAPNEW, has been developed to perform static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements. SAPNEW provides a stand alone capability for static and eigen analysis on the Alliant FX/80, a parallel processing computer. A preprocessor, named with the acronym NTOS, has been developed to accept NASTRAN input decks and convert them to the SAPNEW format to make SAPNEW more readily used by researchers at NASA Lewis Research Center.
Parallel Mutual Information Based Construction of Genome-Scale Networks on the Intel® Xeon Phi™ Coprocessor.

PubMed

Misra, Sanchit; Pamnany, Kiran; Aluru, Srinivas

2015-01-01

Construction of whole-genome networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, most cannot handle network reconstruction at the whole-genome scale, and the few that can, require large clusters. In this paper, we present a solution on the Intel Xeon Phi coprocessor, taking advantage of its multi-level parallelism including many x86-based cores, multiple threads per core, and vector processing units. We also present a solution on the Intel® Xeon® processor. Our solution is based on TINGe, a fast parallel network reconstruction technique that uses mutual information and permutation testing for assessing statistical significance. We demonstrate the first ever inference of a plant whole genome regulatory network on a single chip by constructing a 15,575 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in only 22 minutes. In addition, our optimization for parallelizing mutual information computation on the Intel Xeon Phi coprocessor holds out lessons that are applicable to other domains.
Algorithms and programming tools for image processing on the MPP, part 2

NASA Technical Reports Server (NTRS)

Reeves, Anthony P.

1986-01-01

A number of algorithms were developed for image warping and pyramid image filtering. Techniques were investigated for the parallel processing of a large number of independent irregular shaped regions on the MPP. In addition some utilities for dealing with very long vectors and for sorting were developed. Documentation pages for the algorithms which are available for distribution are given. The performance of the MPP for a number of basic data manipulations was determined. From these results it is possible to predict the efficiency of the MPP for a number of algorithms and applications. The Parallel Pascal development system, which is a portable programming environment for the MPP, was improved and better documentation including a tutorial was written. This environment allows programs for the MPP to be developed on any conventional computer system; it consists of a set of system programs and a library of general purpose Parallel Pascal functions. The algorithms were tested on the MPP and a presentation on the development system was made to the MPP users group. The UNIX version of the Parallel Pascal System was distributed to a number of new sites.
Giotto magnetic field observations at the outbound quasi-parallel bow shock of Comet Halley

NASA Technical Reports Server (NTRS)

Neubauer, F. M.; Glassmeier, K. H.; Acuna, M. H.; Mariani, F.; Musmann, G.

1990-01-01

The investigation of the outbound bow shock of Comet Halley using Giotto magnetometer data leads to the following results: the shock is characterized by strong magnetic turbulence associated with an increasing background magnetic field and a change in direction by 60 deg as one goes inward. In HSE-coordinates, the observed normal turned out to be (0.544, - 0.801, 0.249). The thickness of the quasi-parallel shock was 120,000 km. The shock is shown to be a new type of shock transition called a 'draping shock'. In a draping shock with high beta in the transonic transition region, the transonic region is characterized by strong directional variations of the magnetic field. The magnetic turbulence ahead of the shock is characterized by k-vectors parallel or antiparallel to the average field (and, therefore, also to the normal of the quasi-parallel shock) and almost isotropic magnetic turbulence in the shock transition region. A model of the draping shock is proposed which also includes a hypothetical subshock in which the supersonic-subsonic transition is accomplished.
FPGA-based protein sequence alignment : A review

NASA Astrophysics Data System (ADS)

Isa, Mohd. Nazrin Md.; Muhsen, Ku Noor Dhaniah Ku; Saiful Nurdin, Dayana; Ahmad, Muhammad Imran; Anuar Zainol Murad, Sohiful; Nizam Mohyar, Shaiful; Harun, Azizi; Hussin, Razaidi

2017-11-01

Sequence alignment have been optimized using several techniques in order to accelerate the computation time to obtain the optimal score by implementing DP-based algorithm into hardware such as FPGA-based platform. During hardware implementation, there will be performance challenges such as the frequent memory access and highly data dependent in computation process. Therefore, investigation in processing element (PE) configuration where involves more on memory access in load or access the data (substitution matrix, query sequence character) and the PE configuration time will be the main focus in this paper. There are various approaches to enhance the PE configuration performance that have been done in previous works such as by using serial configuration chain and parallel configuration chain i.e. the configuration data will be loaded into each PEs sequentially and simultaneously respectively. Some researchers have proven that the performance using parallel configuration chain has optimized both the configuration time and area.
Gregarious Data Re-structuring in a Many Core Architecture

DOE Office of Scientific and Technical Information (OSTI.GOV)

Shrestha, Sunil; Manzano Franco, Joseph B.; Marquez, Andres

this paper, we have developed a new methodology that takes in consideration the access patterns from a single parallel actor (e.g. a thread), as well as, the access patterns of “grouped” parallel actors that share a resource (e.g. a distributed Level 3 cache). We start with a hierarchical tile code for our target machine and apply a series of transformations at the tile level to improve data residence in a given memory hierarchy level. The contribution of this paper includes (a) collaborative data restructuring for group reuse and (b) low overhead transformation technique to improve access pattern and bring closelymore » connected data elements together. Preliminary results in a many core architecture, Tilera TileGX, shows promising improvements over optimized OpenMP code (up to 31% increase in GFLOPS) and over our own previous work on fine grained runtimes (up to 16%) for selected kernels« less
Transcriptome of the adult female malaria mosquito vector Anopheles albimanus.

PubMed

Martínez-Barnetche, Jesús; Gómez-Barreto, Rosa E; Ovilla-Muñoz, Marbella; Téllez-Sosa, Juan; García López, David E; Dinglasan, Rhoel R; Ubaida Mohien, Ceereena; MacCallum, Robert M; Redmond, Seth N; Gibbons, John G; Rokas, Antonis; Machado, Carlos A; Cazares-Raga, Febe E; González-Cerón, Lilia; Hernández-Martínez, Salvador; Rodríguez López, Mario H

2012-05-30

Human Malaria is transmitted by mosquitoes of the genus Anopheles. Transmission is a complex phenomenon involving biological and environmental factors of humans, parasites and mosquitoes. Among more than 500 anopheline species, only a few species from different branches of the mosquito evolutionary tree transmit malaria, suggesting that their vectorial capacity has evolved independently. Anopheles albimanus (subgenus Nyssorhynchus) is an important malaria vector in the Americas. The divergence time between Anopheles gambiae, the main malaria vector in Africa, and the Neotropical vectors has been estimated to be 100 My. To better understand the biological basis of malaria transmission and to develop novel and effective means of vector control, there is a need to explore the mosquito biology beyond the An. gambiae complex. We sequenced the transcriptome of the An. albimanus adult female. By combining Sanger, 454 and Illumina sequences from cDNA libraries derived from the midgut, cuticular fat body, dorsal vessel, salivary gland and whole body, we generated a single, high-quality assembly containing 16,669 transcripts, 92% of which mapped to the An. darlingi genome and covered 90% of the core eukaryotic genome. Bidirectional comparisons between the An. gambiae, An. darlingi and An. albimanus predicted proteomes allowed the identification of 3,772 putative orthologs. More than half of the transcripts had a match to proteins in other insect vectors and had an InterPro annotation. We identified several protein families that may be relevant to the study of Plasmodium-mosquito interaction. An open source transcript annotation browser called GDAV (Genome-Delinked Annotation Viewer) was developed to facilitate public access to the data generated by this and future transcriptome projects. We have explored the adult female transcriptome of one important New World malaria vector, An. albimanus. We identified protein-coding transcripts involved in biological processes that may be relevant to the Plasmodium lifecycle and can serve as the starting point for searching targets for novel control strategies. Our data increase the available genomic information regarding An. albimanus several hundred-fold, and will facilitate molecular research in medical entomology, evolutionary biology, genomics and proteomics of anopheline mosquito vectors. The data reported in this manuscript is accessible to the community via the VectorBase website (http://www.vectorbase.org/Other/AdditionalOrganisms/).
Accelerating finite-rate chemical kinetics with coprocessors: Comparing vectorization methods on GPUs, MICs, and CPUs

NASA Astrophysics Data System (ADS)

Stone, Christopher P.; Alferman, Andrew T.; Niemeyer, Kyle E.

2018-05-01

Accurate and efficient methods for solving stiff ordinary differential equations (ODEs) are a critical component of turbulent combustion simulations with finite-rate chemistry. The ODEs governing the chemical kinetics at each mesh point are decoupled by operator-splitting allowing each to be solved concurrently. An efficient ODE solver must then take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and a nonstiff Runge-Kutta ODE solver are both implemented using the single instruction, multiple thread (SIMT) and single instruction, multiple data (SIMD) paradigms within OpenCL. Both methods solve multiple ODEs concurrently within the same instruction stream. The performance of these parallel implementations was measured on three chemical kinetic models of increasing size across several multicore and many-core platforms. Two separate benchmarks were conducted to clearly determine any performance advantage offered by either method. The first benchmark measured the run-time of evaluating the right-hand-side source terms in parallel and the second benchmark integrated a series of constant-pressure, homogeneous reactors using the Rosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMD parallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processor performed approximately three times faster than the baseline multithreaded C++ code. The SIMT parallel model on the host and Phi was 13%-35% slower than the baseline while the SIMT model on the NVIDIA Kepler GPU provided approximately the same performance as the SIMD model on the Phi. The runtimes for both ODE solvers decreased significantly with the SIMD implementations on the host CPU (2.5-2.7 ×) and Xeon Phi coprocessor (4.7-4.9 ×) compared to the baseline parallel code. The SIMT implementations on the GPU ran 1.5-1.6 times faster than the baseline multithreaded CPU code; however, this was significantly slower than the SIMD versions on the host CPU or the Xeon Phi. The performance difference between the three platforms was attributed to thread divergence caused by the adaptive step-sizes within the ODE integrators. Analysis showed that the wider vector width of the GPU incurs a higher level of divergence than the narrower Sandy Bridge or Xeon Phi. The significant performance improvement provided by the SIMD parallel strategy motivates further research into more ODE solver methods that are both SIMD-friendly and computationally efficient.
Recent improvements in the NASA technical report server

NASA Technical Reports Server (NTRS)

Maa, Ming-Hokng; Nelson, Michael L.

1995-01-01

The NASA Technical Report Server (NTRS), a World Wide Web (WWW) report distribution service, has been modified to allow parallel database queries, significantly decreasing user access time by an average factor of 2.3, access from clients behind firewalls and/or proxies which truncate excessively long Uniform Resource Locators (URL's), access to non-Wide Area Information Server (WAIS) databases, and compatibility with the Z39-50.3 protocol.
40 CFR 257.3-8 - Safety.

Code of Federal Regulations, 2012 CFR

2012-07-01

... application and compaction of soil or other suitable material over disposed solid waste at the end of each... disease vectors' access to the waste. (7) Putrescible wastes means solid waste which contains organic...
40 CFR 257.3-8 - Safety.

Code of Federal Regulations, 2014 CFR

2014-07-01

... application and compaction of soil or other suitable material over disposed solid waste at the end of each... disease vectors' access to the waste. (7) Putrescible wastes means solid waste which contains organic...

40 CFR 257.3-8 - Safety.

Code of Federal Regulations, 2010 CFR

2010-07-01

... application and compaction of soil or other suitable material over disposed solid waste at the end of each... disease vectors' access to the waste. (7) Putrescible wastes means solid waste which contains organic...
40 CFR 257.3-8 - Safety.

Code of Federal Regulations, 2011 CFR

2011-07-01

... application and compaction of soil or other suitable material over disposed solid waste at the end of each... disease vectors' access to the waste. (7) Putrescible wastes means solid waste which contains organic...
40 CFR 257.3-8 - Safety.

Code of Federal Regulations, 2013 CFR

2013-07-01

... application and compaction of soil or other suitable material over disposed solid waste at the end of each... disease vectors' access to the waste. (7) Putrescible wastes means solid waste which contains organic...
Demand-supply gaps in human resources to combat vector-borne disease in India: capacity-building measures in medical entomology.

PubMed

Pandey, Anuja; Zodpey, Sanjay; Kumar, Raj

2015-01-01

Vector-borne diseases account for a significant proportion of the global burden of infectious disease. They are one of the greatest contributors to human mortality and morbidity in tropical settings, including India. The World Health Organization declared vector-borne diseases as theme for the year 2014, and thus called for renewed commitment to their prevention and control. Human resources are critical to support public health systems, and medical entomologists play a crucial role in public health efforts to combat vector-borne diseases. This paper aims to review the capacity-building initiatives in medical entomology in India, to understand the demand and supply of medical entomologists, and to give future direction for the initiation of need-based training in the country. A systematic, predefined approach, with three parallel strategies, was used to collect and assemble the data regarding medical entomology training in India and assess the demand-supply gap in medical entomologists in the country. The findings suggest that, considering the high burden of vector-borne diseases in the country and the growing need of health manpower specialized in medical entomology, the availability of specialized training in medical entomology is insufficient in terms of number and intake capacity. The demand analysis of medical entomologists in India suggests a wide gap in demand and supply, which needs to be addressed to cater for the burden of vector-borne diseases in the country.
Pinning, rotation, and metastability of BiFeO 3 cycloidal domains in a magnetic field

DOE PAGES

Fishman, Randy S.

2018-01-03

Earlier models for the room-temperature multiferroic BiFeO 3 implicitly assumed that a very strong anisotropy restricts the domain wave vectors q to the threefold-symmetric axis normal to the static polarization P. However, recent measurements demonstrate that the domain wave vectors q rotate within the hexagonal plane normal to P away from the magnetic field orientation m. In this paper, we show that the previously neglected threefold anisotropy K 3 restricts the wave vectors to lie along the threefold axis in zero field. Taking m to lie along a threefold axis, the domain with q parallel to m remains metastable belowmore » B c1≈7 T. Due to the pinning of domains by nonmagnetic impurities, the wave vectors of the other two domains start to rotate away from m above 5.6 T, when the component of the torque τ=M×B along P exceeds a threshold value τ pin. Since τ=0 when m⊥q, the wave vectors of those domains never become completely perpendicular to the magnetic field. Our results explain recent measurements of the critical field as a function of field orientation, small-angle neutron scattering measurements of the wave vectors, as well as spectroscopic measurements with m along a threefold axis. Finally, the model developed in this paper also explains how the three multiferroic domains of BiFeO 3 for a fixed P can be manipulated by a magnetic field.« less
Pinning, rotation, and metastability of BiFeO3 cycloidal domains in a magnetic field

NASA Astrophysics Data System (ADS)

Fishman, Randy S.

2018-01-01

Earlier models for the room-temperature multiferroic BiFeO3 implicitly assumed that a very strong anisotropy restricts the domain wave vectors q to the threefold-symmetric axis normal to the static polarization P . However, recent measurements demonstrate that the domain wave vectors q rotate within the hexagonal plane normal to P away from the magnetic field orientation m . We show that the previously neglected threefold anisotropy K3 restricts the wave vectors to lie along the threefold axis in zero field. Taking m to lie along a threefold axis, the domain with q parallel to m remains metastable below Bc 1≈7 T. Due to the pinning of domains by nonmagnetic impurities, the wave vectors of the other two domains start to rotate away from m above 5.6 T, when the component of the torque τ =M ×B along P exceeds a threshold value τpin. Since τ =0 when m ⊥q , the wave vectors of those domains never become completely perpendicular to the magnetic field. Our results explain recent measurements of the critical field as a function of field orientation, small-angle neutron scattering measurements of the wave vectors, as well as spectroscopic measurements with m along a threefold axis. The model developed in this paper also explains how the three multiferroic domains of BiFeO3 for a fixed P can be manipulated by a magnetic field.
Pinning, rotation, and metastability of BiFeO 3 cycloidal domains in a magnetic field

DOE Office of Scientific and Technical Information (OSTI.GOV)

Fishman, Randy S.

Earlier models for the room-temperature multiferroic BiFeO 3 implicitly assumed that a very strong anisotropy restricts the domain wave vectors q to the threefold-symmetric axis normal to the static polarization P. However, recent measurements demonstrate that the domain wave vectors q rotate within the hexagonal plane normal to P away from the magnetic field orientation m. In this paper, we show that the previously neglected threefold anisotropy K 3 restricts the wave vectors to lie along the threefold axis in zero field. Taking m to lie along a threefold axis, the domain with q parallel to m remains metastable belowmore » B c1≈7 T. Due to the pinning of domains by nonmagnetic impurities, the wave vectors of the other two domains start to rotate away from m above 5.6 T, when the component of the torque τ=M×B along P exceeds a threshold value τ pin. Since τ=0 when m⊥q, the wave vectors of those domains never become completely perpendicular to the magnetic field. Our results explain recent measurements of the critical field as a function of field orientation, small-angle neutron scattering measurements of the wave vectors, as well as spectroscopic measurements with m along a threefold axis. Finally, the model developed in this paper also explains how the three multiferroic domains of BiFeO 3 for a fixed P can be manipulated by a magnetic field.« less
Adsorption and dissociation of molecular oxygen on α-Pu (0 2 0) surface: A density functional study

NASA Astrophysics Data System (ADS)

Wang, Jianguang; Ray, Asok K.

2011-09-01

Molecular and dissociative oxygen adsorptions on the α-Pu (0 2 0) surface have been systematically studied using the full-potential linearized augmented-plane-wave plus local orbitals (FP-LAPW+lo) basis method and the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional. Chemisorption energies have been optimized for the distance of the admolecule from the Pu surface and the bond length of O-O atoms for four adsorption sites and three approaches of O 2 admolecule to the (0 2 0) surface. Chemisorption energies have been calculated at the scalar relativistic level with no spin-orbit coupling (NSOC) and at the fully relativistic level with spin-orbit coupling (SOC). Dissociative adsorptions are found at the two horizontal approaches (O 2 is parallel to the surface and perpendicular/parallel to a lattice vector). Hor2 (O 2 is parallel to the surface and perpendicular to a lattice vector) approach at the one-fold top site is the most stable adsorption site, with chemisorption energies of 8.048 and 8.415 eV for the NSOC and SOC cases, respectively, and an OO separation of 3.70 Å. Molecular adsorption occurs at the Vert (O 2 is vertical to the surface) approach of each adsorption site. The calculated work functions and net spin magnetic moments, respectively, increase and decrease in all cases upon chemisorption compared to the clean surface. The partial charges inside the muffin-tins, the difference charge density distributions, and the local density of states have been used to investigate the Pu-admolecule electronic structures and bonding mechanisms.
Accurate treatment of total photoabsorption cross sections by an ab initio time-dependent method

NASA Astrophysics Data System (ADS)

Daud, Mohammad Noh

2014-09-01

A detailed discussion of parallel and perpendicular transitions required for the photoabsorption of a molecule is presented within a time-dependent view. Total photoabsorption cross sections for the first two ultraviolet absorption bands of the N2O molecule corresponding to transitions from the X1 A' state to the 21 A' and 11 A'' states are calculated to test the reliability of the method. By fully considering the property of the electric field polarization vector of the incident light, the method treats the coupling of angular momentum and the parity differently for two kinds of transitions depending on the direction of the vector whether it is: (a) situated parallel in a molecular plane for an electronic transition between states with the same symmetry; (b) situated perpendicular to a molecular plane for an electronic transition between states with different symmetry. Through this, for those transitions, we are able to offer an insightful picture of the dynamics involved and to characterize some new aspects in the photoabsorption process of N2O. Our calculations predicted that the parallel transition to the 21 A' state is the major dissociation pathway which is in qualitative agreement with the experimental observations. Most importantly, a significant improvement in the absolute value of the total cross section over previous theoretical results [R. Schinke, J. Chem. Phys. 134, 064313 (2011), M.N. Daud, G.G. Balint-Kurti, A. Brown, J. Chem. Phys. 122, 054305 (2005), S. Nanbu, M.S. Johnson, J. Phys. Chem. A 108, 8905 (2004)] was obtained.
Pressure-constrained, reduced-DOF, interconnected parallel manipulators with applications to space suit design

NASA Astrophysics Data System (ADS)

Jacobs, Shane Earl

This dissertation presents the concept of a Morphing Upper Torso, an innovative pressure suit design that incorporates robotic elements to enable a resizable, highly mobile and easy to don/doff spacesuit. The torso is modeled as a system of interconnected, pressure-constrained, reduced-DOF, wire-actuated parallel manipulators, that enable the dimensions of the suit to be reconfigured to match the wearer. The kinematics, dynamics and control of wire-actuated manipulators are derived and simulated, along with the Jacobian transforms, which relate the total twist vector of the system to the vector of actuator velocities. Tools are developed that allow calculation of the workspace for both single and interconnected reduced-DOF robots of this type, using knowledge of the link lengths. The forward kinematics and statics equations are combined and solved to produce the pose of the platforms along with the link tensions. These tools allow analysis of the full Morphing Upper Torso design, in which the back hatch of a rear-entry torso is interconnected with the waist ring, helmet ring and two scye bearings. Half-scale and full-scale experimental models are used along with analytical models to examine the feasibility of this novel space suit concept. The analytical and experimental results demonstrate that the torso could be expanded to facilitate donning and doffng, and then contracted to match different wearer's body dimensions. Using the system of interconnected parallel manipulators, suit components can be accurately repositioned to different desired configurations. The demonstrated feasibility of the Morphing Upper Torso concept makes it an exciting candidate for inclusion in a future planetary suit architecture.
Virus Database and Online Inquiry System Based on Natural Vectors.

PubMed

Dong, Rui; Zheng, Hui; Tian, Kun; Yau, Shek-Chung; Mao, Weiguang; Yu, Wenping; Yin, Changchuan; Yu, Chenglong; He, Rong Lucy; Yang, Jie; Yau, Stephen St

2017-01-01

We construct a virus database called VirusDB (http://yaulab.math.tsinghua.edu.cn/VirusDB/) and an online inquiry system to serve people who are interested in viral classification and prediction. The database stores all viral genomes, their corresponding natural vectors, and the classification information of the single/multiple-segmented viral reference sequences downloaded from National Center for Biotechnology Information. The online inquiry system serves the purpose of computing natural vectors and their distances based on submitted genomes, providing an online interface for accessing and using the database for viral classification and prediction, and back-end processes for automatic and manual updating of database content to synchronize with GenBank. Submitted genomes data in FASTA format will be carried out and the prediction results with 5 closest neighbors and their classifications will be returned by email. Considering the one-to-one correspondence between sequence and natural vector, time efficiency, and high accuracy, natural vector is a significant advance compared with alignment methods, which makes VirusDB a useful database in further research.
A graphics to scalable vector graphics adaptation framework for progressive remote line rendering on mobile devices

NASA Astrophysics Data System (ADS)

Le, Minh Tuan; Nguyen, Congdu; Yoon, Dae-Il; Jung, Eun Ku; Kim, Hae-Kwang

2007-12-01

In this paper, we introduce a graphics to Scalable Vector Graphics (SVG) adaptation framework with a mechanism of vector graphics transmission to overcome the shortcoming of real-time representation and interaction experiences of 3D graphics application running on mobile devices. We therefore develop an interactive 3D visualization system based on the proposed framework for rapidly representing a 3D scene on mobile devices without having to download it from the server. Our system scenario is composed of a client viewer and a graphic to SVG adaptation server. The client viewer offers the user to access to the same 3D contents with different devices according to consumer interactions.
Road Damage Extraction from Post-Earthquake Uav Images Assisted by Vector Data

NASA Astrophysics Data System (ADS)

Chen, Z.; Dou, A.

2018-04-01

Extraction of road damage information after earthquake has been regarded as urgent mission. To collect information about stricken areas, Unmanned Aerial Vehicle can be used to obtain images rapidly. This paper put forward a novel method to detect road damage and bring forward a coefficient to assess road accessibility. With the assistance of vector road data, image data of the Jiuzhaigou Ms7.0 Earthquake is tested. In the first, the image is clipped according to vector buffer. Then a large-scale segmentation is applied to remove irrelevant objects. Thirdly, statistics of road features are analysed, and damage information is extracted. Combining with the on-filed investigation, the extraction result is effective.
Analysis of wave propagation in a two-dimensional photonic crystal with negative index of refraction: plane wave decomposition of the Bloch modes.

PubMed

Martínez, Alejandro; Míguez, Hernán; Sánchez-Dehesa, José; Martí, Javier

2005-05-30

This work presents a comprehensive analysis of electromagnetic wave propagation inside a two-dimensional photonic crystal in a spectral region in which the crystal behaves as an effective medium to which a negative effective index of refraction can be associated. It is obtained that the main plane wave component of the Bloch mode that propagates inside the photonic crystal has its wave vector k' out of the first Brillouin zone and it is parallel to the Poynting vector ( S' ? k'> 0 ), so light propagation in these composites is different from that reported for left-handed materials despite the fact that negative refraction can take place at the interface between air and both kinds of composites. However, wave coupling at the interfaces is well explained using the reduced wave vector ( k' ) in the first Brillouin zone, which is opposed to the energy flow, and agrees well with previous works dealing with negative refraction in photonic crystals.
Current harmonics elimination control method for six-phase PM synchronous motor drives.

PubMed

Yuan, Lei; Chen, Ming-liang; Shen, Jian-qing; Xiao, Fei

2015-11-01

To reduce the undesired 5th and 7th stator harmonic current in the six-phase permanent magnet synchronous motor (PMSM), an improved vector control algorithm was proposed based on vector space decomposition (VSD) transformation method, which can control the fundamental and harmonic subspace separately. To improve the traditional VSD technology, a novel synchronous rotating coordinate transformation matrix was presented in this paper, and only using the traditional PI controller in d-q subspace can meet the non-static difference adjustment, the controller parameter design method is given by employing internal model principle. Moreover, the current PI controller parallel with resonant controller is employed in x-y subspace to realize the specific 5th and 7th harmonic component compensation. In addition, a new six-phase SVPWM algorithm based on VSD transformation theory is also proposed. Simulation and experimental results verify the effectiveness of current decoupling vector controller. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Phase space analysis in anisotropic optical systems

NASA Technical Reports Server (NTRS)

Rivera, Ana Leonor; Chumakov, Sergey M.; Wolf, Kurt Bernardo

1995-01-01

From the minimal action principle follows the Hamilton equations of evolution for geometric optical rays in anisotropic media. As in classical mechanics of velocity-dependent potentials, the velocity and the canonical momentum are not parallel, but differ by an anisotropy vector potential, similar to that of linear electromagnetism. Descartes' well known diagram for refraction is generalized and a factorization theorem holds for interfaces between two anisotropic media.
Optical computation using residue arithmetic.

PubMed

Huang, A; Tsunoda, Y; Goodman, J W; Ishihara, S

1979-01-15

Using residue arithmetic it is possible to perform additions, subtractions, multiplications, and polynomial evaluation without the necessity for carry operations. Calculations can, therefore, be performed in a fully parallel manner. Several different optical methods for performing residue arithmetic operations are described. A possible combination of such methods to form a matrix vector multiplier is considered. The potential advantages of optics in performing these kinds of operations are discussed.
Geometrization of the Dirac theory of the electron

NASA Technical Reports Server (NTRS)

Fock, V.

1977-01-01

Using the concept of parallel displacement of a half vector, the Dirac equations are generally written in invariant form. The energy tensor is formed and both the macroscopic and quantum mechanic equations of motion are set up. The former have the usual form: divergence of the energy tensor equals the Lorentz force and the latter are essentially identical with those of the geodesic line.
Generalizing on Multiple Grounds: Performance Learning in Model-Based Troubleshooting

DTIC Science & Technology

1989-02-01

Aritificial Intelligence , 24, 1984. [Ble88] Guy E. Blelloch. Scan Primitives and Parallel Vector Models. PhD thesis, Artificial Intelligence Laboratory...Diagnostic reasoning based on strcture and behavior. Aritificial Intelligence , 24, 1984. [dK86] J. de Kleer. An assumption-based truth maintenance system...diagnosis. Aritificial Intelligence , 24. �. )3 94 BIBLIOGRAPHY [Ham87] Kristian J. Hammond. Learning to anticipate and avoid planning prob- lems
Milestone Completion Report WBS 1.3.5.05 ECP/VTK-m FY17Q3 [MS-17/02] Faceted Surface Normals STDA05-3.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moreland, Kenneth D.

2017-07-01

The FY17Q3 milestone of the ECP/VTK-m project includes the completion of a VTK-m filter that computes normal vectors for surfaces. Normal vectors are those that point perpendicular to the surface and are an important direction when rendering the surface. The implementation includes the parallel algorithm itself, a filter module to simplify integrating it into other software, and documentation in the VTK-m Users’ Guide. With the completion of this milestone, we are able to necessary information to rendering systems to provide appropriate shading of surfaces. This milestone also feeds into subsequent milestones that progressively improve the approximation of surface direction.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.