Exploiting Vector and Multicore Parallelsim for Recursive, Data- and Task-Parallel Programs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ren, Bin; Krishnamoorthy, Sriram; Agrawal, Kunal
Modern hardware contains parallel execution resources that are well-suited for data-parallelism-vector units-and task parallelism-multicores. However, most work on parallel scheduling focuses on one type of hardware or the other. In this work, we present a scheduling framework that allows for a unified treatment of task- and data-parallelism. Our key insight is an abstraction, task blocks, that uniformly handles data-parallel iterations and task-parallel tasks, allowing them to be scheduled on vector units or executed independently as multicores. Our framework allows us to define schedulers that can dynamically select between executing task- blocks on vector units or multicores. We show that thesemore » schedulers are asymptotically optimal, and deliver the maximum amount of parallelism available in computation trees. To evaluate our schedulers, we develop program transformations that can convert mixed data- and task-parallel pro- grams into task block-based programs. Using a prototype instantiation of our scheduling framework, we show that, on an 8-core system, we can simultaneously exploit vector and multicore parallelism to achieve 14×-108× speedup over sequential baselines.« less
Parallel-vector solution of large-scale structural analysis problems on supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.
1989-01-01
A direct linear equation solution method based on the Choleski factorization procedure is presented which exploits both parallel and vector features of supercomputers. The new equation solver is described, and its performance is evaluated by solving structural analysis problems on three high-performance computers. The method has been implemented using Force, a generic parallel FORTRAN language.
Evaluation of a new parallel numerical parameter optimization algorithm for a dynamical system
NASA Astrophysics Data System (ADS)
Duran, Ahmet; Tuncel, Mehmet
2016-10-01
It is important to have a scalable parallel numerical parameter optimization algorithm for a dynamical system used in financial applications where time limitation is crucial. We use Message Passing Interface parallel programming and present such a new parallel algorithm for parameter estimation. For example, we apply the algorithm to the asset flow differential equations that have been developed and analyzed since 1989 (see [3-6] and references contained therein). We achieved speed-up for some time series to run up to 512 cores (see [10]). Unlike [10], we consider more extensive financial market situations, for example, in presence of low volatility, high volatility and stock market price at a discount/premium to its net asset value with varying magnitude, in this work. Moreover, we evaluated the convergence of the model parameter vector, the nonlinear least squares error and maximum improvement factor to quantify the success of the optimization process depending on the number of initial parameter vectors.
Electromagnetic Physics Models for Parallel Computing Architectures
NASA Astrophysics Data System (ADS)
Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Duhem, L.; Elvira, D.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.
2016-10-01
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part of the GeantV project. Results of preliminary performance evaluation and physics validation are presented as well.
NASA Technical Reports Server (NTRS)
Nguyen, D. T.; Al-Nasra, M.; Zhang, Y.; Baddourah, M. A.; Agarwal, T. K.; Storaasli, O. O.; Carmona, E. A.
1991-01-01
Several parallel-vector computational improvements to the unconstrained optimization procedure are described which speed up the structural analysis-synthesis process. A fast parallel-vector Choleski-based equation solver, pvsolve, is incorporated into the well-known SAP-4 general-purpose finite-element code. The new code, denoted PV-SAP, is tested for static structural analysis. Initial results on a four processor CRAY 2 show that using pvsolve reduces the equation solution time by a factor of 14-16 over the original SAP-4 code. In addition, parallel-vector procedures for the Golden Block Search technique and the BFGS method are developed and tested for nonlinear unconstrained optimization. A parallel version of an iterative solver and the pvsolve direct solver are incorporated into the BFGS method. Preliminary results on nonlinear unconstrained optimization test problems, using pvsolve in the analysis, show excellent parallel-vector performance indicating that these parallel-vector algorithms can be used in a new generation of finite-element based structural design/analysis-synthesis codes.
Evaluation of Cache-based Superscalar and Cacheless Vector Architectures for Scientific Computations
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Carter, Jonathan; Shalf, John; Skinner, David; Ethier, Stephane; Biswas, Rupak; Djomehri, Jahed; VanderWijngaart, Rob
2003-01-01
The growing gap between sustained and peak performance for scientific applications has become a well-known problem in high performance computing. The recent development of parallel vector systems offers the potential to bridge this gap for a significant number of computational science codes and deliver a substantial increase in computing capabilities. This paper examines the intranode performance of the NEC SX6 vector processor and the cache-based IBM Power3/4 superscalar architectures across a number of key scientific computing areas. First, we present the performance of a microbenchmark suite that examines a full spectrum of low-level machine characteristics. Next, we study the behavior of the NAS Parallel Benchmarks using some simple optimizations. Finally, we evaluate the perfor- mance of several numerical codes from key scientific computing domains. Overall results demonstrate that the SX6 achieves high performance on a large fraction of our application suite and in many cases significantly outperforms the RISC-based architectures. However, certain classes of applications are not easily amenable to vectorization and would likely require extensive reengineering of both algorithm and implementation to utilize the SX6 effectively.
High-performance computing — an overview
NASA Astrophysics Data System (ADS)
Marksteiner, Peter
1996-08-01
An overview of high-performance computing (HPC) is given. Different types of computer architectures used in HPC are discussed: vector supercomputers, high-performance RISC processors, various parallel computers like symmetric multiprocessors, workstation clusters, massively parallel processors. Software tools and programming techniques used in HPC are reviewed: vectorizing compilers, optimization and vector tuning, optimization for RISC processors; parallel programming techniques like shared-memory parallelism, message passing and data parallelism; and numerical libraries.
Electromagnetic physics models for parallel computing architectures
Amadio, G.; Ananya, A.; Apostolakis, J.; ...
2016-11-21
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. GeantV, a next generation detector simulation, has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth and type of parallelization needed to achieve optimal performance. In this paper we describe implementation of electromagnetic physics models developed for parallel computing architectures as a part ofmore » the GeantV project. Finally, the results of preliminary performance evaluation and physics validation are presented as well.« less
Implementation of a parallel unstructured Euler solver on the CM-5
NASA Technical Reports Server (NTRS)
Morano, Eric; Mavriplis, D. J.
1995-01-01
An efficient unstructured 3D Euler solver is parallelized on a Thinking Machine Corporation Connection Machine 5, distributed memory computer with vectoring capability. In this paper, the single instruction multiple data (SIMD) strategy is employed through the use of the CM Fortran language and the CMSSL scientific library. The performance of the CMSSL mesh partitioner is evaluated and the overall efficiency of the parallel flow solver is discussed.
SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX/80
NASA Astrophysics Data System (ADS)
Kamat, Manohar P.; Watson, Brian C.
1992-02-01
The results of a research activity aimed at providing a finite element capability for analyzing turbo-machinery bladed-disk assemblies in a vector/parallel processing environment are summarized. Analysis of aircraft turbofan engines is very computationally intensive. The performance limit of modern day computers with a single processing unit was estimated at 3 billions of floating point operations per second (3 gigaflops). In view of this limit of a sequential unit, performance rates higher than 3 gigaflops can be achieved only through vectorization and/or parallelization as on Alliant FX/80. Accordingly, the efforts of this critically needed research were geared towards developing and evaluating parallel finite element methods for static and vibration analysis. A special purpose code, named with the acronym SAPNEW, performs static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements.
SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX/80
NASA Technical Reports Server (NTRS)
Kamat, Manohar P.; Watson, Brian C.
1992-01-01
The results of a research activity aimed at providing a finite element capability for analyzing turbo-machinery bladed-disk assemblies in a vector/parallel processing environment are summarized. Analysis of aircraft turbofan engines is very computationally intensive. The performance limit of modern day computers with a single processing unit was estimated at 3 billions of floating point operations per second (3 gigaflops). In view of this limit of a sequential unit, performance rates higher than 3 gigaflops can be achieved only through vectorization and/or parallelization as on Alliant FX/80. Accordingly, the efforts of this critically needed research were geared towards developing and evaluating parallel finite element methods for static and vibration analysis. A special purpose code, named with the acronym SAPNEW, performs static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements.
A new parallel-vector finite element analysis software on distributed-memory computers
NASA Technical Reports Server (NTRS)
Qin, Jiangning; Nguyen, Duc T.
1993-01-01
A new parallel-vector finite element analysis software package MPFEA (Massively Parallel-vector Finite Element Analysis) is developed for large-scale structural analysis on massively parallel computers with distributed-memory. MPFEA is designed for parallel generation and assembly of the global finite element stiffness matrices as well as parallel solution of the simultaneous linear equations, since these are often the major time-consuming parts of a finite element analysis. Block-skyline storage scheme along with vector-unrolling techniques are used to enhance the vector performance. Communications among processors are carried out concurrently with arithmetic operations to reduce the total execution time. Numerical results on the Intel iPSC/860 computers (such as the Intel Gamma with 128 processors and the Intel Touchstone Delta with 512 processors) are presented, including an aircraft structure and some very large truss structures, to demonstrate the efficiency and accuracy of MPFEA.
NASA Astrophysics Data System (ADS)
Murni, Bustamam, A.; Ernastuti, Handhika, T.; Kerami, D.
2017-07-01
Calculation of the matrix-vector multiplication in the real-world problems often involves large matrix with arbitrary size. Therefore, parallelization is needed to speed up the calculation process that usually takes a long time. Graph partitioning techniques that have been discussed in the previous studies cannot be used to complete the parallelized calculation of matrix-vector multiplication with arbitrary size. This is due to the assumption of graph partitioning techniques that can only solve the square and symmetric matrix. Hypergraph partitioning techniques will overcome the shortcomings of the graph partitioning technique. This paper addresses the efficient parallelization of matrix-vector multiplication through hypergraph partitioning techniques using CUDA GPU-based parallel computing. CUDA (compute unified device architecture) is a parallel computing platform and programming model that was created by NVIDIA and implemented by the GPU (graphics processing unit).
Automatic recognition of vector and parallel operations in a higher level language
NASA Technical Reports Server (NTRS)
Schneck, P. B.
1971-01-01
A compiler for recognizing statements of a FORTRAN program which are suited for fast execution on a parallel or pipeline machine such as Illiac-4, Star or ASC is described. The technique employs interval analysis to provide flow information to the vector/parallel recognizer. Where profitable the compiler changes scalar variables to subscripted variables. The output of the compiler is an extension to FORTRAN which shows parallel and vector operations explicitly.
Hirano, Toshiyuki; Sato, Fumitoshi
2014-07-28
We used grid-free modified Cholesky decomposition (CD) to develop a density-functional-theory (DFT)-based method for calculating the canonical molecular orbitals (CMOs) of large molecules. Our method can be used to calculate standard CMOs, analytically compute exchange-correlation terms, and maximise the capacity of next-generation supercomputers. Cholesky vectors were first analytically downscaled using low-rank pivoted CD and CD with adaptive metric (CDAM). The obtained Cholesky vectors were distributed and stored on each computer node in a parallel computer, and the Coulomb, Fock exchange, and pure exchange-correlation terms were calculated by multiplying the Cholesky vectors without evaluating molecular integrals in self-consistent field iterations. Our method enables DFT and massively distributed memory parallel computers to be used in order to very efficiently calculate the CMOs of large molecules.
Soto-Quiros, Pablo
2015-01-01
This paper presents a parallel implementation of a kind of discrete Fourier transform (DFT): the vector-valued DFT. The vector-valued DFT is a novel tool to analyze the spectra of vector-valued discrete-time signals. This parallel implementation is developed in terms of a mathematical framework with a set of block matrix operations. These block matrix operations contribute to analysis, design, and implementation of parallel algorithms in multicore processors. In this work, an implementation and experimental investigation of the mathematical framework are performed using MATLAB with the Parallel Computing Toolbox. We found that there is advantage to use multicore processors and a parallel computing environment to minimize the high execution time. Additionally, speedup increases when the number of logical processors and length of the signal increase.
Evaluating a Pivot-Based Approach for Bilingual Lexicon Extraction
Kim, Jae-Hoon; Kwon, Hong-Seok; Seo, Hyeong-Won
2015-01-01
A pivot-based approach for bilingual lexicon extraction is based on the similarity of context vectors represented by words in a pivot language like English. In this paper, in order to show validity and usability of the pivot-based approach, we evaluate the approach in company with two different methods for estimating context vectors: one estimates them from two parallel corpora based on word association between source words (resp., target words) and pivot words and the other estimates them from two parallel corpora based on word alignment tools for statistical machine translation. Empirical results on two language pairs (e.g., Korean-Spanish and Korean-French) have shown that the pivot-based approach is very promising for resource-poor languages and this approach observes its validity and usability. Furthermore, for words with low frequency, our method is also well performed. PMID:25983745
An OpenACC-Based Unified Programming Model for Multi-accelerator Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Jungwon; Lee, Seyong; Vetter, Jeffrey S
2015-01-01
This paper proposes a novel SPMD programming model of OpenACC. Our model integrates the different granularities of parallelism from vector-level parallelism to node-level parallelism into a single, unified model based on OpenACC. It allows programmers to write programs for multiple accelerators using a uniform programming model whether they are in shared or distributed memory systems. We implement a prototype of our model and evaluate its performance with a GPU-based supercomputer using three benchmark applications.
Vectorization and parallelization of the finite strip method for dynamic Mindlin plate problems
NASA Technical Reports Server (NTRS)
Chen, Hsin-Chu; He, Ai-Fang
1993-01-01
The finite strip method is a semi-analytical finite element process which allows for a discrete analysis of certain types of physical problems by discretizing the domain of the problem into finite strips. This method decomposes a single large problem into m smaller independent subproblems when m harmonic functions are employed, thus yielding natural parallelism at a very high level. In this paper we address vectorization and parallelization strategies for the dynamic analysis of simply-supported Mindlin plate bending problems and show how to prevent potential conflicts in memory access during the assemblage process. The vector and parallel implementations of this method and the performance results of a test problem under scalar, vector, and vector-concurrent execution modes on the Alliant FX/80 are also presented.
NASA Astrophysics Data System (ADS)
Sheykina, Nadiia; Bogatina, Nina
The following variants of roots location relatively to static and alternative components of magnetic field were studied. At first variant the static magnetic field was directed parallel to the gravitation vector, the alternative magnetic field was directed perpendicular to static one; roots were directed perpendicular to both two fields’ components and gravitation vector. At the variant the negative gravitropysm for cress roots was observed. At second variant the static magnetic field was directed parallel to the gravitation vector, the alternative magnetic field was directed perpendicular to static one; roots were directed parallel to alternative magnetic field. At third variant the alternative magnetic field was directed parallel to the gravitation vector, the static magnetic field was directed perpendicular to the gravitation vector, roots were directed perpendicular to both two fields components and gravitation vector; At forth variant the alternative magnetic field was directed parallel to the gravitation vector, the static magnetic field was directed perpendicular to the gravitation vector, roots were directed parallel to static magnetic field. In all cases studied the alternative magnetic field frequency was equal to Ca ions cyclotron frequency. In 2, 3 and 4 variants gravitropism was positive. But the gravitropic reaction speeds were different. In second and forth variants the gravitropic reaction speed in error limits coincided with the gravitropic reaction speed under Earth’s conditions. At third variant the gravitropic reaction speed was slowed essentially.
Vectorization for Molecular Dynamics on Intel Xeon Phi Corpocessors
NASA Astrophysics Data System (ADS)
Yi, Hongsuk
2014-03-01
Many modern processors are capable of exploiting data-level parallelism through the use of single instruction multiple data (SIMD) execution. The new Intel Xeon Phi coprocessor supports 512 bit vector registers for the high performance computing. In this paper, we have developed a hierarchical parallelization scheme for accelerated molecular dynamics simulations with the Terfoff potentials for covalent bond solid crystals on Intel Xeon Phi coprocessor systems. The scheme exploits multi-level parallelism computing. We combine thread-level parallelism using a tightly coupled thread-level and task-level parallelism with 512-bit vector register. The simulation results show that the parallel performance of SIMD implementations on Xeon Phi is apparently superior to their x86 CPU architecture.
Cache-Oblivious parallel SIMD Viterbi decoding for sequence search in HMMER.
Ferreira, Miguel; Roma, Nuno; Russo, Luis M S
2014-05-30
HMMER is a commonly used bioinformatics tool based on Hidden Markov Models (HMMs) to analyze and process biological sequences. One of its main homology engines is based on the Viterbi decoding algorithm, which was already highly parallelized and optimized using Farrar's striped processing pattern with Intel SSE2 instruction set extension. A new SIMD vectorization of the Viterbi decoding algorithm is proposed, based on an SSE2 inter-task parallelization approach similar to the DNA alignment algorithm proposed by Rognes. Besides this alternative vectorization scheme, the proposed implementation also introduces a new partitioning of the Markov model that allows a significantly more efficient exploitation of the cache locality. Such optimization, together with an improved loading of the emission scores, allows the achievement of a constant processing throughput, regardless of the innermost-cache size and of the dimension of the considered model. The proposed optimized vectorization of the Viterbi decoding algorithm was extensively evaluated and compared with the HMMER3 decoder to process DNA and protein datasets, proving to be a rather competitive alternative implementation. Being always faster than the already highly optimized ViterbiFilter implementation of HMMER3, the proposed Cache-Oblivious Parallel SIMD Viterbi (COPS) implementation provides a constant throughput and offers a processing speedup as high as two times faster, depending on the model's size.
NASA Astrophysics Data System (ADS)
Lee, Dukhyung; Kim, Dai-Sik
2016-01-01
We study light scattering off rectangular slot nano antennas on a metal film varying incident polarization and incident angle, to examine which field vector of light is more important: electric vector perpendicular to, versus magnetic vector parallel to the long axis of the rectangle. While vector Babinet’s principle would prefer magnetic field along the long axis for optimizing slot antenna function, convention and intuition most often refer to the electric field perpendicular to it. Here, we demonstrate experimentally that in accordance with vector Babinet’s principle, the incident magnetic vector parallel to the long axis is the dominant component, with the perpendicular incident electric field making a small contribution of the factor of 1/|ε|, the reciprocal of the absolute value of the dielectric constant of the metal, owing to the non-perfectness of metals at optical frequencies.
Global Magnetohydrodynamic Simulation Using High Performance FORTRAN on Parallel Computers
NASA Astrophysics Data System (ADS)
Ogino, T.
High Performance Fortran (HPF) is one of modern and common techniques to achieve high performance parallel computation. We have translated a 3-dimensional magnetohydrodynamic (MHD) simulation code of the Earth's magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A 3-dimensional global MHD simulation of the earth's magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5 VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran.
Parallel processing in finite element structural analysis
NASA Technical Reports Server (NTRS)
Noor, Ahmed K.
1987-01-01
A brief review is made of the fundamental concepts and basic issues of parallel processing. Discussion focuses on parallel numerical algorithms, performance evaluation of machines and algorithms, and parallelism in finite element computations. A computational strategy is proposed for maximizing the degree of parallelism at different levels of the finite element analysis process including: 1) formulation level (through the use of mixed finite element models); 2) analysis level (through additive decomposition of the different arrays in the governing equations into the contributions to a symmetrized response plus correction terms); 3) numerical algorithm level (through the use of operator splitting techniques and application of iterative processes); and 4) implementation level (through the effective combination of vectorization, multitasking and microtasking, whenever available).
Parallel processors and nonlinear structural dynamics algorithms and software
NASA Technical Reports Server (NTRS)
Belytschko, Ted
1990-01-01
Techniques are discussed for the implementation and improvement of vectorization and concurrency in nonlinear explicit structural finite element codes. In explicit integration methods, the computation of the element internal force vector consumes the bulk of the computer time. The program can be efficiently vectorized by subdividing the elements into blocks and executing all computations in vector mode. The structuring of elements into blocks also provides a convenient way to implement concurrency by creating tasks which can be assigned to available processors for evaluation. The techniques were implemented in a 3-D nonlinear program with one-point quadrature shell elements. Concurrency and vectorization were first implemented in a single time step version of the program. Techniques were developed to minimize processor idle time and to select the optimal vector length. A comparison of run times between the program executed in scalar, serial mode and the fully vectorized code executed concurrently using eight processors shows speed-ups of over 25. Conjugate gradient methods for solving nonlinear algebraic equations are also readily adapted to a parallel environment. A new technique for improving convergence properties of conjugate gradients in nonlinear problems is developed in conjunction with other techniques such as diagonal scaling. A significant reduction in the number of iterations required for convergence is shown for a statically loaded rigid bar suspended by three equally spaced springs.
Evaluation of the power consumption of a high-speed parallel robot
NASA Astrophysics Data System (ADS)
Han, Gang; Xie, Fugui; Liu, Xin-Jun
2018-06-01
An inverse dynamic model of a high-speed parallel robot is established based on the virtual work principle. With this dynamic model, a new evaluation method is proposed to measure the power consumption of the robot during pick-and-place tasks. The power vector is extended in this method and used to represent the collinear velocity and acceleration of the moving platform. Afterward, several dynamic performance indices, which are homogenous and possess obvious physical meanings, are proposed. These indices can evaluate the power input and output transmissibility of the robot in a workspace. The distributions of the power input and output transmissibility of the high-speed parallel robot are derived with these indices and clearly illustrated in atlases. Furtherly, a low-power-consumption workspace is selected for the robot.
Cache-Oblivious parallel SIMD Viterbi decoding for sequence search in HMMER
2014-01-01
Background HMMER is a commonly used bioinformatics tool based on Hidden Markov Models (HMMs) to analyze and process biological sequences. One of its main homology engines is based on the Viterbi decoding algorithm, which was already highly parallelized and optimized using Farrar’s striped processing pattern with Intel SSE2 instruction set extension. Results A new SIMD vectorization of the Viterbi decoding algorithm is proposed, based on an SSE2 inter-task parallelization approach similar to the DNA alignment algorithm proposed by Rognes. Besides this alternative vectorization scheme, the proposed implementation also introduces a new partitioning of the Markov model that allows a significantly more efficient exploitation of the cache locality. Such optimization, together with an improved loading of the emission scores, allows the achievement of a constant processing throughput, regardless of the innermost-cache size and of the dimension of the considered model. Conclusions The proposed optimized vectorization of the Viterbi decoding algorithm was extensively evaluated and compared with the HMMER3 decoder to process DNA and protein datasets, proving to be a rather competitive alternative implementation. Being always faster than the already highly optimized ViterbiFilter implementation of HMMER3, the proposed Cache-Oblivious Parallel SIMD Viterbi (COPS) implementation provides a constant throughput and offers a processing speedup as high as two times faster, depending on the model’s size. PMID:24884826
Parallel-vector out-of-core equation solver for computational mechanics
NASA Technical Reports Server (NTRS)
Qin, J.; Agarwal, T. K.; Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.
1993-01-01
A parallel/vector out-of-core equation solver is developed for shared-memory computers, such as the Cray Y-MP machine. The input/ output (I/O) time is reduced by using the a synchronous BUFFER IN and BUFFER OUT, which can be executed simultaneously with the CPU instructions. The parallel and vector capability provided by the supercomputers is also exploited to enhance the performance. Numerical applications in large-scale structural analysis are given to demonstrate the efficiency of the present out-of-core solver.
Optical systolic array processor using residue arithmetic
NASA Technical Reports Server (NTRS)
Jackson, J.; Casasent, D.
1983-01-01
The use of residue arithmetic to increase the accuracy and reduce the dynamic range requirements of optical matrix-vector processors is evaluated. It is determined that matrix-vector operations and iterative algorithms can be performed totally in residue notation. A new parallel residue quantizer circuit is developed which significantly improves the performance of the systolic array feedback processor. Results are presented of a computer simulation of this system used to solve a set of three simultaneous equations.
2012-05-22
tabulation of the reduced space is performed using the In Situ Adaptive Tabulation ( ISAT ) algorithm. In addition, we use x2f mpi – a Fortran library...for parallel vector-valued function evaluation (used with ISAT in this context) – to efficiently redistribute the chemistry workload among the...Constrained-Equilibrium (RCCE) method, and tabulation of the reduced space is performed using the In Situ Adaptive Tabulation ( ISAT ) algorithm. In addition
Global MHD simulation of magnetosphere using HPF
NASA Astrophysics Data System (ADS)
Ogino, T.
We have translated a 3-dimensional magnetohydrodynamic (MHD) simulation code of the Earth's magnetosphere from VPP Fortran to HPF/JA on the Fujitsu VPP5000/56 vector-parallel supercomputer and the MHD code was fully vectorized and fully parallelized in VPP Fortran. The entire performance and capability of the HPF MHD code could be shown to be almost comparable to that of VPP Fortran. A 3-dimensional global MHD simulation of the earth's magnetosphere was performed at a speed of over 400 Gflops with an efficiency of 76.5% using 56 PEs of Fujitsu VPP5000/56 in vector and parallel computation that permitted comparison with catalog values. We have concluded that fluid and MHD codes that are fully vectorized and fully parallelized in VPP Fortran can be translated with relative ease to HPF/JA, and a code in HPF/JA may be expected to perform comparably to the same code written in VPP Fortran.
Vectorized and multitasked solution of the few-group neutron diffusion equations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zee, S.K.; Turinsky, P.J.; Shayer, Z.
1989-03-01
A numerical algorithm with parallelism was used to solve the two-group, multidimensional neutron diffusion equations on computers characterized by shared memory, vector pipeline, and multi-CPU architecture features. Specifically, solutions were obtained on the Cray X/MP-48, the IBM-3090 with vector facilities, and the FPS-164. The material-centered mesh finite difference method approximation and outer-inner iteration method were employed. Parallelism was introduced in the inner iterations using the cyclic line successive overrelaxation iterative method and solving in parallel across lines. The outer iterations were completed using the Chebyshev semi-iterative method that allows parallelism to be introduced in both space and energy groups. Formore » the three-dimensional model, power, soluble boron, and transient fission product feedbacks were included. Concentrating on the pressurized water reactor (PWR), the thermal-hydraulic calculation of moderator density assumed single-phase flow and a closed flow channel, allowing parallelism to be introduced in the solution across the radial plane. Using a pinwise detail, quarter-core model of a typical PWR in cycle 1, for the two-dimensional model without feedback the measured million floating point operations per second (MFLOPS)/vector speedups were 83/11.7. 18/2.2, and 2.4/5.6 on the Cray, IBM, and FPS without multitasking, respectively. Lower performance was observed with a coarser mesh, i.e., shorter vector length, due to vector pipeline start-up. For an 18 x 18 x 30 (x-y-z) three-dimensional model with feedback of the same core, MFLOPS/vector speedups of --61/6.7 and an execution time of 0.8 CPU seconds on the Cray without multitasking were measured. Finally, using two CPUs and the vector pipelines of the Cray, a multitasking efficiency of 81% was noted for the three-dimensional model.« less
An efficient parallel algorithm for matrix-vector multiplication
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hendrickson, B.; Leland, R.; Plimpton, S.
The multiplication of a vector by a matrix is the kernel computation of many algorithms in scientific computation. A fast parallel algorithm for this calculation is therefore necessary if one is to make full use of the new generation of parallel supercomputers. This paper presents a high performance, parallel matrix-vector multiplication algorithm that is particularly well suited to hypercube multiprocessors. For an n x n matrix on p processors, the communication cost of this algorithm is O(n/[radical]p + log(p)), independent of the matrix sparsity pattern. The performance of the algorithm is demonstrated by employing it as the kernel in themore » well-known NAS conjugate gradient benchmark, where a run time of 6.09 seconds was observed. This is the best published performance on this benchmark achieved to date using a massively parallel supercomputer.« less
Gyroscope precession along bound equatorial plane orbits around a Kerr black hole
NASA Astrophysics Data System (ADS)
Bini, Donato; Geralico, Andrea; Jantzen, Robert T.
2016-09-01
The precession of a test gyroscope along stable bound equatorial plane orbits around a Kerr black hole is analyzed, and the precession angular velocity of the gyro's parallel transported spin vector and the increment in the precession angle after one orbital period is evaluated. The parallel transported Marck frame which enters this discussion is shown to have an elegant geometrical explanation in terms of the electric and magnetic parts of the Killing-Yano 2-form and a Wigner rotation effect.
Computational mechanics analysis tools for parallel-vector supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Baddourah, Majdi; Qin, Jiangning
1993-01-01
Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigensolution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization search analysis and domain decomposition. The source code for many of these algorithms is available.
NASA Astrophysics Data System (ADS)
Imamura, Seigo; Ono, Kenji; Yokokawa, Mitsuo
2016-07-01
Ensemble computing, which is an instance of capacity computing, is an effective computing scenario for exascale parallel supercomputers. In ensemble computing, there are multiple linear systems associated with a common coefficient matrix. We improve the performance of iterative solvers for multiple vectors by solving them at the same time, that is, by solving for the product of the matrices. We implemented several iterative methods and compared their performance. The maximum performance on Sparc VIIIfx was 7.6 times higher than that of a naïve implementation. Finally, to deal with the different convergence processes of linear systems, we introduced a control method to eliminate the calculation of already converged vectors.
Spatial data analytics on heterogeneous multi- and many-core parallel architectures using python
Laura, Jason R.; Rey, Sergio J.
2017-01-01
Parallel vector spatial analysis concerns the application of parallel computational methods to facilitate vector-based spatial analysis. The history of parallel computation in spatial analysis is reviewed, and this work is placed into the broader context of high-performance computing (HPC) and parallelization research. The rise of cyber infrastructure and its manifestation in spatial analysis as CyberGIScience is seen as a main driver of renewed interest in parallel computation in the spatial sciences. Key problems in spatial analysis that have been the focus of parallel computing are covered. Chief among these are spatial optimization problems, computational geometric problems including polygonization and spatial contiguity detection, the use of Monte Carlo Markov chain simulation in spatial statistics, and parallel implementations of spatial econometric methods. Future directions for research on parallelization in computational spatial analysis are outlined.
The Utility of SAR to Monitor Ocean Processes.
1981-11-01
echo received from ocean waves include the motion of the a horizontally polarized wave will have its E vector parallel to scattering surfaces, the so...radiation is defined by the direction of the electric field intensity, E, vector . For example, a horizontally polarized wave will have its E vector ...Oil Spill Off the East Coast of the United States ................ .... 55 19. L-band Parallel and Cross Polarized SAR Imagery of Ice in the Beaufort
Computational mechanics analysis tools for parallel-vector supercomputers
NASA Technical Reports Server (NTRS)
Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.; Qin, J.
1993-01-01
Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigen-solution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization algorithm and domain decomposition. The source code for many of these algorithms is available from NASA Langley.
Parallel implementation of an adaptive and parameter-free N-body integrator
NASA Astrophysics Data System (ADS)
Pruett, C. David; Ingham, William H.; Herman, Ralph D.
2011-05-01
Previously, Pruett et al. (2003) [3] described an N-body integrator of arbitrarily high order M with an asymptotic operation count of O(MN). The algorithm's structure lends itself readily to data parallelization, which we document and demonstrate here in the integration of point-mass systems subject to Newtonian gravitation. High order is shown to benefit parallel efficiency. The resulting N-body integrator is robust, parameter-free, highly accurate, and adaptive in both time-step and order. Moreover, it exhibits linear speedup on distributed parallel processors, provided that each processor is assigned at least a handful of bodies. Program summaryProgram title: PNB.f90 Catalogue identifier: AEIK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3052 No. of bytes in distributed program, including test data, etc.: 68 600 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors Operating system: Unix/Linux Has the code been vectorized or parallelized?: The code has been parallelized but has not been explicitly vectorized. RAM: Dependent upon N Classification: 4.3, 4.12, 6.5 Nature of problem: High accuracy numerical evaluation of trajectories of N point masses each subject to Newtonian gravitation. Solution method: Parallel and adaptive extrapolation in time via power series of arbitrary degree. Running time: 5.1 s for the demo program supplied with the package.
Massively Parallel Solution of Poisson Equation on Coarse Grain MIMD Architectures
NASA Technical Reports Server (NTRS)
Fijany, A.; Weinberger, D.; Roosta, R.; Gulati, S.
1998-01-01
In this paper a new algorithm, designated as Fast Invariant Imbedding algorithm, for solution of Poisson equation on vector and massively parallel MIMD architectures is presented. This algorithm achieves the same optimal computational efficiency as other Fast Poisson solvers while offering a much better structure for vector and parallel implementation. Our implementation on the Intel Delta and Paragon shows that a speedup of over two orders of magnitude can be achieved even for moderate size problems.
Sentence alignment using feed forward neural network.
Fattah, Mohamed Abdel; Ren, Fuji; Kuroiwa, Shingo
2006-12-01
Parallel corpora have become an essential resource for work in multi lingual natural language processing. However, sentence aligned parallel corpora are more efficient than non-aligned parallel corpora for cross language information retrieval and machine translation applications. In this paper, we present a new approach to align sentences in bilingual parallel corpora based on feed forward neural network classifier. A feature parameter vector is extracted from the text pair under consideration. This vector contains text features such as length, punctuate score, and cognate score values. A set of manually prepared training data has been assigned to train the feed forward neural network. Another set of data was used for testing. Using this new approach, we could achieve an error reduction of 60% over length based approach when applied on English-Arabic parallel documents. Moreover this new approach is valid for any language pair and it is quite flexible approach since the feature parameter vector may contain more/less or different features than that we used in our system such as lexical match feature.
Vectoring of parallel synthetic jets: A parametric study
NASA Astrophysics Data System (ADS)
Berk, Tim; Gomit, Guillaume; Ganapathisubramani, Bharathram
2016-11-01
The vectoring of a pair of parallel synthetic jets can be described using five dimensionless parameters: the aspect ratio of the slots, the Strouhal number, the Reynolds number, the phase difference between the jets and the spacing between the slots. In the present study, the influence of the latter four on the vectoring behaviour of the jets is examined experimentally using particle image velocimetry. Time-averaged velocity maps are used to study the variations in vectoring behaviour for a parametric sweep of each of the four parameters independently. A topological map is constructed for the full four-dimensional parameter space. The vectoring behaviour is described both qualitatively and quantitatively. A vectoring mechanism is proposed, based on measured vortex positions. We acknowledge the financial support from the European Research Council (ERC Grant Agreement No. 277472).
Vector processing efficiency of plasma MHD codes by use of the FACOM 230-75 APU
NASA Astrophysics Data System (ADS)
Matsuura, T.; Tanaka, Y.; Naraoka, K.; Takizuka, T.; Tsunematsu, T.; Tokuda, S.; Azumi, M.; Kurita, G.; Takeda, T.
1982-06-01
In the framework of pipelined vector architecture, the efficiency of vector processing is assessed with respect to plasma MHD codes in nuclear fusion research. By using a vector processor, the FACOM 230-75 APU, the limit of the enhancement factor due to parallelism of current vector machines is examined for three numerical codes based on a fluid model. Reasonable speed-up factors of approximately 6,6 and 4 times faster than the highly optimized scalar version are obtained for ERATO (linear stability code), AEOLUS-R1 (nonlinear stability code) and APOLLO (1-1/2D transport code), respectively. Problems of the pipelined vector processors are discussed from the viewpoint of restructuring, optimization and choice of algorithms. In conclusion, the important concept of "concurrency within pipelined parallelism" is emphasized.
A Performance Evaluation of the Cray X1 for Scientific Applications
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak; Borrill, Julian; Canning, Andrew; Carter, Jonathan; Djomehri, M. Jahed; Shan, Hongzhang; Skinner, David
2004-01-01
The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end capability and cost effectiveness. However, the recent development of massively parallel vector systems is having a significant effect on the supercomputing landscape. In this paper, we compare the performance of the recently released Cray X1 vector system with that of the cacheless NEC SX-6 vector machine, and the superscalar cache-based IBM Power3 and Power4 architectures for scientific applications. Overall results demonstrate that the X1 is quite promising, but performance improvements are expected as the hardware, systems software, and numerical libraries mature. Code reengineering to effectively utilize the complex architecture may also lead to significant efficiency enhancements.
2006-08-23
polarization the electric field vector is parallel to the substrate, for TM polarization the magnetic field vector is parallel to the substrate. Figure...section can be obtained for the case of the two electromagnetic field polarization vectors λ and µ describing the two photons being absorbed (of the same or... polarization effects on two-photon absorption as investigated by the technique of thermal lensing detected absorption of a mode- locked laser beam. This
A GaAs vector processor based on parallel RISC microprocessors
NASA Astrophysics Data System (ADS)
Misko, Tim A.; Rasset, Terry L.
A vector processor architecture based on the development of a 32-bit microprocessor using gallium arsenide (GaAs) technology has been developed. The McDonnell Douglas vector processor (MVP) will be fabricated completely from GaAs digital integrated circuits. The MVP architecture includes a vector memory of 1 megabyte, a parallel bus architecture with eight processing elements connected in parallel, and a control processor. The processing elements consist of a reduced instruction set CPU (RISC) with four floating-point coprocessor units and necessary memory interface functions. This architecture has been simulated for several benchmark programs including complex fast Fourier transform (FFT), complex inner product, trigonometric functions, and sort-merge routine. The results of this study indicate that the MVP can process a 1024-point complex FFT at a speed of 112 microsec (389 megaflops) while consuming approximately 618 W of power in a volume of approximately 0.1 ft-cubed.
Soares, Rodrigo Pedro; Altoé, Ellen Cristina Félix; Ennes-Vidal, Vítor; da Costa, Simone M; Rangel, Elizabeth Ferreira; de Souza, Nataly Araújo; da Silva, Vanderlei Campos; Volf, Petr; d'Avila-Levy, Claudia Masini
2017-07-01
Leishmania braziliensis and Leishmania infantum are the causative agents of cutaneous and visceral leishmaniasis, respectively. Several aspects of the vector-parasite interaction involving gp63 and phosphoglycans have been individually assayed in different studies. However, their role under the same experimental conditions was not studied yet. Here, the roles of divalent metal chelators, anti-gp63 antibodies and purified type I phosphoglycans (PGs) were evaluated during in vitro parasite attachment to the midgut of the vector. Parasites were treated with divalent metal chelators or anti-gp63 antibodies prior to the interaction with Lutzomyia longipalpis/Lutzomyia intermedia midguts or sand fly LL-5 cells. In vitro binding system was used to examine the role of PG and gp63 in parallel. Treatment with divalent metal chelators reduced Le. infantum adhesion to the Lu. longipalpis midguts. The most effective compound (Phen) inhibited the binding in both vectors. Similar results were observed in the interaction between both Leishmania species and the cell line LL-5. Finally, parallel experiments using anti-gp63-treated parasites and PG-incubated midguts demonstrated that both approaches substantially inhibited attachment in the natural parasite-vector pairs Le. infantum/Lu. longipalpis and Le. braziliensis/Lu. intermedia. Our results suggest that gp63 and/or PG are involved in parasite attachment to the midgut of these important vectors. Copyright © 2017 Elsevier GmbH. All rights reserved.
Three-dimensional Hybrid Simulation Study of Anisotropic Turbulence in the Proton Kinetic Regime
NASA Astrophysics Data System (ADS)
Vasquez, Bernard J.; Markovskii, Sergei A.; Chandran, Benjamin D. G.
2014-06-01
Three-dimensional numerical hybrid simulations with particle protons and quasi-neutralizing fluid electrons are conducted for a freely decaying turbulence that is anisotropic with respect to the background magnetic field. The turbulence evolution is determined by both the combined root-mean-square (rms) amplitude for fluctuating proton bulk velocity and magnetic field and by the ratio of perpendicular to parallel wavenumbers. This kind of relationship had been considered in the past with regard to interplanetary turbulence. The fluctuations nonlinearly evolve to a turbulent phase whose net wave vector anisotropy is usually more perpendicular than the initial one, irrespective of the initial ratio of perpendicular to parallel wavenumbers. Self-similar anisotropy evolution is found as a function of the rms amplitude and parallel wavenumber. Proton heating rates in the turbulent phase vary strongly with the rms amplitude but only weakly with the initial wave vector anisotropy. Even in the limit where wave vectors are confined to the plane perpendicular to the background magnetic field, the heating rate remains close to the corresponding case with finite parallel wave vector components. Simulation results obtained as a function of proton plasma to background magnetic pressure ratio β p in the range 0.1-0.5 show that the wave vector anisotropy also weakly depends on β p .
Vectoring of parallel synthetic jets
NASA Astrophysics Data System (ADS)
Berk, Tim; Ganapathisubramani, Bharathram; Gomit, Guillaume
2015-11-01
A pair of parallel synthetic jets can be vectored by applying a phase difference between the two driving signals. The resulting jet can be merged or bifurcated and either vectored towards the actuator leading in phase or the actuator lagging in phase. In the present study, the influence of phase difference and Strouhal number on the vectoring behaviour is examined experimentally. Phase-locked vorticity fields, measured using Particle Image Velocimetry (PIV), are used to track vortex pairs. The physical mechanisms that explain the diversity in vectoring behaviour are observed based on the vortex trajectories. For a fixed phase difference, the vectoring behaviour is shown to be primarily influenced by pinch-off time of vortex rings generated by the synthetic jets. Beyond a certain formation number, the pinch-off timescale becomes invariant. In this region, the vectoring behaviour is determined by the distance between subsequent vortex rings. We acknowledge the financial support from the European Research Council (ERC grant agreement no. 277472).
Parallel/Vector Integration Methods for Dynamical Astronomy
NASA Astrophysics Data System (ADS)
Fukushima, T.
Progress of parallel/vector computers has driven us to develop suitable numerical integrators utilizing their computational power to the full extent while being independent on the size of system to be integrated. Unfortunately, the parallel version of Runge-Kutta type integrators are known to be not so efficient. Recently we developed a parallel version of the extrapolation method (Ito and Fukushima 1997), which allows variable timesteps and still gives an acceleration factor of 3-4 for general problems. While the vector-mode usage of Picard-Chebyshev method (Fukushima 1997a, 1997b) will lead the acceleration factor of order of 1000 for smooth problems such as planetary/satellites orbit integration. The success of multiple-correction PECE mode of time-symmetric implicit Hermitian integrator (Kokubo 1998) seems to enlighten Milankar's so-called "pipelined predictor corrector method", which is expected to lead an acceleration factor of 3-4. We will review these directions and discuss future prospects.
Automated Vectorization of Decision-Based Algorithms
NASA Technical Reports Server (NTRS)
James, Mark
2006-01-01
Virtually all existing vectorization algorithms are designed to only analyze the numeric properties of an algorithm and distribute those elements across multiple processors. This advances the state of the practice because it is the only known system, at the time of this reporting, that takes high-level statements and analyzes them for their decision properties and converts them to a form that allows them to automatically be executed in parallel. The software takes a high-level source program that describes a complex decision- based condition and rewrites it as a disjunctive set of component Boolean relations that can then be executed in parallel. This is important because parallel architectures are becoming more commonplace in conventional systems and they have always been present in NASA flight systems. This technology allows one to take existing condition-based code and automatically vectorize it so it naturally decomposes across parallel architectures.
Generic accelerated sequence alignment in SeqAn using vectorization and multi-threading.
Rahn, René; Budach, Stefan; Costanza, Pascal; Ehrhardt, Marcel; Hancox, Jonny; Reinert, Knut
2018-05-03
Pairwise sequence alignment is undoubtedly a central tool in many bioinformatics analyses. In this paper, we present a generically accelerated module for pairwise sequence alignments applicable for a broad range of applications. In our module, we unified the standard dynamic programming kernel used for pairwise sequence alignments and extended it with a generalized inter-sequence vectorization layout, such that many alignments can be computed simultaneously by exploiting SIMD (Single Instruction Multiple Data) instructions of modern processors. We then extended the module by adding two layers of thread-level parallelization, where we a) distribute many independent alignments on multiple threads and b) inherently parallelize a single alignment computation using a work stealing approach producing a dynamic wavefront progressing along the minor diagonal. We evaluated our alignment vectorization and parallelization on different processors, including the newest Intel® Xeon® (Skylake) and Intel® Xeon Phi™ (KNL) processors, and use cases. The instruction set AVX512-BW (Byte and Word), available on Skylake processors, can genuinely improve the performance of vectorized alignments. We could run single alignments 1600 times faster on the Xeon Phi™ and 1400 times faster on the Xeon® than executing them with our previous sequential alignment module. The module is programmed in C++ using the SeqAn (Reinert et al., 2017) library and distributed with version 2.4. under the BSD license. We support SSE4, AVX2, AVX512 instructions and included UME::SIMD, a SIMD-instruction wrapper library, to extend our module for further instruction sets. We thoroughly test all alignment components with all major C++ compilers on various platforms. rene.rahn@fu-berlin.de.
Lu, Zhao; Sun, Jing; Butts, Kenneth
2014-05-01
Support vector regression for approximating nonlinear dynamic systems is more delicate than the approximation of indicator functions in support vector classification, particularly for systems that involve multitudes of time scales in their sampled data. The kernel used for support vector learning determines the class of functions from which a support vector machine can draw its solution, and the choice of kernel significantly influences the performance of a support vector machine. In this paper, to bridge the gap between wavelet multiresolution analysis and kernel learning, the closed-form orthogonal wavelet is exploited to construct new multiscale asymmetric orthogonal wavelet kernels for linear programming support vector learning. The closed-form multiscale orthogonal wavelet kernel provides a systematic framework to implement multiscale kernel learning via dyadic dilations and also enables us to represent complex nonlinear dynamics effectively. To demonstrate the superiority of the proposed multiscale wavelet kernel in identifying complex nonlinear dynamic systems, two case studies are presented that aim at building parallel models on benchmark datasets. The development of parallel models that address the long-term/mid-term prediction issue is more intricate and challenging than the identification of series-parallel models where only one-step ahead prediction is required. Simulation results illustrate the effectiveness of the proposed multiscale kernel learning.
Associative Pattern Recognition In Analog VLSI Circuits
NASA Technical Reports Server (NTRS)
Tawel, Raoul
1995-01-01
Winner-take-all circuit selects best-match stored pattern. Prototype cascadable very-large-scale integrated (VLSI) circuit chips built and tested to demonstrate concept of electronic associative pattern recognition. Based on low-power, sub-threshold analog complementary oxide/semiconductor (CMOS) VLSI circuitry, each chip can store 128 sets (vectors) of 16 analog values (vector components), vectors representing known patterns as diverse as spectra, histograms, graphs, or brightnesses of pixels in images. Chips exploit parallel nature of vector quantization architecture to implement highly parallel processing in relatively simple computational cells. Through collective action, cells classify input pattern in fraction of microsecond while consuming power of few microwatts.
Compute Server Performance Results
NASA Technical Reports Server (NTRS)
Stockdale, I. E.; Barton, John; Woodrow, Thomas (Technical Monitor)
1994-01-01
Parallel-vector supercomputers have been the workhorses of high performance computing. As expectations of future computing needs have risen faster than projected vector supercomputer performance, much work has been done investigating the feasibility of using Massively Parallel Processor systems as supercomputers. An even more recent development is the availability of high performance workstations which have the potential, when clustered together, to replace parallel-vector systems. We present a systematic comparison of floating point performance and price-performance for various compute server systems. A suite of highly vectorized programs was run on systems including traditional vector systems such as the Cray C90, and RISC workstations such as the IBM RS/6000 590 and the SGI R8000. The C90 system delivers 460 million floating point operations per second (FLOPS), the highest single processor rate of any vendor. However, if the price-performance ration (PPR) is considered to be most important, then the IBM and SGI processors are superior to the C90 processors. Even without code tuning, the IBM and SGI PPR's of 260 and 220 FLOPS per dollar exceed the C90 PPR of 160 FLOPS per dollar when running our highly vectorized suite,
Parallel/Vector Integration Methods for Dynamical Astronomy
NASA Astrophysics Data System (ADS)
Fukushima, Toshio
1999-01-01
This paper reviews three recent works on the numerical methods to integrate ordinary differential equations (ODE), which are specially designed for parallel, vector, and/or multi-processor-unit(PU) computers. The first is the Picard-Chebyshev method (Fukushima, 1997a). It obtains a global solution of ODE in the form of Chebyshev polynomial of large (> 1000) degree by applying the Picard iteration repeatedly. The iteration converges for smooth problems and/or perturbed dynamics. The method runs around 100-1000 times faster in the vector mode than in the scalar mode of a certain computer with vector processors (Fukushima, 1997b). The second is a parallelization of a symplectic integrator (Saha et al., 1997). It regards the implicit midpoint rules covering thousands of timesteps as large-scale nonlinear equations and solves them by the fixed-point iteration. The method is applicable to Hamiltonian systems and is expected to lead an acceleration factor of around 50 in parallel computers with more than 1000 PUs. The last is a parallelization of the extrapolation method (Ito and Fukushima, 1997). It performs trial integrations in parallel. Also the trial integrations are further accelerated by balancing computational load among PUs by the technique of folding. The method is all-purpose and achieves an acceleration factor of around 3.5 by using several PUs. Finally, we give a perspective on the parallelization of some implicit integrators which require multiple corrections in solving implicit formulas like the implicit Hermitian integrators (Makino and Aarseth, 1992), (Hut et al., 1995) or the implicit symmetric multistep methods (Fukushima, 1998), (Fukushima, 1999).
DOE Office of Scientific and Technical Information (OSTI.GOV)
Naito, O.
2015-08-15
An analytic formula has been derived for the relativistic incoherent Thomson backscattering spectrum for a drifting anisotropic plasma when the scattering vector is parallel to the drifting direction. The shape of the scattering spectrum is insensitive to the electron temperature perpendicular to the scattering vector, but its amplitude may be modulated. As a result, while the measured temperature correctly represents the electron distribution parallel to the scattering vector, the electron density may be underestimated when the perpendicular temperature is higher than the parallel temperature. Since the scattering spectrum in shorter wavelengths is greatly enhanced by the existence of drift, themore » diagnostics might be used to measure local electron current density in fusion plasmas.« less
A complete set of two-dimensional harmonic vortices on a spherical surface
NASA Astrophysics Data System (ADS)
Esparza, Christian; Rendón, Pablo Luis; Ley Koo, Eugenio
2018-03-01
The solutions of the Euler equations on a spherical surface are constructed, starting from a vector velocity potential A in the radial direction and with a two-dimensional spherical harmonic variation of order m and well-defined parity under \\varphi \\mapsto -\\varphi . The solutions are well-behaved on the entire surface and continuous at the position of a parallel circle θ ={θ }0, where the vorticity is shown to be harmonically distributed. The velocity field is evaluated as the curl of the vector potential: it is shown that the velocity is divergenceless and distributed on the spherical surface. Its polar components at the parallel circle are shown to be continuous, confirming its divergenceless nature, while its azimuthal components are discontinuous at the circle, and their discontinuity is a measure of the vorticity in the radial direction. A closed form for the velocity field lines is also obtained in terms of fixed values of the scalar harmonic function associated with the vector potential. Additionally, the connections of the solutions on a spherical surface with their circular, elliptic and bipolar counterparts on the equatorial plane are implemented via stereographic projections.
Reduced Order Model Basis Vector Generation: Generates Basis Vectors fro ROMs
DOE Office of Scientific and Technical Information (OSTI.GOV)
Arrighi, Bill
2016-03-03
libROM is a library that implements order reduction via singular value decomposition (SVD) of sampled state vectors. It implements 2 parallel, incremental SVD algorithms and one serial, non-incremental algorithm. It also provides a mechanism for adaptive sampling of basis vectors.
He, Xianzhi; Zhang, Lei; Liu, Pengchong; Liu, Li; Deng, Hui; Huang, Jinhai
2015-03-01
Staphylococcal enterotoxins (SEs) produced by Staphylococcus aureus have increasingly given rise to human health and food safety. Genetically engineered small molecular antibody is a useful tool in immuno-detection and treatment for clinical illness caused by SEs. In this study, we constructed the V(L)-V(H) tail-parallel genetically engineered antibody against SEs by using the repertoire of rearranged germ-line immunoglobulin variable region genes. Total RNA were extracted from six hybridoma cell lines that stably express anti-SEs antibodies. The variable region genes of light chain (V(L)) and heavy chain (V(H)) were cloned by reverse transcription PCR, and their classical murine antibody structure and functional V(D)J gene rearrangement were analyzed. To construct the eukaryotic V(H)-V(L) tail-parallel co-expression vectors based on the "5'-V(H)-ivs-IRES-V(L)-3'" mode, the ivs-IRES fragment and V(L) genes were spliced by two-step overlap extension PCR, and then, the recombined gene fragment and V(H) genes were inserted into the pcDNA3.1(+) expression vector sequentially. And then the constructed eukaryotic expression clones termed as p2C2HILO and p5C12HILO were transfected into baby hamster kidney 21 cell line, respectively. Two clonal cell lines stably expressing V(L)-V(H) tail-parallel antibodies against SEs were obtained, and the antibodies that expressed intracytoplasma were evaluated by enzyme-linked immunosorbent assay, immunofluorescence assay, and flow cytometry. SEs can stimulate the expression of some chemokines and chemokine receptors in porcine IPEC-J2 cells; mRNA transcription level of four chemokines and chemokine receptors can be blocked by the recombinant SE antibody prepared in this study. Our results showed that it is possible to get functional V(L)-V(H) tail-parallel genetically engineered antibodies in same vector using eukaryotic expression system.
A Parallel Vector Machine for the PM Programming Language
NASA Astrophysics Data System (ADS)
Bellerby, Tim
2016-04-01
PM is a new programming language which aims to make the writing of computational geoscience models on parallel hardware accessible to scientists who are not themselves expert parallel programmers. It is based around the concept of communicating operators: language constructs that enable variables local to a single invocation of a parallelised loop to be viewed as if they were arrays spanning the entire loop domain. This mechanism enables different loop invocations (which may or may not be executing on different processors) to exchange information in a manner that extends the successful Communicating Sequential Processes idiom from single messages to collective communication. Communicating operators avoid the additional synchronisation mechanisms, such as atomic variables, required when programming using the Partitioned Global Address Space (PGAS) paradigm. Using a single loop invocation as the fundamental unit of concurrency enables PM to uniformly represent different levels of parallelism from vector operations through shared memory systems to distributed grids. This paper describes an implementation of PM based on a vectorised virtual machine. On a single processor node, concurrent operations are implemented using masked vector operations. Virtual machine instructions operate on vectors of values and may be unmasked, masked using a Boolean field, or masked using an array of active vector cell locations. Conditional structures (such as if-then-else or while statement implementations) calculate and apply masks to the operations they control. A shift in mask representation from Boolean to location-list occurs when active locations become sufficiently sparse. Parallel loops unfold data structures (or vectors of data structures for nested loops) into vectors of values that may additionally be distributed over multiple computational nodes and then split into micro-threads compatible with the size of the local cache. Inter-node communication is accomplished using standard OpenMP and MPI. Performance analyses of the PM vector machine, demonstrating its scaling properties with respect to domain size and the number of processor nodes will be presented for a range of hardware configurations. The PM software and language definition are being made available under unrestrictive MIT and Creative Commons Attribution licenses respectively: www.pm-lang.org.
AZTEC. Parallel Iterative method Software for Solving Linear Systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hutchinson, S.; Shadid, J.; Tuminaro, R.
1995-07-01
AZTEC is an interactive library that greatly simplifies the parrallelization process when solving the linear systems of equations Ax=b where A is a user supplied n X n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. AZTEC is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparse unstructured matricesmore » for parallel solutions.« less
A parallel Jacobson-Oksman optimization algorithm. [parallel processing (computers)
NASA Technical Reports Server (NTRS)
Straeter, T. A.; Markos, A. T.
1975-01-01
A gradient-dependent optimization technique which exploits the vector-streaming or parallel-computing capabilities of some modern computers is presented. The algorithm, derived by assuming that the function to be minimized is homogeneous, is a modification of the Jacobson-Oksman serial minimization method. In addition to describing the algorithm, conditions insuring the convergence of the iterates of the algorithm and the results of numerical experiments on a group of sample test functions are presented. The results of these experiments indicate that this algorithm will solve optimization problems in less computing time than conventional serial methods on machines having vector-streaming or parallel-computing capabilities.
NASA Astrophysics Data System (ADS)
Shah, Syed Muhammad Saqlain; Batool, Safeera; Khan, Imran; Ashraf, Muhammad Usman; Abbas, Syed Hussnain; Hussain, Syed Adnan
2017-09-01
Automatic diagnosis of human diseases are mostly achieved through decision support systems. The performance of these systems is mainly dependent on the selection of the most relevant features. This becomes harder when the dataset contains missing values for the different features. Probabilistic Principal Component Analysis (PPCA) has reputation to deal with the problem of missing values of attributes. This research presents a methodology which uses the results of medical tests as input, extracts a reduced dimensional feature subset and provides diagnosis of heart disease. The proposed methodology extracts high impact features in new projection by using Probabilistic Principal Component Analysis (PPCA). PPCA extracts projection vectors which contribute in highest covariance and these projection vectors are used to reduce feature dimension. The selection of projection vectors is done through Parallel Analysis (PA). The feature subset with the reduced dimension is provided to radial basis function (RBF) kernel based Support Vector Machines (SVM). The RBF based SVM serves the purpose of classification into two categories i.e., Heart Patient (HP) and Normal Subject (NS). The proposed methodology is evaluated through accuracy, specificity and sensitivity over the three datasets of UCI i.e., Cleveland, Switzerland and Hungarian. The statistical results achieved through the proposed technique are presented in comparison to the existing research showing its impact. The proposed technique achieved an accuracy of 82.18%, 85.82% and 91.30% for Cleveland, Hungarian and Switzerland dataset respectively.
A Performance Evaluation of the Cray X1 for Scientific Applications
NASA Technical Reports Server (NTRS)
Oliker, Leonid; Biswas, Rupak; Borrill, Julian; Canning, Andrew; Carter, Jonathan; Djomehri, M. Jahed; Shan, Hongzhang; Skinner, David
2003-01-01
The last decade has witnessed a rapid proliferation of superscalar cache-based microprocessors to build high-end capability and capacity computers because of their generality, scalability, and cost effectiveness. However, the recent development of massively parallel vector systems is having a significant effect on the supercomputing landscape. In this paper, we compare the performance of the recently-released Cray X1 vector system with that of the cacheless NEC SX-6 vector machine, and the superscalar cache-based IBM Power3 and Power4 architectures for scientific applications. Overall results demonstrate that the X1 is quite promising, but performance improvements are expected as the hardware, systems software, and numerical libraries mature. Code reengineering to effectively utilize the complex architecture may also lead to significant efficiency enhancements.
NASA Astrophysics Data System (ADS)
Finsterbusch, Jürgen
2011-01-01
Experiments with two diffusion weightings applied in direct succession in a single acquisition, so-called double- or two-wave-vector diffusion-weighting (DWV) experiments at short mixing times, have been shown to be a promising tool to estimate cell or compartment sizes, e.g. in living tissue. The basic theory for such experiments predicts that the signal decays for parallel and antiparallel wave vector orientations differ by a factor of three for small wave vectors. This seems to be surprising because in standard, single-wave-vector experiments the polarity of the diffusion weighting has no influence on the signal attenuation. Thus, the question how this difference can be understood more pictorially is often raised. In this rather educational manuscript, the phase evolution during a DWV experiment for simple geometries, e.g. diffusion between parallel, impermeable planes oriented perpendicular to the wave vectors, is considered step-by-step and demonstrates how the signal difference develops. Considering the populations of the phase distributions obtained, the factor of three between the signal decays which is predicted by the theory can be reproduced. Furthermore, the intermediate signal decay for orthogonal wave vector orientations can be derived when investigating diffusion in a box. Thus, the presented “phase gymnastics” approach may help to understand the signal modulation observed in DWV experiments at short mixing times.
Limits on the Efficiency of Event-Based Algorithms for Monte Carlo Neutron Transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
Romano, Paul K.; Siegel, Andrew R.
The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup duemore » to vectorization as a function of the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size to achieve vector efficiency greater than 90%. Lastly, when the execution times for events are allowed to vary, the vector speedup is also limited by differences in execution time for events being carried out in a single event-iteration.« less
Limits on the Efficiency of Event-Based Algorithms for Monte Carlo Neutron Transport
Romano, Paul K.; Siegel, Andrew R.
2017-07-01
The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup duemore » to vectorization as a function of the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size to achieve vector efficiency greater than 90%. Lastly, when the execution times for events are allowed to vary, the vector speedup is also limited by differences in execution time for events being carried out in a single event-iteration.« less
Parallel and fault-tolerant algorithms for hypercube multiprocessors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Aykanat, C.
1988-01-01
Several techniques for increasing the performance of parallel algorithms on distributed-memory message-passing multi-processor systems are investigated. These techniques are effectively implemented for the parallelization of the Scaled Conjugate Gradient (SCG) algorithm on a hypercube connected message-passing multi-processor. Significant performance improvement is achieved by using these techniques. The SCG algorithm is used for the solution phase of an FE modeling system. Almost linear speed-up is achieved, and it is shown that hypercube topology is scalable for an FE class of problem. The SCG algorithm is also shown to be suitable for vectorization, and near supercomputer performance is achieved on a vectormore » hypercube multiprocessor by exploiting both parallelization and vectorization. Fault-tolerance issues for the parallel SCG algorithm and for the hypercube topology are also addressed.« less
A hybrid algorithm for parallel molecular dynamics simulations
NASA Astrophysics Data System (ADS)
Mangiardi, Chris M.; Meyer, R.
2017-10-01
This article describes algorithms for the hybrid parallelization and SIMD vectorization of molecular dynamics simulations with short-range forces. The parallelization method combines domain decomposition with a thread-based parallelization approach. The goal of the work is to enable efficient simulations of very large (tens of millions of atoms) and inhomogeneous systems on many-core processors with hundreds or thousands of cores and SIMD units with large vector sizes. In order to test the efficiency of the method, simulations of a variety of configurations with up to 74 million atoms have been performed. Results are shown that were obtained on multi-core systems with Sandy Bridge and Haswell processors as well as systems with Xeon Phi many-core processors.
Essential issues in multiprocessor systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gajski, D.D.; Peir, J.K.
1985-06-01
During the past several years, a great number of proposals have been made with the objective to increase supercomputer performance by an order of magnitude on the basis of a utilization of new computer architectures. The present paper is concerned with a suitable classification scheme for comparing these architectures. It is pointed out that there are basically four schools of thought as to the most important factor for an enhancement of computer performance. According to one school, the development of faster circuits will make it possible to retain present architectures, except, possibly, for a mechanism providing synchronization of parallel processes.more » A second school assigns priority to the optimization and vectorization of compilers, which will detect parallelism and help users to write better parallel programs. A third school believes in the predominant importance of new parallel algorithms, while the fourth school supports new models of computation. The merits of the four approaches are critically evaluated. 50 references.« less
Reviving the shear-free perfect fluid conjecture in general relativity
NASA Astrophysics Data System (ADS)
Sikhonde, Muzikayise E.; Dunsby, Peter K. S.
2017-12-01
Employing a Mathematica symbolic computer algebra package called xTensor, we present (1+3) -covariant special case proofs of the shear-free perfect fluid conjecture in general relativity. We first present the case where the pressure is constant, and where the acceleration is parallel to the vorticity vector. These cases were first presented in their covariant form by Senovilla et al. We then provide a covariant proof for the case where the acceleration and vorticity vectors are orthogonal, which leads to the existence of a Killing vector along the vorticity. This Killing vector satisfies the new constraint equations resulting from the vanishing of the shear. Furthermore, it is shown that in order for the conjecture to be true, this Killing vector must have a vanishing spatially projected directional covariant derivative along the velocity vector field. This in turn implies the existence of another basic vector field along the direction of the vorticity for the conjecture to hold. Finally, we show that in general, there exists a basic vector field parallel to the acceleration for which the conjecture is true.
Hypercluster - Parallel processing for computational mechanics
NASA Technical Reports Server (NTRS)
Blech, Richard A.
1988-01-01
An account is given of the development status, performance capabilities and implications for further development of NASA-Lewis' testbed 'hypercluster' parallel computer network, in which multiple processors communicate through a shared memory. Processors have local as well as shared memory; the hypercluster is expanded in the same manner as the hypercube, with processor clusters replacing the normal single processor node. The NASA-Lewis machine has three nodes with a vector personality and one node with a scalar personality. Each of the vector nodes uses four board-level vector processors, while the scalar node uses four general-purpose microcomputer boards.
Reversible vector ratchets for skyrmion systems
NASA Astrophysics Data System (ADS)
Ma, X.; Reichhardt, C. J. Olson; Reichhardt, C.
2017-03-01
We show that ac driven skyrmions interacting with an asymmetric substrate provide a realization of a class of ratchet system which we call a vector ratchet that arises due to the effect of the Magnus term on the skyrmion dynamics. In a vector ratchet, the dc motion induced by the ac drive can be described as a vector that can be rotated clockwise or counterclockwise relative to the substrate asymmetry direction. Up to a full 360∘ rotation is possible for varied ac amplitudes or skyrmion densities. In contrast to overdamped systems, in which ratchet motion is always parallel to the substrate asymmetry direction, vector ratchets allow the ratchet motion to be in any direction relative to the substrate asymmetry. It is also possible to obtain a reversal in the direction of rotation of the vector ratchet, permitting the creation of a reversible vector ratchet. We examine vector ratchets for ac drives applied parallel or perpendicular to the substrate asymmetry direction, and show that reverse ratchet motion can be produced by collective effects. No reversals occur for an isolated skyrmion on an asymmetric substrate. Since a vector ratchet can produce motion in any direction, it could represent a method for controlling skyrmion motion for spintronic applications.
Optical computation using residue arithmetic.
Huang, A; Tsunoda, Y; Goodman, J W; Ishihara, S
1979-01-15
Using residue arithmetic it is possible to perform additions, subtractions, multiplications, and polynomial evaluation without the necessity for carry operations. Calculations can, therefore, be performed in a fully parallel manner. Several different optical methods for performing residue arithmetic operations are described. A possible combination of such methods to form a matrix vector multiplier is considered. The potential advantages of optics in performing these kinds of operations are discussed.
NASA Astrophysics Data System (ADS)
Lashkin, S. V.; Kozelkov, A. S.; Yalozo, A. V.; Gerasimov, V. Yu.; Zelensky, D. K.
2017-12-01
This paper describes the details of the parallel implementation of the SIMPLE algorithm for numerical solution of the Navier-Stokes system of equations on arbitrary unstructured grids. The iteration schemes for the serial and parallel versions of the SIMPLE algorithm are implemented. In the description of the parallel implementation, special attention is paid to computational data exchange among processors under the condition of the grid model decomposition using fictitious cells. We discuss the specific features for the storage of distributed matrices and implementation of vector-matrix operations in parallel mode. It is shown that the proposed way of matrix storage reduces the number of interprocessor exchanges. A series of numerical experiments illustrates the effect of the multigrid SLAE solver tuning on the general efficiency of the algorithm; the tuning involves the types of the cycles used (V, W, and F), the number of iterations of a smoothing operator, and the number of cells for coarsening. Two ways (direct and indirect) of efficiency evaluation for parallelization of the numerical algorithm are demonstrated. The paper presents the results of solving some internal and external flow problems with the evaluation of parallelization efficiency by two algorithms. It is shown that the proposed parallel implementation enables efficient computations for the problems on a thousand processors. Based on the results obtained, some general recommendations are made for the optimal tuning of the multigrid solver, as well as for selecting the optimal number of cells per processor.
Effective Vectorization with OpenMP 4.5
DOE Office of Scientific and Technical Information (OSTI.GOV)
Huber, Joseph N.; Hernandez, Oscar R.; Lopez, Matthew Graham
This paper describes how the Single Instruction Multiple Data (SIMD) model and its extensions in OpenMP work, and how these are implemented in different compilers. Modern processors are highly parallel computational machines which often include multiple processors capable of executing several instructions in parallel. Understanding SIMD and executing instructions in parallel allows the processor to achieve higher performance without increasing the power required to run it. SIMD instructions can significantly reduce the runtime of code by executing a single operation on large groups of data. The SIMD model is so integral to the processor s potential performance that, if SIMDmore » is not utilized, less than half of the processor is ever actually used. Unfortunately, using SIMD instructions is a challenge in higher level languages because most programming languages do not have a way to describe them. Most compilers are capable of vectorizing code by using the SIMD instructions, but there are many code features important for SIMD vectorization that the compiler cannot determine at compile time. OpenMP attempts to solve this by extending the C++/C and Fortran programming languages with compiler directives that express SIMD parallelism. OpenMP is used to pass hints to the compiler about the code to be executed in SIMD. This is a key resource for making optimized code, but it does not change whether or not the code can use SIMD operations. However, in many cases critical functions are limited by a poor understanding of how SIMD instructions are actually implemented, as SIMD can be implemented through vector instructions or simultaneous multi-threading (SMT). We have found that it is often the case that code cannot be vectorized, or is vectorized poorly, because the programmer does not have sufficient knowledge of how SIMD instructions work.« less
NASA Technical Reports Server (NTRS)
Verber, C. M.; Kenan, R. P.; Hartman, N. F.; Chapman, C. M.
1980-01-01
A laboratory model of a 16 channel integrated optical data preprocessor was fabricated and tested in response to a need for a device to evaluate the outputs of a set of remote sensors. It does this by accepting the outputs of these sensors, in parallel, as the components of a multidimensional vector descriptive of the data and comparing this vector to one or more reference vectors which are used to classify the data set. The comparison is performed by taking the difference between the signal and reference vectors. The preprocessor is wholly integrated upon the surface of a LiNbO3 single crystal with the exceptions of the source and the detector. He-Ne laser light is coupled in and out of the waveguide by prism couplers. The integrated optical circuit consists of a titanium infused waveguide pattern, electrode structures and grating beam splitters. The waveguide and electrode patterns, by virtue of their complexity, make the vector subtraction device the most complex integrated optical structure fabricated to date.
GaAs Supercomputing: Architecture, Language, And Algorithms For Image Processing
NASA Astrophysics Data System (ADS)
Johl, John T.; Baker, Nick C.
1988-10-01
The application of high-speed GaAs processors in a parallel system matches the demanding computational requirements of image processing. The architecture of the McDonnell Douglas Astronautics Company (MDAC) vector processor is described along with the algorithms and language translator. Most image and signal processing algorithms can utilize parallel processing and show a significant performance improvement over sequential versions. The parallelization performed by this system is within each vector instruction. Since each vector has many elements, each requiring some computation, useful concurrent arithmetic operations can easily be performed. Balancing the memory bandwidth with the computation rate of the processors is an important design consideration for high efficiency and utilization. The architecture features a bus-based execution unit consisting of four to eight 32-bit GaAs RISC microprocessors running at a 200 MHz clock rate for a peak performance of 1.6 BOPS. The execution unit is connected to a vector memory with three buses capable of transferring two input words and one output word every 10 nsec. The address generators inside the vector memory perform different vector addressing modes and feed the data to the execution unit. The functions discussed in this paper include basic MATRIX OPERATIONS, 2-D SPATIAL CONVOLUTION, HISTOGRAM, and FFT. For each of these algorithms, assembly language programs were run on a behavioral model of the system to obtain performance figures.
Partitioning Rectangular and Structurally Nonsymmetric Sparse Matrices for Parallel Processing
DOE Office of Scientific and Technical Information (OSTI.GOV)
B. Hendrickson; T.G. Kolda
1998-09-01
A common operation in scientific computing is the multiplication of a sparse, rectangular or structurally nonsymmetric matrix and a vector. In many applications the matrix- transpose-vector product is also required. This paper addresses the efficient parallelization of these operations. We show that the problem can be expressed in terms of partitioning bipartite graphs. We then introduce several algorithms for this partitioning problem and compare their performance on a set of test matrices.
NASA Technical Reports Server (NTRS)
Nguyen, Duc T.; Storaasli, Olaf O.; Qin, Jiangning; Qamar, Ramzi
1994-01-01
An automatic differentiation tool (ADIFOR) is incorporated into a finite element based structural analysis program for shape and non-shape design sensitivity analysis of structural systems. The entire analysis and sensitivity procedures are parallelized and vectorized for high performance computation. Small scale examples to verify the accuracy of the proposed program and a medium scale example to demonstrate the parallel vector performance on multiple CRAY C90 processors are included.
Krylov subspace methods on supercomputers
NASA Technical Reports Server (NTRS)
Saad, Youcef
1988-01-01
A short survey of recent research on Krylov subspace methods with emphasis on implementation on vector and parallel computers is presented. Conjugate gradient methods have proven very useful on traditional scalar computers, and their popularity is likely to increase as three-dimensional models gain importance. A conservative approach to derive effective iterative techniques for supercomputers has been to find efficient parallel/vector implementations of the standard algorithms. The main source of difficulty in the incomplete factorization preconditionings is in the solution of the triangular systems at each step. A few approaches consisting of implementing efficient forward and backward triangular solutions are described in detail. Polynomial preconditioning as an alternative to standard incomplete factorization techniques is also discussed. Another efficient approach is to reorder the equations so as to improve the structure of the matrix to achieve better parallelism or vectorization. An overview of these and other ideas and their effectiveness or potential for different types of architectures is given.
Parallel-vector unsymmetric Eigen-Solver on high performance computers
NASA Technical Reports Server (NTRS)
Nguyen, Duc T.; Jiangning, Qin
1993-01-01
The popular QR algorithm for solving all eigenvalues of an unsymmetric matrix is reviewed. Among the basic components in the QR algorithm, it was concluded from this study, that the reduction of an unsymmetric matrix to a Hessenberg form (before applying the QR algorithm itself) can be done effectively by exploiting the vector speed and multiple processors offered by modern high-performance computers. Numerical examples of several test cases have indicated that the proposed parallel-vector algorithm for converting a given unsymmetric matrix to a Hessenberg form offers computational advantages over the existing algorithm. The time saving obtained by the proposed methods is increased as the problem size increased.
An object-oriented approach to nested data parallelism
NASA Technical Reports Server (NTRS)
Sheffler, Thomas J.; Chatterjee, Siddhartha
1994-01-01
This paper describes an implementation technique for integrating nested data parallelism into an object-oriented language. Data-parallel programming employs sets of data called 'collections' and expresses parallelism as operations performed over the elements of a collection. When the elements of a collection are also collections, then there is the possibility for 'nested data parallelism.' Few current programming languages support nested data parallelism however. In an object-oriented framework, a collection is a single object. Its type defines the parallel operations that may be applied to it. Our goal is to design and build an object-oriented data-parallel programming environment supporting nested data parallelism. Our initial approach is built upon three fundamental additions to C++. We add new parallel base types by implementing them as classes, and add a new parallel collection type called a 'vector' that is implemented as a template. Only one new language feature is introduced: the 'foreach' construct, which is the basis for exploiting elementwise parallelism over collections. The strength of the method lies in the compilation strategy, which translates nested data-parallel C++ into ordinary C++. Extracting the potential parallelism in nested 'foreach' constructs is called 'flattening' nested parallelism. We show how to flatten 'foreach' constructs using a simple program transformation. Our prototype system produces vector code which has been successfully run on workstations, a CM-2, and a CM-5.
GPU-accelerated adjoint algorithmic differentiation
NASA Astrophysics Data System (ADS)
Gremse, Felix; Höfter, Andreas; Razik, Lukas; Kiessling, Fabian; Naumann, Uwe
2016-03-01
Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the ;tape;. Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography.
GPU-Accelerated Adjoint Algorithmic Differentiation.
Gremse, Felix; Höfter, Andreas; Razik, Lukas; Kiessling, Fabian; Naumann, Uwe
2016-03-01
Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the "tape". Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography.
GPU-Accelerated Adjoint Algorithmic Differentiation
Gremse, Felix; Höfter, Andreas; Razik, Lukas; Kiessling, Fabian; Naumann, Uwe
2015-01-01
Many scientific problems such as classifier training or medical image reconstruction can be expressed as minimization of differentiable real-valued cost functions and solved with iterative gradient-based methods. Adjoint algorithmic differentiation (AAD) enables automated computation of gradients of such cost functions implemented as computer programs. To backpropagate adjoint derivatives, excessive memory is potentially required to store the intermediate partial derivatives on a dedicated data structure, referred to as the “tape”. Parallelization is difficult because threads need to synchronize their accesses during taping and backpropagation. This situation is aggravated for many-core architectures, such as Graphics Processing Units (GPUs), because of the large number of light-weight threads and the limited memory size in general as well as per thread. We show how these limitations can be mediated if the cost function is expressed using GPU-accelerated vector and matrix operations which are recognized as intrinsic functions by our AAD software. We compare this approach with naive and vectorized implementations for CPUs. We use four increasingly complex cost functions to evaluate the performance with respect to memory consumption and gradient computation times. Using vectorization, CPU and GPU memory consumption could be substantially reduced compared to the naive reference implementation, in some cases even by an order of complexity. The vectorization allowed usage of optimized parallel libraries during forward and reverse passes which resulted in high speedups for the vectorized CPU version compared to the naive reference implementation. The GPU version achieved an additional speedup of 7.5 ± 4.4, showing that the processing power of GPUs can be utilized for AAD using this concept. Furthermore, we show how this software can be systematically extended for more complex problems such as nonlinear absorption reconstruction for fluorescence-mediated tomography. PMID:26941443
Limits on the Efficiency of Event-Based Algorithms for Monte Carlo Neutron Transport
DOE Office of Scientific and Technical Information (OSTI.GOV)
Romano, Paul K.; Siegel, Andrew R.
The traditional form of parallelism in Monte Carlo particle transport simulations, wherein each individual particle history is considered a unit of work, does not lend itself well to data-level parallelism. Event-based algorithms, which were originally used for simulations on vector processors, may offer a path toward better utilizing data-level parallelism in modern computer architectures. In this study, a simple model is developed for estimating the efficiency of the event-based particle transport algorithm under two sets of assumptions. Data collected from simulations of four reactor problems using OpenMC was then used in conjunction with the models to calculate the speedup duemore » to vectorization as a function of two parameters: the size of the particle bank and the vector width. When each event type is assumed to have constant execution time, the achievable speedup is directly related to the particle bank size. We observed that the bank size generally needs to be at least 20 times greater than vector size in order to achieve vector efficiency greater than 90%. When the execution times for events are allowed to vary, however, the vector speedup is also limited by differences in execution time for events being carried out in a single event-iteration. For some problems, this implies that vector effciencies over 50% may not be attainable. While there are many factors impacting performance of an event-based algorithm that are not captured by our model, it nevertheless provides insights into factors that may be limiting in a real implementation.« less
Reversible vector ratchets for skyrmion systems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ma, Xiu; Reichhardt, Cynthia Jane Olson; Reichhardt, Charles
In this paper, we show that ac driven skyrmions interacting with an asymmetric substrate provide a realization of a class of ratchet system which we call a vector ratchet that arises due to the effect of the Magnus term on the skyrmion dynamics. In a vector ratchet, the dc motion induced by the ac drive can be described as a vector that can be rotated clockwise or counterclockwise relative to the substrate asymmetry direction. Up to a full 360° rotation is possible for varied ac amplitudes or skyrmion densities. In contrast to overdamped systems, in which ratchet motion is alwaysmore » parallel to the substrate asymmetry direction, vector ratchets allow the ratchet motion to be in any direction relative to the substrate asymmetry. It is also possible to obtain a reversal in the direction of rotation of the vector ratchet, permitting the creation of a reversible vector ratchet. We examine vector ratchets for ac drives applied parallel or perpendicular to the substrate asymmetry direction, and show that reverse ratchet motion can be produced by collective effects. No reversals occur for an isolated skyrmion on an asymmetric substrate. Finally, since a vector ratchet can produce motion in any direction, it could represent a method for controlling skyrmion motion for spintronic applications.« less
Reversible vector ratchets for skyrmion systems
Ma, Xiu; Reichhardt, Cynthia Jane Olson; Reichhardt, Charles
2017-03-03
In this paper, we show that ac driven skyrmions interacting with an asymmetric substrate provide a realization of a class of ratchet system which we call a vector ratchet that arises due to the effect of the Magnus term on the skyrmion dynamics. In a vector ratchet, the dc motion induced by the ac drive can be described as a vector that can be rotated clockwise or counterclockwise relative to the substrate asymmetry direction. Up to a full 360° rotation is possible for varied ac amplitudes or skyrmion densities. In contrast to overdamped systems, in which ratchet motion is alwaysmore » parallel to the substrate asymmetry direction, vector ratchets allow the ratchet motion to be in any direction relative to the substrate asymmetry. It is also possible to obtain a reversal in the direction of rotation of the vector ratchet, permitting the creation of a reversible vector ratchet. We examine vector ratchets for ac drives applied parallel or perpendicular to the substrate asymmetry direction, and show that reverse ratchet motion can be produced by collective effects. No reversals occur for an isolated skyrmion on an asymmetric substrate. Finally, since a vector ratchet can produce motion in any direction, it could represent a method for controlling skyrmion motion for spintronic applications.« less
1991-01-01
visual and three-layer connectionist network, in that the input layer of memory processing is serial, and is likely to represent each module is... Selective attention gates visual University Press. processing in the extrastnate cortex. Science, 229:782-784. Treasman, A.M. (1985). Preartentive...AD-A242 225 A CONNECTIONIST SIMULATION OF ATTENTION AND VECTOR COMPARISON: THE NEED FOR SERIAL PROCESSING IN PARALLEL HARDWARE Technical Report AlP
Kinematic sensitivity of robot manipulators
NASA Technical Reports Server (NTRS)
Vuskovic, Marko I.
1989-01-01
Kinematic sensitivity vectors and matrices for open-loop, n degrees-of-freedom manipulators are derived. First-order sensitivity vectors are defined as partial derivatives of the manipulator's position and orientation with respect to its geometrical parameters. The four-parameter kinematic model is considered, as well as the five-parameter model in case of nominally parallel joint axes. Sensitivity vectors are expressed in terms of coordinate axes of manipulator frames. Second-order sensitivity vectors, the partial derivatives of first-order sensitivity vectors, are also considered. It is shown that second-order sensitivity vectors can be expressed as vector products of the first-order sensitivity vectors.
NASA Astrophysics Data System (ADS)
Hauth, T.; Innocente and, V.; Piparo, D.
2012-12-01
The processing of data acquired by the CMS detector at LHC is carried out with an object-oriented C++ software framework: CMSSW. With the increasing luminosity delivered by the LHC, the treatment of recorded data requires extraordinary large computing resources, also in terms of CPU usage. A possible solution to cope with this task is the exploitation of the features offered by the latest microprocessor architectures. Modern CPUs present several vector units, the capacity of which is growing steadily with the introduction of new processor generations. Moreover, an increasing number of cores per die is offered by the main vendors, even on consumer hardware. Most recent C++ compilers provide facilities to take advantage of such innovations, either by explicit statements in the programs sources or automatically adapting the generated machine instructions to the available hardware, without the need of modifying the existing code base. Programming techniques to implement reconstruction algorithms and optimised data structures are presented, that aim to scalable vectorization and parallelization of the calculations. One of their features is the usage of new language features of the C++11 standard. Portions of the CMSSW framework are illustrated which have been found to be especially profitable for the application of vectorization and multi-threading techniques. Specific utility components have been developed to help vectorization and parallelization. They can easily become part of a larger common library. To conclude, careful measurements are described, which show the execution speedups achieved via vectorised and multi-threaded code in the context of CMSSW.
Multiple mechanisms in the perception of face gender: Effect of sex-irrelevant features.
Komori, Masashi; Kawamura, Satoru; Ishihara, Shigekazu
2011-06-01
Effects of sex-relevant and sex-irrelevant facial features on the evaluation of facial gender were investigated. Participants rated masculinity of 48 male facial photographs and femininity of 48 female facial photographs. Eighty feature points were measured on each of the facial photographs. Using a generalized Procrustes analysis, facial shapes were converted into multidimensional vectors, with the average face as a starting point. Each vector was decomposed into a sex-relevant subvector and a sex-irrelevant subvector which were, respectively, parallel and orthogonal to the main male-female axis. Principal components analysis (PCA) was performed on the sex-irrelevant subvectors. One principal component was negatively correlated with both perceived masculinity and femininity, and another was correlated only with femininity, though both components were orthogonal to the male-female dimension (and thus by definition sex-irrelevant). These results indicate that evaluation of facial gender depends on sex-irrelevant as well as sex-relevant facial features.
Parallel-Vector Algorithm For Rapid Structural Anlysis
NASA Technical Reports Server (NTRS)
Agarwal, Tarun R.; Nguyen, Duc T.; Storaasli, Olaf O.
1993-01-01
New algorithm developed to overcome deficiency of skyline storage scheme by use of variable-band storage scheme. Exploits both parallel and vector capabilities of modern high-performance computers. Gives engineers and designers opportunity to include more design variables and constraints during optimization of structures. Enables use of more refined finite-element meshes to obtain improved understanding of complex behaviors of aerospace structures leading to better, safer designs. Not only attractive for current supercomputers but also for next generation of shared-memory supercomputers.
Transferring ecosystem simulation codes to supercomputers
NASA Technical Reports Server (NTRS)
Skiles, J. W.; Schulbach, C. H.
1995-01-01
Many ecosystem simulation computer codes have been developed in the last twenty-five years. This development took place initially on main-frame computers, then mini-computers, and more recently, on micro-computers and workstations. Supercomputing platforms (both parallel and distributed systems) have been largely unused, however, because of the perceived difficulty in accessing and using the machines. Also, significant differences in the system architectures of sequential, scalar computers and parallel and/or vector supercomputers must be considered. We have transferred a grassland simulation model (developed on a VAX) to a Cray Y-MP/C90. We describe porting the model to the Cray and the changes we made to exploit the parallelism in the application and improve code execution. The Cray executed the model 30 times faster than the VAX and 10 times faster than a Unix workstation. We achieved an additional speedup of 30 percent by using the compiler's vectoring and 'in-line' capabilities. The code runs at only about 5 percent of the Cray's peak speed because it ineffectively uses the vector and parallel processing capabilities of the Cray. We expect that by restructuring the code, it could execute an additional six to ten times faster.
Brian hears: online auditory processing using vectorization over channels.
Fontaine, Bertrand; Goodman, Dan F M; Benichoux, Victor; Brette, Romain
2011-01-01
The human cochlea includes about 3000 inner hair cells which filter sounds at frequencies between 20 Hz and 20 kHz. This massively parallel frequency analysis is reflected in models of auditory processing, which are often based on banks of filters. However, existing implementations do not exploit this parallelism. Here we propose algorithms to simulate these models by vectorizing computation over frequency channels, which are implemented in "Brian Hears," a library for the spiking neural network simulator package "Brian." This approach allows us to use high-level programming languages such as Python, because with vectorized operations, the computational cost of interpretation represents a small fraction of the total cost. This makes it possible to define and simulate complex models in a simple way, while all previous implementations were model-specific. In addition, we show that these algorithms can be naturally parallelized using graphics processing units, yielding substantial speed improvements. We demonstrate these algorithms with several state-of-the-art cochlear models, and show that they compare favorably with existing, less flexible, implementations.
Signal processing applications of massively parallel charge domain computing devices
NASA Technical Reports Server (NTRS)
Fijany, Amir (Inventor); Barhen, Jacob (Inventor); Toomarian, Nikzad (Inventor)
1999-01-01
The present invention is embodied in a charge coupled device (CCD)/charge injection device (CID) architecture capable of performing a Fourier transform by simultaneous matrix vector multiplication (MVM) operations in respective plural CCD/CID arrays in parallel in O(1) steps. For example, in one embodiment, a first CCD/CID array stores charge packets representing a first matrix operator based upon permutations of a Hartley transform and computes the Fourier transform of an incoming vector. A second CCD/CID array stores charge packets representing a second matrix operator based upon different permutations of a Hartley transform and computes the Fourier transform of an incoming vector. The incoming vector is applied to the inputs of the two CCD/CID arrays simultaneously, and the real and imaginary parts of the Fourier transform are produced simultaneously in the time required to perform a single MVM operation in a CCD/CID array.
Adaptive proxy map server for efficient vector spatial data rendering
NASA Astrophysics Data System (ADS)
Sayar, Ahmet
2013-01-01
The rapid transmission of vector map data over the Internet is becoming a bottleneck of spatial data delivery and visualization in web-based environment because of increasing data amount and limited network bandwidth. In order to improve both the transmission and rendering performances of vector spatial data over the Internet, we propose a proxy map server enabling parallel vector data fetching as well as caching to improve the performance of web-based map servers in a dynamic environment. Proxy map server is placed seamlessly anywhere between the client and the final services, intercepting users' requests. It employs an efficient parallelization technique based on spatial proximity and data density in case distributed replica exists for the same spatial data. The effectiveness of the proposed technique is proved at the end of the article by the application of creating map images enriched with earthquake seismic data records.
Wiimote Experiments: 3-D Inclined Plane Problem for Reinforcing the Vector Concept
ERIC Educational Resources Information Center
Kawam, Alae; Kouh, Minjoon
2011-01-01
In an introductory physics course where students first learn about vectors, they oftentimes struggle with the concept of vector addition and decomposition. For example, the classic physics problem involving a mass on an inclined plane requires the decomposition of the force of gravity into two directions that are parallel and perpendicular to the…
A Data Type for Efficient Representation of Other Data Types
NASA Technical Reports Server (NTRS)
James, Mark
2008-01-01
A self-organizing, monomorphic data type denoted a sequence has been conceived to address certain concerns that arise in programming parallel computers. A sequence in the present sense can be regarded abstractly as a vector, set, bag, queue, or other construct. Heretofore, in programming a parallel computer, it has been necessary for the programmer to state explicitly, at the outset, what parts of the program and the underlying data structures must be represented in parallel form. Not only is this requirement not optimal from the perspective of implementation; it entails an additional requirement that the programmer have intimate understanding of the underlying parallel structure. The present sequence data type overcomes both the implementation and parallel structure obstacles. In so doing, the sequence data type provides unified means by which the programmer can represent a data structure for natural and automatic decomposition to a parallel computing architecture. Sequences exhibit the behavioral and structural characteristics of vectors, but the underlying representations are automatically synthesized from combinations of programmers advice and execution use metrics. Sequences can vary bidirectionally between sparseness and density, making them excellent choices for many kinds of algorithms. The novelty and benefit of this behavior lies in the fact that it can relieve programmers of the details of implementations. The creation of a sequence enables decoupling of a conceptual representation from an implementation. The underlying representation of a sequence is a hybrid of representations composed of vectors, linked lists, connected blocks, and hash tables. The internal structure of a sequence can automatically change from time to time on the basis of how it is being used. Those portions of a sequence where elements have not been added or removed can be as efficient as vectors. As elements are inserted and removed in a given portion, then different methods are utilized to provide both an access and memory strategy that is optimized for that portion and the use to which it is put.
Super and parallel computers and their impact on civil engineering
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kamat, M.P.
1986-01-01
This book presents the papers given at a conference on the use of supercomputers in civil engineering. Topics considered at the conference included solving nonlinear equations on a hypercube, a custom architectured parallel processing system, distributed data processing, algorithms, computer architecture, parallel processing, vector processing, computerized simulation, and cost benefit analysis.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S.
1995-10-01
Aztec is an iterative library that greatly simplifies the parallelization process when solving the linear systems of equations Ax = b where A is a user supplied n x n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. Aztec is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparsemore » unstructured matrices for parallel solution. Once the distributed matrix is created, computation can be performed on any of the parallel machines running Aztec: nCUBE 2, IBM SP2 and Intel Paragon, MPI platforms as well as standard serial and vector platforms. Aztec includes a number of Krylov iterative methods such as conjugate gradient (CG), generalized minimum residual (GMRES) and stabilized biconjugate gradient (BICGSTAB) to solve systems of equations. These Krylov methods are used in conjunction with various preconditioners such as polynomial or domain decomposition methods using LU or incomplete LU factorizations within subdomains. Although the matrix A can be general, the package has been designed for matrices arising from the approximation of partial differential equations (PDEs). In particular, the Aztec package is oriented toward systems arising from PDE applications.« less
Modeling Cooperative Threads to Project GPU Performance for Adaptive Parallelism
DOE Office of Scientific and Technical Information (OSTI.GOV)
Meng, Jiayuan; Uram, Thomas; Morozov, Vitali A.
Most accelerators, such as graphics processing units (GPUs) and vector processors, are particularly suitable for accelerating massively parallel workloads. On the other hand, conventional workloads are developed for multi-core parallelism, which often scale to only a few dozen OpenMP threads. When hardware threads significantly outnumber the degree of parallelism in the outer loop, programmers are challenged with efficient hardware utilization. A common solution is to further exploit the parallelism hidden deep in the code structure. Such parallelism is less structured: parallel and sequential loops may be imperfectly nested within each other, neigh boring inner loops may exhibit different concurrency patternsmore » (e.g. Reduction vs. Forall), yet have to be parallelized in the same parallel section. Many input-dependent transformations have to be explored. A programmer often employs a larger group of hardware threads to cooperatively walk through a smaller outer loop partition and adaptively exploit any encountered parallelism. This process is time-consuming and error-prone, yet the risk of gaining little or no performance remains high for such workloads. To reduce risk and guide implementation, we propose a technique to model workloads with limited parallelism that can automatically explore and evaluate transformations involving cooperative threads. Eventually, our framework projects the best achievable performance and the most promising transformations without implementing GPU code or using physical hardware. We envision our technique to be integrated into future compilers or optimization frameworks for autotuning.« less
NASA Astrophysics Data System (ADS)
Dey, T.; Rodrigue, P.
2015-07-01
We aim to evaluate the Intel Xeon Phi coprocessor for acceleration of 3D Positron Emission Tomography (PET) image reconstruction. We focus on the sensitivity map calculation as one computational intensive part of PET image reconstruction, since it is a promising candidate for acceleration with the Many Integrated Core (MIC) architecture of the Xeon Phi. The computation of the voxels in the field of view (FoV) can be done in parallel and the 103 to 104 samples needed to calculate the detection probability of each voxel can take advantage of vectorization. We use the ray tracing kernels of the Embree project to calculate the hit points of the sample rays with the detector and in a second step the sum of the radiological path taking into account attenuation is determined. The core components are implemented using the Intel single instruction multiple data compiler (ISPC) to enable a portable implementation showing efficient vectorization either on the Xeon Phi and the Host platform. On the Xeon Phi, the calculation of the radiological path is also implemented in hardware specific intrinsic instructions (so-called `intrinsics') to allow manually-optimized vectorization. For parallelization either OpenMP and ISPC tasking (based on pthreads) are evaluated.Our implementation achieved a scalability factor of 0.90 on the Xeon Phi coprocessor (model 5110P) with 60 cores at 1 GHz. Only minor differences were found between parallelization with OpenMP and the ISPC tasking feature. The implementation using intrinsics was found to be about 12% faster than the portable ISPC version. With this version, a speedup of 1.43 was achieved on the Xeon Phi coprocessor compared to the host system (HP SL250s Gen8) equipped with two Xeon (E5-2670) CPUs, with 8 cores at 2.6 to 3.3 GHz each. Using a second Xeon Phi card the speedup could be further increased to 2.77. No significant differences were found between the results of the different Xeon Phi and the Host implementations. The examination showed that a reasonable speedup of sensitivity map calculation could be achieved on the Xeon Phi either by a portable or a hardware specific implementation.
NASA Technical Reports Server (NTRS)
Manohar, Mareboyana; Tilton, James C.
1994-01-01
A progressive vector quantization (VQ) compression approach is discussed which decomposes image data into a number of levels using full search VQ. The final level is losslessly compressed, enabling lossless reconstruction. The computational difficulties are addressed by implementation on a massively parallel SIMD machine. We demonstrate progressive VQ on multispectral imagery obtained from the Advanced Very High Resolution Radiometer instrument and other Earth observation image data, and investigate the trade-offs in selecting the number of decomposition levels and codebook training method.
A compositional reservoir simulator on distributed memory parallel computers
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rame, M.; Delshad, M.
1995-12-31
This paper presents the application of distributed memory parallel computes to field scale reservoir simulations using a parallel version of UTCHEM, The University of Texas Chemical Flooding Simulator. The model is a general purpose highly vectorized chemical compositional simulator that can simulate a wide range of displacement processes at both field and laboratory scales. The original simulator was modified to run on both distributed memory parallel machines (Intel iPSC/960 and Delta, Connection Machine 5, Kendall Square 1 and 2, and CRAY T3D) and a cluster of workstations. A domain decomposition approach has been taken towards parallelization of the code. Amore » portion of the discrete reservoir model is assigned to each processor by a set-up routine that attempts a data layout as even as possible from the load-balance standpoint. Each of these subdomains is extended so that data can be shared between adjacent processors for stencil computation. The added routines that make parallel execution possible are written in a modular fashion that makes the porting to new parallel platforms straight forward. Results of the distributed memory computing performance of Parallel simulator are presented for field scale applications such as tracer flood and polymer flood. A comparison of the wall-clock times for same problems on a vector supercomputer is also presented.« less
Pairwise Sequence Alignment Library
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jeff Daily, PNNL
2015-05-20
Vector extensions, such as SSE, have been part of the x86 CPU since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. The trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based on striped data layouts. Therefore, amore » novel SIMD implementation of a parallel scan-based sequence alignment algorithm that can better exploit wider SIMD units was implemented as part of the Parallel Sequence Alignment Library (parasail). Parasail features: Reference implementations of all known vectorized sequence alignment approaches. Implementations of Smith Waterman (SW), semi-global (SG), and Needleman Wunsch (NW) sequence alignment algorithms. Implementations across all modern CPU instruction sets including AVX2 and KNC. Language interfaces for C/C++ and Python.« less
Conceptual design of a hybrid parallel mechanism for mask exchanging of TMT
NASA Astrophysics Data System (ADS)
Wang, Jianping; Zhou, Hongfei; Li, Kexuan; Zhou, Zengxiang; Zhai, Chao
2015-10-01
Mask exchange system is an important part of the Multi-Object Broadband Imaging Echellette (MOBIE) on the Thirty Meter Telescope (TMT). To solve the problem of stiffness changing with the gravity vector of the mask exchange system in the MOBIE, the hybrid parallel mechanism design method was introduced into the whole research. By using the characteristics of high stiffness and precision of parallel structure, combined with large moving range of serial structure, a conceptual design of a hybrid parallel mask exchange system based on 3-RPS parallel mechanism was presented. According to the position requirements of the MOBIE, the SolidWorks structure model of the hybrid parallel mask exchange robot was established and the appropriate installation position without interfering with the related components and light path in the MOBIE of TMT was analyzed. Simulation results in SolidWorks suggested that 3-RPS parallel platform had good stiffness property in different gravity vector directions. Furthermore, through the research of the mechanism theory, the inverse kinematics solution of the 3-RPS parallel platform was calculated and the mathematical relationship between the attitude angle of moving platform and the angle of ball-hinges on the moving platform was established, in order to analyze the attitude adjustment ability of the hybrid parallel mask exchange robot. The proposed conceptual design has some guiding significance for the design of mask exchange system of the MOBIE on TMT.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Dupertuis, M.A.; Proctor, M.; Acklin, B.
Energy balance and reciprocity relations are studied for harmonic inhomogeneous plane waves that are incident upon a stack of continuous absorbing dielectric media that are macroscopically characterized by their electric and magnetic permittivities and their conductivities. New cross terms between parallel electric and parallel magnetic modes are identified in the fully generalized Poynting vector. The symmetry and the relations between the general Fresnel coefficients are investigated in the context of energy balance at the interface. The contributions of the so-called mixed Poynting vector are discussed in detail. In particular a new transfer matrix is introduced for energy fluxes in thin-filmmore » optics based on the Poynting and mixed Poynting vectors. Finally, the study of reciprocity relations leads to a generalization of a theorem of reversibility for conducting and dielectric media. 16 refs.« less
NASA Astrophysics Data System (ADS)
Stone, Christopher P.; Alferman, Andrew T.; Niemeyer, Kyle E.
2018-05-01
Accurate and efficient methods for solving stiff ordinary differential equations (ODEs) are a critical component of turbulent combustion simulations with finite-rate chemistry. The ODEs governing the chemical kinetics at each mesh point are decoupled by operator-splitting allowing each to be solved concurrently. An efficient ODE solver must then take into account the available thread and instruction-level parallelism of the underlying hardware, especially on many-core coprocessors, as well as the numerical efficiency. A stiff Rosenbrock and a nonstiff Runge-Kutta ODE solver are both implemented using the single instruction, multiple thread (SIMT) and single instruction, multiple data (SIMD) paradigms within OpenCL. Both methods solve multiple ODEs concurrently within the same instruction stream. The performance of these parallel implementations was measured on three chemical kinetic models of increasing size across several multicore and many-core platforms. Two separate benchmarks were conducted to clearly determine any performance advantage offered by either method. The first benchmark measured the run-time of evaluating the right-hand-side source terms in parallel and the second benchmark integrated a series of constant-pressure, homogeneous reactors using the Rosenbrock and Runge-Kutta solvers. The right-hand-side evaluations with SIMD parallelism on the host multicore Xeon CPU and many-core Xeon Phi co-processor performed approximately three times faster than the baseline multithreaded C++ code. The SIMT parallel model on the host and Phi was 13%-35% slower than the baseline while the SIMT model on the NVIDIA Kepler GPU provided approximately the same performance as the SIMD model on the Phi. The runtimes for both ODE solvers decreased significantly with the SIMD implementations on the host CPU (2.5-2.7 ×) and Xeon Phi coprocessor (4.7-4.9 ×) compared to the baseline parallel code. The SIMT implementations on the GPU ran 1.5-1.6 times faster than the baseline multithreaded CPU code; however, this was significantly slower than the SIMD versions on the host CPU or the Xeon Phi. The performance difference between the three platforms was attributed to thread divergence caused by the adaptive step-sizes within the ODE integrators. Analysis showed that the wider vector width of the GPU incurs a higher level of divergence than the narrower Sandy Bridge or Xeon Phi. The significant performance improvement provided by the SIMD parallel strategy motivates further research into more ODE solver methods that are both SIMD-friendly and computationally efficient.
NASA Astrophysics Data System (ADS)
Zhang, Lei; Yang, Fengbao; Ji, Linna; Lv, Sheng
2018-01-01
Diverse image fusion methods perform differently. Each method has advantages and disadvantages compared with others. One notion is that the advantages of different image methods can be effectively combined. A multiple-algorithm parallel fusion method based on algorithmic complementarity and synergy is proposed. First, in view of the characteristics of the different algorithms and difference-features among images, an index vector-based feature-similarity is proposed to define the degree of complementarity and synergy. This proposed index vector is a reliable evidence indicator for algorithm selection. Second, the algorithms with a high degree of complementarity and synergy are selected. Then, the different degrees of various features and infrared intensity images are used as the initial weights for the nonnegative matrix factorization (NMF). This avoids randomness of the NMF initialization parameter. Finally, the fused images of different algorithms are integrated using the NMF because of its excellent data fusing performance on independent features. Experimental results demonstrate that the visual effect and objective evaluation index of the fused images obtained using the proposed method are better than those obtained using traditional methods. The proposed method retains all the advantages that individual fusion algorithms have.
Ju, Jia; Huan, Meng-Lei; Wan, Ning; Hou, Yi-Lin; Ma, Xi-Xi; Jia, Yi-Yang; Li, Chen; Zhou, Si-Yuan; Zhang, Bang-Le
2016-05-15
Cholesterol derivatives M1-M6 as synthetic cationic lipids were designed and the biological evaluation of the cationic liposomes based on them as non-viral gene delivery vectors were described. Plasmid pEGFP-N1, used as model gene, was transferred into 293T cells by cationic liposomes formed with M1-M6 and transfection efficiency and GFP expression were tested. Cationic liposomes prepared with cationic lipids M1-M6 exhibited good transfection activity, and the transfection activity was parallel (M2 and M4) or superior (M1 and M6) to that of DC-Chol derived from the same backbone. Among them, the transfection efficiency of cationic lipid M6 was parallel to that of the commercially available Lipofectamine2000. The optimal formulation of M1 and M6 were found to be at a mol ratio of 1:0.5 for cationic lipid/DOPE, and at a N/P charge mol ratio of 3:1 for liposome/DNA. Under optimized conditions, the efficiency of M1 and M6 is greater than that of all the tested commercial liposomes DC-Chol and Lipofectamine2000, even in the presence of serum. The results indicated that M1 and M6 exhibited low cytotoxicity, good serum compatibility and efficient transfection performance, having the potential of being excellent non-viral vectors for gene delivery. Copyright © 2016 Elsevier Ltd. All rights reserved.
Brian Hears: Online Auditory Processing Using Vectorization Over Channels
Fontaine, Bertrand; Goodman, Dan F. M.; Benichoux, Victor; Brette, Romain
2011-01-01
The human cochlea includes about 3000 inner hair cells which filter sounds at frequencies between 20 Hz and 20 kHz. This massively parallel frequency analysis is reflected in models of auditory processing, which are often based on banks of filters. However, existing implementations do not exploit this parallelism. Here we propose algorithms to simulate these models by vectorizing computation over frequency channels, which are implemented in “Brian Hears,” a library for the spiking neural network simulator package “Brian.” This approach allows us to use high-level programming languages such as Python, because with vectorized operations, the computational cost of interpretation represents a small fraction of the total cost. This makes it possible to define and simulate complex models in a simple way, while all previous implementations were model-specific. In addition, we show that these algorithms can be naturally parallelized using graphics processing units, yielding substantial speed improvements. We demonstrate these algorithms with several state-of-the-art cochlear models, and show that they compare favorably with existing, less flexible, implementations. PMID:21811453
NASA Astrophysics Data System (ADS)
Solano-Altamirano, J. M.; Hernández-Pérez, Julio M.
2015-11-01
DensToolKit is a suite of cross-platform, optionally parallelized, programs for analyzing the molecular electron density (ρ) and several fields derived from it. Scalar and vector fields, such as the gradient of the electron density (∇ρ), electron localization function (ELF) and its gradient, localized orbital locator (LOL), region of slow electrons (RoSE), reduced density gradient, localized electrons detector (LED), information entropy, molecular electrostatic potential, kinetic energy densities K and G, among others, can be evaluated on zero, one, two, and three dimensional grids. The suite includes a program for searching critical points and bond paths of the electron density, under the framework of Quantum Theory of Atoms in Molecules. DensToolKit also evaluates the momentum space electron density on spatial grids, and the reduced density matrix of order one along lines joining two arbitrary atoms of a molecule. The source code is distributed under the GNU-GPLv3 license, and we release the code with the intent of establishing an open-source collaborative project. The style of DensToolKit's code follows some of the guidelines of an object-oriented program. This allows us to supply the user with a simple manner for easily implement new scalar or vector fields, provided they are derived from any of the fields already implemented in the code. In this paper, we present some of the most salient features of the programs contained in the suite, some examples of how to run them, and the mathematical definitions of the implemented fields along with hints of how we optimized their evaluation. We benchmarked our suite against both a freely-available program and a commercial package. Speed-ups of ˜2×, and up to 12× were obtained using a non-parallel compilation of DensToolKit for the evaluation of fields. DensToolKit takes similar times for finding critical points, compared to a commercial package. Finally, we present some perspectives for the future development and growth of the suite.
Efficient solution of parabolic equations by Krylov approximation methods
NASA Technical Reports Server (NTRS)
Gallopoulos, E.; Saad, Y.
1990-01-01
Numerical techniques for solving parabolic equations by the method of lines is addressed. The main motivation for the proposed approach is the possibility of exploiting a high degree of parallelism in a simple manner. The basic idea of the method is to approximate the action of the evolution operator on a given state vector by means of a projection process onto a Krylov subspace. Thus, the resulting approximation consists of applying an evolution operator of a very small dimension to a known vector which is, in turn, computed accurately by exploiting well-known rational approximations to the exponential. Because the rational approximation is only applied to a small matrix, the only operations required with the original large matrix are matrix-by-vector multiplications, and as a result the algorithm can easily be parallelized and vectorized. Some relevant approximation and stability issues are discussed. We present some numerical experiments with the method and compare its performance with a few explicit and implicit algorithms.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Amadio, G.; et al.
An intensive R&D and programming effort is required to accomplish new challenges posed by future experimental high-energy particle physics (HEP) programs. The GeantV project aims to narrow the gap between the performance of the existing HEP detector simulation software and the ideal performance achievable, exploiting latest advances in computing technology. The project has developed a particle detector simulation prototype capable of transporting in parallel particles in complex geometries exploiting instruction level microparallelism (SIMD and SIMT), task-level parallelism (multithreading) and high-level parallelism (MPI), leveraging both the multi-core and the many-core opportunities. We present preliminary verification results concerning the electromagnetic (EM) physicsmore » models developed for parallel computing architectures within the GeantV project. In order to exploit the potential of vectorization and accelerators and to make the physics model effectively parallelizable, advanced sampling techniques have been implemented and tested. In this paper we introduce a set of automated statistical tests in order to verify the vectorized models by checking their consistency with the corresponding Geant4 models and to validate them against experimental data.« less
Method and means for measuring the anisotropy of a plasma in a magnetic field
Shohet, J.L.; Greene, D.G.S.
1973-10-23
Anisotropy is measured of a free-free-bremsstrahlungradiation-generating plasma in a magnetic field by collimating the free-free bremsstrahlung radiation in a direction normal to the magnetic field and scattering the collimated free- free bremsstrahlung radiation to resolve the radiation into its vector components in a plane parallel to the electric field of the bremsstrahlung radiation. The scattered vector components are counted at particular energy levels in a direction parallel to the magnetic field and also normal to the magnetic field of the plasma to provide a measure of anisotropy of the plasma. (Official Gazette)
CFD code evaluation for internal flow modeling
NASA Technical Reports Server (NTRS)
Chung, T. J.
1990-01-01
Research on the computational fluid dynamics (CFD) code evaluation with emphasis on supercomputing in reacting flows is discussed. Advantages of unstructured grids, multigrids, adaptive methods, improved flow solvers, vector processing, parallel processing, and reduction of memory requirements are discussed. As examples, researchers include applications of supercomputing to reacting flow Navier-Stokes equations including shock waves and turbulence and combustion instability problems associated with solid and liquid propellants. Evaluation of codes developed by other organizations are not included. Instead, the basic criteria for accuracy and efficiency have been established, and some applications on rocket combustion have been made. Research toward an ultimate goal, the most accurate and efficient CFD code, is in progress and will continue for years to come.
NASA Astrophysics Data System (ADS)
Olson, Richard F.
2013-05-01
Rendering of point scatterer based radar scenes for millimeter wave (mmW) seeker tests in real-time hardware-in-the-loop (HWIL) scene generation requires efficient algorithms and vector-friendly computer architectures for complex signal synthesis. New processor technology from Intel implements an extended 256-bit vector SIMD instruction set (AVX, AVX2) in a multi-core CPU design providing peak execution rates of hundreds of GigaFLOPS (GFLOPS) on one chip. Real world mmW scene generation code can approach peak SIMD execution rates only after careful algorithm and source code design. An effective software design will maintain high computing intensity emphasizing register-to-register SIMD arithmetic operations over data movement between CPU caches or off-chip memories. Engineers at the U.S. Army Aviation and Missile Research, Development and Engineering Center (AMRDEC) applied two basic parallel coding methods to assess new 256-bit SIMD multi-core architectures for mmW scene generation in HWIL. These include use of POSIX threads built on vector library functions and more portable, highlevel parallel code based on compiler technology (e.g. OpenMP pragmas and SIMD autovectorization). Since CPU technology is rapidly advancing toward high processor core counts and TeraFLOPS peak SIMD execution rates, it is imperative that coding methods be identified which produce efficient and maintainable parallel code. This paper describes the algorithms used in point scatterer target model rendering, the parallelization of those algorithms, and the execution performance achieved on an AVX multi-core machine using the two basic parallel coding methods. The paper concludes with estimates for scale-up performance on upcoming multi-core technology.
The International Conference on Vector and Parallel Computing (2nd)
1989-01-17
Computation of the SVD of Bidiagonal Matrices" ...................................... 11 " Lattice QCD -As a Large Scale Scientific Computation...vectorizcd for the IBM 3090 Vector Facility. In addition, elapsed times " Lattice QCD -As a Large Scale Scientific have been reduced by using 3090...benchmarked Lattice QCD on a large number ofcompu- come from the wavefront solver routine. This was exten- ters: CrayX-MP and Cray 2 (vector
NASA Technical Reports Server (NTRS)
Ortega, J. M.
1986-01-01
Various graduate research activities in the field of computer science are reported. Among the topics discussed are: (1) failure probabilities in multi-version software; (2) Gaussian Elimination on parallel computers; (3) three dimensional Poisson solvers on parallel/vector computers; (4) automated task decomposition for multiple robot arms; (5) multi-color incomplete cholesky conjugate gradient methods on the Cyber 205; and (6) parallel implementation of iterative methods for solving linear equations.
Spin wave filtering and guiding in Permalloy/iron nanowires
NASA Astrophysics Data System (ADS)
Silvani, R.; Kostylev, M.; Adeyeye, A. O.; Gubbiotti, G.
2018-03-01
We have investigated the spin wave filtering and guiding properties of periodic array of single (Permalloy and Fe) and bi-layer (Py/Fe) nanowires (NWs) by means of Brillouin light scattering measurements and micromagnetic simulations. For all the nanowire arrays, the thickness of the layers is 10 nm while all NWs have the same width of 340 nm and edge-to-edge separation of 100 nm. Spin wave dispersion has been measured in the Damon-Eshbach configuration for wave vector either parallel or perpendicular to the nanowire length. This study reveals the filtering property of the spin waves when the wave vector is perpendicular to the NW length, with frequency ranges where the spin wave propagation is permitted separated by frequency band gaps, and the guiding property of NW when the wave vector is oriented parallel to the NW, with spin wave modes propagating in parallel channels in the central and edge regions of the NW. The measured dispersions were well reproduced by micromagnetic simulations, which also deliver the spatial profiles for the modes at zero wave vector. To reproduce the dispersion of the modes localized close to the NW edges, uniaxial anisotropy has been introduced. In the case of Permalloy/iron NWs, the obtained results have been compared with those for a 20 nm thick effective NW having average magnetic properties of the two materials.
NASA Technical Reports Server (NTRS)
Mayo, W. T., Jr.; Smart, A. E.
1979-01-01
A laser transit anemometer measured a two-dimensional vector velocity, using the transit time of scattering particles between two focused and parallel laser beams. The objectives were: (1) the determination of the concentration levels and light scattering efficiencies of naturally occurring, submicron particles in the NASA/Ames unitary wind tunnel and (2) the evaluation based on these measured data of a laser transit anemometer with digital correlation processing for nonintrusive velocity measurement in this facility. The evaluation criteria were the speeds at which point velocity measurements could be realized with this technique (as determined from computer simulations) for given accuracy requirements.
Extendability of parallel sections in vector bundles
NASA Astrophysics Data System (ADS)
Kirschner, Tim
2016-01-01
I address the following question: Given a differentiable manifold M, what are the open subsets U of M such that, for all vector bundles E over M and all linear connections ∇ on E, any ∇-parallel section in E defined on U extends to a ∇-parallel section in E defined on M? For simply connected manifolds M (among others) I describe the entirety of all such sets U which are, in addition, the complement of a C1 submanifold, boundary allowed, of M. This delivers a partial positive answer to a problem posed by Antonio J. Di Scala and Gianni Manno (2014). Furthermore, in case M is an open submanifold of Rn, n ≥ 2, I prove that the complement of U in M, not required to be a submanifold now, can have arbitrarily large n-dimensional Lebesgue measure.
A Computer Simulation of the System-Wide Effects of Parallel-Offset Route Maneuvers
NASA Technical Reports Server (NTRS)
Lauderdale, Todd A.; Santiago, Confesor; Pankok, Carl
2010-01-01
Most aircraft managed by air-traffic controllers in the National Airspace System are capable of flying parallel-offset routes. This paper presents the results of two related studies on the effects of increased use of offset routes as a conflict resolution maneuver. The first study analyzes offset routes in the context of all standard resolution types which air-traffic controllers currently use. This study shows that by utilizing parallel-offset route maneuvers, significant system-wide savings in delay due to conflict resolution of up to 30% are possible. It also shows that most offset resolutions replace horizontal-vectoring resolutions. The second study builds on the results of the first and directly compares offset resolutions and standard horizontal-vectoring maneuvers to determine that in-trail conflicts are often more efficiently resolved by offset maneuvers.
Optical computing and image processing using photorefractive gallium arsenide
NASA Technical Reports Server (NTRS)
Cheng, Li-Jen; Liu, Duncan T. H.
1990-01-01
Recent experimental results on matrix-vector multiplication and multiple four-wave mixing using GaAs are presented. Attention is given to a simple concept of using two overlapping holograms in GaAs to do two matrix-vector multiplication processes operating in parallel with a common input vector. This concept can be used to construct high-speed, high-capacity, reconfigurable interconnection and multiplexing modules, important for optical computing and neural-network applications.
Solution of partial differential equations on vector and parallel computers
NASA Technical Reports Server (NTRS)
Ortega, J. M.; Voigt, R. G.
1985-01-01
The present status of numerical methods for partial differential equations on vector and parallel computers was reviewed. The relevant aspects of these computers are discussed and a brief review of their development is included, with particular attention paid to those characteristics that influence algorithm selection. Both direct and iterative methods are given for elliptic equations as well as explicit and implicit methods for initial boundary value problems. The intent is to point out attractive methods as well as areas where this class of computer architecture cannot be fully utilized because of either hardware restrictions or the lack of adequate algorithms. Application areas utilizing these computers are briefly discussed.
Using domain decomposition in the multigrid NAS parallel benchmark on the Fujitsu VPP500
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wang, J.C.H.; Lung, H.; Katsumata, Y.
1995-12-01
In this paper, we demonstrate how domain decomposition can be applied to the multigrid algorithm to convert the code for MPP architectures. We also discuss the performance and scalability of this implementation on the new product line of Fujitsu`s vector parallel computer, VPP500. This computer has Fujitsu`s well-known vector processor as the PE each rated at 1.6 C FLOPS. The high speed crossbar network rated at 800 MB/s provides the inter-PE communication. The results show that the physical domain decomposition is the best way to solve MG problems on VPP500.
Asymmetry in the Farley-Buneman dispersion relation caused by parallel electric fields
NASA Astrophysics Data System (ADS)
Forsythe, Victoriya V.; Makarevich, Roman A.
2016-11-01
An implicit assumption utilized in studies of E region plasma waves generated by the Farley-Buneman instability (FBI) is that the FBI dispersion relation and its solutions for the growth rate and phase velocity are perfectly symmetric with respect to the reversal of the wave propagation component parallel to the magnetic field. In the present study, a recently derived general dispersion relation that describes fundamental plasma instabilities in the lower ionosphere including FBI is considered and it is demonstrated that the dispersion relation is symmetric only for background electric fields that are perfectly perpendicular to the magnetic field. It is shown that parallel electric fields result in significant differences between the growth rates and phase velocities for propagation of parallel components of opposite signs. These differences are evaluated using numerical solutions of the general dispersion relation and shown to exhibit an approximately linear relationship with the parallel electric field near the E region peak altitude of 110 km. An analytic expression for the differences is also derived from an approximate version of the dispersion relation, with comparisons between numerical and analytic results agreeing near 110 km. It is further demonstrated that parallel electric fields do not change the overall symmetry when the full 3-D wave propagation vector is reversed, with no symmetry seen when either the perpendicular or parallel component is reversed. The present results indicate that moderate-to-strong parallel electric fields of 0.1-1.0 mV/m can result in experimentally measurable differences between the characteristics of plasma waves with parallel propagation components of opposite polarity.
Sparse matrix-vector multiplication on network-on-chip
NASA Astrophysics Data System (ADS)
Sun, C.-C.; Götze, J.; Jheng, H.-Y.; Ruan, S.-J.
2010-12-01
In this paper, we present an idea for performing matrix-vector multiplication by using Network-on-Chip (NoC) architecture. In traditional IC design on-chip communications have been designed with dedicated point-to-point interconnections. Therefore, regular local data transfer is the major concept of many parallel implementations. However, when dealing with the parallel implementation of sparse matrix-vector multiplication (SMVM), which is the main step of all iterative algorithms for solving systems of linear equation, the required data transfers depend on the sparsity structure of the matrix and can be extremely irregular. Using the NoC architecture makes it possible to deal with arbitrary structure of the data transfers; i.e. with the irregular structure of the sparse matrices. So far, we have already implemented the proposed SMVM-NoC architecture with the size 4×4 and 5×5 in IEEE 754 single float point precision using FPGA.
Satellite Angular Rate Estimation From Vector Measurements
NASA Technical Reports Server (NTRS)
Azor, Ruth; Bar-Itzhack, Itzhack Y.; Harman, Richard R.
1996-01-01
This paper presents an algorithm for estimating the angular rate vector of a satellite which is based on the time derivatives of vector measurements expressed in a reference and body coordinate. The computed derivatives are fed into a spacial Kalman filter which yields an estimate of the spacecraft angular velocity. The filter, named Extended Interlaced Kalman Filter (EIKF), is an extension of the Kalman filter which, although being linear, estimates the state of a nonlinear dynamic system. It consists of two or three parallel Kalman filters whose individual estimates are fed to one another and are considered as known inputs by the other parallel filter(s). The nonlinear dynamics stem from the nonlinear differential equation that describes the rotation of a three dimensional body. Initial results, using simulated data, and real Rossi X ray Timing Explorer (RXTE) data indicate that the algorithm is efficient and robust.
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters.
Lan, Haidong; Chan, Yuandong; Xu, Kai; Schmidt, Bertil; Peng, Shaoliang; Liu, Weiguo
2016-07-19
Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data parallelism, thread-level coarse-grained parallelism, and vector-level fine-grained parallelism. Furthermore, we re-organize the sequence datasets and use Xeon Phi shuffle operations to improve I/O efficiency. Evaluations show that our method achieves a peak overall performance up to 220 GCUPS for scanning real protein sequence databanks on a single node consisting of two Intel E5-2620 CPUs and two Intel Xeon Phi 7110P cards. It also exhibits good scalability in terms of sequence length and size, and number of compute nodes for both database scanning and multiple sequence alignment. Furthermore, the achieved performance is highly competitive in comparison to optimized Xeon Phi and GPU implementations. Our implementation is available at https://github.com/turbo0628/LSDBS-mpi .
Full-field drift Hamiltonian particle orbits in 3D geometry
NASA Astrophysics Data System (ADS)
Cooper, W. A.; Graves, J. P.; Brunner, S.; Isaev, M. Yu
2011-02-01
A Hamiltonian/Lagrangian theory to describe guiding centre orbit drift motion which is canonical in the Boozer coordinate frame has been extended to include full electromagnetic perturbed fields in anisotropic pressure 3D equilibria with nested magnetic flux surfaces. A redefinition of the guiding centre velocity to eliminate the motion due to finite equilibrium radial magnetic fields and the choice of a gauge condition that sets the radial component of the electromagnetic vector potential to zero are invoked to guarantee that the Boozer angular coordinates retain the canonical structure. The canonical momenta are identified and the guiding centre particle radial drift motion and parallel gyroradius evolution are derived. The particle coordinate position is linearly modified by wave-particle interactions. All the nonlinear wave-wave interactions appear explicitly only in the evolution of the parallel gyroradius. The radial variation of the electrostatic potential is related to the binormal component of the displacement vector for MHD-type perturbations. The electromagnetic vector potential projections can then be determined from the electrostatic potential and the radial component of the MHD displacement vector.
Comparison of serological and molecular panels for diagnosis of vector-borne diseases in dogs
2014-01-01
Background Canine vector-borne diseases (CVBD) are caused by a diverse array of pathogens with varying biological behaviors that result in a wide spectrum of clinical presentations and laboratory abnormalities. For many reasons, the diagnosis of canine vector-borne infectious diseases can be challenging for clinicians. The aim of the present study was to compare CVBD serological and molecular testing as the two most common methodologies used for screening healthy dogs or diagnosing sick dogs in which a vector-borne disease is suspected. Methods We used serological (Anaplasma species, Babesia canis, Bartonella henselae, Bartonella vinsonii subspecies berkhoffii, Borrelia burgdorferi, Ehrlichia canis, and SFG Rickettsia) and molecular assays to assess for exposure to, or infection with, 10 genera of organisms that cause CVBDs (Anaplasma, Babesia, Bartonella, Borrelia, Ehrlichia, Francisella, hemotropic Mycoplasma, Neorickettsia, Rickettsia, and Dirofilaria). Paired serum and EDTA blood samples from 30 clinically healthy dogs (Group I) and from 69 sick dogs suspected of having one or more canine vector-borne diseases (Groups II-IV), were tested in parallel to establish exposure to or infection with the specific CVBDs targeted in this study. Results Among all dogs tested (Groups I-IV), the molecular prevalences for individual CVBD pathogens ranged between 23.3 and 39.1%. Similarly, pathogen-specific seroprevalences ranged from 43.3% to 59.4% among healthy and sick dogs (Groups I-IV). Among these representative sample groupings, a panel combining serological and molecular assays run in parallel resulted in a 4-58% increase in the recognition of exposure to or infection with CVBD. Conclusions We conclude that serological and PCR assays should be used in parallel to maximize CVBD diagnosis. PMID:24670154
Comparison of serological and molecular panels for diagnosis of vector-borne diseases in dogs.
Maggi, Ricardo G; Birkenheuer, Adam J; Hegarty, Barbara C; Bradley, Julie M; Levy, Michael G; Breitschwerdt, Edward B
2014-03-26
Canine vector-borne diseases (CVBD) are caused by a diverse array of pathogens with varying biological behaviors that result in a wide spectrum of clinical presentations and laboratory abnormalities. For many reasons, the diagnosis of canine vector-borne infectious diseases can be challenging for clinicians. The aim of the present study was to compare CVBD serological and molecular testing as the two most common methodologies used for screening healthy dogs or diagnosing sick dogs in which a vector-borne disease is suspected. We used serological (Anaplasma species, Babesia canis, Bartonella henselae, Bartonella vinsonii subspecies berkhoffii, Borrelia burgdorferi, Ehrlichia canis, and SFG Rickettsia) and molecular assays to assess for exposure to, or infection with, 10 genera of organisms that cause CVBDs (Anaplasma, Babesia, Bartonella, Borrelia, Ehrlichia, Francisella, hemotropic Mycoplasma, Neorickettsia, Rickettsia, and Dirofilaria). Paired serum and EDTA blood samples from 30 clinically healthy dogs (Group I) and from 69 sick dogs suspected of having one or more canine vector-borne diseases (Groups II-IV), were tested in parallel to establish exposure to or infection with the specific CVBDs targeted in this study. Among all dogs tested (Groups I-IV), the molecular prevalences for individual CVBD pathogens ranged between 23.3 and 39.1%. Similarly, pathogen-specific seroprevalences ranged from 43.3% to 59.4% among healthy and sick dogs (Groups I-IV). Among these representative sample groupings, a panel combining serological and molecular assays run in parallel resulted in a 4-58% increase in the recognition of exposure to or infection with CVBD. We conclude that serological and PCR assays should be used in parallel to maximize CVBD diagnosis.
1980-03-01
Pressure on a Flat Plate, Arnold Engineering Development Center, Arnold Air Force Station, Tennessee 37389, AEDC-TR-79-14. 28. G. B. Thomas , Calculus and...Equation (6) was then 0.00177 sec. The average impact force from Equation (7) was 23,245 lb. The bird impact force-time history (28) G. B. Thomas ... Calculus and Analytic Geometry, Addison- Wesley, 1965. 60 Parallel to C Windshield N is unit vector B normal to windshieldNN panel at target point ... 4C
NASA Technical Reports Server (NTRS)
Reif, John H.
1987-01-01
A parallel compression algorithm for the 16,384 processor MPP machine was developed. The serial version of the algorithm can be viewed as a combination of on-line dynamic lossless test compression techniques (which employ simple learning strategies) and vector quantization. These concepts are described. How these concepts are combined to form a new strategy for performing dynamic on-line lossy compression is discussed. Finally, the implementation of this algorithm in a massively parallel fashion on the MPP is discussed.
Fast adaptive composite grid methods on distributed parallel architectures
NASA Technical Reports Server (NTRS)
Lemke, Max; Quinlan, Daniel
1992-01-01
The fast adaptive composite (FAC) grid method is compared with the adaptive composite method (AFAC) under variety of conditions including vectorization and parallelization. Results are given for distributed memory multiprocessor architectures (SUPRENUM, Intel iPSC/2 and iPSC/860). It is shown that the good performance of AFAC and its superiority over FAC in a parallel environment is a property of the algorithm and not dependent on peculiarities of any machine.
Algorithms for parallel and vector computations
NASA Technical Reports Server (NTRS)
Ortega, James M.
1995-01-01
This is a final report on work performed under NASA grant NAG-1-1112-FOP during the period March, 1990 through February 1995. Four major topics are covered: (1) solution of nonlinear poisson-type equations; (2) parallel reduced system conjugate gradient method; (3) orderings for conjugate gradient preconditioners, and (4) SOR as a preconditioner.
Application of high-performance computing to numerical simulation of human movement
NASA Technical Reports Server (NTRS)
Anderson, F. C.; Ziegler, J. M.; Pandy, M. G.; Whalen, R. T.
1995-01-01
We have examined the feasibility of using massively-parallel and vector-processing supercomputers to solve large-scale optimization problems for human movement. Specifically, we compared the computational expense of determining the optimal controls for the single support phase of gait using a conventional serial machine (SGI Iris 4D25), a MIMD parallel machine (Intel iPSC/860), and a parallel-vector-processing machine (Cray Y-MP 8/864). With the human body modeled as a 14 degree-of-freedom linkage actuated by 46 musculotendinous units, computation of the optimal controls for gait could take up to 3 months of CPU time on the Iris. Both the Cray and the Intel are able to reduce this time to practical levels. The optimal solution for gait can be found with about 77 hours of CPU on the Cray and with about 88 hours of CPU on the Intel. Although the overall speeds of the Cray and the Intel were found to be similar, the unique capabilities of each machine are better suited to different portions of the computational algorithm used. The Intel was best suited to computing the derivatives of the performance criterion and the constraints whereas the Cray was best suited to parameter optimization of the controls. These results suggest that the ideal computer architecture for solving very large-scale optimal control problems is a hybrid system in which a vector-processing machine is integrated into the communication network of a MIMD parallel machine.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hau, L.-N.; Department of Physics, National Central University, Jhongli, Taiwan; Lai, Y.-T.
Harris-type current sheets with the magnetic field model of B-vector=B{sub x}(z)x-caret+B{sub y}(z)y-caret have many important applications to space, astrophysical, and laboratory plasmas for which the temperature or pressure usually exhibits the gyrotropic form of p{r_reversible}=p{sub Parallel-To }b-caretb-caret+p{sub Up-Tack }(I{r_reversible}-b-caretb-caret). Here, p{sub Parallel-To} and p{sub Up-Tack} are, respectively, to be the pressure component along and perpendicular to the local magnetic field, b-caret=B-vector/B. This study presents the general formulation for magnetohydrodynamic (MHD) wave propagation, fire-hose, and mirror instabilities in general Harris-type current sheets. The wave equations are expressed in terms of the four MHD characteristic speeds of fast, intermediate, slow, and cuspmore » waves, and in the local (k{sub Parallel-To },k{sub Up-Tack },z) coordinates. Here, k{sub Parallel-To} and k{sub Up-Tack} are, respectively, to be the wave vector along and perpendicular to the local magnetic field. The parameter regimes for the existence of discrete and resonant modes are identified, which may become unstable at the local fire-hose and mirror instability thresholds. Numerical solutions for discrete eigenmodes are shown for stable and unstable cases. The results have important implications for the anomalous heating and stability of thin current sheets.« less
Parallel algorithm for determining motion vectors in ice floe images by matching edge features
NASA Technical Reports Server (NTRS)
Manohar, M.; Ramapriyan, H. K.; Strong, J. P.
1988-01-01
A parallel algorithm is described to determine motion vectors of ice floes using time sequences of images of the Arctic ocean obtained from the Synthetic Aperture Radar (SAR) instrument flown on-board the SEASAT spacecraft. Researchers describe a parallel algorithm which is implemented on the MPP for locating corresponding objects based on their translationally and rotationally invariant features. The algorithm first approximates the edges in the images by polygons or sets of connected straight-line segments. Each such edge structure is then reduced to a seed point. Associated with each seed point are the descriptions (lengths, orientations and sequence numbers) of the lines constituting the corresponding edge structure. A parallel matching algorithm is used to match packed arrays of such descriptions to identify corresponding seed points in the two images. The matching algorithm is designed such that fragmentation and merging of ice floes are taken into account by accepting partial matches. The technique has been demonstrated to work on synthetic test patterns and real image pairs from SEASAT in times ranging from .5 to 0.7 seconds for 128 x 128 images.
NASA Astrophysics Data System (ADS)
Chang, Faliang; Liu, Chunsheng
2017-09-01
The high variability of sign colors and shapes in uncontrolled environments has made the detection of traffic signs a challenging problem in computer vision. We propose a traffic sign detection (TSD) method based on coarse-to-fine cascade and parallel support vector machine (SVM) detectors to detect Chinese warning and danger traffic signs. First, a region of interest (ROI) extraction method is proposed to extract ROIs using color contrast features in local regions. The ROI extraction can reduce scanning regions and save detection time. For multiclass TSD, we propose a structure that combines a coarse-to-fine cascaded tree with a parallel structure of histogram of oriented gradients (HOG) + SVM detectors. The cascaded tree is designed to detect different types of traffic signs in a coarse-to-fine process. The parallel HOG + SVM detectors are designed to do fine detection of different types of traffic signs. The experiments demonstrate the proposed TSD method can rapidly detect multiclass traffic signs with different colors and shapes in high accuracy.
NASA Astrophysics Data System (ADS)
Hara, Tatsuhiko
2004-08-01
We implement the Direct Solution Method (DSM) on a vector-parallel supercomputer and show that it is possible to significantly improve its computational efficiency through parallel computing. We apply the parallel DSM calculation to waveform inversion of long period (250-500 s) surface wave data for three-dimensional (3-D) S-wave velocity structure in the upper and uppermost lower mantle. We use a spherical harmonic expansion to represent lateral variation with the maximum angular degree 16. We find significant low velocities under south Pacific hot spots in the transition zone. This is consistent with other seismological studies conducted in the Superplume project, which suggests deep roots of these hot spots. We also perform simultaneous waveform inversion for 3-D S-wave velocity and Q structure. Since resolution for Q is not good, we develop a new technique in which power spectra are used as data for inversion. We find good correlation between long wavelength patterns of Vs and Q in the transition zone such as high Vs and high Q under the western Pacific.
Implementation and analysis of a Navier-Stokes algorithm on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1988-01-01
The results of the implementation of a Navier-Stokes algorithm on three parallel/vector computers are presented. The object of this research is to determine how well, or poorly, a single numerical algorithm would map onto three different architectures. The algorithm is a compact difference scheme for the solution of the incompressible, two-dimensional, time-dependent Navier-Stokes equations. The computers were chosen so as to encompass a variety of architectures. They are the following: the MPP, an SIMD machine with 16K bit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. The basic comparison is among SIMD instruction parallelism on the MPP, MIMD process parallelism on the Flex/32, and vectorization of a serial code on the Cray/2. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.
Parallelization of the Physical-Space Statistical Analysis System (PSAS)
NASA Technical Reports Server (NTRS)
Larson, J. W.; Guo, J.; Lyster, P. M.
1999-01-01
Atmospheric data assimilation is a method of combining observations with model forecasts to produce a more accurate description of the atmosphere than the observations or forecast alone can provide. Data assimilation plays an increasingly important role in the study of climate and atmospheric chemistry. The NASA Data Assimilation Office (DAO) has developed the Goddard Earth Observing System Data Assimilation System (GEOS DAS) to create assimilated datasets. The core computational components of the GEOS DAS include the GEOS General Circulation Model (GCM) and the Physical-space Statistical Analysis System (PSAS). The need for timely validation of scientific enhancements to the data assimilation system poses computational demands that are best met by distributed parallel software. PSAS is implemented in Fortran 90 using object-based design principles. The analysis portions of the code solve two equations. The first of these is the "innovation" equation, which is solved on the unstructured observation grid using a preconditioned conjugate gradient (CG) method. The "analysis" equation is a transformation from the observation grid back to a structured grid, and is solved by a direct matrix-vector multiplication. Use of a factored-operator formulation reduces the computational complexity of both the CG solver and the matrix-vector multiplication, rendering the matrix-vector multiplications as a successive product of operators on a vector. Sparsity is introduced to these operators by partitioning the observations using an icosahedral decomposition scheme. PSAS builds a large (approx. 128MB) run-time database of parameters used in the calculation of these operators. Implementing a message passing parallel computing paradigm into an existing yet developing computational system as complex as PSAS is nontrivial. One of the technical challenges is balancing the requirements for computational reproducibility with the need for high performance. The problem of computational reproducibility is well known in the parallel computing community. It is a requirement that the parallel code perform calculations in a fashion that will yield identical results on different configurations of processing elements on the same platform. In some cases this problem can be solved by sacrificing performance. Meeting this requirement and still achieving high performance is very difficult. Topics to be discussed include: current PSAS design and parallelization strategy; reproducibility issues; load balance vs. database memory demands, possible solutions to these problems.
Analysis of ground-motion simulation big data
NASA Astrophysics Data System (ADS)
Maeda, T.; Fujiwara, H.
2016-12-01
We developed a parallel distributed processing system which applies a big data analysis to the large-scale ground motion simulation data. The system uses ground-motion index values and earthquake scenario parameters as input. We used peak ground velocity value and velocity response spectra as the ground-motion index. The ground-motion index values are calculated from our simulation data. We used simulated long-period ground motion waveforms at about 80,000 meshes calculated by a three dimensional finite difference method based on 369 earthquake scenarios of a great earthquake in the Nankai Trough. These scenarios were constructed by considering the uncertainty of source model parameters such as source area, rupture starting point, asperity location, rupture velocity, fmax and slip function. We used these parameters as the earthquake scenario parameter. The system firstly carries out the clustering of the earthquake scenario in each mesh by the k-means method. The number of clusters is determined in advance using a hierarchical clustering by the Ward's method. The scenario clustering results are converted to the 1-D feature vector. The dimension of the feature vector is the number of scenario combination. If two scenarios belong to the same cluster the component of the feature vector is 1, and otherwise the component is 0. The feature vector shows a `response' of mesh to the assumed earthquake scenario group. Next, the system performs the clustering of the mesh by k-means method using the feature vector of each mesh previously obtained. Here the number of clusters is arbitrarily given. The clustering of scenarios and meshes are performed by parallel distributed processing with Hadoop and Spark, respectively. In this study, we divided the meshes into 20 clusters. The meshes in each cluster are geometrically concentrated. Thus this system can extract regions, in which the meshes have similar `response', as clusters. For each cluster, it is possible to determine particular scenario parameters which characterize the cluster. In other word, by utilizing this system, we can obtain critical scenario parameters of the ground-motion simulation for each evaluation point objectively. This research was supported by CREST, JST.
Parallel integer sorting with medium and fine-scale parallelism
NASA Technical Reports Server (NTRS)
Dagum, Leonardo
1993-01-01
Two new parallel integer sorting algorithms, queue-sort and barrel-sort, are presented and analyzed in detail. These algorithms do not have optimal parallel complexity, yet they show very good performance in practice. Queue-sort designed for fine-scale parallel architectures which allow the queueing of multiple messages to the same destination. Barrel-sort is designed for medium-scale parallel architectures with a high message passing overhead. The performance results from the implementation of queue-sort on a Connection Machine CM-2 and barrel-sort on a 128 processor iPSC/860 are given. The two implementations are found to be comparable in performance but not as good as a fully vectorized bucket sort on the Cray YMP.
Three-Dimensional Route Planning for a Cruise Missile for Minimal Detection by Observer
1989-06-01
detect the enemy’s weakest avenues of approach are needed. Such systems could also be used to identify our own deficiencies and allow for...vector-k (oval (line-segment-direction-vector (oval line-i))))) ( Tk2 (vector-k (eval (line-segment-direction-voctor (oval line-2))))) (Tval ’nil...zerop Tkl)) (not (zerop Tk2 ))) (setf Tval (/ Tkl Tk2 ))) (t (return-from parallel-lines ’nil))) (cond ((and (equal Til (* Tval Ti2)) (equal Tjl (* Tval
Interaction of upgoing auroral H(+) and O(+) beams
NASA Technical Reports Server (NTRS)
Kaufmann, R. L.; Ludlow, G. R.; Collin, H. L.; Peterson, W. K.; Burch, J. L.
1986-01-01
Data from the S3-3 and DE 1 satellites are analyzed to study the interaction between H(+) and O(+) ions in upgoing auroral beams. Every data set analyzed showed some evidence of an interaction. The measured plasma was found to be unstable to a low-frequency electrostatic wave that propagates at an oblique angle to vector-B(0). A second wave, which can propagate parallel to vector-B(0), is weakly damped in the plasma studied in most detail. It is likely that the upgoing ion beams generate this parallel wave at lower altitudes. The resulting wave-particle interactions qualitatively can explain most of the features observed in ion distribution functions.
Real time display Fourier-domain OCT using multi-thread parallel computing with data vectorization
NASA Astrophysics Data System (ADS)
Eom, Tae Joong; Kim, Hoon Seop; Kim, Chul Min; Lee, Yeung Lak; Choi, Eun-Seo
2011-03-01
We demonstrate a real-time display of processed OCT images using multi-thread parallel computing with a quad-core CPU of a personal computer. The data of each A-line are treated as one vector to maximize the data translation rate between the cores of the CPU and RAM stored image data. A display rate of 29.9 frames/sec for processed OCT data (4096 FFT-size x 500 A-scans) is achieved in our system using a wavelength swept source with 52-kHz swept frequency. The data processing times of the OCT image and a Doppler OCT image with a 4-time average are 23.8 msec and 91.4 msec.
Some fast elliptic solvers on parallel architectures and their complexities
NASA Technical Reports Server (NTRS)
Gallopoulos, E.; Saad, Y.
1989-01-01
The discretization of separable elliptic partial differential equations leads to linear systems with special block tridiagonal matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconstant coefficients. A method was recently proposed to parallelize and vectorize BCR. In this paper, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational compelxity lower than that of parallel BCR.
Some fast elliptic solvers on parallel architectures and their complexities
NASA Technical Reports Server (NTRS)
Gallopoulos, E.; Saad, Youcef
1989-01-01
The discretization of separable elliptic partial differential equations leads to linear systems with special block triangular matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconsistant coefficients. A method was recently proposed to parallelize and vectorize BCR. Here, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches, including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational complexity lower than that of parallel BCR.
Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™
Gomes, Jeremias M.; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H.
2016-01-01
We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel® Xeon Phi™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP’s irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63× on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7× and 1.62×, respectively, as compared to efficient CPU and GPU implementations. PMID:27298591
Efficient irregular wavefront propagation algorithms on Intel® Xeon Phi™.
Gomes, Jeremias M; Teodoro, George; de Melo, Alba; Kong, Jun; Kurc, Tahsin; Saltz, Joel H
2015-10-01
We investigate the execution of the Irregular Wavefront Propagation Pattern (IWPP), a fundamental computing structure used in several image analysis operations, on the Intel ® Xeon Phi ™ co-processor. An efficient implementation of IWPP on the Xeon Phi is a challenging problem because of IWPP's irregularity and the use of atomic instructions in the original IWPP algorithm to resolve race conditions. On the Xeon Phi, the use of SIMD and vectorization instructions is critical to attain high performance. However, SIMD atomic instructions are not supported. Therefore, we propose a new IWPP algorithm that can take advantage of the supported SIMD instruction set. We also evaluate an alternate storage container (priority queue) to track active elements in the wavefront in an effort to improve the parallel algorithm efficiency. The new IWPP algorithm is evaluated with Morphological Reconstruction and Imfill operations as use cases. Our results show performance improvements of up to 5.63 × on top of the original IWPP due to vectorization. Moreover, the new IWPP achieves speedups of 45.7 × and 1.62 × , respectively, as compared to efficient CPU and GPU implementations.
Magic Pools: Parallel Assessment of Transposon Delivery Vectors in Bacteria
DOE Office of Scientific and Technical Information (OSTI.GOV)
Liu, Hualan; Price, Morgan N.; Waters, Robert Jordan
Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach for discovering the functions of bacterial genes. However, the development of a suitable TnSeq strategy for a given bacterium can be costly and time-consuming. To meet this challenge, we describe a part-based strategy for constructing libraries of hundreds of transposon delivery vectors, which we term “magic pools.” Within a magic pool, each transposon vector has a different combination of upstream sequences (promoters and ribosome binding sites) and antibiotic resistance markers as well as a random DNA barcode sequence, which allows the tracking of each vector during mutagenesis experiments. Tomore » identify an efficient vector for a given bacterium, we mutagenize it with a magic pool and sequence the resulting insertions; we then use this efficient vector to generate a large mutant library. We used the magic pool strategy to construct transposon mutant libraries in five genera of bacteria, including three genera of the phylumBacteroidetes. IMPORTANCEMolecular genetics is indispensable for interrogating the physiology of bacteria. However, the development of a functional genetic system for any given bacterium can be time-consuming. Here, we present a streamlined approach for identifying an effective transposon mutagenesis system for a new bacterium. Our strategy first involves the construction of hundreds of different transposon vector variants, which we term a “magic pool.” The efficacy of each vector in a magic pool is monitored in parallel using a unique DNA barcode that is introduced into each vector design. Using archived DNA “parts,” we next reassemble an effective vector for making a whole-genome transposon mutant library that is suitable for large-scale interrogation of gene function using competitive growth assays. Here, we demonstrate the utility of the magic pool system to make mutant libraries in five genera of bacteria.« less
Magic Pools: Parallel Assessment of Transposon Delivery Vectors in Bacteria
Liu, Hualan; Price, Morgan N.; Waters, Robert Jordan; ...
2018-01-16
Transposon mutagenesis coupled to next-generation sequencing (TnSeq) is a powerful approach for discovering the functions of bacterial genes. However, the development of a suitable TnSeq strategy for a given bacterium can be costly and time-consuming. To meet this challenge, we describe a part-based strategy for constructing libraries of hundreds of transposon delivery vectors, which we term “magic pools.” Within a magic pool, each transposon vector has a different combination of upstream sequences (promoters and ribosome binding sites) and antibiotic resistance markers as well as a random DNA barcode sequence, which allows the tracking of each vector during mutagenesis experiments. Tomore » identify an efficient vector for a given bacterium, we mutagenize it with a magic pool and sequence the resulting insertions; we then use this efficient vector to generate a large mutant library. We used the magic pool strategy to construct transposon mutant libraries in five genera of bacteria, including three genera of the phylumBacteroidetes. IMPORTANCEMolecular genetics is indispensable for interrogating the physiology of bacteria. However, the development of a functional genetic system for any given bacterium can be time-consuming. Here, we present a streamlined approach for identifying an effective transposon mutagenesis system for a new bacterium. Our strategy first involves the construction of hundreds of different transposon vector variants, which we term a “magic pool.” The efficacy of each vector in a magic pool is monitored in parallel using a unique DNA barcode that is introduced into each vector design. Using archived DNA “parts,” we next reassemble an effective vector for making a whole-genome transposon mutant library that is suitable for large-scale interrogation of gene function using competitive growth assays. Here, we demonstrate the utility of the magic pool system to make mutant libraries in five genera of bacteria.« less
A conservative scheme for electromagnetic simulation of magnetized plasmas with kinetic electrons
NASA Astrophysics Data System (ADS)
Bao, J.; Lin, Z.; Lu, Z. X.
2018-02-01
A conservative scheme has been formulated and verified for gyrokinetic particle simulations of electromagnetic waves and instabilities in magnetized plasmas. An electron continuity equation derived from the drift kinetic equation is used to time advance the electron density perturbation by using the perturbed mechanical flow calculated from the parallel vector potential, and the parallel vector potential is solved by using the perturbed canonical flow from the perturbed distribution function. In gyrokinetic particle simulations using this new scheme, the shear Alfvén wave dispersion relation in the shearless slab and continuum damping in the sheared cylinder have been recovered. The new scheme overcomes the stringent requirement in the conventional perturbative simulation method that perpendicular grid size needs to be as small as electron collisionless skin depth even for the long wavelength Alfvén waves. The new scheme also avoids the problem in the conventional method that an unphysically large parallel electric field arises due to the inconsistency between electrostatic potential calculated from the perturbed density and vector potential calculated from the perturbed canonical flow. Finally, the gyrokinetic particle simulations of the Alfvén waves in sheared cylinder have superior numerical properties compared with the fluid simulations, which suffer from numerical difficulties associated with singular mode structures.
An M-step preconditioned conjugate gradient method for parallel computation
NASA Technical Reports Server (NTRS)
Adams, L.
1983-01-01
This paper describes a preconditioned conjugate gradient method that can be effectively implemented on both vector machines and parallel arrays to solve sparse symmetric and positive definite systems of linear equations. The implementation on the CYBER 203/205 and on the Finite Element Machine is discussed and results obtained using the method on these machines are given.
Wavelet Transforms in Parallel Image Processing
1994-01-27
NUMBER OF PAGES Object Segmentation, Texture Segmentation, Image Compression, Image 137 Halftoning , Neural Network, Parallel Algorithms, 2D and 3D...Vector Quantization of Wavelet Transform Coefficients ........ ............................. 57 B.1.f Adaptive Image Halftoning based on Wavelet...application has been directed to the adaptive image halftoning . The gray information at a pixel, including its gray value and gradient, is represented by
Some Problems and Solutions in Transferring Ecosystem Simulation Codes to Supercomputers
NASA Technical Reports Server (NTRS)
Skiles, J. W.; Schulbach, C. H.
1994-01-01
Many computer codes for the simulation of ecological systems have been developed in the last twenty-five years. This development took place initially on main-frame computers, then mini-computers, and more recently, on micro-computers and workstations. Recent recognition of ecosystem science as a High Performance Computing and Communications Program Grand Challenge area emphasizes supercomputers (both parallel and distributed systems) as the next set of tools for ecological simulation. Transferring ecosystem simulation codes to such systems is not a matter of simply compiling and executing existing code on the supercomputer since there are significant differences in the system architectures of sequential, scalar computers and parallel and/or vector supercomputers. To more appropriately match the application to the architecture (necessary to achieve reasonable performance), the parallelism (if it exists) of the original application must be exploited. We discuss our work in transferring a general grassland simulation model (developed on a VAX in the FORTRAN computer programming language) to a Cray Y-MP. We show the Cray shared-memory vector-architecture, and discuss our rationale for selecting the Cray. We describe porting the model to the Cray and executing and verifying a baseline version, and we discuss the changes we made to exploit the parallelism in the application and to improve code execution. As a result, the Cray executed the model 30 times faster than the VAX 11/785 and 10 times faster than a Sun 4 workstation. We achieved an additional speed-up of approximately 30 percent over the original Cray run by using the compiler's vectorizing capabilities and the machine's ability to put subroutines and functions "in-line" in the code. With the modifications, the code still runs at only about 5% of the Cray's peak speed because it makes ineffective use of the vector processing capabilities of the Cray. We conclude with a discussion and future plans.
On the parallel solution of parabolic equations
NASA Technical Reports Server (NTRS)
Gallopoulos, E.; Saad, Youcef
1989-01-01
Parallel algorithms for the solution of linear parabolic problems are proposed. The first of these methods is based on using polynomial approximation to the exponential. It does not require solving any linear systems and is highly parallelizable. The two other methods proposed are based on Pade and Chebyshev approximations to the matrix exponential. The parallelization of these methods is achieved by using partial fraction decomposition techniques to solve the resulting systems and thus offers the potential for increased time parallelism in time dependent problems. Experimental results from the Alliant FX/8 and the Cray Y-MP/832 vector multiprocessors are also presented.
First experience of vectorizing electromagnetic physics models for detector simulation
NASA Astrophysics Data System (ADS)
Amadio, G.; Apostolakis, J.; Bandieramonte, M.; Bianchini, C.; Bitzes, G.; Brun, R.; Canal, P.; Carminati, F.; de Fine Licht, J.; Duhem, L.; Elvira, D.; Gheata, A.; Jun, S. Y.; Lima, G.; Novak, M.; Presbyterian, M.; Shadura, O.; Seghal, R.; Wenzel, S.
2015-12-01
The recent emergence of hardware architectures characterized by many-core or accelerated processors has opened new opportunities for concurrent programming models taking advantage of both SIMD and SIMT architectures. The GeantV vector prototype for detector simulations has been designed to exploit both the vector capability of mainstream CPUs and multi-threading capabilities of coprocessors including NVidia GPUs and Intel Xeon Phi. The characteristics of these architectures are very different in terms of the vectorization depth, parallelization needed to achieve optimal performance or memory access latency and speed. An additional challenge is to avoid the code duplication often inherent to supporting heterogeneous platforms. In this paper we present the first experience of vectorizing electromagnetic physics models developed for the GeantV project.
Hypercluster Parallel Processor
NASA Technical Reports Server (NTRS)
Blech, Richard A.; Cole, Gary L.; Milner, Edward J.; Quealy, Angela
1992-01-01
Hypercluster computer system includes multiple digital processors, operation of which coordinated through specialized software. Configurable according to various parallel-computing architectures of shared-memory or distributed-memory class, including scalar computer, vector computer, reduced-instruction-set computer, and complex-instruction-set computer. Designed as flexible, relatively inexpensive system that provides single programming and operating environment within which one can investigate effects of various parallel-computing architectures and combinations on performance in solution of complicated problems like those of three-dimensional flows in turbomachines. Hypercluster software and architectural concepts are in public domain.
Research on Parallel Three Phase PWM Converters base on RTDS
NASA Astrophysics Data System (ADS)
Xia, Yan; Zou, Jianxiao; Li, Kai; Liu, Jingbo; Tian, Jun
2018-01-01
Converters parallel operation can increase capacity of the system, but it may lead to potential zero-sequence circulating current, so the control of circulating current was an important goal in the design of parallel inverters. In this paper, the Real Time Digital Simulator (RTDS) is used to model the converters parallel system in real time and study the circulating current restraining. The equivalent model of two parallel converters and zero-sequence circulating current(ZSCC) were established and analyzed, then a strategy using variable zero vector control was proposed to suppress the circulating current. For two parallel modular converters, hardware-in-the-loop(HIL) study based on RTDS and practical experiment were implemented, results prove that the proposed control strategy is feasible and effective.
Multivectored Superficial Muscular Aponeurotic System Suspension for Facial Paralysis.
Leach, Garrison; Kurnik, Nicole; Joganic, Jessica; Joganic, Edward
2017-06-01
Facial paralysis is a devastating condition that may cause severe cosmetic and functional deformities. In this study we describe our technique for superficial muscular aponeurotic system (SMAS) suspension using barbed suture and compare the vectors of suspension in relation to the underlying musculature. This study also quantifies the improvements in postoperative symmetry using traditional anthropologic landmarks. The efficacy of this procedure for improving facial paralysis was determined by comparing anthropometric indices and using Procrustes distance between 4 groupings of homologous landmarks plotted on each patient's preoperative and postoperative photos. Geometric morphometrics was used to evaluate change in facial shape and improvement in symmetry postoperatively.To analyze the vector of suspension in relation to the underlying musculature, specific anthropologic landmarks were used to calculate the vector of the musculature in 3 facial hemispheres from cadaveric controls against the vector of repair in our patients. Ten patients were included in our study. Subjectively, great improvement in functional status was achieved. Geometric morphometric analysis demonstrated a statistically significant improvement in facial symmetry. Cadaveric dissection demonstrated that the suture should be placed in the SMAS in vectors parallel to the underlying musculature to achieve these results. There were no complications in our study to date. In conclusion, multivectored SMAS suture suspension is an effective method for restoring static suspension of the face after facial paralysis. This method has the benefit of producing quick, reliable results with improved function, low cost, and low morbidity.
Anisotropic Surface State Mediated RKKY Interaction Between Adatoms on a Hexagonal Lattice
NASA Astrophysics Data System (ADS)
Einstein, Theodore; Patrone, Paul
2012-02-01
Motivated by recent numerical studies of Ag on Pt(111), we derive a far-field expression for the RKKY interaction mediated by surface states on a (111) FCC surface, considering the effect of anisotropy in the Fermi edge. The main contribution to the interaction comes from electrons whose Fermi velocity vF is parallel to the vector R connecting the interacting adatoms; we show that in general, the corresponding Fermi wave-vector kF is not parallel to R. The interaction is oscillatory; the amplitude and wavelength of oscillations have angular dependence arising from the anisotropy of the surface state band structure. The wavelength, in particular, is determined by the component of the aforementioned kF that is parallel to R. Our analysis is easily generalized to other systems. For Ag on Pt(111), our results indicate that the RKKY interaction between pairs of adatoms should be nearly isotropic and so cannot account for the anisotropy found in the studies motivating our work.
Equation solvers for distributed-memory computers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.
1994-01-01
A large number of scientific and engineering problems require the rapid solution of large systems of simultaneous equations. The performance of parallel computers in this area now dwarfs traditional vector computers by nearly an order of magnitude. This talk describes the major issues involved in parallel equation solvers with particular emphasis on the Intel Paragon, IBM SP-1 and SP-2 processors.
NASA Astrophysics Data System (ADS)
Georgiev, K.; Zlatev, Z.
2010-11-01
The Danish Eulerian Model (DEM) is an Eulerian model for studying the transport of air pollutants on large scale. Originally, the model was developed at the National Environmental Research Institute of Denmark. The model computational domain covers Europe and some neighbour parts belong to the Atlantic Ocean, Asia and Africa. If DEM model is to be applied by using fine grids, then its discretization leads to a huge computational problem. This implies that such a model as DEM must be run only on high-performance computer architectures. The implementation and tuning of such a complex large-scale model on each different computer is a non-trivial task. Here, some comparison results of running of this model on different kind of vector (CRAY C92A, Fujitsu, etc.), parallel computers with distributed memory (IBM SP, CRAY T3E, Beowulf clusters, Macintosh G4 clusters, etc.), parallel computers with shared memory (SGI Origin, SUN, etc.) and parallel computers with two levels of parallelism (IBM SMP, IBM BlueGene/P, clusters of multiprocessor nodes, etc.) will be presented. The main idea in the parallel version of DEM is domain partitioning approach. Discussions according to the effective use of the cache and hierarchical memories of the modern computers as well as the performance, speed-ups and efficiency achieved will be done. The parallel code of DEM, created by using MPI standard library, appears to be highly portable and shows good efficiency and scalability on different kind of vector and parallel computers. Some important applications of the computer model output are presented in short.
Can we estimate total magnetization directions from aeromagnetic data using Helbig's integrals?
Phillips, J.D.
2005-01-01
An algorithm that implements Helbig's (1963) integrals for estimating the vector components (mx, my, mz) of tile magnetic dipole moment from the first order moments of the vector magnetic field components (??X, ??Y, ??Z) is tested on real and synthetic data. After a grid of total field aeromagnetic data is converted to vector component grids using Fourier filtering, Helbig's infinite integrals are evaluated as finite integrals in small moving windows using a quadrature algorithm based on the 2-D trapezoidal rule. Prior to integration, best-fit planar surfaces must be removed from the component data within the data windows in order to make the results independent of the coordinate system origin. Two different approaches are described for interpreting the results of the integration. In the "direct" method, results from pairs of different window sizes are compared to identify grid nodes where the angular difference between solutions is small. These solutions provide valid estimates of total magnetization directions for compact sources such as spheres or dipoles, but not for horizontally elongated or 2-D sources. In the "indirect" method, which is more forgiving of source geometry, results of the quadrature analysis are scanned for solutions that are parallel to a specified total magnetization direction.
Kindlmann, Gordon; Chiw, Charisee; Seltzer, Nicholas; Samuels, Lamont; Reppy, John
2016-01-01
Many algorithms for scientific visualization and image analysis are rooted in the world of continuous scalar, vector, and tensor fields, but are programmed in low-level languages and libraries that obscure their mathematical foundations. Diderot is a parallel domain-specific language that is designed to bridge this semantic gap by providing the programmer with a high-level, mathematical programming notation that allows direct expression of mathematical concepts in code. Furthermore, Diderot provides parallel performance that takes advantage of modern multicore processors and GPUs. The high-level notation allows a concise and natural expression of the algorithms and the parallelism allows efficient execution on real-world datasets.
Kalman Filter Tracking on Parallel Architectures
NASA Astrophysics Data System (ADS)
Cerati, Giuseppe; Elmer, Peter; Krutelyov, Slava; Lantz, Steven; Lefebvre, Matthieu; McDermott, Kevin; Riley, Daniel; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi
2016-11-01
Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors such as GPGPU, ARM and Intel MIC. In order to achieve the theoretical performance gains of these processors, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High-Luminosity Large Hadron Collider (HL-LHC), for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques such as Cellular Automata or Hough Transforms. The most common track finding techniques in use today, however, are those based on a Kalman filter approach. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust, and are in use today at the LHC. Given the utility of the Kalman filter in track finding, we have begun to port these algorithms to parallel architectures, namely Intel Xeon and Xeon Phi. We report here on our progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a simplified experimental environment.
Scaling Support Vector Machines On Modern HPC Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
You, Yang; Fu, Haohuan; Song, Shuaiwen
2015-02-01
We designed and implemented MIC-SVM, a highly efficient parallel SVM for x86 based multicore and many-core architectures, such as the Intel Ivy Bridge CPUs and Intel Xeon Phi co-processor (MIC). We propose various novel analysis methods and optimization techniques to fully utilize the multilevel parallelism provided by these architectures and serve as general optimization methods for other machine learning tools.
Using a constraint on the parallel velocity when determining electric fields with EISCAT
NASA Technical Reports Server (NTRS)
Caudal, G.; Blanc, M.
1988-01-01
A method is proposed to determine the perpendicular components of the ion velocity vector (and hence the perpendicular electric field) from EISCAT tristatic measurements, in which one introduces an additional constraint on the parallel velocity, in order to take account of our knowledge that the parallel velocity of ions is small. This procedure removes some artificial features introduced when the tristatic geometry becomes too unfavorable. It is particularly well suited for the southernmost or northernmost positions of the tristatic measurements performed by meridian scan experiments (CP3 mode).
Vector tomography for reconstructing electric fields with non-zero divergence in bounded domains
NASA Astrophysics Data System (ADS)
Koulouri, Alexandra; Brookes, Mike; Rimpiläinen, Ville
2017-01-01
In vector tomography (VT), the aim is to reconstruct an unknown multi-dimensional vector field using line integral data. In the case of a 2-dimensional VT, two types of line integral data are usually required. These data correspond to integration of the parallel and perpendicular projection of the vector field along the integration lines and are called the longitudinal and transverse measurements, respectively. In most cases, however, the transverse measurements cannot be physically acquired. Therefore, the VT methods are typically used to reconstruct divergence-free (or source-free) velocity and flow fields that can be reconstructed solely from the longitudinal measurements. In this paper, we show how vector fields with non-zero divergence in a bounded domain can also be reconstructed from the longitudinal measurements without the need of explicitly evaluating the transverse measurements. To the best of our knowledge, VT has not previously been used for this purpose. In particular, we study low-frequency, time-harmonic electric fields generated by dipole sources in convex bounded domains which arise, for example, in electroencephalography (EEG) source imaging. We explain in detail the theoretical background, the derivation of the electric field inverse problem and the numerical approximation of the line integrals. We show that fields with non-zero divergence can be reconstructed from the longitudinal measurements with the help of two sparsity constraints that are constructed from the transverse measurements and the vector Laplace operator. As a comparison to EEG source imaging, we note that VT does not require mathematical modeling of the sources. By numerical simulations, we show that the pattern of the electric field can be correctly estimated using VT and the location of the source activity can be determined accurately from the reconstructed magnitudes of the field.
NASA Technical Reports Server (NTRS)
Gentzsch, W.
1982-01-01
Problems which can arise with vector and parallel computers are discussed in a user oriented context. Emphasis is placed on the algorithms used and the programming techniques adopted. Three recently developed supercomputers are examined and typical application examples are given in CRAY FORTRAN, CYBER 205 FORTRAN and DAP (distributed array processor) FORTRAN. The systems performance is compared. The addition of parts of two N x N arrays is considered. The influence of the architecture on the algorithms and programming language is demonstrated. Numerical analysis of magnetohydrodynamic differential equations by an explicit difference method is illustrated, showing very good results for all three systems. The prognosis for supercomputer development is assessed.
Jafarpour, Farshid; Angheluta, Luiza; Goldenfeld, Nigel
2013-10-01
The dynamics of edge dislocations with parallel Burgers vectors, moving in the same slip plane, is mapped onto Dyson's model of a two-dimensional Coulomb gas confined in one dimension. We show that the tail distribution of the velocity of dislocations is power law in form, as a consequence of the pair interaction of nearest neighbors in one dimension. In two dimensions, we show the presence of a pairing phase transition in a system of interacting dislocations with parallel Burgers vectors. The scaling exponent of the velocity distribution at effective temperatures well below this pairing transition temperature can be derived from the nearest-neighbor interaction, while near the transition temperature, the distribution deviates from the form predicted by the nearest-neighbor interaction, suggesting the presence of collective effects.
CFD Research, Parallel Computation and Aerodynamic Optimization
NASA Technical Reports Server (NTRS)
Ryan, James S.
1995-01-01
During the last five years, CFD has matured substantially. Pure CFD research remains to be done, but much of the focus has shifted to integration of CFD into the design process. The work under these cooperative agreements reflects this trend. The recent work, and work which is planned, is designed to enhance the competitiveness of the US aerospace industry. CFD and optimization approaches are being developed and tested, so that the industry can better choose which methods to adopt in their design processes. The range of computer architectures has been dramatically broadened, as the assumption that only huge vector supercomputers could be useful has faded. Today, researchers and industry can trade off time, cost, and availability, choosing vector supercomputers, scalable parallel architectures, networked workstations, or heterogenous combinations of these to complete required computations efficiently.
Solving the Cauchy-Riemann equations on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1987-01-01
Discussed is the implementation of a single algorithm on three parallel-vector computers. The algorithm is a relaxation scheme for the solution of the Cauchy-Riemann equations; a set of coupled first order partial differential equations. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, and SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The machine architectures are briefly described. The implementation of the algorithm is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Conclusions are presented.
NASA Technical Reports Server (NTRS)
Nguyen, Duc T.
1990-01-01
Practical engineering application can often be formulated in the form of a constrained optimization problem. There are several solution algorithms for solving a constrained optimization problem. One approach is to convert a constrained problem into a series of unconstrained problems. Furthermore, unconstrained solution algorithms can be used as part of the constrained solution algorithms. Structural optimization is an iterative process where one starts with an initial design, a finite element structure analysis is then performed to calculate the response of the system (such as displacements, stresses, eigenvalues, etc.). Based upon the sensitivity information on the objective and constraint functions, an optimizer such as ADS or IDESIGN, can be used to find the new, improved design. For the structural analysis phase, the equation solver for the system of simultaneous, linear equations plays a key role since it is needed for either static, or eigenvalue, or dynamic analysis. For practical, large-scale structural analysis-synthesis applications, computational time can be excessively large. Thus, it is necessary to have a new structural analysis-synthesis code which employs new solution algorithms to exploit both parallel and vector capabilities offered by modern, high performance computers such as the Convex, Cray-2 and Cray-YMP computers. The objective of this research project is, therefore, to incorporate the latest development in the parallel-vector equation solver, PVSOLVE into the widely popular finite-element production code, such as the SAP-4. Furthermore, several nonlinear unconstrained optimization subroutines have also been developed and tested under a parallel computer environment. The unconstrained optimization subroutines are not only useful in their own right, but they can also be incorporated into a more popular constrained optimization code, such as ADS.
A Domain Decomposition Parallelization of the Fast Marching Method
NASA Technical Reports Server (NTRS)
Herrmann, M.
2003-01-01
In this paper, the first domain decomposition parallelization of the Fast Marching Method for level sets has been presented. Parallel speedup has been demonstrated in both the optimal and non-optimal domain decomposition case. The parallel performance of the proposed method is strongly dependent on load balancing separately the number of nodes on each side of the interface. A load imbalance of nodes on either side of the domain leads to an increase in communication and rollback operations. Furthermore, the amount of inter-domain communication can be reduced by aligning the inter-domain boundaries with the interface normal vectors. In the case of optimal load balancing and aligned inter-domain boundaries, the proposed parallel FMM algorithm is highly efficient, reaching efficiency factors of up to 0.98. Future work will focus on the extension of the proposed parallel algorithm to higher order accuracy. Also, to further enhance parallel performance, the coupling of the domain decomposition parallelization to the G(sub 0)-based parallelization will be investigated.
NASA Astrophysics Data System (ADS)
Bao, J.; Liu, D.; Lin, Z.
2017-10-01
A conservative scheme of drift kinetic electrons for gyrokinetic simulations of kinetic-magnetohydrodynamic processes in toroidal plasmas has been formulated and verified. Both vector potential and electron perturbed distribution function are decomposed into adiabatic part with analytic solution and non-adiabatic part solved numerically. The adiabatic parallel electric field is solved directly from the electron adiabatic response, resulting in a high degree of accuracy. The consistency between electrostatic potential and parallel vector potential is enforced by using the electron continuity equation. Since particles are only used to calculate the non-adiabatic response, which is used to calculate the non-adiabatic vector potential through Ohm's law, the conservative scheme minimizes the electron particle noise and mitigates the cancellation problem. Linear dispersion relations of the kinetic Alfvén wave and the collisionless tearing mode in cylindrical geometry have been verified in gyrokinetic toroidal code simulations, which show that the perpendicular grid size can be larger than the electron collisionless skin depth when the mode wavelength is longer than the electron skin depth.
Fluctuation dynamo and turbulent induction at small Prandtl number.
Eyink, Gregory L
2010-10-01
We study the Lagrangian mechanism of the fluctuation dynamo at zero Prandtl number and infinite magnetic Reynolds number, in the Kazantsev-Kraichnan model of white-noise advection. With a rough velocity field corresponding to a turbulent inertial range, flux freezing holds only in a stochastic sense. We show that field lines arriving to the same point which were initially separated by many resistive lengths are important to the dynamo. Magnetic vectors of the seed field that point parallel to the initial separation vector arrive anticorrelated and produce an "antidynamo" effect. We also study the problem of "magnetic induction" of a spatially uniform seed field. We find no essential distinction between this process and fluctuation dynamo, both producing the same growth rates and small-scale magnetic correlations. In the regime of very rough velocity fields where fluctuation dynamo fails, we obtain the induced magnetic energy spectra. We use these results to evaluate theories proposed for magnetic spectra in laboratory experiments of turbulent induction.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase I is complete for the development of a Computational Fluid Dynamics parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Research in Parallel Algorithms and Software for Computational Aerosciences
NASA Technical Reports Server (NTRS)
Domel, Neal D.
1996-01-01
Phase 1 is complete for the development of a computational fluid dynamics CFD) parallel code with automatic grid generation and adaptation for the Euler analysis of flow over complex geometries. SPLITFLOW, an unstructured Cartesian grid code developed at Lockheed Martin Tactical Aircraft Systems, has been modified for a distributed memory/massively parallel computing environment. The parallel code is operational on an SGI network, Cray J90 and C90 vector machines, SGI Power Challenge, and Cray T3D and IBM SP2 massively parallel machines. Parallel Virtual Machine (PVM) is the message passing protocol for portability to various architectures. A domain decomposition technique was developed which enforces dynamic load balancing to improve solution speed and memory requirements. A host/node algorithm distributes the tasks. The solver parallelizes very well, and scales with the number of processors. Partially parallelized and non-parallelized tasks consume most of the wall clock time in a very fine grain environment. Timing comparisons on a Cray C90 demonstrate that Parallel SPLITFLOW runs 2.4 times faster on 8 processors than its non-parallel counterpart autotasked over 8 processors.
Performance of GeantV EM Physics Models
NASA Astrophysics Data System (ADS)
Amadio, G.; Ananya, A.; Apostolakis, J.; Aurora, A.; Bandieramonte, M.; Bhattacharyya, A.; Bianchini, C.; Brun, R.; Canal, P.; Carminati, F.; Cosmo, G.; Duhem, L.; Elvira, D.; Folger, G.; Gheata, A.; Gheata, M.; Goulas, I.; Iope, R.; Jun, S. Y.; Lima, G.; Mohanty, A.; Nikitina, T.; Novak, M.; Pokorski, W.; Ribon, A.; Seghal, R.; Shadura, O.; Vallecorsa, S.; Wenzel, S.; Zhang, Y.
2017-10-01
The recent progress in parallel hardware architectures with deeper vector pipelines or many-cores technologies brings opportunities for HEP experiments to take advantage of SIMD and SIMT computing models. Launched in 2013, the GeantV project studies performance gains in propagating multiple particles in parallel, improving instruction throughput and data locality in HEP event simulation on modern parallel hardware architecture. Due to the complexity of geometry description and physics algorithms of a typical HEP application, performance analysis is indispensable in identifying factors limiting parallel execution. In this report, we will present design considerations and preliminary computing performance of GeantV physics models on coprocessors (Intel Xeon Phi and NVidia GPUs) as well as on mainstream CPUs.
Experimental studies of susceptibility of Italian Aedes albopictus to Zika virus.
Di Luca, Marco; Severini, Francesco; Toma, Luciano; Boccolini, Daniela; Romi, Roberto; Remoli, Maria Elena; Sabbatucci, Michela; Rizzo, Caterina; Venturi, Giulietta; Rezza, Giovanni; Fortuna, Claudia
2016-05-05
We report a study on vector competence of an Italian population of Aedes albopictus for Zika virus (ZIKV). Ae. albopictus was susceptible to ZIKV infection (infection rate: 10%), and the virus could disseminate and was secreted in the mosquito's saliva (dissemination rate: 29%; transmission rate: 29%) after an extrinsic incubation period of 11 days. The observed vector competence was lower than that of an Ae. aegypti colony tested in parallel.
Rational calculation accuracy in acousto-optical matrix-vector processor
NASA Astrophysics Data System (ADS)
Oparin, V. V.; Tigin, Dmitry V.
1994-01-01
The high speed of parallel computations for a comparatively small-size processor and acceptable power consumption makes the usage of acousto-optic matrix-vector multiplier (AOMVM) attractive for processing of large amounts of information in real time. The limited accuracy of computations is an essential disadvantage of such a processor. The reduced accuracy requirements allow for considerable simplification of the AOMVM architecture and the reduction of the demands on its components.
A parallel-vector algorithm for rapid structural analysis on high-performance computers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.
1990-01-01
A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the 'loop unrolling' technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large-scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.
A parallel-vector algorithm for rapid structural analysis on high-performance computers
NASA Technical Reports Server (NTRS)
Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.
1990-01-01
A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the loop unrolling technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.
Improved parallel data partitioning by nested dissection with applications to information retrieval.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wolf, Michael M.; Chevalier, Cedric; Boman, Erik Gunnar
The computational work in many information retrieval and analysis algorithms is based on sparse linear algebra. Sparse matrix-vector multiplication is a common kernel in many of these computations. Thus, an important related combinatorial problem in parallel computing is how to distribute the matrix and the vectors among processors so as to minimize the communication cost. We focus on minimizing the total communication volume while keeping the computation balanced across processes. In [1], the first two authors presented a new 2D partitioning method, the nested dissection partitioning algorithm. In this paper, we improve on that algorithm and show that it ismore » a good option for data partitioning in information retrieval. We also show partitioning time can be substantially reduced by using the SCOTCH software, and quality improves in some cases, too.« less
Efficient Parallel Formulations of Hierarchical Methods and Their Applications
NASA Astrophysics Data System (ADS)
Grama, Ananth Y.
1996-01-01
Hierarchical methods such as the Fast Multipole Method (FMM) and Barnes-Hut (BH) are used for rapid evaluation of potential (gravitational, electrostatic) fields in particle systems. They are also used for solving integral equations using boundary element methods. The linear systems arising from these methods are dense and are solved iteratively. Hierarchical methods reduce the complexity of the core matrix-vector product from O(n^2) to O(n log n) and the memory requirement from O(n^2) to O(n). We have developed highly scalable parallel formulations of a hybrid FMM/BH method that are capable of handling arbitrarily irregular distributions. We apply these formulations to astrophysical simulations of Plummer and Gaussian galaxies. We have used our parallel formulations to solve the integral form of the Laplace equation. We show that our parallel hierarchical mat-vecs yield high efficiency and overall performance even on relatively small problems. A problem containing approximately 200K nodes takes under a second to compute on 256 processors and yet yields over 85% efficiency. The efficiency and raw performance is expected to increase for bigger problems. For the 200K node problem, our code delivers about 5 GFLOPS of performance on a 256 processor T3D. This is impressive considering the fact that the problem has floating point divides and roots, and very little locality resulting in poor cache performance. A dense matrix-vector product of the same dimensions would require about 0.5 TeraBytes of memory and about 770 TeraFLOPS of computing speed. Clearly, if the loss in accuracy resulting from the use of hierarchical methods is acceptable, our code yields significant savings in time and memory. We also study the convergence of a GMRES solver built around this mat-vec. We accelerate the convergence of the solver using three preconditioning techniques: diagonal scaling, block-diagonal preconditioning, and inner-outer preconditioning. We study the performance and parallel efficiency of these preconditioned solvers. Using this solver, we solve dense linear systems with hundreds of thousands of unknowns. Solving a 105K unknown problem takes about 10 minutes on a 64 processor T3D. Until very recently, boundary element problems of this magnitude could not even be generated, let alone solved.
Scale dependence of the alignment between strain rate and rotation in turbulent shear flow
NASA Astrophysics Data System (ADS)
Fiscaletti, D.; Elsinga, G. E.; Attili, A.; Bisetti, F.; Buxton, O. R. H.
2016-10-01
The scale dependence of the statistical alignment tendencies of the eigenvectors of the strain-rate tensor ei, with the vorticity vector ω , is examined in the self-preserving region of a planar turbulent mixing layer. Data from a direct numerical simulation are filtered at various length scales and the probability density functions of the magnitude of the alignment cosines between the two unit vectors | ei.ω ̂| are examined. It is observed that the alignment tendencies are insensitive to the concurrent large-scale velocity fluctuations, but are quantitatively affected by the nature of the concurrent large-scale velocity-gradient fluctuations. It is confirmed that the small-scale (local) vorticity vector is preferentially aligned in parallel with the large-scale (background) extensive strain-rate eigenvector e1, in contrast to the global tendency for ω to be aligned in parallel with the intermediate strain-rate eigenvector [Hamlington et al., Phys. Fluids 20, 111703 (2008), 10.1063/1.3021055]. When only data from regions of the flow that exhibit strong swirling are included, the so-called high-enstrophy worms, the alignment tendencies are exaggerated with respect to the global picture. These findings support the notion that the production of enstrophy, responsible for a net cascade of turbulent kinetic energy from large scales to small scales, is driven by vorticity stretching due to the preferential parallel alignment between ω and nonlocal e1 and that the strongly swirling worms are kinematically significant to this process.
On the impact of communication complexity in the design of parallel numerical algorithms
NASA Technical Reports Server (NTRS)
Gannon, D.; Vanrosendale, J.
1984-01-01
This paper describes two models of the cost of data movement in parallel numerical algorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In the second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm independent upper bounds on system performance are derived for several problems that are important to scientific computation.
On the impact of communication complexity on the design of parallel numerical algorithms
NASA Technical Reports Server (NTRS)
Gannon, D. B.; Van Rosendale, J.
1984-01-01
This paper describes two models of the cost of data movement in parallel numerical alorithms. One model is a generalization of an approach due to Hockney, and is suitable for shared memory multiprocessors where each processor has vector capabilities. The other model is applicable to highly parallel nonshared memory MIMD systems. In this second model, algorithm performance is characterized in terms of the communication network design. Techniques used in VLSI complexity theory are also brought in, and algorithm-independent upper bounds on system performance are derived for several problems that are important to scientific computation.
A parallel variable metric optimization algorithm
NASA Technical Reports Server (NTRS)
Straeter, T. A.
1973-01-01
An algorithm, designed to exploit the parallel computing or vector streaming (pipeline) capabilities of computers is presented. When p is the degree of parallelism, then one cycle of the parallel variable metric algorithm is defined as follows: first, the function and its gradient are computed in parallel at p different values of the independent variable; then the metric is modified by p rank-one corrections; and finally, a single univariant minimization is carried out in the Newton-like direction. Several properties of this algorithm are established. The convergence of the iterates to the solution is proved for a quadratic functional on a real separable Hilbert space. For a finite-dimensional space the convergence is in one cycle when p equals the dimension of the space. Results of numerical experiments indicate that the new algorithm will exploit parallel or pipeline computing capabilities to effect faster convergence than serial techniques.
Progress report on Nuclear Density project with Lawrence Livermore National Lab Year 2010
DOE Office of Scientific and Technical Information (OSTI.GOV)
Johnson, C W; Krastev, P; Ormand, W E
2011-03-11
The main goal for year 2010 was to improve parallelization of the configuration interaction code BIGSTICK, co-written by W. Erich Ormand (LLNL) and Calvin W. Johnson (SDSU), with the parallelization carried out primarily by Plamen Krastev, a postdoc at SDSU and funded in part by this grant. The central computational algorithm is the Lanczos algorithm, which consists of a matrix-vector multiplication (matvec), followed by a Gram-Schmidt reorthogonalization.
NASA Technical Reports Server (NTRS)
Mcardle, Jack G.; Esker, Barbara S.
1993-01-01
Many conceptual designs for advanced short-takeoff, vertical landing (ASTOVL) aircraft need exhaust nozzles that can vector the jet to provide forces and moments for controlling the aircraft's movement or attitude in flight near the ground. A type of nozzle that can both vector the jet and vary the jet flow area is called a vane nozzle. Basically, the nozzle consists of parallel, spaced-apart flow passages formed by pairs of vanes (vanesets) that can be rotated on axes perpendicular to the flow. Two important features of this type of nozzle are the abilities to vector the jet rearward up to 45 degrees and to produce less harsh pressure and velocity footprints during vertical landing than does an equivalent single jet. A one-third-scale model of a generic vane nozzle was tested with unheated air at the NASA Lewis Research Center's Powered Lift Facility. The model had three parallel flow passages. Each passage was formed by a vaneset consisting of a long and a short vane. The longer vanes controlled the jet vector angle, and the shorter controlled the flow area. Nozzle performance for three nominal flow areas (basic and plus or minus 21 percent of basic area), each at nominal jet vector angles from -20 deg (forward of vertical) to +45 deg (rearward of vertical) are presented. The tests were made with the nozzle mounted on a model tailpipe with a blind flange on the end to simulate a closed cruise nozzle, at tailpipe-to-ambient pressure ratios from 1.8 to 4.0. Also included are jet wake data, single-vaneset vector performance for long/short and equal-length vane designs, and pumping capability. The pumping capability arises from the subambient pressure developed in the cavities between the vanesets, which could be used to aspirate flow from a source such as the engine compartment. Some of the performance characteristics are compared with characteristics of a single-jet nozzle previously reported.
2002-01-01
their expression profile and for classification of cells into tumerous and non- tumerous classes. Then we will present a parallel tree method for... cancerous cells. We will use the same dataset and use tree structured classifiers with multi-resolution analysis for classifying cancerous from non- cancerous ...cells. We have the expressions of 4096 genes from 98 different cell types. Of these 98, 72 are cancerous while 26 are non- cancerous . We are interested
Lock, Martin; Alvira, Mauricio R.
2012-01-01
Abstract Advances in adeno-associated virus (AAV)-mediated gene therapy have brought the possibility of commercial manufacturing of AAV vectors one step closer. To realize this prospect, a parallel effort with the goal of ever-increasing sophistication for AAV vector production technology and supporting assays will be required. Among the important release assays for a clinical gene therapy product, those monitoring potentially hazardous contaminants are most critical for patient safety. A prominent contaminant in many AAV vector preparations is vector particles lacking a genome, which can substantially increase the dose of AAV capsid proteins and lead to possible unwanted immunological consequences. Current methods to determine empty particle content suffer from inconsistency, are adversely affected by contaminants, or are not applicable to all serotypes. Here we describe the development of an ion-exchange chromatography-based assay that permits the rapid separation and relative quantification of AAV8 empty and full vector particles through the application of shallow gradients and a strong anion-exchange monolith chromatography medium. PMID:22428980
Discontinuous finite element method for vector radiative transfer
NASA Astrophysics Data System (ADS)
Wang, Cun-Hai; Yi, Hong-Liang; Tan, He-Ping
2017-03-01
The discontinuous finite element method (DFEM) is applied to solve the vector radiative transfer in participating media. The derivation in a discrete form of the vector radiation governing equations is presented, in which the angular space is discretized by the discrete-ordinates approach with a local refined modification, and the spatial domain is discretized into finite non-overlapped discontinuous elements. The elements in the whole solution domain are connected by modelling the boundary numerical flux between adjacent elements, which makes the DFEM numerically stable for solving radiative transfer equations. Several various problems of vector radiative transfer are tested to verify the performance of the developed DFEM, including vector radiative transfer in a one-dimensional parallel slab containing a Mie/Rayleigh/strong forward scattering medium and a two-dimensional square medium. The fact that DFEM results agree very well with the benchmark solutions in published references shows that the developed DFEM in this paper is accurate and effective for solving vector radiative transfer problems.
Automotive applications of superconductors
DOE Office of Scientific and Technical Information (OSTI.GOV)
Ginsberg, M.
1987-01-01
These proceedings compile papers on supercomputers in the automobile industry. Titles include: An automotive engineer's guide to the effective use of scalar, vector, and parallel computers; fluid mechanics, finite elements, and supercomputers; and Automotive crashworthiness performance on a supercomputer.
Parallelization of Lower-Upper Symmetric Gauss-Seidel Method for Chemically Reacting Flow
NASA Technical Reports Server (NTRS)
Yoon, Seokkwan; Jost, Gabriele; Chang, Sherry
2005-01-01
Development of technologies for exploration of the solar system has revived an interest in computational simulation of chemically reacting flows since planetary probe vehicles exhibit non-equilibrium phenomena during the atmospheric entry of a planet or a moon as well as the reentry to the Earth. Stability in combustion is essential for new propulsion systems. Numerical solution of real-gas flows often increases computational work by an order-of-magnitude compared to perfect gas flow partly because of the increased complexity of equations to solve. Recently, as part of Project Columbia, NASA has integrated a cluster of interconnected SGI Altix systems to provide a ten-fold increase in current supercomputing capacity that includes an SGI Origin system. Both the new and existing machines are based on cache coherent non-uniform memory access architecture. Lower-Upper Symmetric Gauss-Seidel (LU-SGS) relaxation method has been implemented into both perfect and real gas flow codes including Real-Gas Aerodynamic Simulator (RGAS). However, the vectorized RGAS code runs inefficiently on cache-based shared-memory machines such as SGI system. Parallelization of a Gauss-Seidel method is nontrivial due to its sequential nature. The LU-SGS method has been vectorized on an oblique plane in INS3D-LU code that has been one of the base codes for NAS Parallel benchmarks. The oblique plane has been called a hyperplane by computer scientists. It is straightforward to parallelize a Gauss-Seidel method by partitioning the hyperplanes once they are formed. Another way of parallelization is to schedule processors like a pipeline using software. Both hyperplane and pipeline methods have been implemented using openMP directives. The present paper reports the performance of the parallelized RGAS code on SGI Origin and Altix systems.
NASA Astrophysics Data System (ADS)
Bellerby, Tim
2014-05-01
Model Integration System (MIST) is open-source environmental modelling programming language that directly incorporates data parallelism. The language is designed to enable straightforward programming structures, such as nested loops and conditional statements to be directly translated into sequences of whole-array (or more generally whole data-structure) operations. MIST thus enables the programmer to use well-understood constructs, directly relating to the mathematical structure of the model, without having to explicitly vectorize code or worry about details of parallelization. A range of common modelling operations are supported by dedicated language structures operating on cell neighbourhoods rather than individual cells (e.g.: the 3x3 local neighbourhood needed to implement an averaging image filter can be simply accessed from within a simple loop traversing all image pixels). This facility hides details of inter-process communication behind more mathematically relevant descriptions of model dynamics. The MIST automatic vectorization/parallelization process serves both to distribute work among available nodes and separately to control storage requirements for intermediate expressions - enabling operations on very large domains for which memory availability may be an issue. MIST is designed to facilitate efficient interpreter based implementations. A prototype open source interpreter is available, coded in standard FORTRAN 95, with tools to rapidly integrate existing FORTRAN 77 or 95 code libraries. The language is formally specified and thus not limited to FORTRAN implementation or to an interpreter-based approach. A MIST to FORTRAN compiler is under development and volunteers are sought to create an ANSI-C implementation. Parallel processing is currently implemented using OpenMP. However, parallelization code is fully modularised and could be replaced with implementations using other libraries. GPU implementation is potentially possible.
Adaptive track scheduling to optimize concurrency and vectorization in GeantV
Apostolakis, J.; Bandieramonte, M.; Bitzes, G.; ...
2015-05-22
The GeantV project is focused on the R&D of new particle transport techniques to maximize parallelism on multiple levels, profiting from the use of both SIMD instructions and co-processors for the CPU-intensive calculations specific to this type of applications. In our approach, vectors of tracks belonging to multiple events and matching different locality criteria must be gathered and dispatched to algorithms having vector signatures. While the transport propagates tracks and changes their individual states, data locality becomes harder to maintain. The scheduling policy has to be changed to maintain efficient vectors while keeping an optimal level of concurrency. The modelmore » has complex dynamics requiring tuning the thresholds to switch between the normal regime and special modes, i.e. prioritizing events to allow flushing memory, adding new events in the transport pipeline to boost locality, dynamically adjusting the particle vector size or switching between vector to single track mode when vectorization causes only overhead. Lastly, this work requires a comprehensive study for optimizing these parameters to make the behaviour of the scheduler self-adapting, presenting here its initial results.« less
PCTDSE: A parallel Cartesian-grid-based TDSE solver for modeling laser-atom interactions
NASA Astrophysics Data System (ADS)
Fu, Yongsheng; Zeng, Jiaolong; Yuan, Jianmin
2017-01-01
We present a parallel Cartesian-grid-based time-dependent Schrödinger equation (TDSE) solver for modeling laser-atom interactions. It can simulate the single-electron dynamics of atoms in arbitrary time-dependent vector potentials. We use a split-operator method combined with fast Fourier transforms (FFT), on a three-dimensional (3D) Cartesian grid. Parallelization is realized using a 2D decomposition strategy based on the Message Passing Interface (MPI) library, which results in a good parallel scaling on modern supercomputers. We give simple applications for the hydrogen atom using the benchmark problems coming from the references and obtain repeatable results. The extensions to other laser-atom systems are straightforward with minimal modifications of the source code.
NASA Technical Reports Server (NTRS)
OKeefe, Matthew (Editor); Kerr, Christopher L. (Editor)
1998-01-01
This report contains the abstracts and technical papers from the Second International Workshop on Software Engineering and Code Design in Parallel Meteorological and Oceanographic Applications, held June 15-18, 1998, in Scottsdale, Arizona. The purpose of the workshop is to bring together software developers in meteorology and oceanography to discuss software engineering and code design issues for parallel architectures, including Massively Parallel Processors (MPP's), Parallel Vector Processors (PVP's), Symmetric Multi-Processors (SMP's), Distributed Shared Memory (DSM) multi-processors, and clusters. Issues to be discussed include: (1) code architectures for current parallel models, including basic data structures, storage allocation, variable naming conventions, coding rules and styles, i/o and pre/post-processing of data; (2) designing modular code; (3) load balancing and domain decomposition; (4) techniques that exploit parallelism efficiently yet hide the machine-related details from the programmer; (5) tools for making the programmer more productive; and (6) the proliferation of programming models (F--, OpenMP, MPI, and HPF).
Capabilities of Fully Parallelized MHD Stability Code MARS
NASA Astrophysics Data System (ADS)
Svidzinski, Vladimir; Galkin, Sergei; Kim, Jin-Soo; Liu, Yueqiang
2016-10-01
Results of full parallelization of the plasma stability code MARS will be reported. MARS calculates eigenmodes in 2D axisymmetric toroidal equilibria in MHD-kinetic plasma models. Parallel version of MARS, named PMARS, has been recently developed at FAR-TECH. Parallelized MARS is an efficient tool for simulation of MHD instabilities with low, intermediate and high toroidal mode numbers within both fluid and kinetic plasma models, implemented in MARS. Parallelization of the code included parallelization of the construction of the matrix for the eigenvalue problem and parallelization of the inverse vector iterations algorithm, implemented in MARS for the solution of the formulated eigenvalue problem. Construction of the matrix is parallelized by distributing the load among processors assigned to different magnetic surfaces. Parallelization of the solution of the eigenvalue problem is made by repeating steps of the MARS algorithm using parallel libraries and procedures. Parallelized MARS is capable of calculating eigenmodes with significantly increased spatial resolution: up to 5,000 adapted radial grid points with up to 500 poloidal harmonics. Such resolution is sufficient for simulation of kink, tearing and peeling-ballooning instabilities with physically relevant parameters. Work is supported by the U.S. DOE SBIR program.
Fu, Jun; Huang, Canqin; Xing, Jianguo; Zheng, Junbao
2012-01-01
Biologically-inspired models and algorithms are considered as promising sensor array signal processing methods for electronic noses. Feature selection is one of the most important issues for developing robust pattern recognition models in machine learning. This paper describes an investigation into the classification performance of a bionic olfactory model with the increase of the dimensions of input feature vector (outer factor) as well as its parallel channels (inner factor). The principal component analysis technique was applied for feature selection and dimension reduction. Two data sets of three classes of wine derived from different cultivars and five classes of green tea derived from five different provinces of China were used for experiments. In the former case the results showed that the average correct classification rate increased as more principal components were put in to feature vector. In the latter case the results showed that sufficient parallel channels should be reserved in the model to avoid pattern space crowding. We concluded that 6~8 channels of the model with principal component feature vector values of at least 90% cumulative variance is adequate for a classification task of 3~5 pattern classes considering the trade-off between time consumption and classification rate.
Acoustic 3D modeling by the method of integral equations
NASA Astrophysics Data System (ADS)
Malovichko, M.; Khokhlov, N.; Yavich, N.; Zhdanov, M.
2018-02-01
This paper presents a parallel algorithm for frequency-domain acoustic modeling by the method of integral equations (IE). The algorithm is applied to seismic simulation. The IE method reduces the size of the problem but leads to a dense system matrix. A tolerable memory consumption and numerical complexity were achieved by applying an iterative solver, accompanied by an effective matrix-vector multiplication operation, based on the fast Fourier transform (FFT). We demonstrate that, the IE system matrix is better conditioned than that of the finite-difference (FD) method, and discuss its relation to a specially preconditioned FD matrix. We considered several methods of matrix-vector multiplication for the free-space and layered host models. The developed algorithm and computer code were benchmarked against the FD time-domain solution. It was demonstrated that, the method could accurately calculate the seismic field for the models with sharp material boundaries and a point source and receiver located close to the free surface. We used OpenMP to speed up the matrix-vector multiplication, while MPI was used to speed up the solution of the system equations, and also for parallelizing across multiple sources. The practical examples and efficiency tests are presented as well.
NASA Astrophysics Data System (ADS)
Mills, R. T.
2014-12-01
As the high performance computing (HPC) community pushes towards the exascale horizon, the importance and prevalence of fine-grained parallelism in new computer architectures is increasing. This is perhaps most apparent in the proliferation of so-called "accelerators" such as the Intel Xeon Phi or NVIDIA GPGPUs, but the trend also holds for CPUs, where serial performance has grown slowly and effective use of hardware threads and vector units are becoming increasingly important to realizing high performance. This has significant implications for weather, climate, and Earth system modeling codes, many of which display impressive scalability across MPI ranks but take relatively little advantage of threading and vector processing. In addition to increasing parallelism, next generation codes will also need to address increasingly deep hierarchies for data movement: NUMA/cache levels, on node vs. off node, local vs. wide neighborhoods on the interconnect, and even in the I/O system. We will discuss some approaches (grounded in experiences with the Intel Xeon Phi architecture) for restructuring Earth science codes to maximize concurrency across multiple levels (vectors, threads, MPI ranks), and also discuss some novel approaches for minimizing expensive data movement/communication.
Morphological evidence for parallel processing of information in rat macula.
Ross, M D
1988-01-01
Study of montages, tracings and reconstructions prepared from a series of 570 consecutive ultrathin sections shows that rat maculas are morphologically organized for parallel processing of linear acceleratory information. Type II cells of one terminal field distribute information to neighboring terminals as well. The findings are examined in light of physiological data which indicate that macular receptor fields have a preferred directional vector, and are interpreted by analogy to a computer technology known as an information network.
2010-08-09
44 9 A photograph of a goniophotometer used by Bell and a schematic of a goniophotometer used by Mian et al...plane is called the parallel field component because it lies parallel to the specular plane. The incident electric field vector component which...resides in the plane or- thogonal to the specular plane is called the perpendicular field component because it lies perpendicular to the specular plane. If
Parallel Visualization Co-Processing of Overnight CFD Propulsion Applications
NASA Technical Reports Server (NTRS)
Edwards, David E.; Haimes, Robert
1999-01-01
An interactive visualization system pV3 is being developed for the investigation of advanced computational methodologies employing visualization and parallel processing for the extraction of information contained in large-scale transient engineering simulations. Visual techniques for extracting information from the data in terms of cutting planes, iso-surfaces, particle tracing and vector fields are included in this system. This paper discusses improvements to the pV3 system developed under NASA's Affordable High Performance Computing project.
NASA Technical Reports Server (NTRS)
Fijany, A.; Roberts, J. A.; Jain, A.; Man, G. K.
1993-01-01
Part 1 of this paper presented the requirements for the real-time simulation of Cassini spacecraft along with some discussion of the DARTS algorithm. Here, in Part 2 we discuss the development and implementation of parallel/vectorized DARTS algorithm and architecture for real-time simulation. Development of the fast algorithms and architecture for real-time hardware-in-the-loop simulation of spacecraft dynamics is motivated by the fact that it represents a hard real-time problem, in the sense that the correctness of the simulation depends on both the numerical accuracy and the exact timing of the computation. For a given model fidelity, the computation should be computed within a predefined time period. Further reduction in computation time allows increasing the fidelity of the model (i.e., inclusion of more flexible modes) and the integration routine.
A hybrid dynamic harmony search algorithm for identical parallel machines scheduling
NASA Astrophysics Data System (ADS)
Chen, Jing; Pan, Quan-Ke; Wang, Ling; Li, Jun-Qing
2012-02-01
In this article, a dynamic harmony search (DHS) algorithm is proposed for the identical parallel machines scheduling problem with the objective to minimize makespan. First, an encoding scheme based on a list scheduling rule is developed to convert the continuous harmony vectors to discrete job assignments. Second, the whole harmony memory (HM) is divided into multiple small-sized sub-HMs, and each sub-HM performs evolution independently and exchanges information with others periodically by using a regrouping schedule. Third, a novel improvisation process is applied to generate a new harmony by making use of the information of harmony vectors in each sub-HM. Moreover, a local search strategy is presented and incorporated into the DHS algorithm to find promising solutions. Simulation results show that the hybrid DHS (DHS_LS) is very competitive in comparison to its competitors in terms of mean performance and average computational time.
Implementation of an ADI method on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1987-01-01
The implementation of an ADI method for solving the diffusion equation on three parallel/vector computers is discussed. The computers were chosen so as to encompass a variety of architectures. They are: the MPP, an SIMD machine with 16K bit serial processors; FLEX/32, an MIMD machine with 20 processors; and CRAY/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the FLEX/32 and CRAY/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally, conclusions are presented.
Implementation of an ADI method on parallel computers
NASA Technical Reports Server (NTRS)
Fatoohi, Raad A.; Grosch, Chester E.
1987-01-01
In this paper the implementation of an ADI method for solving the diffusion equation on three parallel/vector computers is discussed. The computers were chosen so as to encompass a variety of architectures. They are the MPP, an SIMD machine with 16-Kbit serial processors; Flex/32, an MIMD machine with 20 processors; and Cray/2, an MIMD machine with four vector processors. The Gaussian elimination algorithm is used to solve a set of tridiagonal systems on the Flex/32 and Cray/2 while the cyclic elimination algorithm is used to solve these systems on the MPP. The implementation of the method is discussed in relation to these architectures and measures of the performance on each machine are given. Simple performance models are used to describe the performance. These models highlight the bottlenecks and limiting factors for this algorithm on these architectures. Finally conclusions are presented.
Multi-Modulator for Bandwidth-Efficient Communication
NASA Technical Reports Server (NTRS)
Gray, Andrew; Lee, Dennis; Lay, Norman; Cheetham, Craig; Fong, Wai; Yeh, Pen-Shu; King, Robin; Ghuman, Parminder; Hoy, Scott; Fisher, Dave
2009-01-01
A modulator circuit board has recently been developed to be used in conjunction with a vector modulator to generate any of a large number of modulations for bandwidth-efficient radio transmission of digital data signals at rates than can exceed 100 Mb/s. The modulations include quadrature phaseshift keying (QPSK), offset quadrature phase-shift keying (OQPSK), Gaussian minimum-shift keying (GMSK), and octonary phase-shift keying (8PSK) with square-root raised-cosine pulse shaping. The figure is a greatly simplified block diagram showing the relationship between the modulator board and the rest of the transmitter. The role of the modulator board is to encode the incoming data stream and to shape the resulting pulses, which are fed as inputs to the vector modulator. The combination of encoding and pulse shaping in a given application is chosen to maximize the bandwidth efficiency. The modulator board includes gallium arsenide serial-to-parallel converters at its input end. A complementary metal oxide/semiconductor (CMOS) field-programmable gate array (FPGA) performs the coding and modulation computations and utilizes parallel processing in doing so. The results of the parallel computation are combined and converted to pulse waveforms by use of gallium arsenide parallel-to-serial converters integrated with digital-to-analog converters. Without changing the hardware, one can configure the modulator to produce any of the designed combinations of coding and modulation by loading the appropriate bit configuration file into the FPGA.
A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Hoemmen, Mark
2010-11-01
Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, 'communication' includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches formore » orthogonalizing the vectors within each block ('normalization'). TSQR's rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5-20) of very long vectors. The shared-memory parallel component uses Intel's Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.« less
2011-04-01
roll rates are estimates of projectile roll rates with respect to the sun and the local geomagnetic field respectively. The solar aspect angle is the...vector and a vector originating at the CG and parallel to the local geomagnetic field. Methodologies employed to obtain these and other airframe states...and an independent approach (POINTER) and relative magnitude information about the side moments was obtained. VAPP-24 underwent a reversal in coning
Acceleration of Radiance for Lighting Simulation by Using Parallel Computing with OpenCL
DOE Office of Scientific and Technical Information (OSTI.GOV)
Zuo, Wangda; McNeil, Andrew; Wetter, Michael
2011-09-06
We report on the acceleration of annual daylighting simulations for fenestration systems in the Radiance ray-tracing program. The algorithm was optimized to reduce both the redundant data input/output operations and the floating-point operations. To further accelerate the simulation speed, the calculation for matrix multiplications was implemented using parallel computing on a graphics processing unit. We used OpenCL, which is a cross-platform parallel programming language. Numerical experiments show that the combination of the above measures can speed up the annual daylighting simulations 101.7 times or 28.6 times when the sky vector has 146 or 2306 elements, respectively.
A high performance linear equation solver on the VPP500 parallel supercomputer
DOE Office of Scientific and Technical Information (OSTI.GOV)
Nakanishi, Makoto; Ina, Hiroshi; Miura, Kenichi
1994-12-31
This paper describes the implementation of two high performance linear equation solvers developed for the Fujitsu VPP500, a distributed memory parallel supercomputer system. The solvers take advantage of the key architectural features of VPP500--(1) scalability for an arbitrary number of processors up to 222 processors, (2) flexible data transfer among processors provided by a crossbar interconnection network, (3) vector processing capability on each processor, and (4) overlapped computation and transfer. The general linear equation solver based on the blocked LU decomposition method achieves 120.0 GFLOPS performance with 100 processors in the LIN-PACK Highly Parallel Computing benchmark.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Feo, J.T.
1993-10-01
This report contain papers on: Programmability and performance issues; The case of an iterative partial differential equation solver; Implementing the kernal of the Australian Region Weather Prediction Model in Sisal; Even and quarter-even prime length symmetric FFTs and their Sisal Implementations; Top-down thread generation for Sisal; Overlapping communications and computations on NUMA architechtures; Compiling technique based on dataflow analysis for funtional programming language Valid; Copy elimination for true multidimensional arrays in Sisal 2.0; Increasing parallelism for an optimization that reduces copying in IF2 graphs; Caching in on Sisal; Cache performance of Sisal Vs. FORTRAN; FFT algorithms on a shared-memory multiprocessor;more » A parallel implementation of nonnumeric search problems in Sisal; Computer vision algorithms in Sisal; Compilation of Sisal for a high-performance data driven vector processor; Sisal on distributed memory machines; A virtual shared addressing system for distributed memory Sisal; Developing a high-performance FFT algorithm in Sisal for a vector supercomputer; Implementation issues for IF2 on a static data-flow architechture; and Systematic control of parallelism in array-based data-flow computation. Selected papers have been indexed separately for inclusion in the Energy Science and Technology Database.« less
Liu, Bo; Zhang, Lijia; Xin, Xiangjun
2018-03-19
This paper proposes and demonstrates an enhanced secure 4-D modulation optical generalized filter bank multi-carrier (GFBMC) system based on joint constellation and Stokes vector scrambling. The constellation and Stokes vectors are scrambled by using different scrambling parameters. A multi-scroll Chua's circuit map is adopted as the chaotic model. Large secure key space can be obtained due to the multi-scroll attractors and independent operability of subcarriers. A 40.32Gb/s encrypted optical GFBMC signal with 128 parallel subcarriers is successfully demonstrated in the experiment. The results show good resistance against the illegal receiver and indicate a potential way for the future optical multi-carrier system.
High-performance ultra-low power VLSI analog processor for data compression
NASA Technical Reports Server (NTRS)
Tawel, Raoul (Inventor)
1996-01-01
An apparatus for data compression employing a parallel analog processor. The apparatus includes an array of processor cells with N columns and M rows wherein the processor cells have an input device, memory device, and processor device. The input device is used for inputting a series of input vectors. Each input vector is simultaneously input into each column of the array of processor cells in a pre-determined sequential order. An input vector is made up of M components, ones of which are input into ones of M processor cells making up a column of the array. The memory device is used for providing ones of M components of a codebook vector to ones of the processor cells making up a column of the array. A different codebook vector is provided to each of the N columns of the array. The processor device is used for simultaneously comparing the components of each input vector to corresponding components of each codebook vector, and for outputting a signal representative of the closeness between the compared vector components. A combination device is used to combine the signal output from each processor cell in each column of the array and to output a combined signal. A closeness determination device is then used for determining which codebook vector is closest to an input vector from the combined signals, and for outputting a codebook vector index indicating which of the N codebook vectors was the closest to each input vector input into the array.
NASA Astrophysics Data System (ADS)
Gershman, D. J.; Figueroa-Vinas, A.; Dorelli, J.; Goldstein, M. L.; Shuster, J. R.; Avanov, L. A.; Boardsen, S. A.; Stawarz, J. E.; Schwartz, S. J.; Schiff, C.; Lavraud, B.; Saito, Y.; Paterson, W. R.; Giles, B. L.; Pollock, C. J.; Strangeway, R. J.; Russell, C. T.; Torbert, R. B.; Moore, T. E.; Burch, J. L.
2017-12-01
Measurements from the Fast Plasma Investigation (FPI) on NASA's Magnetospheric Multiscale (MMS) mission have enabled unprecedented analyses of kinetic-scale plasma physics. FPI regularly provides estimates of current density and pressure gradients of sufficient accuracy to evaluate the relative contribution of terms in plasma equations of motion. In addition, high-resolution three-dimensional velocity distribution functions of both ions and electrons provide new insights into kinetic-scale processes. As an example, for a monochromatic kinetic Alfven wave (KAW) we find non-zero, but out-of-phase parallel current density and electric field fluctuations, providing direct confirmation of the conservative energy exchange between the wave field and particles. In addition, we use fluctuations in current density and magnetic field to calculate the perpendicular and parallel wavelengths of the KAW. Furthermore, examination of the electron velocity distribution inside the KAW reveals a population of electrons non-linearly trapped in the kinetic-scale magnetic mirror formed between successive wave peaks. These electrons not only contribute to the wave's parallel electric field but also account for over half of the density fluctuations within the wave, supplying an unexpected mechanism for maintaining quasi-neutrality in a KAW. Finally, we demonstrate that the employed wave vector determination technique is also applicable to broadband fluctuations found in Earth's turbulent magnetosheath.
Flexbar 3.0 - SIMD and multicore parallelization.
Roehr, Johannes T; Dieterich, Christoph; Reinert, Knut
2017-09-15
High-throughput sequencing machines can process many samples in a single run. For Illumina systems, sequencing reads are barcoded with an additional DNA tag that is contained in the respective sequencing adapters. The recognition of barcode and adapter sequences is hence commonly needed for the analysis of next-generation sequencing data. Flexbar performs demultiplexing based on barcodes and adapter trimming for such data. The massive amounts of data generated on modern sequencing machines demand that this preprocessing is done as efficiently as possible. We present Flexbar 3.0, the successor of the popular program Flexbar. It employs now twofold parallelism: multi-threading and additionally SIMD vectorization. Both types of parallelism are used to speed-up the computation of pair-wise sequence alignments, which are used for the detection of barcodes and adapters. Furthermore, new features were included to cover a wide range of applications. We evaluated the performance of Flexbar based on a simulated sequencing dataset. Our program outcompetes other tools in terms of speed and is among the best tools in the presented quality benchmark. https://github.com/seqan/flexbar. johannes.roehr@fu-berlin.de or knut.reinert@fu-berlin.de. © The Author (2017). Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com
2014-05-01
fusion, space and astrophysical plasmas, but still the general picture can be presented quite well with the fluid approach [6, 7]. The microscopic...purpose computing CPU for algorithms where processing of large blocks of data is done in parallel. The reason for that is the GPU’s highly effective...parallel structure. Most of the image and video processing computations involve heavy matrix and vector op- erations over large amounts of data and
Vectorial magnetometry with the magneto-optic Kerr effect applied to Co/Cu/Co trilayer structures
NASA Astrophysics Data System (ADS)
Daboo, C.; Bland, J. A. C.; Hicken, R. J.; Ives, A. J. R.; Baird, M. J.; Walker, M. J.
1993-05-01
We describe an arrangement in which the magnetization components parallel and perpendicular to the applied field are both determined from longitudinal magneto-optic Kerr effect measurements. This arrangement differs from the usual procedures in that the same optical geometry is used but the magnet geometry altered. This leads to two magneto-optic signals which are directly comparable in magnitude thereby giving the in-plane magnetization vector directly. We show that it is of great value to study both in-plane magnetization vector components when studying coupled structures where significant anisotropies are also present. We discuss simulations which show that it is possible to accurately determine the coupling strength in such structures by examining the behavior of the component of magnetization perpendicular to the applied field in the vicinity of the hard in-plane anisotropy axis. We illustrate this technique by examining the magnetization and magnetic anisotropy behavior of ultrathin Co/Cu(111)/Co (dCu=20 Å and 27 Å) trilayer structures prepared by molecular beam epitaxy, in which coherent rotation of the magnetization vector is observed when the magnetic field B is applied along the hard in-plane anisotropy axis, with the magnitude of the magnetization vector constant and close to its bulk value. Results of micromagnetic calculations closely reproduce the observed parallel and perpendicular magnetization loops, and yield strong uniaxial magnetic anisotropies in both layers, while the interlayer coupling appears to be absent or negligible in comparison with the anisotropy strengths.
Ground states of larger nuclei
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pieper, S.C.; Wiringa, R.B.; Pandharipande, V.R.
1995-08-01
The methods used for the few-body nuclei require operations on the complete spin-isospin vector; the size of this vector makes such methods impractical for nuclei with A > 8. During the last few years we developed cluster expansion methods that do not require operations on the complete vector. We use the same Hamiltonians as for the few-body nuclei and variational wave functions of form similar to the few-body wave functions. The cluster expansions are made for the noncentral parts of the wave functions and for the operators whose expectation values are being evaluated. The central pair correlations in the wavemore » functions are treated exactly and this requires the evaluation of 3A-dimensional integrals which are done with Monte Carlo techniques. Most of our effort was on {sup 16}O, other p-shell nuclei, and {sup 40}Ca. In 1993 the Mathematics and Computer Science Division acquired a 128-processor IBM SP which has a theoretical peak speed of 16 Gigaflops (GFLOPS). We converted our program to run on this machine. Because of the large memory on each node of the SP, it was easy to convert the program to parallel form with very low communication overhead. Considerably more effort was needed to restructure the program from one oriented towards long vectors for the Cray computers at NERSC to one that makes efficient use of the cache of the RS6000 architecture. The SP made possible complete five-body cluster calculations of {sup 16}O for the first time; previously we could only do four-body cluster calculations. These calculations show that the expectation value of the two-body potential is converging less rapidly than we had thought, while that of the three-body potential is more rapidly convergent; the net result is no significant change to our predicted binding energy for {sup 16}O using the new Argonne v{sub 18} potential and the Urbana IX three-nucleon potential. This result is in good agreement with experiment.« less
Saito, Minoru; Okazaki, Isao
2007-04-30
Molecular dynamics (MD) simulations of human adult hemoglobin (HbA) were carried out for 45 ns in water with all degrees of freedom including bond stretching and without any artificial constraints. To perform such large-scale simulations, one of the authors (M.S.) accelerated his own software COSMOS90 on the Earth Simulator by vectorization and parallelization. The dynamical features of HbA were investigated by evaluating root-mean-square deviations from the initial X-ray structure (an oxy T-state hemoglobin with PDB code: 1GZX) and root-mean-square fluctuations around the average structure from the simulation trajectories. The four subunits (alpha(1), alpha(2), beta(1), and beta(2)) of HbA maintained structures close to their respective X-ray structures during the simulations even though no constraints were applied to HbA in the simulations. Dimers alpha(1)beta(1) and alpha(2)beta(2) also maintained structures close to their respective X-ray structures while they moved relative to each other like two stacks of dumbbells. The distance between the two dimers (alpha(1)beta(1) and alpha(2)beta(2)) increased by 2 A (7.4%) in the initial 15 ns and stably fluctuated at the distance with the standard deviation 0.2 A. The relative orientation of the two dimers fluctuated between the initial X-ray angle -100 degrees and about -105 degrees with intervals of a few tens of nanoseconds.
On the Impact of Widening Vector Registers on Sequence Alignment
DOE Office of Scientific and Technical Information (OSTI.GOV)
Daily, Jeffrey A.; Kalyanaraman, Anantharaman; Krishnamoorthy, Sriram
2016-09-22
Vector extensions, such as SSE, have been part of the x86 since the 1990s, with applications in graphics, signal processing, and scientific applications. Although many algorithms and applications can naturally benefit from automatic vectorization techniques, there are still many that are difficult to vectorize due to their dependence on irregular data structures, dense branch operations, or data dependencies. Sequence alignment, one of the most widely used operations in bioinformatics workflows, has a computational footprint that features complex data dependencies. In this paper, we demonstrate that the trend of widening vector registers adversely affects the state-of-the-art sequence alignment algorithm based onmore » striped data layouts. We present a practically efficient SIMD implementation of a parallel scan based sequence alignment algorithm that can better exploit wider SIMD units. We conduct comprehensive workload and use case analyses to characterize the relative behavior of the striped and scan approaches and identify the best choice of algorithm based on input length and SIMD width.« less
Electric fields and vector potentials of thin cylindrical antennas
NASA Astrophysics Data System (ADS)
King, Ronold W. P.
1990-09-01
The vector potential and electric field generated by the current in a center-driven or parasitic dipole antenna that extends from z = -h to z = h are investigated for each of the several components of the current. These include sin k(h - absolute value of z), sin k (absolute value of z) - sin kh, cos kz - cos kh, and cos kz/2 - cos kh/2. Of special interest are the interactions among the variously spaced elements in parallel nonstaggered arrays. These depend on the mutual vector potentials. It is shown that at a radial distance rho approximately = h and in the range z = -h to h, the vector potentials due to all four components become alike and have an approximately plane-wave form. Simple approximate formulas for the electric fields and vector potentials generated by each of the four distributions are derived and compared with the exact results. The application of the new formulas to large arrays is discussed.
Absolute surface reconstruction by slope metrology and photogrammetry
NASA Astrophysics Data System (ADS)
Dong, Yue
Developing the manufacture of aspheric and freeform optical elements requires an advanced metrology method which is capable of inspecting these elements with arbitrary freeform surfaces. In this dissertation, a new surface measurement scheme is investigated for such a purpose, which is to measure the absolute surface shape of an object under test through its surface slope information obtained by photogrammetric measurement. A laser beam propagating toward the object reflects on its surface while the vectors of the incident and reflected beams are evaluated from the four spots they leave on the two parallel transparent windows in front of the object. The spots' spatial coordinates are determined by photogrammetry. With the knowledge of the incident and reflected beam vectors, the local slope information of the object surface is obtained through vector calculus and finally yields the absolute object surface profile by a reconstruction algorithm. An experimental setup is designed and the proposed measuring principle is experimentally demonstrated by measuring the absolute surface shape of a spherical mirror. The measurement uncertainty is analyzed, and efforts for improvement are made accordingly. In particular, structured windows are designed and fabricated to generate uniform scattering spots left by the transmitted laser beams. Calibration of the fringe reflection instrument, another typical surface slope measurement method, is also reported in the dissertation. Finally, a method for uncertainty analysis of a photogrammetry measurement system by optical simulation is investigated.
Ma, Benjiang; Hang, Changshou; Zhao, Yun; Wang, Shiwen; Xie, Yanxiang
2002-09-01
To construct a novel baculovirus vector which is capable of promoting the high-yield expression of foreign gene in mammalian cells and to express by this vector the nucleoprotein (NP) gene of Crimean-Congo hemorrhagic fever virus (CCHFV) Chinese isolate (Xinjiang hemorrhagic fever virus, XHFV) BA88166 in insect and Vero cells. Human cytomegalovirus (CMV) immediate early (IE) promoter was ligated to the baculovirus vector pFastBac1 downstream of the polyhedrin promoter to give rise to the novel vector pCB1. XHFV NP gene was cloned to this vector and was well expressed in COS-7 cells and Vero cells by means of recombinant plasmid transfection and baculovirus infection. The XHFV NP gene in vector pCB1 could be well expressed in mammalian cells. Vero cells infected with recombinant baculovirus harboring NP gene could be employed as antigens to detect XHF serum specimens whose results were in good correlation with those of ELISA and in parallel with clinical diagnoses. This novel baculovirus vector is able to express the foreign gene efficiently in both insect and mammalian cells, which provides not only the convenient diagnostic antigens but also the potential for developing recombinant virus vaccines and gene therapies.
Efficient ICCG on a shared memory multiprocessor
NASA Technical Reports Server (NTRS)
Hammond, Steven W.; Schreiber, Robert
1989-01-01
Different approaches are discussed for exploiting parallelism in the ICCG (Incomplete Cholesky Conjugate Gradient) method for solving large sparse symmetric positive definite systems of equations on a shared memory parallel computer. Techniques for efficiently solving triangular systems and computing sparse matrix-vector products are explored. Three methods for scheduling the tasks in solving triangular systems are implemented on the Sequent Balance 21000. Sample problems that are representative of a large class of problems solved using iterative methods are used. We show that a static analysis to determine data dependences in the triangular solve can greatly improve its parallel efficiency. We also show that ignoring symmetry and storing the whole matrix can reduce solution time substantially.
A parallel finite-difference method for computational aerodynamics
NASA Technical Reports Server (NTRS)
Swisshelm, Julie M.
1989-01-01
A finite-difference scheme for solving complex three-dimensional aerodynamic flow on parallel-processing supercomputers is presented. The method consists of a basic flow solver with multigrid convergence acceleration, embedded grid refinements, and a zonal equation scheme. Multitasking and vectorization have been incorporated into the algorithm. Results obtained include multiprocessed flow simulations from the Cray X-MP and Cray-2. Speedups as high as 3.3 for the two-dimensional case and 3.5 for segments of the three-dimensional case have been achieved on the Cray-2. The entire solver attained a factor of 2.7 improvement over its unitasked version on the Cray-2. The performance of the parallel algorithm on each machine is analyzed.
A VLSI chip set for real time vector quantization of image sequences
NASA Technical Reports Server (NTRS)
Baker, Richard L.
1989-01-01
The architecture and implementation of a VLSI chip set that vector quantizes (VQ) image sequences in real time is described. The chip set forms a programmable Single-Instruction, Multiple-Data (SIMD) machine which can implement various vector quantization encoding structures. Its VQ codebook may contain unlimited number of codevectors, N, having dimension up to K = 64. Under a weighted least squared error criterion, the engine locates at video rates the best code vector in full-searched or large tree searched VQ codebooks. The ability to manipulate tree structured codebooks, coupled with parallelism and pipelining, permits searches in as short as O (log N) cycles. A full codebook search results in O(N) performance, compared to O(KN) for a Single-Instruction, Single-Data (SISD) machine. With this VLSI chip set, an entire video code can be built on a single board that permits realtime experimentation with very large codebooks.
Kalman Filter Tracking on Parallel Architectures
NASA Astrophysics Data System (ADS)
Cerati, Giuseppe; Elmer, Peter; Lantz, Steven; McDermott, Kevin; Riley, Dan; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi
2015-12-01
Power density constraints are limiting the performance improvements of modern CPUs. To address this we have seen the introduction of lower-power, multi-core processors, but the future will be even more exciting. In order to stay within the power density limits but still obtain Moore's Law performance/price gains, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Example technologies today include Intel's Xeon Phi and GPGPUs. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High Luminosity LHC, for example, this will be by far the dominant problem. The need for greater parallelism has driven investigations of very different track finding techniques including Cellular Automata or returning to Hough Transform. The most common track finding techniques in use today are however those based on the Kalman Filter [2]. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. They are known to provide high physics performance, are robust and are exactly those being used today for the design of the tracking system for HL-LHC. Our previous investigations showed that, using optimized data structures, track fitting with Kalman Filter can achieve large speedup both with Intel Xeon and Xeon Phi. We report here our further progress towards an end-to-end track reconstruction algorithm fully exploiting vectorization and parallelization techniques in a realistic simulation setup.
Lattice QCD calculation using VPP500
DOE Office of Scientific and Technical Information (OSTI.GOV)
Kim, Seyong; Ohta, Shigemi
1995-02-01
A new vector parallel supercomputer, Fujitsu VPP500, was installed at RIKEN earlier this year. It consists of 30 vector computers, each with 1.6 GFLOPS peak speed and 256 MB memory, connected by a crossbar switch with 400 MB/s peak data transfer rate each way between any pair of nodes. The authors developed a Fortran lattice QCD simulation code for it. It runs at about 1.1 GFLOPS sustained per node for Metropolis pure-gauge update, and about 0.8 GFLOPS sustained per node for conjugate gradient inversion of staggered fermion matrix.
Dual-scale topology optoelectronic processor.
Marsden, G C; Krishnamoorthy, A V; Esener, S C; Lee, S H
1991-12-15
The dual-scale topology optoelectronic processor (D-STOP) is a parallel optoelectronic architecture for matrix algebraic processing. The architecture can be used for matrix-vector multiplication and two types of vector outer product. The computations are performed electronically, which allows multiplication and summation concepts in linear algebra to be generalized to various nonlinear or symbolic operations. This generalization permits the application of D-STOP to many computational problems. The architecture uses a minimum number of optical transmitters, which thereby reduces fabrication requirements while maintaining area-efficient electronics. The necessary optical interconnections are space invariant, minimizing space-bandwidth requirements.
A multi-platform evaluation of the randomized CX low-rank matrix factorization in Spark
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gittens, Alex; Kottalam, Jey; Yang, Jiyan
We investigate the performance and scalability of the randomized CX low-rank matrix factorization and demonstrate its applicability through the analysis of a 1TB mass spectrometry imaging (MSI) dataset, using Apache Spark on an Amazon EC2 cluster, a Cray XC40 system, and an experimental Cray cluster. We implemented this factorization both as a parallelized C implementation with hand-tuned optimizations and in Scala using the Apache Spark high-level cluster computing framework. We obtained consistent performance across the three platforms: using Spark we were able to process the 1TB size dataset in under 30 minutes with 960 cores on all systems, with themore » fastest times obtained on the experimental Cray cluster. In comparison, the C implementation was 21X faster on the Amazon EC2 system, due to careful cache optimizations, bandwidth-friendly access of matrices and vector computation using SIMD units. We report these results and their implications on the hardware and software issues arising in supporting data-centric workloads in parallel and distributed environments.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)
Pingenot, J; Rieben, R; White, D
2004-12-06
We present a computational study of signal propagation and attenuation of a 200 MHz dipole antenna in a cave environment. The cave is modeled as a straight and lossy random rough wall. To simulate a broad frequency band, the full wave Maxwell equations are solved directly in the time domain via a high order vector finite element discretization using the massively parallel CEM code EMSolve. The simulation is performed for a series of random meshes in order to generate statistical data for the propagation and attenuation properties of the cave environment. Results for the power spectral density and phase ofmore » the electric field vector components are presented and discussed.« less
Optimized scalable network switch
Blumrich, Matthias A [Ridgefield, CT; Chen, Dong [Croton On Hudson, NY; Coteus, Paul W [Yorktown Heights, NY; Gara, Alan G [Mount Kisco, NY; Giampapa, Mark E [Irvington, NY; Heidelberger, Philip [Cortlandt Manor, NY; Steinmacher-Burow, Burkhard D [Mount Kisco, NY; Takken, Todd E [Mount Kisco, NY; Vranas, Pavlos M [Bedford Hills, NY
2007-12-04
In a massively parallel computing system having a plurality of nodes configured in m multi-dimensions, each node including a computing device, a method for routing packets towards their destination nodes is provided which includes generating at least one of a 2m plurality of compact bit vectors containing information derived from downstream nodes. A multilevel arbitration process in which downstream information stored in the compact vectors, such as link status information and fullness of downstream buffers, is used to determine a preferred direction and virtual channel for packet transmission. Preferred direction ranges are encoded and virtual channels are selected by examining the plurality of compact bit vectors. This dynamic routing method eliminates the necessity of routing tables, thus enhancing scalability of the switch.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vafin, S.; Schlickeiser, R.; Yoon, P. H.
Recently, the general electromagnetic fluctuation theory for magnetized plasmas has been used to study the steady-state fluctuation spectra and the total intensity of low-frequency collective weakly damped modes for parallel wave vectors in Maxwellian plasmas. Now, we address the same question with respect to an arbitrary direction of the wave-vector. Here, we analyze this problem for equal mass plasmas. These plasmas are a very good tool to study various plasma phenomena, as they considerably facilitate the theoretical consideration and at the same time provide with their clear physical picture. Finally, we compare our results in the limiting case of parallelmore » wave vectors with the previous study.« less
Method and apparatus for second-rank tensor generation
NASA Technical Reports Server (NTRS)
Liu, Hua-Kuang (Inventor)
1991-01-01
A method and apparatus are disclosed for generation of second-rank tensors using a photorefractive crystal to perform the outer-product between two vectors via four-wave mixing, thereby taking 2n input data to a control n squared output data points. Two orthogonal amplitude modulated coherent vector beams x and y are expanded and then parallel sides of the photorefractive crystal in exact opposition. A beamsplitter is used to direct a coherent pumping beam onto the crystal at an appropriate angle so as to produce a conjugate beam that is the matrix product of the vector beam that propagates in the exact opposite direction from the pumping beam. The conjugate beam thus separated is the tensor output xy (sup T).
Optimized scalable network switch
Blumrich, Matthias A.; Chen, Dong; Coteus, Paul W.
2010-02-23
In a massively parallel computing system having a plurality of nodes configured in m multi-dimensions, each node including a computing device, a method for routing packets towards their destination nodes is provided which includes generating at least one of a 2m plurality of compact bit vectors containing information derived from downstream nodes. A multilevel arbitration process in which downstream information stored in the compact vectors, such as link status information and fullness of downstream buffers, is used to determine a preferred direction and virtual channel for packet transmission. Preferred direction ranges are encoded and virtual channels are selected by examining the plurality of compact bit vectors. This dynamic routing method eliminates the necessity of routing tables, thus enhancing scalability of the switch.
A bibliography on parallel and vector numerical algorithms
NASA Technical Reports Server (NTRS)
Ortega, James M.; Voigt, Robert G.; Romine, Charles H.
1988-01-01
This is a bibliography on numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are also listed.
A bibliography on parallel and vector numerical algorithms
NASA Technical Reports Server (NTRS)
Ortega, J. M.; Voigt, R. G.
1987-01-01
This is a bibliography of numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are listed also.
A bibliography on parallel and vector numerical algorithms
NASA Technical Reports Server (NTRS)
Ortega, James M.; Voigt, Robert G.; Romine, Charles H.
1990-01-01
This is a bibliography on numerical methods. It also includes a number of other references on machine architecture, programming language, and other topics of interest to scientific computing. Certain conference proceedings and anthologies which have been published in book form are also listed.
Vector tomography for reconstructing electric fields with non-zero divergence in bounded domains
DOE Office of Scientific and Technical Information (OSTI.GOV)
Koulouri, Alexandra, E-mail: koulouri@uni-muenster.de; Department of Electrical and Electronic Engineering, Imperial College London, Exhibition Road, London SW7 2BT; Brookes, Mike
In vector tomography (VT), the aim is to reconstruct an unknown multi-dimensional vector field using line integral data. In the case of a 2-dimensional VT, two types of line integral data are usually required. These data correspond to integration of the parallel and perpendicular projection of the vector field along the integration lines and are called the longitudinal and transverse measurements, respectively. In most cases, however, the transverse measurements cannot be physically acquired. Therefore, the VT methods are typically used to reconstruct divergence-free (or source-free) velocity and flow fields that can be reconstructed solely from the longitudinal measurements. In thismore » paper, we show how vector fields with non-zero divergence in a bounded domain can also be reconstructed from the longitudinal measurements without the need of explicitly evaluating the transverse measurements. To the best of our knowledge, VT has not previously been used for this purpose. In particular, we study low-frequency, time-harmonic electric fields generated by dipole sources in convex bounded domains which arise, for example, in electroencephalography (EEG) source imaging. We explain in detail the theoretical background, the derivation of the electric field inverse problem and the numerical approximation of the line integrals. We show that fields with non-zero divergence can be reconstructed from the longitudinal measurements with the help of two sparsity constraints that are constructed from the transverse measurements and the vector Laplace operator. As a comparison to EEG source imaging, we note that VT does not require mathematical modeling of the sources. By numerical simulations, we show that the pattern of the electric field can be correctly estimated using VT and the location of the source activity can be determined accurately from the reconstructed magnitudes of the field. - Highlights: • Vector tomography is used to reconstruct electric fields generated by dipole sources. • Inverse solutions are based on longitudinal and transverse line integral measurements. • Transverse line integral measurements are used as a sparsity constraint. • Numerical procedure to approximate the line integrals is described in detail. • Patterns of the studied electric fields are correctly estimated.« less
A parallel solver for huge dense linear systems
NASA Astrophysics Data System (ADS)
Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.
2011-11-01
HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system: Linux/Unix Has the code been vectorized or parallelized?: Yes, includes MPI primitives. RAM: Tested for up to 190 GB Classification: 6.5 External routines: MPI ( http://www.mpi-forum.org/), BLAS ( http://www.netlib.org/blas/), PLAPACK ( http://www.cs.utexas.edu/~plapack/), POOCLAPACK ( ftp://ftp.cs.utexas.edu/pub/rvdg/PLAPACK/pooclapack.ps) (code for PLAPACK and POOCLAPACK is included in the distribution). Catalogue identifier of previous version: AEHU_v1_0 Journal reference of previous version: Comput. Phys. Comm. 182 (2011) 533 Does the new version supersede the previous version?: Yes Nature of problem: Huge scale dense systems of linear equations, Ax=B, beyond standard LAPACK capabilities. Solution method: The linear systems are solved by means of parallelized routines based on the LU factorization, using efficient secondary storage algorithms when the available main memory is insufficient. Reasons for new version: In many applications we need to guarantee a high accuracy in the solution of very large linear systems and we can do it by using double-precision arithmetic. Summary of revisions: Version 1.1 Can be used to solve linear systems using double-precision arithmetic. New version of the initialization routine. The user can choose the kind of arithmetic and the values of several parameters of the environment. Running time: About 5 hours to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors using double-precision arithmetic on an eight-node commodity cluster with a total of 64 Intel cores.
Yu, Dongjun; Wu, Xiaowei; Shen, Hongbin; Yang, Jian; Tang, Zhenmin; Qi, Yong; Yang, Jingyu
2012-12-01
Membrane proteins are encoded by ~ 30% in the genome and function importantly in the living organisms. Previous studies have revealed that membrane proteins' structures and functions show obvious cell organelle-specific properties. Hence, it is highly desired to predict membrane protein's subcellular location from the primary sequence considering the extreme difficulties of membrane protein wet-lab studies. Although many models have been developed for predicting protein subcellular locations, only a few are specific to membrane proteins. Existing prediction approaches were constructed based on statistical machine learning algorithms with serial combination of multi-view features, i.e., different feature vectors are simply serially combined to form a super feature vector. However, such simple combination of features will simultaneously increase the information redundancy that could, in turn, deteriorate the final prediction accuracy. That's why it was often found that prediction success rates in the serial super space were even lower than those in a single-view space. The purpose of this paper is investigation of a proper method for fusing multiple multi-view protein sequential features for subcellular location predictions. Instead of serial strategy, we propose a novel parallel framework for fusing multiple membrane protein multi-view attributes that will represent protein samples in complex spaces. We also proposed generalized principle component analysis (GPCA) for feature reduction purpose in the complex geometry. All the experimental results through different machine learning algorithms on benchmark membrane protein subcellular localization datasets demonstrate that the newly proposed parallel strategy outperforms the traditional serial approach. We also demonstrate the efficacy of the parallel strategy on a soluble protein subcellular localization dataset indicating the parallel technique is flexible to suite for other computational biology problems. The software and datasets are available at: http://www.csbio.sjtu.edu.cn/bioinf/mpsp.
Hybrid simulations of radial transport driven by the Rayleigh-Taylor instability
NASA Astrophysics Data System (ADS)
Delamere, P. A.; Stauffer, B. H.; Ma, X.
2017-12-01
Plasma transport in the rapidly rotating giant magnetospheres is thought to involve a centrifugally-driven flux tube interchange instability, similar to the Rayleigh-Taylor (RT) instability. In three dimensions, the convective flow patterns associated with the RT instability can produce strong guide field reconnection, allowing plasma mass to move radially outward while conserving magnetic flux (Ma et al., 2016). We present a set of hybrid (kinetic ion / fluid electron) plasma simulations of the RT instability using high plasma beta conditions appropriate for Jupiter's inner and middle magnetosphere. A density gradient, combined with a centrifugal force, provide appropriate RT onset conditions. Pressure balance is achieved by initializing two ion populations: one with fixed temperature, but varying density, and the other with fixed density, but a temperature gradient that offsets the density gradient from the first population and the centrifugal force (effective gravity). We first analyze two-dimensional results for the plane perpendicular to the magnetic field by comparing growth rates as a function of wave vector following Huba et al. (1998). Prescribed perpendicular wave modes are seeded with an initial velocity perturbation. We then extend the model to three dimensions, introducing a stabilizing parallel wave vector. Boundary conditions in the parallel direction prohibit motion of the magnetic field line footprints to model the eigenmodes of the magnetodisc's resonant cavity. We again compare growth rates based on perpendicular wave number, but also on the parallel extent of the resonant cavity, which fixes the size of the largest parallel wavelength. Finally, we search for evidence of strong guide field magnetic reconnection within the domain by identifying areas with large parallel electric fields or changes in magnetic field topology.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bemporad, G.A.; Rubin, H.
This manuscript concerns the onset of thermohaline convection in a solar pond subject to field conditions as well as a small scale laboratory test section simulating the solar pond performance. The onset of thermohaline convection is analyzed in this study by means of a linear stability analysis in which the flow field perturbations are expended in sets of complete orthonormal functions satisfying the boundary conditions of the flow field. The linear stability analysis is first performed with regard to an advanced solar pond (ASP) subject to field conditions in which thermohaline convection develops in planes perpendicular to the unperturbed flowmore » velocity vector. In the laboratory simulator of the ASP the width and depth are of the same order of magnitude. In this case it is found that the side walls delay the onset of convection in planes perpendicular to the unperturbed flow velocity vector. The presence of the side walls may cause the planes parallel to the flow velocity to be the most susceptible to the development on all three spatial variables, are predicted. They may develop in planes parallel or perpendicular to the unperturbed velocity vector according to the value of the Reynolds number of the unperturbed flow and the ratio between the width and depth of the ASP simulator.« less
Fu, Jun; Huang, Canqin; Xing, Jianguo; Zheng, Junbao
2012-01-01
Biologically-inspired models and algorithms are considered as promising sensor array signal processing methods for electronic noses. Feature selection is one of the most important issues for developing robust pattern recognition models in machine learning. This paper describes an investigation into the classification performance of a bionic olfactory model with the increase of the dimensions of input feature vector (outer factor) as well as its parallel channels (inner factor). The principal component analysis technique was applied for feature selection and dimension reduction. Two data sets of three classes of wine derived from different cultivars and five classes of green tea derived from five different provinces of China were used for experiments. In the former case the results showed that the average correct classification rate increased as more principal components were put in to feature vector. In the latter case the results showed that sufficient parallel channels should be reserved in the model to avoid pattern space crowding. We concluded that 6∼8 channels of the model with principal component feature vector values of at least 90% cumulative variance is adequate for a classification task of 3∼5 pattern classes considering the trade-off between time consumption and classification rate. PMID:22736979
NASA Astrophysics Data System (ADS)
Xu, Wen-Sheng; Zhang, Wen-Zheng
2018-01-01
A new orientation relationship (OR) is found between Widmanstätten cementite precipitates and the austenite matrix in a 1.3C-14Mn steel. The associated habit plane (HP) and the dislocations in the HP have been investigated with transmission electron microscopy. The HP is parallel to ? in cementite, and it is parallel to ? in austenite. Three groups of interfacial dislocations are observed in the HP, with limited quantitative experimental data. The line directions, the spacing and the Burgers vectors of two sets of dislocations have been calculated based on a misfit analysis, which combines the CSL/DSC/O-lattice theories, row matching and good matching site (GMS) mappings. The calculated results are in reasonable agreement with the experimental results. The dislocations 'Coarse 1' and 'Fine 1' are in the same direction as the matching rows, i.e. ?. 'Coarse 1' dislocations are secondary dislocations with a Burgers vector of ?, and 'Fine 1' dislocations are pseudo-primary dislocations with a plausible Burgers vector of ?. The reason why the fraction of the new OR is much less than that of the dominant Pitsch OR has been discussed in terms of the degree of matching in the HPs.
Bale, S D; Mozer, F S
2007-05-18
Large parallel (
Tong, Alex W; Nemunaitis, John; Su, Dan; Zhang, Yuan; Cunningham, Casey; Senzer, Neil; Netto, George; Rich, Dawn; Mhashilkar, Abner; Parker, Karen; Coffee, Keith; Ramesh, Rajagopal; Ekmekcioglu, Suhendan; Grimm, Elizabeth A; van Wart Hood, Jill; Merritt, James; Chada, Sunil
2005-01-01
The mda-7 gene (approved gene symbol IL24) is a novel tumor suppressor gene with tumor-apoptotic and immune-activating properties. We completed a Phase I dose-escalation clinical trial, in which a nonreplicating adenoviral construct expressing the mda-7 transgene (INGN 241; Ad-mda7) was administered intratumorally to 22 patients with advanced cancer. Excised tumors were evaluated for vector-specific DNA and RNA, transgenic MDA-7 expression, and biological effects. Successful gene transfer as assessed by DNA- and RT-PCR was demonstrated in 100% of patients evaluated. DNA analyses demonstrated a dose-dependent penetration of INGN 241 (up to 4 x 10(8) copies/mug DNA at the 2 x 10(12) vp dose). A parallel distribution of vector DNA, vector RNA, MDA-7 protein expression, and apoptosis induction was observed in all tumors, with signals decreasing with distance away from the injection site. Additional evidence for bioactivity of INGN 241 was illustrated via regulation of the MDA-7 target genes beta-catenin, iNOS, and CD31. Transient increases (up to 20-fold) of serum IL-6, IL-10, and TNF-alpha were observed. Significantly higher elevations of IL-6 and TNF-alpha were observed in patients who responded clinically to INGN 241. Patients also showed marked increases of CD3+CD8+ T cells posttreatment, suggesting that INGN 241 increased systemic TH1 cytokine production and mobilized CD8+ T cells. Intratumoral delivery of INGN 241 induced apoptosis in a large volume of tumor and elicited tumor-regulatory and immune-activating events that are consistent with the preclinical features of MDA-7/IL-24.
Assessing Density Functionals Using Many Body Theory for Hybrid Perovskites
NASA Astrophysics Data System (ADS)
Bokdam, Menno; Lahnsteiner, Jonathan; Ramberger, Benjamin; Schäfer, Tobias; Kresse, Georg
2017-10-01
Which density functional is the "best" for structure simulations of a particular material? A concise, first principles, approach to answer this question is presented. The random phase approximation (RPA)—an accurate many body theory—is used to evaluate various density functionals. To demonstrate and verify the method, we apply it to the hybrid perovskite MAPbI3 , a promising new solar cell material. The evaluation is done by first creating finite temperature ensembles for small supercells using RPA molecular dynamics, and then evaluating the variance between the RPA and various approximate density functionals for these ensembles. We find that, contrary to recent suggestions, van der Waals functionals do not improve the description of the material, whereas hybrid functionals and the strongly constrained appropriately normed (SCAN) density functional yield very good agreement with the RPA. Finally, our study shows that in the room temperature tetragonal phase of MAPbI3 , the molecules are preferentially parallel to the shorter lattice vectors but reorientation on ps time scales is still possible.
Facial Expression Generation from Speaker's Emotional States in Daily Conversation
NASA Astrophysics Data System (ADS)
Mori, Hiroki; Ohshima, Koh
A framework for generating facial expressions from emotional states in daily conversation is described. It provides a mapping between emotional states and facial expressions, where the former is represented by vectors with psychologically-defined abstract dimensions, and the latter is coded by the Facial Action Coding System. In order to obtain the mapping, parallel data with rated emotional states and facial expressions were collected for utterances of a female speaker, and a neural network was trained with the data. The effectiveness of proposed method is verified by a subjective evaluation test. As the result, the Mean Opinion Score with respect to the suitability of generated facial expression was 3.86 for the speaker, which was close to that of hand-made facial expressions.
NASA Technical Reports Server (NTRS)
Kramer, Williams T. C.; Simon, Horst D.
1994-01-01
This tutorial proposes to be a practical guide for the uninitiated to the main topics and themes of high-performance computing (HPC), with particular emphasis to distributed computing. The intent is first to provide some guidance and directions in the rapidly increasing field of scientific computing using both massively parallel and traditional supercomputers. Because of their considerable potential computational power, loosely or tightly coupled clusters of workstations are increasingly considered as a third alternative to both the more conventional supercomputers based on a small number of powerful vector processors, as well as high massively parallel processors. Even though many research issues concerning the effective use of workstation clusters and their integration into a large scale production facility are still unresolved, such clusters are already used for production computing. In this tutorial we will utilize the unique experience made at the NAS facility at NASA Ames Research Center. Over the last five years at NAS massively parallel supercomputers such as the Connection Machines CM-2 and CM-5 from Thinking Machines Corporation and the iPSC/860 (Touchstone Gamma Machine) and Paragon Machines from Intel were used in a production supercomputer center alongside with traditional vector supercomputers such as the Cray Y-MP and C90.
ms2: A molecular simulation tool for thermodynamic properties
NASA Astrophysics Data System (ADS)
Deublein, Stephan; Eckl, Bernhard; Stoll, Jürgen; Lishchuk, Sergey V.; Guevara-Carrion, Gabriela; Glass, Colin W.; Merker, Thorsten; Bernreuther, Martin; Hasse, Hans; Vrabec, Jadran
2011-11-01
This work presents the molecular simulation program ms2 that is designed for the calculation of thermodynamic properties of bulk fluids in equilibrium consisting of small electro-neutral molecules. ms2 features the two main molecular simulation techniques, molecular dynamics (MD) and Monte-Carlo. It supports the calculation of vapor-liquid equilibria of pure fluids and multi-component mixtures described by rigid molecular models on the basis of the grand equilibrium method. Furthermore, it is capable of sampling various classical ensembles and yields numerous thermodynamic properties. To evaluate the chemical potential, Widom's test molecule method and gradual insertion are implemented. Transport properties are determined by equilibrium MD simulations following the Green-Kubo formalism. ms2 is designed to meet the requirements of academia and industry, particularly achieving short response times and straightforward handling. It is written in Fortran90 and optimized for a fast execution on a broad range of computer architectures, spanning from single processor PCs over PC-clusters and vector computers to high-end parallel machines. The standard Message Passing Interface (MPI) is used for parallelization and ms2 is therefore easily portable to different computing platforms. Feature tools facilitate the interaction with the code and the interpretation of input and output files. The accuracy and reliability of ms2 has been shown for a large variety of fluids in preceding work. Program summaryProgram title:ms2 Catalogue identifier: AEJF_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEJF_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Special Licence supplied by the authors No. of lines in distributed program, including test data, etc.: 82 794 No. of bytes in distributed program, including test data, etc.: 793 705 Distribution format: tar.gz Programming language: Fortran90 Computer: The simulation tool ms2 is usable on a wide variety of platforms, from single processor machines over PC-clusters and vector computers to vector-parallel architectures. (Tested with Fortran compilers: gfortran, Intel, PathScale, Portland Group and Sun Studio.) Operating system: Unix/Linux, Windows Has the code been vectorized or parallelized?: Yes. Message Passing Interface (MPI) protocol Scalability. Excellent scalability up to 16 processors for molecular dynamics and >512 processors for Monte-Carlo simulations. RAM:ms2 runs on single processors with 512 MB RAM. The memory demand rises with increasing number of processors used per node and increasing number of molecules. Classification: 7.7, 7.9, 12 External routines: Message Passing Interface (MPI) Nature of problem: Calculation of application oriented thermodynamic properties for rigid electro-neutral molecules: vapor-liquid equilibria, thermal and caloric data as well as transport properties of pure fluids and multi-component mixtures. Solution method: Molecular dynamics, Monte-Carlo, various classical ensembles, grand equilibrium method, Green-Kubo formalism. Restrictions: No. The system size is user-defined. Typical problems addressed by ms2 can be solved by simulating systems containing typically 2000 molecules or less. Unusual features: Feature tools are available for creating input files, analyzing simulation results and visualizing molecular trajectories. Additional comments: Sample makefiles for multiple operation platforms are provided. Documentation is provided with the installation package and is available at http://www.ms-2.de. Running time: The running time of ms2 depends on the problem set, the system size and the number of processes used in the simulation. Running four processes on a "Nehalem" processor, simulations calculating VLE data take between two and twelve hours, calculating transport properties between six and 24 hours.
Optimization of sparse matrix-vector multiplication on emerging multicore platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Samuel; Oliker, Leonid; Vuduc, Richard
2007-01-01
We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD dual-core and Intel quad-core designs, the heterogeneous STI Cell, as well as the first scientificmore » study of the highly multithreaded Sun Niagara2. We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural tradeoffs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.« less
Rigorous vector wave propagation for arbitrary flat media
NASA Astrophysics Data System (ADS)
Bos, Steven P.; Haffert, Sebastiaan Y.; Keller, Christoph U.
2017-08-01
Precise modelling of the (off-axis) point spread function (PSF) to identify geometrical and polarization aberrations is important for many optical systems. In order to characterise the PSF of the system in all Stokes parameters, an end-to-end simulation of the system has to be performed in which Maxwell's equations are rigorously solved. We present the first results of a python code that we are developing to perform multiscale end-to-end wave propagation simulations that include all relevant physics. Currently we can handle plane-parallel near- and far-field vector diffraction effects of propagating waves in homogeneous isotropic and anisotropic materials, refraction and reflection of flat parallel surfaces, interference effects in thin films and unpolarized light. We show that the code has a numerical precision on the order of 10-16 for non-absorbing isotropic and anisotropic materials. For absorbing materials the precision is on the order of 10-8. The capabilities of the code are demonstrated by simulating a converging beam reflecting from a flat aluminium mirror at normal incidence.
A Tensor Product Formulation of Strassen's Matrix Multiplication Algorithm with Memory Reduction
Kumar, B.; Huang, C. -H.; Sadayappan, P.; ...
1995-01-01
In this article, we present a program generation strategy of Strassen's matrix multiplication algorithm using a programming methodology based on tensor product formulas. In this methodology, block recursive programs such as the fast Fourier Transforms and Strassen's matrix multiplication algorithm are expressed as algebraic formulas involving tensor products and other matrix operations. Such formulas can be systematically translated to high-performance parallel/vector codes for various architectures. In this article, we present a nonrecursive implementation of Strassen's algorithm for shared memory vector processors such as the Cray Y-MP. A previous implementation of Strassen's algorithm synthesized from tensor product formulas required working storagemore » of size O(7 n ) for multiplying 2 n × 2 n matrices. We present a modified formulation in which the working storage requirement is reduced to O(4 n ). The modified formulation exhibits sufficient parallelism for efficient implementation on a shared memory multiprocessor. Performance results on a Cray Y-MP8/64 are presented.« less
The ASC Sequoia Programming Model
DOE Office of Scientific and Technical Information (OSTI.GOV)
Seager, M
2008-08-06
In the late 1980's and early 1990's, Lawrence Livermore National Laboratory was deeply engrossed in determining the next generation programming model for the Integrated Design Codes (IDC) beyond vectorization for the Cray 1s series of computers. The vector model, developed in mid 1970's first for the CDC 7600 and later extended from stack based vector operation to memory to memory operations for the Cray 1s, lasted approximately 20 years (See Slide 5). The Cray vector era was deemed an extremely long lived era as it allowed vector codes to be developed over time (the Cray 1s were faster in scalarmore » mode than the CDC 7600) with vector unit utilization increasing incrementally over time. The other attributes of the Cray vector era at LLNL were that we developed, supported and maintained the Operating System (LTSS and later NLTSS), communications protocols (LINCS), Compilers (Civic Fortran77 and Model), operating system tools (e.g., batch system, job control scripting, loaders, debuggers, editors, graphics utilities, you name it) and math and highly machine optimized libraries (e.g., SLATEC, and STACKLIB). Although LTSS was adopted by Cray for early system generations, they later developed COS and UNICOS operating systems and environment on their own. In the late 1970s and early 1980s two trends appeared that made the Cray vector programming model (described above including both the hardware and system software aspects) seem potentially dated and slated for major revision. These trends were the appearance of low cost CMOS microprocessors and their attendant, departmental and mini-computers and later workstations and personal computers. With the wide spread adoption of Unix in the early 1980s, it appeared that LLNL (and the other DOE Labs) would be left out of the mainstream of computing without a rapid transition to these 'Killer Micros' and modern OS and tools environments. The other interesting advance in the period is that systems were being developed with multiple 'cores' in them and called Symmetric Multi-Processor or Shared Memory Processor (SMP) systems. The parallel revolution had begun. The Laboratory started a small 'parallel processing project' in 1983 to study the new technology and its application to scientific computing with four people: Tim Axelrod, Pete Eltgroth, Paul Dubois and Mark Seager. Two years later, Eugene Brooks joined the team. This team focused on Unix and 'killer micro' SMPs. Indeed, Eugene Brooks was credited with coming up with the 'Killer Micro' term. After several generations of SMP platforms (e.g., Sequent Balance 8000 with 8 33MHz MC32032s, Allian FX8 with 8 MC68020 and FPGA based Vector Units and finally the BB&N Butterfly with 128 cores), it became apparent to us that the killer micro revolution would indeed take over Crays and that we definitely needed a new programming and systems model. The model developed by Mark Seager and Dale Nielsen focused on both the system aspects (Slide 3) and the code development aspects (Slide 4). Although now succinctly captured in two attached slides, at the time there was tremendous ferment in the research community as to what parallel programming model would emerge, dominate and survive. In addition, we wanted a model that would provide portability between platforms of a single generation but also longevity over multiple--and hopefully--many generations. Only after we developed the 'Livermore Model' and worked it out in considerable detail did it become obvious that what we came up with was the right approach. In a nutshell, the applications programming model of the Livermore Model posited that SMP parallelism would ultimately not scale indefinitely and one would have to bite the bullet and implement MPI parallelism within the Integrated Design Code (IDC). We also had a major emphasis on doing everything in a completely standards based, portable methodology with POSIX/Unix as the target environment. We decided against specialized libraries like STACKLIB for performance, but kept as many general purpose, portable math libraries as were needed by the codes. Third, we assumed that the SMPs in clusters would evolve in time to become more powerful, feature rich and, in particular, offer more cores. Thus, we focused on OpenMP, and POSIX PThreads for programming SMP parallelism. These code porting efforts were lead by Dale Nielsen, A-Division code group leader, and Randy Christensen, B-Division code group leader. Most of the porting effort revolved removing 'Crayisms' in the codes: artifacts of LTSS/NLTSS, Civic compiler extensions beyond Fortran77, IO libraries and dealing with new code control languages (we switched to Perl and later to Python). Adding MPI to the codes was initially problematic and error prone because the programmers used MPI directly and sprinkled the calls throughout the code.« less
NASA Astrophysics Data System (ADS)
Vafin, S.; Schlickeiser, R.; Yoon, P. H.
2016-05-01
The general electromagnetic fluctuation theory for magnetized plasmas is used to calculate the steady-state wave number spectra and total electromagnetic field strength of low-frequency collective weakly damped eigenmodes with parallel wavevectors in a Maxwellian electron-proton plasma. These result from the equilibrium of spontaneous emission and collisionless damping, and they represent the minimum electromagnetic fluctuations guaranteed in quiet thermal space plasmas, including the interstellar and interplanetary medium. Depending on the plasma beta, the ratio of |δB |/B0 can be as high as 10-12 .
Understanding the Cray X1 System
NASA Technical Reports Server (NTRS)
Cheung, Samson
2004-01-01
This paper helps the reader understand the characteristics of the Cray X1 vector supercomputer system, and provides hints and information to enable the reader to port codes to the system. It provides a comparison between the basic performance of the X1 platform and other platforms that are available at NASA Ames Research Center. A set of codes, solving the Laplacian equation with different parallel paradigms, is used to understand some features of the X1 compiler. An example code from the NAS Parallel Benchmarks is used to demonstrate performance optimization on the X1 platform.
A portable approach for PIC on emerging architectures
NASA Astrophysics Data System (ADS)
Decyk, Viktor
2016-03-01
A portable approach for designing Particle-in-Cell (PIC) algorithms on emerging exascale computers, is based on the recognition that 3 distinct programming paradigms are needed. They are: low level vector (SIMD) processing, middle level shared memory parallel programing, and high level distributed memory programming. In addition, there is a memory hierarchy associated with each level. Such algorithms can be initially developed using vectorizing compilers, OpenMP, and MPI. This is the approach recommended by Intel for the Phi processor. These algorithms can then be translated and possibly specialized to other programming models and languages, as needed. For example, the vector processing and shared memory programming might be done with CUDA instead of vectorizing compilers and OpenMP, but generally the algorithm itself is not greatly changed. The UCLA PICKSC web site at http://www.idre.ucla.edu/ contains example open source skeleton codes (mini-apps) illustrating each of these three programming models, individually and in combination. Fortran2003 now supports abstract data types, and design patterns can be used to support a variety of implementations within the same code base. Fortran2003 also supports interoperability with C so that implementations in C languages are also easy to use. Finally, main codes can be translated into dynamic environments such as Python, while still taking advantage of high performing compiled languages. Parallel languages are still evolving with interesting developments in co-Array Fortran, UPC, and OpenACC, among others, and these can also be supported within the same software architecture. Work supported by NSF and DOE Grants.
Gravitropism in Arabidopsis thaliana: violation of the sine- and resultant-law
NASA Astrophysics Data System (ADS)
Galland, Paul
We investigated the gravitropic bending of hypocotyls and roots of seedlings of Arabidopsis tha-liana in response to long-term centrifugal accelerations in a range of 5 x 10-3 to 4 x g. The so-cal-led resultant law of gravitropism, a corollary of the so called sine law, claims that during centri-fugation a gravitropic organ aligns itself parallel to the resultant stimulus vector. We show here that neither of the two empirical “laws” is apt to describe the complex gravitropic behaviour of seedlings of Arabidopsis. Hypocotyls obey reasonably well the resultant law while roots display a complex behaviour that is clearly at variance with it. Horizontally centrifuged seedlings sense minute accelerations acting parallel to the longitudinal axis. If the centrifugal vector points to-ward the cotyledons, then the bending of hypocotyls and roots is greatly enhanced. If the centri-fugal vector points, however, toward the root tip, then only the bending of roots is enhanced by accelerations as low as 5 x 10-3 x g (positive tonic effect). The absolute gravitropic thresholds were determined for hypocotyls and roots in a clinostat-centrifuge and found to be near 1.5 x 10-2 x g. A behavioural mutant, ehb1-2 (Knauer et al. 2011), displays a lower gravitropic threshold for roots, not however, for hypocotyls. The complex gravitropic behaviour of seedlings of Arabi-dopsis is at odds with the classical sine- as well as the resultant law and can indicates the eminent role that is played by the acceleration vector operating longitudinally to the seedling axis.
Boisgérault, Florence; Mingozzi, Federico
2015-01-01
Since the early days of gene therapy, muscle has been one the most studied tissue targets for the correction of enzyme deficiencies and myopathies. Several preclinical and clinical studies have been conducted using adeno-associated virus (AAV) vectors. Exciting progress has been made in the gene delivery technologies, from the identification of novel AAV serotypes to the development of novel vector delivery techniques. In parallel, significant knowledge has been generated on the host immune system and its interaction with both the vector and the transgene at the muscle level. In particular, the role of underlying muscle inflammation, characteristic of several diseases affecting the muscle, has been defined in terms of its potential detrimental impact on gene transfer with AAV vectors. At the same time, feedback immunomodulatory mechanisms peculiar of skeletal muscle involving resident regulatory T cells have been identified, which seem to play an important role in maintaining, at least to some extent, muscle homeostasis during inflammation and regenerative processes. Devising strategies to tip this balance towards unresponsiveness may represent an avenue to improve the safety and efficacy of muscle gene transfer with AAV vectors. PMID:26122097
Beckett, Travis; Bonneau, Laura; Howard, Alan; Blanchard, James; Borda, Juan; Weiner, Daniel J.; Wang, Lili; Gao, Guang Ping; Kolls, Jay K.; Bohm, Rudolf; Liggitt, Denny
2012-01-01
Abstract Use of perfluorochemical liquids during intratracheal vector administration enhances recombinant adenovirus and adeno-associated virus (AAV)-mediated lung epithelial gene expression. We hypothesized that inhalation of nebulized perfluorochemical vapor would also enhance epithelial gene expression after subsequent intratracheal vector administration. Freely breathing adult C57BL/6 mice were exposed for selected times to nebulized perflubron or sterile saline in a sealed Plexiglas chamber. Recombinant adenoviral vector was administered by transtracheal puncture at selected times afterward and mice were killed 3 days after vector administration to assess transgene expression. Mice tolerated the nebulized perflubron without obvious ill effects. Vector administration 6 hr after nebulized perflubron exposure resulted in an average 540% increase in gene expression in airway and alveolar epithelium, compared with that with vector alone or saline plus vector control (p<0.05). However, vector administration 1 hr, 1 day, or 3 days after perflubron exposure was not different from either nebulized saline with vector or vector alone and a 60-min exposure to nebulized perflubron is required. In parallel pilot studies in macaques, inhalation of nebulized perflubron enhanced recombinant AAV2/5 vector expression throughout the lung. Serial chest radiographs, bronchoalveolar lavages, and results of complete blood counts and serum biochemistries demonstrated no obvious adverse effects of nebulized perflubron. Further, one macaque receiving nebulized perflubron only was monitored for 1 year with no obvious adverse effects of exposure. These results demonstrate that inhalation of nebulized perflubron, a simple, clinically more feasible technique than intratracheal administration of liquid perflubron, safely enhances lung gene expression. PMID:22568624
NASA Astrophysics Data System (ADS)
Frickenhaus, Stephan; Hiller, Wolfgang; Best, Meike
The portable software FoSSI is introduced that—in combination with additional free solver software packages—allows for an efficient and scalable parallel solution of large sparse linear equations systems arising in finite element model codes. FoSSI is intended to support rapid model code development, completely hiding the complexity of the underlying solver packages. In particular, the model developer need not be an expert in parallelization and is yet free to switch between different solver packages by simple modifications of the interface call. FoSSI offers an efficient and easy, yet flexible interface to several parallel solvers, most of them available on the web, such as PETSC, AZTEC, MUMPS, PILUT and HYPRE. FoSSI makes use of the concept of handles for vectors, matrices, preconditioners and solvers, that is frequently used in solver libraries. Hence, FoSSI allows for a flexible treatment of several linear equations systems and associated preconditioners at the same time, even in parallel on separate MPI-communicators. The second special feature in FoSSI is the task specifier, being a combination of keywords, each configuring a certain phase in the solver setup. This enables the user to control a solver over one unique subroutine. Furthermore, FoSSI has rather similar features for all solvers, making a fast solver intercomparison or exchange an easy task. FoSSI is a community software, proven in an adaptive 2D-atmosphere model and a 3D-primitive equation ocean model, both formulated in finite elements. The present paper discusses perspectives of an OpenMP-implementation of parallel iterative solvers based on domain decomposition methods. This approach to OpenMP solvers is rather attractive, as the code for domain-local operations of factorization, preconditioning and matrix-vector product can be readily taken from a sequential implementation that is also suitable to be used in an MPI-variant. Code development in this direction is in an advanced state under the name ScOPES: the Scalable Open Parallel sparse linear Equations Solver.
A novel double fine guide sensor design on space telescope
NASA Astrophysics Data System (ADS)
Zhang, Xu-xu; Yin, Da-yi
2018-02-01
To get high precision attitude for space telescope, a double marginal FOV (field of view) FGS (Fine Guide Sensor) is proposed. It is composed of two large area APS CMOS sensors and both share the same lens in main light of sight. More star vectors can be get by two FGS and be used for high precision attitude determination. To improve star identification speed, the vector cross product in inter-star angles for small marginal FOV different from traditional way is elaborated and parallel processing method is applied to pyramid algorithm. The star vectors from two sensors are then used to attitude fusion with traditional QUEST algorithm. The simulation results show that the system can get high accuracy three axis attitudes and the scheme is feasibility.
Plant Virus–Insect Vector Interactions: Current and Potential Future Research Directions
Dietzgen, Ralf G.; Mann, Krin S.; Johnson, Karyn N.
2016-01-01
Acquisition and transmission by an insect vector is central to the infection cycle of the majority of plant pathogenic viruses. Plant viruses can interact with their insect host in a variety of ways including both non-persistent and circulative transmission; in some cases, the latter involves virus replication in cells of the insect host. Replicating viruses can also elicit both innate and specific defense responses in the insect host. A consistent feature is that the interaction of the virus with its insect host/vector requires specific molecular interactions between virus and host, commonly via proteins. Understanding the interactions between plant viruses and their insect host can underpin approaches to protect plants from infection by interfering with virus uptake and transmission. Here, we provide a perspective focused on identifying novel approaches and research directions to facilitate control of plant viruses by better understanding and targeting virus–insect molecular interactions. We also draw parallels with molecular interactions in insect vectors of animal viruses, and consider technical advances for their control that may be more broadly applicable to plant virus vectors. PMID:27834855
Plant Virus-Insect Vector Interactions: Current and Potential Future Research Directions.
Dietzgen, Ralf G; Mann, Krin S; Johnson, Karyn N
2016-11-09
Acquisition and transmission by an insect vector is central to the infection cycle of the majority of plant pathogenic viruses. Plant viruses can interact with their insect host in a variety of ways including both non-persistent and circulative transmission; in some cases, the latter involves virus replication in cells of the insect host. Replicating viruses can also elicit both innate and specific defense responses in the insect host. A consistent feature is that the interaction of the virus with its insect host/vector requires specific molecular interactions between virus and host, commonly via proteins. Understanding the interactions between plant viruses and their insect host can underpin approaches to protect plants from infection by interfering with virus uptake and transmission. Here, we provide a perspective focused on identifying novel approaches and research directions to facilitate control of plant viruses by better understanding and targeting virus-insect molecular interactions. We also draw parallels with molecular interactions in insect vectors of animal viruses, and consider technical advances for their control that may be more broadly applicable to plant virus vectors.
Man, Sumche; Maan, Arie C; Schalij, Martin J; Swenne, Cees A
2015-01-01
In the course of time, electrocardiography has assumed several modalities with varying electrode numbers, electrode positions and lead systems. 12-lead electrocardiography and 3-lead vectorcardiography have become particularly popular. These modalities developed in parallel through the mid-twentieth century. In the same time interval, the physical concepts underlying electrocardiography were defined and worked out. In particular, the vector concept (heart vector, lead vector, volume conductor) appeared to be essential to understanding the manifestations of electrical heart activity, both in the 12-lead electrocardiogram (ECG) and in the 3-lead vectorcardiogram (VCG). Not universally appreciated in the clinic, the vectorcardiogram, and with it the vector concept, went out of use. A revival of vectorcardiography started in the 90's, when VCGs were mathematically synthesized from standard 12-lead ECGs. This facilitated combined electrocardiography and vectorcardiography without the need for a special recording system. This paper gives an overview of these historical developments, elaborates on the vector concept and seeks to define where VCG analysis/interpretation can add diagnostic/prognostic value to conventional 12-lead ECG analysis. Copyright © 2015 Elsevier Inc. All rights reserved.
On nonlinear finite element analysis in single-, multi- and parallel-processors
NASA Technical Reports Server (NTRS)
Utku, S.; Melosh, R.; Islam, M.; Salama, M.
1982-01-01
Numerical solution of nonlinear equilibrium problems of structures by means of Newton-Raphson type iterations is reviewed. Each step of the iteration is shown to correspond to the solution of a linear problem, therefore the feasibility of the finite element method for nonlinear analysis is established. Organization and flow of data for various types of digital computers, such as single-processor/single-level memory, single-processor/two-level-memory, vector-processor/two-level-memory, and parallel-processors, with and without sub-structuring (i.e. partitioning) are given. The effect of the relative costs of computation, memory and data transfer on substructuring is shown. The idea of assigning comparable size substructures to parallel processors is exploited. Under Cholesky type factorization schemes, the efficiency of parallel processing is shown to decrease due to the occasional shared data, just as that due to the shared facilities.
Domain decomposition methods in aerodynamics
NASA Technical Reports Server (NTRS)
Venkatakrishnan, V.; Saltz, Joel
1990-01-01
Compressible Euler equations are solved for two-dimensional problems by a preconditioned conjugate gradient-like technique. An approximate Riemann solver is used to compute the numerical fluxes to second order accuracy in space. Two ways to achieve parallelism are tested, one which makes use of parallelism inherent in triangular solves and the other which employs domain decomposition techniques. The vectorization/parallelism in triangular solves is realized by the use of a recording technique called wavefront ordering. This process involves the interpretation of the triangular matrix as a directed graph and the analysis of the data dependencies. It is noted that the factorization can also be done in parallel with the wave front ordering. The performances of two ways of partitioning the domain, strips and slabs, are compared. Results on Cray YMP are reported for an inviscid transonic test case. The performances of linear algebra kernels are also reported.
DOE Office of Scientific and Technical Information (OSTI.GOV)
G.A. Pope; K. Sephernoori; D.C. McKinney
1996-03-15
This report describes the application of distributed-memory parallel programming techniques to a compositional simulator called UTCHEM. The University of Texas Chemical Flooding reservoir simulator (UTCHEM) is a general-purpose vectorized chemical flooding simulator that models the transport of chemical species in three-dimensional, multiphase flow through permeable media. The parallel version of UTCHEM addresses solving large-scale problems by reducing the amount of time that is required to obtain the solution as well as providing a flexible and portable programming environment. In this work, the original parallel version of UTCHEM was modified and ported to CRAY T3D and CRAY T3E, distributed-memory, multiprocessor computersmore » using CRAY-PVM as the interprocessor communication library. Also, the data communication routines were modified such that the portability of the original code across different computer architectures was mad possible.« less
An implementation of a tree code on a SIMD, parallel computer
NASA Technical Reports Server (NTRS)
Olson, Kevin M.; Dorband, John E.
1994-01-01
We describe a fast tree algorithm for gravitational N-body simulation on SIMD parallel computers. The tree construction uses fast, parallel sorts. The sorted lists are recursively divided along their x, y and z coordinates. This data structure is a completely balanced tree (i.e., each particle is paired with exactly one other particle) and maintains good spatial locality. An implementation of this tree-building algorithm on a 16k processor Maspar MP-1 performs well and constitutes only a small fraction (approximately 15%) of the entire cycle of finding the accelerations. Each node in the tree is treated as a monopole. The tree search and the summation of accelerations also perform well. During the tree search, node data that is needed from another processor is simply fetched. Roughly 55% of the tree search time is spent in communications between processors. We apply the code to two problems of astrophysical interest. The first is a simulation of the close passage of two gravitationally, interacting, disk galaxies using 65,636 particles. We also simulate the formation of structure in an expanding, model universe using 1,048,576 particles. Our code attains speeds comparable to one head of a Cray Y-MP, so single instruction, multiple data (SIMD) type computers can be used for these simulations. The cost/performance ratio for SIMD machines like the Maspar MP-1 make them an extremely attractive alternative to either vector processors or large multiple instruction, multiple data (MIMD) type parallel computers. With further optimizations (e.g., more careful load balancing), speeds in excess of today's vector processing computers should be possible.
User's and test case manual for FEMATS
NASA Technical Reports Server (NTRS)
Chatterjee, Arindam; Volakis, John; Nurnberger, Mike; Natzke, John
1995-01-01
The FEMATS program incorporates first-order edge-based finite elements and vector absorbing boundary conditions into the scattered field formulation for computation of the scattering from three-dimensional geometries. The code has been validated extensively for a large class of geometries containing inhomogeneities and satisfying transition conditions. For geometries that are too large for the workstation environment, the FEMATS code has been optimized to run on various supercomputers. Currently, FEMATS has been configured to run on the HP 9000 workstation, vectorized for the Cray Y-MP, and parallelized to run on the Kendall Square Research (KSR) architecture and the Intel Paragon.
A set of ligation-independent in vitro translation vectors for eukaryotic protein production.
Bardóczy, Viola; Géczi, Viktória; Sawasaki, Tatsuya; Endo, Yaeta; Mészáros, Tamás
2008-03-27
The last decade has brought the renaissance of protein studies and accelerated the development of high-throughput methods in all aspects of proteomics. Presently, most protein synthesis systems exploit the capacity of living cells to translate proteins, but their application is limited by several factors. A more flexible alternative protein production method is the cell-free in vitro protein translation. Currently available in vitro translation systems are suitable for high-throughput robotic protein production, fulfilling the requirements of proteomics studies. Wheat germ extract based in vitro translation system is likely the most promising method, since numerous eukaryotic proteins can be cost-efficiently synthesized in their native folded form. Although currently available vectors for wheat embryo in vitro translation systems ensure high productivity, they do not meet the requirements of state-of-the-art proteomics. Target genes have to be inserted using restriction endonucleases and the plasmids do not encode cleavable affinity purification tags. We designed four ligation independent cloning (LIC) vectors for wheat germ extract based in vitro protein translation. In these constructs, the RNA transcription is driven by T7 or SP6 phage polymerase and two TEV protease cleavable affinity tags can be added to aid protein purification. To evaluate our improved vectors, a plant mitogen activated protein kinase was cloned in all four constructs. Purification of this eukaryotic protein kinase demonstrated that all constructs functioned as intended: insertion of PCR fragment by LIC worked efficiently, affinity purification of translated proteins by GST-Sepharose or MagneHis particles resulted in high purity kinase, and the affinity tags could efficiently be removed under different reaction conditions. Furthermore, high in vitro kinase activity testified of proper folding of the purified protein. Four newly designed in vitro translation vectors have been constructed which allow fast and parallel cloning and protein purification, thus representing useful molecular tools for high-throughput production of eukaryotic proteins.
Long-Term Effect of Gene Therapy on Leber’s Congenital Amaurosis
Bainbridge, J.W.B.; Mehat, M.S.; Sundaram, V.; Robbie, S.J.; Barker, S.E.; Ripamonti, C.; Georgiadis, A.; Mowat, F.M.; Beattie, S.G.; Gardner, P.J.; Feathers, K.L.; Luong, V.A.; Yzer, S.; Balaggan, K.; Viswanathan, A.; de Ravel, T.J.L.; Casteels, I.; Holder, G.E.; Tyler, N.; Fitzke, F.W.; Weleber, R.G.; Nardini, M.; Moore, A.T.; Thompson, D.A.; Petersen-Jones, S.M.; Michaelides, M.; van den Born, L.I.; Stockman, A.; Smith, A.J.; Rubin, G.; Ali, R.R.
2015-01-01
BACKGROUND Mutations in RPE65 cause Leber’s congenital amaurosis, a progressive retinal degenerative disease that severely impairs sight in children. Gene therapy can result in modest improvements in night vision, but knowledge of its efficacy in humans is limited. METHODS We performed a phase 1–2 open-label trial involving 12 participants to evaluate the safety and efficacy of gene therapy with a recombinant adeno-associated virus 2/2 (rAAV2/2) vector carrying the RPE65 complementary DNA, and measured visual function over the course of 3 years. Four participants were administered a lower dose of the vector, and 8 were administered a higher dose. In a parallel study in dogs, we investigated the relationship among vector dose, visual function, and electroretinography (ERG) findings. RESULTS Improvements in retinal sensitivity were evident, to varying extents, in six participants for up to 3 years, peaking at 6 to 12 months after treatment and then declining. No associated improvement in retinal function was detected by means of ERG. Three participants had intraocular inflammation, and two had clinically significant deterioration of visual acuity. The reduction in central retinal thickness varied among participants. In dogs, RPE65 gene therapy with the same vector at lower doses improved vision-guided behavior, but only higher doses resulted in improvements in retinal function that were detectable with the use of ERG. CONCLUSIONS Gene therapy with rAAV2/2 RPE65 vector improved retinal sensitivity, albeit modestly and temporarily. Comparison with the results obtained in the dog model indicates that there is a species difference in the amount of RPE65 required to drive the visual cycle and that the demand for RPE65 in affected persons was not met to the extent required for a durable, robust effect. (Funded by the National Institute for Health Research and others; ClinicalTrials.gov number, NCT00643747.) PMID:25938638
Smisc - A collection of miscellaneous functions
DOE Office of Scientific and Technical Information (OSTI.GOV)
Landon Sego, PNNL
2015-08-31
A collection of functions for statistical computing and data manipulation. These include routines for rapidly aggregating heterogeneous matrices, manipulating file names, loading R objects, sourcing multiple R files, formatting datetimes, multi-core parallel computing, stream editing, specialized plotting, etc. Smisc-package A collection of miscellaneous functions allMissing Identifies missing rows or columns in a data frame or matrix as.numericSilent Silent wrapper for coercing a vector to numeric comboList Produces all possible combinations of a set of linear model predictors cumMax Computes the maximum of the vector up to the current index cumsumNA Computes the cummulative sum of a vector without propogating NAsmore » d2binom Probability functions for the sum of two independent binomials dataIn A flexible way to import data into R. dbb The Beta-Binomial Distribution df2list Row-wise conversion of a data frame to a list dfplapply Parallelized single row processing of a data frame dframeEquiv Examines the equivalence of two dataframes or matrices dkbinom Probability functions for the sum of k independent binomials factor2character Converts all factor variables in a dataframe to character variables findDepMat Identify linearly dependent rows or columns in a matrix formatDT Converts date or datetime strings into alternate formats getExtension Filename manipulations: remove the extension or path, extract the extension or path getPath Filename manipulations: remove the extension or path, extract the extension or path grabLast Filename manipulations: remove the extension or path, extract the extension or path ifelse1 Non-vectorized version of ifelse integ Simple numerical integration routine interactionPlot Two-way Interaction Plot with Error Bar linearMap Linear mapping of a numerical vector or scalar list2df Convert a list to a data frame loadObject Loads and returns the object(s) in an ".Rdata" file more Display the contents of a file to the R terminal movAvg2 Calculate the moving average using a 2-sided window openDevice Opens a graphics device based on the filename extension p2binom Probability functions for the sum of two independent binomials padZero Pad a vector of numbers with zeros parseJob Parses a collection of elements into (almost) equal sized groups pbb The Beta-Binomial Distribution pcbinom A continuous version of the binomial cdf pkbinom Probability functions for the sum of k independent binomials plapply Simple parallelization of lapply plotFun Plot one or more functions on a single plot PowerData An example of power data pvar Prints the name and value of one or more objects qbb The Beta-Binomial Distribution rbb And numerous others (space limits reporting).« less
Cylindrical Vector Beams for Rapid Polarization-Dependent Measurements in Atomic Systems
2011-12-05
www.opticsinfobase.org/abstract.cfm?URI=oe-18-24-25035. 16. S. Tripathi and K. C. Toussaint, Jr., “Rapid Mueller matrix polarimetry based on parallelized...optical trapping [11], atom guiding [12], laser machining [13], charged particle acceleration [14,15], and polarimetry [16]. Yet despite numerous
Male motion coordination in swarming Anopheles gambiae and Anopheles coluzzii
USDA-ARS?s Scientific Manuscript database
The Anopheles gambiae species complex comprises the primary vectors of malaria in much of sub-Saharan Africa; most of the mating in these species occurs in swarms composed almost entirely of males. Intermittent, parallel flight patterns in such swarms have been observed, but a detailed description o...
Scalable Parallel Methods for Analyzing Metagenomics Data at Extreme Scale
DOE Office of Scientific and Technical Information (OSTI.GOV)
Daily, Jeffrey A.
2015-05-01
The field of bioinformatics and computational biology is currently experiencing a data revolution. The exciting prospect of making fundamental biological discoveries is fueling the rapid development and deployment of numerous cost-effective, high-throughput next-generation sequencing technologies. The result is that the DNA and protein sequence repositories are being bombarded with new sequence information. Databases are continuing to report a Moore’s law-like growth trajectory in their database sizes, roughly doubling every 18 months. In what seems to be a paradigm-shift, individual projects are now capable of generating billions of raw sequence data that need to be analyzed in the presence of alreadymore » annotated sequence information. While it is clear that data-driven methods, such as sequencing homology detection, are becoming the mainstay in the field of computational life sciences, the algorithmic advancements essential for implementing complex data analytics at scale have mostly lagged behind. Sequence homology detection is central to a number of bioinformatics applications including genome sequencing and protein family characterization. Given millions of sequences, the goal is to identify all pairs of sequences that are highly similar (or “homologous”) on the basis of alignment criteria. While there are optimal alignment algorithms to compute pairwise homology, their deployment for large-scale is currently not feasible; instead, heuristic methods are used at the expense of quality. In this dissertation, we present the design and evaluation of a parallel implementation for conducting optimal homology detection on distributed memory supercomputers. Our approach uses a combination of techniques from asynchronous load balancing (viz. work stealing, dynamic task counters), data replication, and exact-matching filters to achieve homology detection at scale. Results for a collection of 2.56M sequences show parallel efficiencies of ~75-100% on up to 8K cores, representing a time-to-solution of 33 seconds. We extend this work with a detailed analysis of single-node sequence alignment performance using the latest CPU vector instruction set extensions. Preliminary results reveal that current sequence alignment algorithms are unable to fully utilize widening vector registers.« less
NASA Astrophysics Data System (ADS)
Krieg, Todd D.; Salinas, Felipe S.; Narayana, Shalini; Fox, Peter T.; Mogul, David J.
2015-08-01
Objective. Transcranial magnetic stimulation (TMS) represents a powerful technique to noninvasively modulate cortical neurophysiology in the brain. However, the relationship between the magnetic fields created by TMS coils and neuronal activation in the cortex is still not well-understood, making predictable cortical activation by TMS difficult to achieve. Our goal in this study was to investigate the relationship between induced electric fields and cortical activation measured by blood flow response. Particularly, we sought to discover the E-field characteristics that lead to cortical activation. Approach. Subject-specific finite element models (FEMs) of the head and brain were constructed for each of six subjects using magnetic resonance image scans. Positron emission tomography (PET) measured each subject’s cortical response to image-guided robotically-positioned TMS to the primary motor cortex. FEM models that employed the given coil position, orientation, and stimulus intensity in experimental applications of TMS were used to calculate the electric field (E-field) vectors within a region of interest for each subject. TMS-induced E-fields were analyzed to better understand what vector components led to regional cerebral blood flow (CBF) responses recorded by PET. Main results. This study found that decomposing the E-field into orthogonal vector components based on the cortical surface geometry (and hence, cortical neuron directions) led to significant differences between the regions of cortex that were active and nonactive. Specifically, active regions had significantly higher E-field components in the normal inward direction (i.e., parallel to pyramidal neurons in the dendrite-to-axon orientation) and in the tangential direction (i.e., parallel to interneurons) at high gradient. In contrast, nonactive regions had higher E-field vectors in the outward normal direction suggesting inhibitory responses. Significance. These results provide critical new understanding of the factors by which TMS induces cortical activation necessary for predictive and repeatable use of this noninvasive stimulation modality.
GPU-based Branchless Distance-Driven Projection and Backprojection
Liu, Rui; Fu, Lin; De Man, Bruno; Yu, Hengyong
2017-01-01
Projection and backprojection operations are essential in a variety of image reconstruction and physical correction algorithms in CT. The distance-driven (DD) projection and backprojection are widely used for their highly sequential memory access pattern and low arithmetic cost. However, a typical DD implementation has an inner loop that adjusts the calculation depending on the relative position between voxel and detector cell boundaries. The irregularity of the branch behavior makes it inefficient to be implemented on massively parallel computing devices such as graphics processing units (GPUs). Such irregular branch behaviors can be eliminated by factorizing the DD operation as three branchless steps: integration, linear interpolation, and differentiation, all of which are highly amenable to massive vectorization. In this paper, we implement and evaluate a highly parallel branchless DD algorithm for 3D cone beam CT. The algorithm utilizes the texture memory and hardware interpolation on GPUs to achieve fast computational speed. The developed branchless DD algorithm achieved 137-fold speedup for forward projection and 188-fold speedup for backprojection relative to a single-thread CPU implementation. Compared with a state-of-the-art 32-thread CPU implementation, the proposed branchless DD achieved 8-fold acceleration for forward projection and 10-fold acceleration for backprojection. GPU based branchless DD method was evaluated by iterative reconstruction algorithms with both simulation and real datasets. It obtained visually identical images as the CPU reference algorithm. PMID:29333480
GPU-based Branchless Distance-Driven Projection and Backprojection.
Liu, Rui; Fu, Lin; De Man, Bruno; Yu, Hengyong
2017-12-01
Projection and backprojection operations are essential in a variety of image reconstruction and physical correction algorithms in CT. The distance-driven (DD) projection and backprojection are widely used for their highly sequential memory access pattern and low arithmetic cost. However, a typical DD implementation has an inner loop that adjusts the calculation depending on the relative position between voxel and detector cell boundaries. The irregularity of the branch behavior makes it inefficient to be implemented on massively parallel computing devices such as graphics processing units (GPUs). Such irregular branch behaviors can be eliminated by factorizing the DD operation as three branchless steps: integration, linear interpolation, and differentiation, all of which are highly amenable to massive vectorization. In this paper, we implement and evaluate a highly parallel branchless DD algorithm for 3D cone beam CT. The algorithm utilizes the texture memory and hardware interpolation on GPUs to achieve fast computational speed. The developed branchless DD algorithm achieved 137-fold speedup for forward projection and 188-fold speedup for backprojection relative to a single-thread CPU implementation. Compared with a state-of-the-art 32-thread CPU implementation, the proposed branchless DD achieved 8-fold acceleration for forward projection and 10-fold acceleration for backprojection. GPU based branchless DD method was evaluated by iterative reconstruction algorithms with both simulation and real datasets. It obtained visually identical images as the CPU reference algorithm.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bailey, David H.
The NAS Parallel Benchmarks (NPB) are a suite of parallel computer performance benchmarks. They were originally developed at the NASA Ames Research Center in 1991 to assess high-end parallel supercomputers. Although they are no longer used as widely as they once were for comparing high-end system performance, they continue to be studied and analyzed a great deal in the high-performance computing community. The acronym 'NAS' originally stood for the Numerical Aeronautical Simulation Program at NASA Ames. The name of this organization was subsequently changed to the Numerical Aerospace Simulation Program, and more recently to the NASA Advanced Supercomputing Center, althoughmore » the acronym remains 'NAS.' The developers of the original NPB suite were David H. Bailey, Eric Barszcz, John Barton, David Browning, Russell Carter, LeoDagum, Rod Fatoohi, Samuel Fineberg, Paul Frederickson, Thomas Lasinski, Rob Schreiber, Horst Simon, V. Venkatakrishnan and Sisira Weeratunga. The original NAS Parallel Benchmarks consisted of eight individual benchmark problems, each of which focused on some aspect of scientific computing. The principal focus was in computational aerophysics, although most of these benchmarks have much broader relevance, since in a much larger sense they are typical of many real-world scientific computing applications. The NPB suite grew out of the need for a more rational procedure to select new supercomputers for acquisition by NASA. The emergence of commercially available highly parallel computer systems in the late 1980s offered an attractive alternative to parallel vector supercomputers that had been the mainstay of high-end scientific computing. However, the introduction of highly parallel systems was accompanied by a regrettable level of hype, not only on the part of the commercial vendors but even, in some cases, by scientists using the systems. As a result, it was difficult to discern whether the new systems offered any fundamental performance advantage over vector supercomputers, and, if so, which of the parallel offerings would be most useful in real-world scientific computation. In part to draw attention to some of the performance reporting abuses prevalent at the time, the present author wrote a humorous essay 'Twelve Ways to Fool the Masses,' which described in a light-hearted way a number of the questionable ways in which both vendor marketing people and scientists were inflating and distorting their performance results. All of this underscored the need for an objective and scientifically defensible measure to compare performance on these systems.« less
A diagram for evaluating multiple aspects of model performance in simulating vector fields
NASA Astrophysics Data System (ADS)
Xu, Zhongfeng; Hou, Zhaolu; Han, Ying; Guo, Weidong
2016-12-01
Vector quantities, e.g., vector winds, play an extremely important role in climate systems. The energy and water exchanges between different regions are strongly dominated by wind, which in turn shapes the regional climate. Thus, how well climate models can simulate vector fields directly affects model performance in reproducing the nature of a regional climate. This paper devises a new diagram, termed the vector field evaluation (VFE) diagram, which is a generalized Taylor diagram and able to provide a concise evaluation of model performance in simulating vector fields. The diagram can measure how well two vector fields match each other in terms of three statistical variables, i.e., the vector similarity coefficient, root mean square length (RMSL), and root mean square vector difference (RMSVD). Similar to the Taylor diagram, the VFE diagram is especially useful for evaluating climate models. The pattern similarity of two vector fields is measured by a vector similarity coefficient (VSC) that is defined by the arithmetic mean of the inner product of normalized vector pairs. Examples are provided, showing that VSC can identify how close one vector field resembles another. Note that VSC can only describe the pattern similarity, and it does not reflect the systematic difference in the mean vector length between two vector fields. To measure the vector length, RMSL is included in the diagram. The third variable, RMSVD, is used to identify the magnitude of the overall difference between two vector fields. Examples show that the VFE diagram can clearly illustrate the extent to which the overall RMSVD is attributed to the systematic difference in RMSL and how much is due to the poor pattern similarity.
Vector Beam Polarization State Spectrum Analyzer.
Moreno, Ignacio; Davis, Jeffrey A; Badham, Katherine; Sánchez-López, María M; Holland, Joseph E; Cottrell, Don M
2017-05-22
We present a proof of concept for a vector beam polarization state spectrum analyzer based on the combination of a polarization diffraction grating (PDG) and an encoded harmonic q-plate grating (QPG). As a result, a two-dimensional polarization diffraction grating is formed that generates six different q-plate channels with topological charges from -3 to +3 in the horizontal direction, and each is split in the vertical direction into the six polarization channels at the cardinal points of the corresponding higher-order Poincaré sphere. Consequently, 36 different channels are generated in parallel. This special polarization diffractive element is experimentally demonstrated using a single phase-only spatial light modulator in a reflective optical architecture. Finally, we show that this system can be used as a vector beam polarization state spectrum analyzer, where both the topological charge and the state of polarization of an input vector beam can be simultaneously determined in a single experiment. We expect that these results would be useful for applications in optical communications.
NASA Technical Reports Server (NTRS)
Charlesworth, Arthur
1990-01-01
The nondeterministic divide partitions a vector into two non-empty slices by allowing the point of division to be chosen nondeterministically. Support for high-level divide-and-conquer programming provided by the nondeterministic divide is investigated. A diva algorithm is a recursive divide-and-conquer sequential algorithm on one or more vectors of the same range, whose division point for a new pair of recursive calls is chosen nondeterministically before any computation is performed and whose recursive calls are made immediately after the choice of division point; also, access to vector components is only permitted during activations in which the vector parameters have unit length. The notion of diva algorithm is formulated precisely as a diva call, a restricted call on a sequential procedure. Diva calls are proven to be intimately related to associativity. Numerous applications of diva calls are given and strategies are described for translating a diva call into code for a variety of parallel computers. Thus diva algorithms separate logical correctness concerns from implementation concerns.
Streamlining workflow and automation to accelerate laboratory scale protein production.
Konczal, Jennifer; Gray, Christopher H
2017-05-01
Protein production facilities are often required to produce diverse arrays of proteins for demanding methodologies including crystallography, NMR, ITC and other reagent intensive techniques. It is common for these teams to find themselves a bottleneck in the pipeline of ambitious projects. This pressure to deliver has resulted in the evolution of many novel methods to increase capacity and throughput at all stages in the pipeline for generation of recombinant proteins. This review aims to describe current and emerging options to accelerate the success of protein production in Escherichia coli. We emphasize technologies that have been evaluated and implemented in our laboratory, including innovative molecular biology and expression vectors, small-scale expression screening strategies and the automation of parallel and multidimensional chromatography. Copyright © 2017 The Authors. Published by Elsevier Inc. All rights reserved.
A fast pulse design for parallel excitation with gridding conjugate gradient.
Feng, Shuo; Ji, Jim
2013-01-01
Parallel excitation (pTx) is recognized as a crucial technique in high field MRI to address the transmit field inhomogeneity problem. However, it can be time consuming to design pTx pulses which is not desirable. In this work, we propose a pulse design with gridding conjugate gradient (CG) based on the small-tip-angle approximation. The two major time consuming matrix-vector multiplications are substituted by two operators which involves with FFT and gridding only. Simulation results have shown that the proposed method is 3 times faster than conventional method and the memory cost is reduced by 1000 times.
Three-dimensional magnetic bubble memory system
NASA Technical Reports Server (NTRS)
Stadler, Henry L. (Inventor); Katti, Romney R. (Inventor); Wu, Jiin-Chuan (Inventor)
1994-01-01
A compact memory uses magnetic bubble technology for providing data storage. A three-dimensional arrangement, in the form of stacks of magnetic bubble layers, is used to achieve high volumetric storage density. Output tracks are used within each layer to allow data to be accessed uniquely and unambiguously. Storage can be achieved using either current access or field access magnetic bubble technology. Optical sensing via the Faraday effect is used to detect data. Optical sensing facilitates the accessing of data from within the three-dimensional package and lends itself to parallel operation for supporting high data rates and vector and parallel processing.
NASA Astrophysics Data System (ADS)
Farrugia, C. J.; Erkaev, N. V.; Torbert, R. B.; Biernat, H. K.; Gratton, F. T.; Szabo, A.; Kucharek, H.; Matsui, H.; Lin, R. P.; Ogilvie, K. W.; Lepping, R. P.; Smith, C. W.
2010-08-01
While there are many approximations describing the flow of the solar wind past the magnetosphere in the magnetosheath, the case of perfectly aligned (parallel or anti-parallel) interplanetary magnetic field (IMF) and solar wind flow vectors can be treated exactly in a magnetohydrodynamic (MHD) approach. In this work we examine a case of nearly-opposed (to within 15°) interplanetary field and flow vectors, which occurred on October 24-25, 2001 during passage of the last interplanetary coronal mass ejection in an ejecta merger. Interplanetary data are from the ACE spacecraft. Simultaneously Wind was crossing the near-Earth (X ˜ -13 Re) geomagnetic tail and subsequently made an approximately 5-hour-long magnetosheath crossing close to the ecliptic plane (Z = -0.7 Re). Geomagnetic activity was returning steadily to quiet, “ground” conditions. We first compare the predictions of the Spreiter and Rizzi theory with the Wind magnetosheath observations and find fair agreement, in particular as regards the proportionality of the magnetic field strength and the product of the plasma density and bulk speed. We then carry out a small-perturbation analysis of the Spreiter and Rizzi solution to account for the small IMF components perpendicular to the flow vector. The resulting expression is compared to the time series of the observations and satisfactory agreement is obtained. We also present and discuss observations in the dawnside boundary layer of pulsed, high-speed (v ˜ 600 km/s) flows exceeding the solar wind flow speeds. We examine various generating mechanisms and suggest that the most likely cause is a wave of frequency 3.2 mHz excited at the inner edge of the boundary layer by the Kelvin-Helmholtz instability.
NASA Astrophysics Data System (ADS)
Farrugia, Charles
While there are many approximations describing the flow of the solar wind past the mag-netosphere in the magnetosheath, the case of perfectly aligned (parallel or anti-parallel) in-terplanetary magnetic field (IMF) and solar wind flow vectors can be treated exactly in an magnetohydrodynamic (MHD) approach (Spreiter and Rizzi, 1974). In this work we examine a case of nearly-opposed (to within 15 deg) interplanetary field and flow vectors, which occurred on October 24-25, 2001 during passage of the last interplanetary coronal mass ejection in an ejecta merger. Interplanetary data are from the ACE spacecraft. Simultaneously Wind was crossing the near-Earth (X -13 Re) geomagnetic tail and subsequently made a 5-hour-long magnetosheath crossing close to the ecliptic plane (Z = -0.7 Re). Geomagnetic activity was returning steadily to quiet, "ground" conditions. We first compare the predictions of the Spre-iter and Rizzi theory with the Wind magnetosheath observations and find fair agreement, in particular as regards the proportionality of the magnetic field strength and the product of the plasma density and bulk speed. We then carry out a small-perturbation analysis of the Spreiter and Rizzi solution to account for the small IMF components perpendicular to the flow vector. The resulting expression is compared to the time series of the observations and satisfactory agreement is obtained. We also present and discuss observations in the dawnside boundary layer of pulsed, high-speed (v 600 km/s) flows exceeding the solar wind flow speeds. We examine various generating mechanisms and suggest that the most likely causeis a wave of frequency 3.2 mHz excited at the inner edge of the boundary layer.
Efficient Iterative Methods Applied to the Solution of Transonic Flows
NASA Astrophysics Data System (ADS)
Wissink, Andrew M.; Lyrintzis, Anastasios S.; Chronopoulos, Anthony T.
1996-02-01
We investigate the use of an inexact Newton's method to solve the potential equations in the transonic regime. As a test case, we solve the two-dimensional steady transonic small disturbance equation. Approximate factorization/ADI techniques have traditionally been employed for implicit solutions of this nonlinear equation. Instead, we apply Newton's method using an exact analytical determination of the Jacobian with preconditioned conjugate gradient-like iterative solvers for solution of the linear systems in each Newton iteration. Two iterative solvers are tested; a block s-step version of the classical Orthomin(k) algorithm called orthogonal s-step Orthomin (OSOmin) and the well-known GMRES method. The preconditioner is a vectorizable and parallelizable version of incomplete LU (ILU) factorization. Efficiency of the Newton-Iterative method on vector and parallel computer architectures is the main issue addressed. In vectorized tests on a single processor of the Cray C-90, the performance of Newton-OSOmin is superior to Newton-GMRES and a more traditional monotone AF/ADI method (MAF) for a variety of transonic Mach numbers and mesh sizes. Newton-GMRES is superior to MAF for some cases. The parallel performance of the Newton method is also found to be very good on multiple processors of the Cray C-90 and on the massively parallel thinking machine CM-5, where very fast execution rates (up to 9 Gflops) are found for large problems.
Improving Vector Evaluated Particle Swarm Optimisation by Incorporating Nondominated Solutions
Lim, Kian Sheng; Ibrahim, Zuwairie; Buyamin, Salinda; Ahmad, Anita; Naim, Faradila; Ghazali, Kamarul Hawari; Mokhtar, Norrima
2013-01-01
The Vector Evaluated Particle Swarm Optimisation algorithm is widely used to solve multiobjective optimisation problems. This algorithm optimises one objective using a swarm of particles where their movements are guided by the best solution found by another swarm. However, the best solution of a swarm is only updated when a newly generated solution has better fitness than the best solution at the objective function optimised by that swarm, yielding poor solutions for the multiobjective optimisation problems. Thus, an improved Vector Evaluated Particle Swarm Optimisation algorithm is introduced by incorporating the nondominated solutions as the guidance for a swarm rather than using the best solution from another swarm. In this paper, the performance of improved Vector Evaluated Particle Swarm Optimisation algorithm is investigated using performance measures such as the number of nondominated solutions found, the generational distance, the spread, and the hypervolume. The results suggest that the improved Vector Evaluated Particle Swarm Optimisation algorithm has impressive performance compared with the conventional Vector Evaluated Particle Swarm Optimisation algorithm. PMID:23737718
Improving Vector Evaluated Particle Swarm Optimisation by incorporating nondominated solutions.
Lim, Kian Sheng; Ibrahim, Zuwairie; Buyamin, Salinda; Ahmad, Anita; Naim, Faradila; Ghazali, Kamarul Hawari; Mokhtar, Norrima
2013-01-01
The Vector Evaluated Particle Swarm Optimisation algorithm is widely used to solve multiobjective optimisation problems. This algorithm optimises one objective using a swarm of particles where their movements are guided by the best solution found by another swarm. However, the best solution of a swarm is only updated when a newly generated solution has better fitness than the best solution at the objective function optimised by that swarm, yielding poor solutions for the multiobjective optimisation problems. Thus, an improved Vector Evaluated Particle Swarm Optimisation algorithm is introduced by incorporating the nondominated solutions as the guidance for a swarm rather than using the best solution from another swarm. In this paper, the performance of improved Vector Evaluated Particle Swarm Optimisation algorithm is investigated using performance measures such as the number of nondominated solutions found, the generational distance, the spread, and the hypervolume. The results suggest that the improved Vector Evaluated Particle Swarm Optimisation algorithm has impressive performance compared with the conventional Vector Evaluated Particle Swarm Optimisation algorithm.
A Systolic Architecture for Singular Value Decomposition,
1983-01-01
Presented at the 1 st International Colloquium on Vector and Parallel Computing in Scientific Applications, Paris, March 191J Contract N00014-82-K.0703...Gene Golub. Private comunication . given inputs x and n 2 , compute 2 2 2 2 /6/ G. H. Golub and F. T. Luk : "Singular Value I + X1 Decomposition
NAS Applications and Advanced Algorithms
NASA Technical Reports Server (NTRS)
Bailey, David H.; Biswas, Rupak; VanDerWijngaart, Rob; Kutler, Paul (Technical Monitor)
1997-01-01
This paper examines the applications most commonly run on the supercomputers at the Numerical Aerospace Simulation (NAS) facility. It analyzes the extent to which such applications are fundamentally oriented to vector computers, and whether or not they can be efficiently implemented on hierarchical memory machines, such as systems with cache memories and highly parallel, distributed memory systems.
Molecular Symmetry in Ab Initio Calculations
NASA Astrophysics Data System (ADS)
Madhavan, P. V.; Written, J. L.
1987-05-01
A scheme is presented for the construction of the Fock matrix in LCAO-SCF calculations and for the transformation of basis integrals to LCAO-MO integrals that can utilize several symmetry unique lists of integrals corresponding to different symmetry groups. The algorithm is fully compatible with vector processing machines and is especially suited for parallel processing machines.
Electromagnetic banana kinetic equation and its applications in tokamaks
NASA Astrophysics Data System (ADS)
Shaing, K. C.; Chu, M. S.; Sabbagh, S. A.; Seol, J.
2018-03-01
A banana kinetic equation in tokamaks that includes effects of the finite banana width is derived for the electromagnetic waves with frequencies lower than the gyro-frequency and the bounce frequency of the trapped particles. The radial wavelengths are assumed to be either comparable to or shorter than the banana width, but much wider than the gyro-radius. One of the consequences of the banana kinetics is that the parallel component of the vector potential is not annihilated by the orbit averaging process and appears in the banana kinetic equation. The equation is solved to calculate the neoclassical quasilinear transport fluxes in the superbanana plateau regime caused by electromagnetic waves. The transport fluxes can be used to model electromagnetic wave and the chaotic magnetic field induced thermal particle or energetic alpha particle losses in tokamaks. It is shown that the parallel component of the vector potential enhances losses when it is the sole transport mechanism. In particular, the fact that the drift resonance can cause significant transport losses in the chaotic magnetic field in the hitherto unknown low collisionality regimes is emphasized.
Optimization of Sparse Matrix-Vector Multiplication on Emerging Multicore Platforms
DOE Office of Scientific and Technical Information (OSTI.GOV)
Williams, Samuel; Oliker, Leonid; Vuduc, Richard
2008-10-16
We are witnessing a dramatic change in computer architecture due to the multicore paradigm shift, as every electronic device from cell phones to supercomputers confronts parallelism of unprecedented scale. To fully unleash the potential of these systems, the HPC community must develop multicore specific-optimization methodologies for important scientific computations. In this work, we examine sparse matrix-vector multiply (SpMV) - one of the most heavily used kernels in scientific computing - across a broad spectrum of multicore designs. Our experimental platform includes the homogeneous AMD quad-core, AMD dual-core, and Intel quad-core designs, the heterogeneous STI Cell, as well as one ofmore » the first scientific studies of the highly multithreaded Sun Victoria Falls (a Niagara2 SMP). We present several optimization strategies especially effective for the multicore environment, and demonstrate significant performance improvements compared to existing state-of-the-art serial and parallel SpMV implementations. Additionally, we present key insights into the architectural trade-offs of leading multicore design strategies, in the context of demanding memory-bound numerical algorithms.« less
NASA Astrophysics Data System (ADS)
Sylwestrzak, Marcin; Szlag, Daniel; Marchand, Paul J.; Kumar, Ashwin S.; Lasser, Theo
2017-08-01
We present an application of massively parallel processing of quantitative flow measurements data acquired using spectral optical coherence microscopy (SOCM). The need for massive signal processing of these particular datasets has been a major hurdle for many applications based on SOCM. In view of this difficulty, we implemented and adapted quantitative total flow estimation algorithms on graphics processing units (GPU) and achieved a 150 fold reduction in processing time when compared to a former CPU implementation. As SOCM constitutes the microscopy counterpart to spectral optical coherence tomography (SOCT), the developed processing procedure can be applied to both imaging modalities. We present the developed DLL library integrated in MATLAB (with an example) and have included the source code for adaptations and future improvements. Catalogue identifier: AFBT_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AFBT_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: GNU GPLv3 No. of lines in distributed program, including test data, etc.: 913552 No. of bytes in distributed program, including test data, etc.: 270876249 Distribution format: tar.gz Programming language: CUDA/C, MATLAB. Computer: Intel x64 CPU, GPU supporting CUDA technology. Operating system: 64-bit Windows 7 Professional. Has the code been vectorized or parallelized?: Yes, CPU code has been vectorized in MATLAB, CUDA code has been parallelized. RAM: Dependent on users parameters, typically between several gigabytes and several tens of gigabytes Classification: 6.5, 18. Nature of problem: Speed up of data processing in optical coherence microscopy Solution method: Utilization of GPU for massively parallel data processing Additional comments: Compiled DLL library with source code and documentation, example of utilization (MATLAB script with raw data) Running time: 1,8 s for one B-scan (150 × faster in comparison to the CPU data processing time)
[Orthogonal Vector Projection Algorithm for Spectral Unmixing].
Song, Mei-ping; Xu, Xing-wei; Chang, Chein-I; An, Ju-bai; Yao, Li
2015-12-01
Spectrum unmixing is an important part of hyperspectral technologies, which is essential for material quantity analysis in hyperspectral imagery. Most linear unmixing algorithms require computations of matrix multiplication and matrix inversion or matrix determination. These are difficult for programming, especially hard for realization on hardware. At the same time, the computation costs of the algorithms increase significantly as the number of endmembers grows. Here, based on the traditional algorithm Orthogonal Subspace Projection, a new method called. Orthogonal Vector Projection is prompted using orthogonal principle. It simplifies this process by avoiding matrix multiplication and inversion. It firstly computes the final orthogonal vector via Gram-Schmidt process for each endmember spectrum. And then, these orthogonal vectors are used as projection vector for the pixel signature. The unconstrained abundance can be obtained directly by projecting the signature to the projection vectors, and computing the ratio of projected vector length and orthogonal vector length. Compared to the Orthogonal Subspace Projection and Least Squares Error algorithms, this method does not need matrix inversion, which is much computation costing and hard to implement on hardware. It just completes the orthogonalization process by repeated vector operations, easy for application on both parallel computation and hardware. The reasonability of the algorithm is proved by its relationship with Orthogonal Sub-space Projection and Least Squares Error algorithms. And its computational complexity is also compared with the other two algorithms', which is the lowest one. At last, the experimental results on synthetic image and real image are also provided, giving another evidence for effectiveness of the method.
An efficient implementation of a high-order filter for a cubed-sphere spectral element model
NASA Astrophysics Data System (ADS)
Kang, Hyun-Gyu; Cheong, Hyeong-Bin
2017-03-01
A parallel-scalable, isotropic, scale-selective spatial filter was developed for the cubed-sphere spectral element model on the sphere. The filter equation is a high-order elliptic (Helmholtz) equation based on the spherical Laplacian operator, which is transformed into cubed-sphere local coordinates. The Laplacian operator is discretized on the computational domain, i.e., on each cell, by the spectral element method with Gauss-Lobatto Lagrange interpolating polynomials (GLLIPs) as the orthogonal basis functions. On the global domain, the discrete filter equation yielded a linear system represented by a highly sparse matrix. The density of this matrix increases quadratically (linearly) with the order of GLLIP (order of the filter), and the linear system is solved in only O (Ng) operations, where Ng is the total number of grid points. The solution, obtained by a row reduction method, demonstrated the typical accuracy and convergence rate of the cubed-sphere spectral element method. To achieve computational efficiency on parallel computers, the linear system was treated by an inverse matrix method (a sparse matrix-vector multiplication). The density of the inverse matrix was lowered to only a few times of the original sparse matrix without degrading the accuracy of the solution. For better computational efficiency, a local-domain high-order filter was introduced: The filter equation is applied to multiple cells, and then the central cell was only used to reconstruct the filtered field. The parallel efficiency of applying the inverse matrix method to the global- and local-domain filter was evaluated by the scalability on a distributed-memory parallel computer. The scale-selective performance of the filter was demonstrated on Earth topography. The usefulness of the filter as a hyper-viscosity for the vorticity equation was also demonstrated.
A portable MPI-based parallel vector template library
NASA Technical Reports Server (NTRS)
Sheffler, Thomas J.
1995-01-01
This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C++ by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of C or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
A Portable MPI-Based Parallel Vector Template Library
NASA Technical Reports Server (NTRS)
Sheffler, Thomas J.
1995-01-01
This paper discusses the design and implementation of a polymorphic collection library for distributed address-space parallel computers. The library provides a data-parallel programming model for C + + by providing three main components: a single generic collection class, generic algorithms over collections, and generic algebraic combining functions. Collection elements are the fourth component of a program written using the library and may be either of the built-in types of c or of user-defined types. Many ideas are borrowed from the Standard Template Library (STL) of C++, although a restricted programming model is proposed because of the distributed address-space memory model assumed. Whereas the STL provides standard collections and implementations of algorithms for uniprocessors, this paper advocates standardizing interfaces that may be customized for different parallel computers. Just as the STL attempts to increase programmer productivity through code reuse, a similar standard for parallel computers could provide programmers with a standard set of algorithms portable across many different architectures. The efficacy of this approach is verified by examining performance data collected from an initial implementation of the library running on an IBM SP-2 and an Intel Paragon.
MLP: A Parallel Programming Alternative to MPI for New Shared Memory Parallel Systems
NASA Technical Reports Server (NTRS)
Taft, James R.
1999-01-01
Recent developments at the NASA AMES Research Center's NAS Division have demonstrated that the new generation of NUMA based Symmetric Multi-Processing systems (SMPs), such as the Silicon Graphics Origin 2000, can successfully execute legacy vector oriented CFD production codes at sustained rates far exceeding processing rates possible on dedicated 16 CPU Cray C90 systems. This high level of performance is achieved via shared memory based Multi-Level Parallelism (MLP). This programming approach, developed at NAS and outlined below, is distinct from the message passing paradigm of MPI. It offers parallelism at both the fine and coarse grained level, with communication latencies that are approximately 50-100 times lower than typical MPI implementations on the same platform. Such latency reductions offer the promise of performance scaling to very large CPU counts. The method draws on, but is also distinct from, the newly defined OpenMP specification, which uses compiler directives to support a limited subset of multi-level parallel operations. The NAS MLP method is general, and applicable to a large class of NASA CFD codes.
Dynamic current-current susceptibility in three-dimensional Dirac and Weyl semimetals
NASA Astrophysics Data System (ADS)
Thakur, Anmol; Sadhukhan, Krishanu; Agarwal, Amit
2018-01-01
We study the linear response of doped three-dimensional Dirac and Weyl semimetals to vector potentials, by calculating the wave-vector- and frequency-dependent current-current response function analytically. The longitudinal part of the dynamic current-current response function is then used to study the plasmon dispersion and the optical conductivity. The transverse response in the static limit yields the orbital magnetic susceptibility. In a Weyl semimetal, along with the current-current response function, all these quantities are significantly impacted by the presence of parallel electric and magnetic fields (a finite E .B term) and can be used to experimentally explore the chiral anomaly.
Zeier, Zane; Aguilar, J Santiago; Lopez, Cecilia M; Devi-Rao, G B; Watson, Zachary L; Baker, Henry V; Wagner, Edward K; Bloom, David C
2010-01-01
Herpes simplex virus type 1 (HSV-1)–based vectors readily transduce neurons and have a large payload capacity, making them particularly amenable to gene therapy applications within the central nervous system (CNS). Because aspects of the host responses to HSV-1 vectors in the CNS are largely unknown, we compared the host response of a nonreplicating HSV-1 vector to that of a replication-competent HSV-1 virus using microarray analysis. In parallel, HSV-1 gene expression was tracked using HSV-specific oligonucleotide-based arrays in order to correlate viral gene expression with observed changes in host response. Microarray analysis was performed following stereotactic injection into the right hippocampal formation of mice with either a replication-competent HSV-1 or a nonreplicating recombinant of HSV-1, lacking the ICP4 gene (ICP4−). Genes that demonstrated a significant change (P < .001) in expression in response to the replicating HSV-1 outnumbered those that changed in response to mock or nonreplicating vector by approximately 3-fold. Pathway analysis revealed that both the replicating and nonreplicating vectors induced robust antigen presentation but only mild interferon, chemokine, and cytokine signaling responses. The ICP4− vector was restricted in several of the Toll-like receptor-signaling pathways, indicating reduced stimulation of the innate immune response. These array analyses suggest that although the nonreplicating vector induces detectable activation of immune response pathways, the number and magnitude of the induced response is dramatically restricted compared to the replicating vector, and with the exception of antigen presentation, host gene expression induced by the non-replicating vector largely resembles mock infection. PMID:20095947
Highly scalable parallel processing of extracellular recordings of Multielectrode Arrays.
Gehring, Tiago V; Vasilaki, Eleni; Giugliano, Michele
2015-01-01
Technological advances of Multielectrode Arrays (MEAs) used for multisite, parallel electrophysiological recordings, lead to an ever increasing amount of raw data being generated. Arrays with hundreds up to a few thousands of electrodes are slowly seeing widespread use and the expectation is that more sophisticated arrays will become available in the near future. In order to process the large data volumes resulting from MEA recordings there is a pressing need for new software tools able to process many data channels in parallel. Here we present a new tool for processing MEA data recordings that makes use of new programming paradigms and recent technology developments to unleash the power of modern highly parallel hardware, such as multi-core CPUs with vector instruction sets or GPGPUs. Our tool builds on and complements existing MEA data analysis packages. It shows high scalability and can be used to speed up some performance critical pre-processing steps such as data filtering and spike detection, helping to make the analysis of larger data sets tractable.
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations
DOE Office of Scientific and Technical Information (OSTI.GOV)
Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe
A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
The design and implementation of a parallel unstructured Euler solver using software primitives
NASA Technical Reports Server (NTRS)
Das, R.; Mavriplis, D. J.; Saltz, J.; Gupta, S.; Ponnusamy, R.
1992-01-01
This paper is concerned with the implementation of a three-dimensional unstructured grid Euler-solver on massively parallel distributed-memory computer architectures. The goal is to minimize solution time by achieving high computational rates with a numerically efficient algorithm. An unstructured multigrid algorithm with an edge-based data structure has been adopted, and a number of optimizations have been devised and implemented in order to accelerate the parallel communication rates. The implementation is carried out by creating a set of software tools, which provide an interface between the parallelization issues and the sequential code, while providing a basis for future automatic run-time compilation support. Large practical unstructured grid problems are solved on the Intel iPSC/860 hypercube and Intel Touchstone Delta machine. The quantitative effect of the various optimizations are demonstrated, and we show that the combined effect of these optimizations leads to roughly a factor of three performance improvement. The overall solution efficiency is compared with that obtained on the CRAY-YMP vector supercomputer.
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations
Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; ...
2017-11-14
A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals formore » the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. In conclusion, the chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.« less
Pushing configuration-interaction to the limit: Towards massively parallel MCSCF calculations
NASA Astrophysics Data System (ADS)
Vogiatzis, Konstantinos D.; Ma, Dongxia; Olsen, Jeppe; Gagliardi, Laura; de Jong, Wibe A.
2017-11-01
A new large-scale parallel multiconfigurational self-consistent field (MCSCF) implementation in the open-source NWChem computational chemistry code is presented. The generalized active space approach is used to partition large configuration interaction (CI) vectors and generate a sufficient number of batches that can be distributed to the available cores. Massively parallel CI calculations with large active spaces can be performed. The new parallel MCSCF implementation is tested for the chromium trimer and for an active space of 20 electrons in 20 orbitals, which can now routinely be performed. Unprecedented CI calculations with an active space of 22 electrons in 22 orbitals for the pentacene systems were performed and a single CI iteration calculation with an active space of 24 electrons in 24 orbitals for the chromium tetramer was possible. The chromium tetramer corresponds to a CI expansion of one trillion Slater determinants (914 058 513 424) and is the largest conventional CI calculation attempted up to date.
Ohoyama, H
2013-12-21
The vector correlation between the alignment of reactant N2 (A (3)Σu(+)) and the alignment of product NO (A (2)Σ(+)) rotation has been studied in the energy transfer reaction of aligned N2 (A (3)Σu(+)) + NO (X (2)Π) → NO (A (2)Σ(+)) + N2 (X (1)Σg(+)) under the crossed beam condition at a collision energy of ~0.07 eV. NO (A (2)Σ(+)) emission in the two linear polarization directions (i.e., parallel and perpendicular with respect to the relative velocity vector v(R)) has been measured as a function of the alignment of N2 (A (3)Σu(+)) along its molecular axis in the collision frame. The degree of polarization of NO (A (2)Σ(+)) emission is found to depend on the alignment angle (θ(v(R))) of N2 (A (3)Σu(+)) in the collision frame. The shape of the steric opacity function at the two polarization conditions turns out to be extremely different from each other: The steric opacity function at the parallel polarization condition is more favorable for the oblique configuration of N2 (A (3)Σu(+)) at an alignment angle of θ(v(R)) ~ 45° as compared with that at the perpendicular polarization condition. The alignment of N2 (A (3)Σu(+)) is found to give a significant effect on the alignment of NO (A (2)Σ(+)) rotation in the collision frame: The N2 (A (3)Σu(+)) configuration at an oblique alignment angle θ(v(R)) ~ 45° leads to a parallel alignment of NO (A (2)Σ(+)) rotation (J-vector) with respect to v(R), while the axial and sideways configurations of N2 (A (3)Σu(+)) lead to a perpendicular alignment of NO (A (2)Σ(+)) rotation with respect to vR. These stereocorrelated alignments of the product rotation have a good correlation with the stereocorrelated reactivity observed in the multi-dimensional steric opacity function [H. Ohoyama and S. Maruyama, J. Chem. Phys. 137, 064311 (2012)].
NASA Astrophysics Data System (ADS)
Yasuzuka, Syuma; Koga, Hiroaki; Yamamura, Yasuhisa; Saito, Kazuya; Uji, Shinya; Terashima, Taichi; Akutsu, Hiroki; Yamada, Jun-ichi
2017-08-01
Resistance measurements have been performed to investigate the dimensionality and the in-plane anisotropy of the upper critical field (Hc2) for β-(BDA-TTP)2SbF6 in fields H up to 15 T and at temperatures T from 1.5 to 7.5 K, where BDA-TTP stands for 2,5-bis(1,3-dithian-2-ylidene)-1,3,4,6-tetrathiapentalene. The upper critical fields parallel and perpendicular to the conduction layer are determined and dimensional crossover from anisotropic three-dimensional behavior to two-dimensional behavior is found at around 6 K. When the direction of H is varied within the conducting layer at 6.0 K, Hc2 shows twofold symmetry: Hc2 along the minimum Fermi wave vector (maximum Fermi velocity) is larger than that along the maximum Fermi wave vector (minimum Fermi velocity). The normal-state magnetoresistance has twofold symmetry similar to Hc2 and shows a maximum when the magnetic field is nearly parallel to the maximum Fermi wave vector. This tendency is consistent with the Fermi surface anisotropy. At 3.5 K, we found clear fourfold symmetry of Hc2 despite the fact that the normal-state magnetoresistance shows twofold symmetry arising from the Fermi surface anisotropy. The origin of the fourfold symmetry of Hc2 is discussed in terms of the superconducting gap structure in β-(BDA-TTP)2SbF6.
Job Management Requirements for NAS Parallel Systems and Clusters
NASA Technical Reports Server (NTRS)
Saphir, William; Tanner, Leigh Ann; Traversat, Bernard
1995-01-01
A job management system is a critical component of a production supercomputing environment, permitting oversubscribed resources to be shared fairly and efficiently. Job management systems that were originally designed for traditional vector supercomputers are not appropriate for the distributed-memory parallel supercomputers that are becoming increasingly important in the high performance computing industry. Newer job management systems offer new functionality but do not solve fundamental problems. We address some of the main issues in resource allocation and job scheduling we have encountered on two parallel computers - a 160-node IBM SP2 and a cluster of 20 high performance workstations located at the Numerical Aerodynamic Simulation facility. We describe the requirements for resource allocation and job management that are necessary to provide a production supercomputing environment on these machines, prioritizing according to difficulty and importance, and advocating a return to fundamental issues.
Efficient multitasking of Choleski matrix factorization on CRAY supercomputers
NASA Technical Reports Server (NTRS)
Overman, Andrea L.; Poole, Eugene L.
1991-01-01
A Choleski method is described and used to solve linear systems of equations that arise in large scale structural analysis. The method uses a novel variable-band storage scheme and is structured to exploit fast local memory caches while minimizing data access delays between main memory and vector registers. Several parallel implementations of this method are described for the CRAY-2 and CRAY Y-MP computers demonstrating the use of microtasking and autotasking directives. A portable parallel language, FORCE, is used for comparison with the microtasked and autotasked implementations. Results are presented comparing the matrix factorization times for three representative structural analysis problems from runs made in both dedicated and multi-user modes on both computers. CPU and wall clock timings are given for the parallel implementations and are compared to single processor timings of the same algorithm.
NASA Astrophysics Data System (ADS)
Galiatsatos, P. G.; Tennyson, J.
2012-11-01
The most time consuming step within the framework of the UK R-matrix molecular codes is that of the diagonalization of the inner region Hamiltonian matrix (IRHM). Here we present the method that we follow to speed up this step. We use shared memory machines (SMM), distributed memory machines (DMM), the OpenMP directive based parallel language, the MPI function based parallel language, the sparse matrix diagonalizers ARPACK and PARPACK, a variation for real symmetric matrices of the official coordinate sparse matrix format and finally a parallel sparse matrix-vector product (PSMV). The efficient application of the previous techniques rely on two important facts: the sparsity of the matrix is large enough (more than 98%) and in order to get back converged results we need a small only part of the matrix spectrum.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wolfe, A.
1986-03-10
Supercomputing software is moving into high gear, spurred by the rapid spread of supercomputers into new applications. The critical challenge is how to develop tools that will make it easier for programmers to write applications that take advantage of vectorizing in the classical supercomputer and the parallelism that is emerging in supercomputers and minisupercomputers. Writing parallel software is a challenge that every programmer must face because parallel architectures are springing up across the range of computing. Cray is developing a host of tools for programmers. Tools to support multitasking (in supercomputer parlance, multitasking means dividing up a single program tomore » run on multiple processors) are high on Cray's agenda. On tap for multitasking is Premult, dubbed a microtasking tool. As a preprocessor for Cray's CFT77 FORTRAN compiler, Premult will provide fine-grain multitasking.« less
NASA Astrophysics Data System (ADS)
Fiandrotti, Attilio; Fosson, Sophie M.; Ravazzi, Chiara; Magli, Enrico
2018-04-01
Compressive sensing promises to enable bandwidth-efficient on-board compression of astronomical data by lifting the encoding complexity from the source to the receiver. The signal is recovered off-line, exploiting GPUs parallel computation capabilities to speedup the reconstruction process. However, inherent GPU hardware constraints limit the size of the recoverable signal and the speedup practically achievable. In this work, we design parallel algorithms that exploit the properties of circulant matrices for efficient GPU-accelerated sparse signals recovery. Our approach reduces the memory requirements, allowing us to recover very large signals with limited memory. In addition, it achieves a tenfold signal recovery speedup thanks to ad-hoc parallelization of matrix-vector multiplications and matrix inversions. Finally, we practically demonstrate our algorithms in a typical application of circulant matrices: deblurring a sparse astronomical image in the compressed domain.
Dynamics modeling for parallel haptic interfaces with force sensing and control.
Bernstein, Nicholas; Lawrence, Dale; Pao, Lucy
2013-01-01
Closed-loop force control can be used on haptic interfaces (HIs) to mitigate the effects of mechanism dynamics. A single multidimensional force-torque sensor is often employed to measure the interaction force between the haptic device and the user's hand. The parallel haptic interface at the University of Colorado (CU) instead employs smaller 1D force sensors oriented along each of the five actuating rods to build up a 5D force vector. This paper shows that a particular manipulandum/hand partition in the system dynamics is induced by the placement and type of force sensing, and discusses the implications on force and impedance control for parallel haptic interfaces. The details of a "squaring down" process are also discussed, showing how to obtain reduced degree-of-freedom models from the general six degree-of-freedom dynamics formulation.
High Performance Compression of Science Data
NASA Technical Reports Server (NTRS)
Storer, James A.; Carpentieri, Bruno; Cohn, Martin
1994-01-01
Two papers make up the body of this report. One presents a single-pass adaptive vector quantization algorithm that learns a codebook of variable size and shape entries; the authors present experiments on a set of test images showing that with no training or prior knowledge of the data, for a given fidelity, the compression achieved typically equals or exceeds that of the JPEG standard. The second paper addresses motion compensation, one of the most effective techniques used in interframe data compression. A parallel block-matching algorithm for estimating interframe displacement of blocks with minimum error is presented. The algorithm is designed for a simple parallel architecture to process video in real time.
Rapid Parallel Screening for Strain Optimization
2013-08-16
fermentation yields of industrially relevant biological compounds. Screening of the desired chemicals was completed previously. Microbes that can...reporter, and, 2) a yeast TAR cloning shuttle vector for transferring catabolic clusters to E. coli. 15. SUBJECT TERMS NA 16. SECURITY CLASSIFICATION OF... fermentation yields of industrially relevant biological compounds. Screening of the desired chemicals was completed previously. Microbes that can utilize
Rapid Parallel Screening for Strain Optimization
2013-05-16
fermentation yields of industrially relevant biological compounds. Screening of the desired chemicals was completed previously. Microbes that can...reporter, and, 2) a yeast TAR cloning shuttle vector for transferring catabolic clusters to E. coli. 15. SUBJECT TERMS NA 16. SECURITY CLASSIFICATION OF... fermentation yields of industrially relevant biological compounds. Screening of the desired chemicals was completed previously. Microbes that can utilize
The numerical simulation of a high-speed axial flow compressor
NASA Technical Reports Server (NTRS)
Mulac, Richard A.; Adamczyk, John J.
1991-01-01
The advancement of high-speed axial-flow multistage compressors is impeded by a lack of detailed flow-field information. Recent development in compressor flow modeling and numerical simulation have the potential to provide needed information in a timely manner. The development of a computer program is described to solve the viscous form of the average-passage equation system for multistage turbomachinery. Programming issues such as in-core versus out-of-core data storage and CPU utilization (parallelization, vectorization, and chaining) are addressed. Code performance is evaluated through the simulation of the first four stages of a five-stage, high-speed, axial-flow compressor. The second part addresses the flow physics which can be obtained from the numerical simulation. In particular, an examination of the endwall flow structure is made, and its impact on blockage distribution assessed.
Gene therapy decreases seizures in a model of Incontinentia pigmenti.
Dogbevia, Godwin K; Töllner, Kathrin; Körbelin, Jakob; Bröer, Sonja; Ridder, Dirk A; Grasshoff, Hanna; Brandt, Claudia; Wenzel, Jan; Straub, Beate K; Trepel, Martin; Löscher, Wolfgang; Schwaninger, Markus
2017-07-01
Incontinentia pigmenti (IP) is a genetic disease leading to severe neurological symptoms, such as epileptic seizures, but no specific treatment is available. IP is caused by pathogenic variants that inactivate the Nemo gene. Replacing Nemo through gene therapy might provide therapeutic benefits. In a mouse model of IP, we administered a single intravenous dose of the adeno-associated virus (AAV) vector, AAV-BR1-CAG-NEMO, delivering the Nemo gene to the brain endothelium. Spontaneous epileptic seizures and the integrity of the blood-brain barrier (BBB) were monitored. The endothelium-targeted gene therapy improved the integrity of the BBB. In parallel, it reduced the incidence of seizures and delayed their occurrence. Neonate mice intravenously injected with the AAV-BR1-CAG-NEMO vector developed no hepatocellular carcinoma or other major adverse effects 11 months after vector injection, demonstrating that the vector has a favorable safety profile. The data show that the BBB is a target of antiepileptic treatment and, more specifically, provide evidence for the therapeutic benefit of a brain endothelial-targeted gene therapy in IP. Ann Neurol 2017;82:93-104. © 2017 American Neurological Association.
NASA Astrophysics Data System (ADS)
An, Fengwei; Akazawa, Toshinobu; Yamasaki, Shogo; Chen, Lei; Jürgen Mattausch, Hans
2015-04-01
This paper reports a VLSI realization of learning vector quantization (LVQ) with high flexibility for different applications. It is based on a hardware/software (HW/SW) co-design concept for on-chip learning and recognition and designed as a SoC in 180 nm CMOS. The time consuming nearest Euclidean distance search in the LVQ algorithm’s competition layer is efficiently implemented as a pipeline with parallel p-word input. Since neuron number in the competition layer, weight values, input and output number are scalable, the requirements of many different applications can be satisfied without hardware changes. Classification of a d-dimensional input vector is completed in n × \\lceil d/p \\rceil + R clock cycles, where R is the pipeline depth, and n is the number of reference feature vectors (FVs). Adjustment of stored reference FVs during learning is done by the embedded 32-bit RISC CPU, because this operation is not time critical. The high flexibility is verified by the application of human detection with different numbers for the dimensionality of the FVs.
T-cell receptor transfer into human T cells with ecotropic retroviral vectors.
Koste, L; Beissert, T; Hoff, H; Pretsch, L; Türeci, Ö; Sahin, U
2014-05-01
Adoptive T-cell transfer for cancer immunotherapy requires genetic modification of T cells with recombinant T-cell receptors (TCRs). Amphotropic retroviral vectors (RVs) used for TCR transduction for this purpose are considered safe in principle. Despite this, TCR-coding and packaging vectors could theoretically recombine to produce replication competent vectors (RCVs), and transduced T-cell preparations must be proven free of RCV. To eliminate the need for RCV testing, we transduced human T cells with ecotropic RVs so potential RCV would be non-infectious for human cells. We show that transfection of synthetic messenger RNA encoding murine cationic amino-acid transporter 1 (mCAT-1), the receptor for murine retroviruses, enables efficient transient ecotropic transduction of human T cells. mCAT-1-dependent transduction was more efficient than amphotropic transduction performed in parallel, and preferentially targeted naive T cells. Moreover, we demonstrate that ecotropic TCR transduction results in antigen-specific restimulation of primary human T cells. Thus, ecotropic RVs represent a versatile, safe and potent tool to prepare T cells for the adoptive transfer.
NASA Astrophysics Data System (ADS)
Tanaka, Kenta K.; Ichioka, Masanori; Onari, Seiichiro
2018-04-01
Local NMR relaxation rates in the vortex state of chiral and helical p -wave superconductors are investigated by the quasiclassical Eilenberger theory. We calculate the spatial and resonance frequency dependences of the local NMR spin-lattice relaxation rate T1-1 and spin-spin relaxation rate T2-1. Depending on the relation between the NMR relaxation direction and the d -vector symmetry, the local T1-1 and T2-1 in the vortex core region show different behaviors. When the NMR relaxation direction is parallel to the d -vector component, the local NMR relaxation rate is anomalously suppressed by the negative coherence effect due to the spin dependence of the odd-frequency s -wave spin-triplet Cooper pairs. The difference between the local T1-1 and T2-1 in the site-selective NMR measurement is expected to be a method to examine the d -vector symmetry of candidate materials for spin-triplet superconductors.
Exact simulation of polarized light reflectance by particle deposits
NASA Astrophysics Data System (ADS)
Ramezan Pour, B.; Mackowski, D. W.
2015-12-01
The use of polarimetric light reflection measurements as a means of identifying the physical and chemical characteristics of particulate materials obviously relies on an accurate model of predicting the effects of particle size, shape, concentration, and refractive index on polarized reflection. The research examines two methods for prediction of reflection from plane parallel layers of wavelength—sized particles. The first method is based on an exact superposition solution to Maxwell's time harmonic wave equations for a deposit of spherical particles that are exposed to a plane incident wave. We use a FORTRAN-90 implementation of this solution (the Multiple Sphere T Matrix (MSTM) code), coupled with parallel computational platforms, to directly simulate the reflection from particle layers. The second method examined is based upon the vector radiative transport equation (RTE). Mie theory is used in our RTE model to predict the extinction coefficient, albedo, and scattering phase function of the particles, and the solution of the RTE is obtained from adding—doubling method applied to a plane—parallel configuration. Our results show that the MSTM and RTE predictions of the Mueller matrix elements converge when particle volume fraction in the particle layer decreases below around five percent. At higher volume fractions the RTE can yield results that, depending on the particle size and refractive index, significantly depart from the exact predictions. The particle regimes which lead to dependent scattering effects, and the application of methods to correct the vector RTE for particle interaction, will be discussed.
NASA Technical Reports Server (NTRS)
Simpkin, W. E.
1982-01-01
An approximately 0.25 scale model of the transition section of a tandem fan variable cycle engine nacelle was tested in the NASA Lewis Research Center 10-by-10 foot wind tunnel. Two 12-inch, tip-turbine driven fans were used to simulate a tandem fan engine. Three testing modes simulated a V/STOL tandem fan airplane. Parallel mode has two separate propulsion streams for maximum low speed performance. A front inlet, fan, and downward vectorable nozzle forms one stream. An auxilliary top inlet provides air to the aft fan - supplying the core engine and aft vectorable nozzle. Front nozzle and top inlet closure, and removal of a blocker door separating the two streams configures the tandem fan for series mode operations as a typical aircraft propulsion system. Transition mode operation is formed by intermediate settings of the front nozzle, blocker door, and top inlet. Emphasis was on the total pressure recovery and flow distortion at the aft fan face. A range of fan flow rates were tested at tunnel airspeeds from 0 to 240 knots, and angles-of-attack from -10 to 40 deg for all three modes. In addition to the model variables for the three modes, model variants of the top inlet were tested in the parallel mode only. These lip variables were: aft lip boundary layer bleed holes, and Three position turning vane. Also a bellmouth extension of the top inlet side lips was tested in parallel mode.
Efficient iterative methods applied to the solution of transonic flows
DOE Office of Scientific and Technical Information (OSTI.GOV)
Wissink, A.M.; Lyrintzis, A.S.; Chronopoulos, A.T.
1996-02-01
We investigate the use of an inexact Newton`s method to solve the potential equations in the transonic regime. As a test case, we solve the two-dimensional steady transonic small disturbance equation. Approximate factorization/ADI techniques have traditionally been employed for implicit solutions of this nonlinear equation. Instead, we apply Newton`s method using an exact analytical determination of the Jacobian with preconditioned conjugate gradient-like iterative solvers for solution of the linear systems in each Newton iteration. Two iterative solvers are tested; a block s-step version of the classical Orthomin(k) algorithm called orthogonal s-step Orthomin (OSOmin) and the well-known GIVIRES method. The preconditionermore » is a vectorizable and parallelizable version of incomplete LU (ILU) factorization. Efficiency of the Newton-Iterative method on vector and parallel computer architectures is the main issue addressed. In vectorized tests on a single processor of the Cray C-90, the performance of Newton-OSOmin is superior to Newton-GMRES and a more traditional monotone AF/ADI method (MAF) for a variety of transonic Mach numbers and mesh sizes. Newton- GIVIRES is superior to MAF for some cases. The parallel performance of the Newton method is also found to be very good on multiple processors of the Cray C-90 and on the massively parallel thinking machine CM-5, where very fast execution rates (up to 9 Gflops) are found for large problems. 38 refs., 14 figs., 7 tabs.« less
Efficiently modeling neural networks on massively parallel computers
NASA Technical Reports Server (NTRS)
Farber, Robert M.
1993-01-01
Neural networks are a very useful tool for analyzing and modeling complex real world systems. Applying neural network simulations to real world problems generally involves large amounts of data and massive amounts of computation. To efficiently handle the computational requirements of large problems, we have implemented at Los Alamos a highly efficient neural network compiler for serial computers, vector computers, vector parallel computers, and fine grain SIMD computers such as the CM-2 connection machine. This paper describes the mapping used by the compiler to implement feed-forward backpropagation neural networks for a SIMD (Single Instruction Multiple Data) architecture parallel computer. Thinking Machines Corporation has benchmarked our code at 1.3 billion interconnects per second (approximately 3 gigaflops) on a 64,000 processor CM-2 connection machine (Singer 1990). This mapping is applicable to other SIMD computers and can be implemented on MIMD computers such as the CM-5 connection machine. Our mapping has virtually no communications overhead with the exception of the communications required for a global summation across the processors (which has a sub-linear runtime growth on the order of O(log(number of processors)). We can efficiently model very large neural networks which have many neurons and interconnects and our mapping can extend to arbitrarily large networks (within memory limitations) by merging the memory space of separate processors with fast adjacent processor interprocessor communications. This paper will consider the simulation of only feed forward neural network although this method is extendable to recurrent networks.
Kutner, Robert H; Puthli, Sharon; Marino, Michael P; Reiser, Jakob
2009-01-01
Background During the past twelve years, lentiviral (LV) vectors have emerged as valuable tools for transgene delivery because of their ability to transduce nondividing cells and their capacity to sustain long-term transgene expression in target cells in vitro and in vivo. However, despite significant progress, the production and concentration of high-titer, high-quality LV vector stocks is still cumbersome and costly. Methods Here we present a simplified protocol for LV vector production on a laboratory scale using HYPERFlask vessels. HYPERFlask vessels are high-yield, high-performance flasks that utilize a multilayered gas permeable growth surface for efficient gas exchange, allowing convenient production of high-titer LV vectors. For subsequent concentration of LV vector stocks produced in this way, we describe a facile protocol involving Mustang Q anion exchange membrane chromatography. Results Our results show that unconcentrated LV vector stocks with titers in excess of 108 transduction units (TU) per ml were obtained using HYPERFlasks and that these titers were higher than those produced in parallel using regular 150-cm2 tissue culture dishes. We also show that up to 500 ml of an unconcentrated LV vector stock prepared using a HYPERFlask vessel could be concentrated using a single Mustang Q Acrodisc with a membrane volume of 0.18 ml. Up to 5.3 × 1010 TU were recovered from a single HYPERFlask vessel. Conclusion The protocol described here is easy to implement and should facilitate high-titer LV vector production for preclinical studies in animal models without the need for multiple tissue culture dishes and ultracentrifugation-based concentration protocols. PMID:19220915
V/STOL Systems Research Aircraft: A Tool for Cockpit Integration
NASA Technical Reports Server (NTRS)
Stortz, Michael W.; ODonoghue, Dennis P.; Tiffany, Geary (Technical Monitor)
1995-01-01
The next generation ASTOVL aircraft will have a complicated propulsion System. The configuration choices include Direct Lift, Lift-Fan and Lift+Lift /Cruise but the aircraft must also have supersonic performance and low-observable characteristics. The propulsion system may have features such as flow blockers, vectoring nozzles and flow transfer schemes. The flight control system will necessarily fully integrate the aerodynamic surfaces and the propulsive elements. With a fully integrated, fly-by-wire flight/propulsion control system, the options for cockpit integration are interesting and varied. It is possible to decouple longitudinal and vertical responses allowing the pilot to close the loop on flight path and flight path acceleration directly. In the hover, the pilot can control the translational rate directly without having to stabilize the inner rate and attitude loops. The benefit of this approach, reduced workload and increased precision. has previously been demonstrated through several motion-based simulations. In order to prove the results in flight, the V/STOL System Research Aircraft (VSRA) was developed at the NASA Ames Research Center. The VSRA is the YAV-8B Prototype modified with a research flight control system using a series-parallel servo configuration in all the longitudinal degrees of freedom (including thrust and thrust vector angle) to provide an integrated flight and propulsion control system in a limited envelope. Development of the system has been completed and flight evaluations of the response types have been performed. In this paper we will discuss the development of the VSRA, the evolution of the flight path command and translational rate command response types and the Guest Pilot evaluations of the system. Pilot evaluation results will be used to draw conclusions regarding the suitability of the system to satisfy V/STOL requirements.
V/STOL systems research aircraft: A tool for cockpit integration
NASA Technical Reports Server (NTRS)
Stortz, Michael W.; ODonoghue, Dennis P.
1995-01-01
The next generation ASTOVL aircraft will have a complicated propulsion system. The configuration choices include Direct Lift, Lift-Fan and Lift + Lift/Cruise but the aircraft must also have supersonic performance and low-observable characteristics. The propulsion system may have features such as flow blockers, vectoring nozzles and flow transfer schemes. The flight control system will necessarily fully integrate the aerodynamic surfaces and the propulsive elements. With a fully integrated, fly-by-wire flight/propulsion control system, the options for cockpit integration are interesting and varied. It is possible to de-couple longitudinal and vertical responses allowing the pilot to close the loop on flightpath and flightpath acceleration directly. In the hover, the pilot can control the translational rate directly without having to stabilize the inner rate and attitude loops. The benefit of this approach, reduced workload and increased precision, has previously been demonstrated through several motion-based simulations. In order to prove the results in flight, the V/STOL System Research Aircraft (VSRA) was developed at the NASA Ames Research Center. The VSRA is the YAV-8B Prototype modified with a research flight control system using a series-parallel servo configuration in all the longitudinal degrees of freedom (including thrust and thrust vector angle) to provide an integrated flight and propulsion control system in a limited envelope. Development of the system has been completed and flight evaluations of the response types have been performed. In this paper we will discuss the development of the VSRA, the evolution of the flightpath command and translational rate command response types and the Guest Pilot evaluations of the system. Pilot evaluation results are used to draw conclusions regarding the suitability of the system to satisfy V/STOL requirements.
Westerdale, John; Belohlavek, Marek; McMahon, Eileen M; Jiamsripong, Panupong; Heys, Jeffrey J; Milano, Michele
2011-02-01
We performed an in vitro study to assess the precision and accuracy of particle imaging velocimetry (PIV) data acquired using a clinically available portable ultrasound system via comparison with stereo optical PIV. The performance of ultrasound PIV was compared with optical PIV on a benchmark problem involving vortical flow with a substantial out-of-plane velocity component. Optical PIV is capable of stereo image acquisition, thus measuring out-of-plane velocity components. This allowed us to quantify the accuracy of ultrasound PIV, which is limited to in-plane acquisition. The system performance was assessed by considering the instantaneous velocity fields without extracting velocity profiles by spatial averaging. Within the 2-dimensional correlation window, using 7 time-averaged frames, the vector fields were found to have correlations of 0.867 in the direction along the ultrasound beam and 0.738 in the perpendicular direction. Out-of-plane motion of greater than 20% of the in-plane vector magnitude was found to increase the SD by 11% for the vectors parallel to the ultrasound beam direction and 8.6% for the vectors perpendicular to the beam. The results show a close correlation and agreement of individual velocity vectors generated by ultrasound PIV compared with optical PIV. Most of the measurement distortions were caused by out-of-plane velocity components.
Davidsson, Marcus; Diaz-Fernandez, Paula; Schwich, Oliver D.; Torroba, Marcos; Wang, Gang; Björklund, Tomas
2016-01-01
Detailed characterization and mapping of oligonucleotide function in vivo is generally a very time consuming effort that only allows for hypothesis driven subsampling of the full sequence to be analysed. Recent advances in deep sequencing together with highly efficient parallel oligonucleotide synthesis and cloning techniques have, however, opened up for entirely new ways to map genetic function in vivo. Here we present a novel, optimized protocol for the generation of universally applicable, barcode labelled, plasmid libraries. The libraries are designed to enable the production of viral vector preparations assessing coding or non-coding RNA function in vivo. When generating high diversity libraries, it is a challenge to achieve efficient cloning, unambiguous barcoding and detailed characterization using low-cost sequencing technologies. With the presented protocol, diversity of above 3 million uniquely barcoded adeno-associated viral (AAV) plasmids can be achieved in a single reaction through a process achievable in any molecular biology laboratory. This approach opens up for a multitude of in vivo assessments from the evaluation of enhancer and promoter regions to the optimization of genome editing. The generated plasmid libraries are also useful for validation of sequencing clustering algorithms and we here validate the newly presented message passing clustering process named Starcode. PMID:27874090
Actin cytoskeleton rearrangements in Arabidopsis roots under stress and during gravitropic response
NASA Astrophysics Data System (ADS)
Pozhvanov, Gregory; Medvedev, Sergei; Suslov, Dmitry; Demidchik, Vadim
Among environmental factors, gravity vector is the only one which is constant in direction and accompanies the whole plant ontogenesis. That said, gravity vector can be considered as an essential factor for correct development of plants. Gravitropism is a plant growth response against changing its position relative to the gravity vector. It is well estableshed that gravitropism is directed by auxin redistribution across the gravistimulated organ. In addition to auxin, actin cytoskeleton was shown to be involved in gravitropism at different stages: gravity perception, signal transduction and gravitropic bending formation. However, the relationship between IAA and actin is still under discussion. In this work we studied rearrangements of actin cytoskeleton during root gravitropic response. Actin microfilaments were visualized in vivo in GFP-fABD2 transgenic Arabidopsis plants, and their angle distribution was acquired from MicroFilament Analyzer software. The curvature of actin microfilaments in root elongation zone was shown to be increased within 30-60 min of gravistimulation, the fraction of axially oriented microfilaments decreased with a concomitant increase in the fraction of oblique and transversally oriented microfilaments. In particular, the fraction of transversally oriented microfilaments (i.e. parallel to the gravity vector) increased 3-5 times. Under 10 min of sub-lethal salt stress impact, actin microfilament orientations widened from an initial axial orientation to a set of peaks at 15(°) , 45(°) and 90(°) . We conclude that the actin cytoskeleton rearrangements observed are associated with the regulation of basic mechanisms of cell extension growth by which the gravitropic bending is formed. Having common stress-related features, gravity-induced actin cytoskeleton rearrangement is slower but results in higher number of g-vector-parallel microfilaments when compared to salt stress-induced rearrangement. Also, differences in gravistimulated root growth between wild type and GFP-fABD2 plants are discussed. Project was supported by the OPTEC / Carl Zeiss Personal grant to G.P. (2012), grants of Russian Foundation for Basic Research (11-04-00701a, 14-04-01624a) and by the grant of St.-Petersburg State University (1.38.233.2014).
Spinning particles in vacuum spacetimes of different curvature types
NASA Astrophysics Data System (ADS)
Semerák, O.; Šrámek, M.
2015-09-01
We consider the motion of spinning test particles with nonzero rest mass in the "pole-dipole" approximation, as described by the Mathisson-Papapetrou-Dixon (MPD) equations, and examine its properties in dependence on the spin supplementary condition added to close the system. In order to better understand the spin-curvature interaction, the MPD equation of motion is decomposed in the orthonormal tetrad whose time vector is given by the four-velocity Vμ chosen to fix the spin condition (the "reference observer") and the first spatial vector by the corresponding spin sμ; such projections do not contain the Weyl scalars Ψ0 and Ψ4 obtained in the associated Newman-Penrose (NP) null tetrad. One natural option of how to choose the remaining two spatial basis vectors is shown to follow "intrinsically" whenever Vμ has been chosen; it is realizable if the particle's four-velocity and four-momentum are not parallel. In order to see how the problem depends on the algebraic type of curvature, one first identifies the first vector of the NP tetrad kμ with the highest-multiplicity principal null direction of the Weyl tensor, and then sets Vμ so that kμ belong to the spin-bivector eigenplane. In spacetimes of any algebraic type but III, it is known to be possible to rotate the tetrads so as to become "transverse," namely so that Ψ1 and Ψ3 vanish. If the spin-bivector eigenplane could be made to coincide with the real-vector plane of any of such transverse frames, the spinning particle motion would consequently be fully determined by Ψ2 and the cosmological constant; however, this can be managed in exceptional cases only. Besides focusing on specific Petrov types, we derive several sets of useful relations that are valid generally and check whether/how the exercise simplifies for some specific types of motion. The particular option of having four-velocity parallel to four-momentum is advocated, and a natural resolution of nonuniqueness of the corresponding reference observer Vμ is suggested.
Lafuente, M J; Petit, T; Gancedo, C
1997-12-22
We have constructed a series of plasmids to facilitate the fusion of promoters with or without coding regions of genes of Schizosaccharomyces pombe to the lacZ gene of Escherichia coli. These vectors carry a multiple cloning region in which fission yeast DNA may be inserted in three different reading frames with respect to the coding region of lacZ. The plasmids were constructed with the ura4+ or the his3+ marker of S. pombe. Functionality of the plasmids was tested measuring in parallel the expression of fructose 1,6-bisphosphatase and beta-galactosidase under the control of the fbp1+ promoter in different conditions.
Beyond core count: a look at new mainstream computing platforms for HEP workloads
NASA Astrophysics Data System (ADS)
Szostek, P.; Nowak, A.; Bitzes, G.; Valsan, L.; Jarp, S.; Dotti, A.
2014-06-01
As Moore's Law continues to deliver more and more transistors, the mainstream processor industry is preparing to expand its investments in areas other than simple core count. These new interests include deep integration of on-chip components, advanced vector units, memory, cache and interconnect technologies. We examine these moving trends with parallelized and vectorized High Energy Physics workloads in mind. In particular, we report on practical experience resulting from experiments with scalable HEP benchmarks on the Intel "Ivy Bridge-EP" and "Haswell" processor families. In addition, we examine the benefits of the new "Haswell" microarchitecture and its impact on multiple facets of HEP software. Finally, we report on the power efficiency of new systems.
NASA Technical Reports Server (NTRS)
Steinmetz, G. G.
1986-01-01
The development of an electronic primary flight display format aligned with the aircraft velocity vector, a simulation evaluation comparing this format with an electronic attitude-aligned primary flight display format, and a flight evaluation of the velocity-vector-aligned display format are described. Earlier tests in turbulent conditions with the electronic attitude-aligned display format had exhibited unsteadiness. A primary objective of aligning the display format with the velocity vector was to take advantage of a velocity-vector control-wheel steering system to provide steadiness of display during turbulent conditions. Better situational awareness under crosswind conditions was also achieved. The evaluation task was a curved, descending approach with turbulent and crosswind conditions. Primary flight display formats contained computer-drawn perspective runway images and flight-path angle information. The flight tests were conducted aboard the NASA Transport Systems Research Vehicle (TSRV). Comparative results of the simulation and flight tests were principally obtained from subjective commentary. Overall, the pilots preferred the display format aligned with the velocity vector.
Parallel Preconditioning for CFD Problems on the CM-5
NASA Technical Reports Server (NTRS)
Simon, Horst D.; Kremenetsky, Mark D.; Richardson, John; Lasinski, T. A. (Technical Monitor)
1994-01-01
Up to today, preconditioning methods on massively parallel systems have faced a major difficulty. The most successful preconditioning methods in terms of accelerating the convergence of the iterative solver such as incomplete LU factorizations are notoriously difficult to implement on parallel machines for two reasons: (1) the actual computation of the preconditioner is not very floating-point intensive, but requires a large amount of unstructured communication, and (2) the application of the preconditioning matrix in the iteration phase (i.e. triangular solves) are difficult to parallelize because of the recursive nature of the computation. Here we present a new approach to preconditioning for very large, sparse, unsymmetric, linear systems, which avoids both difficulties. We explicitly compute an approximate inverse to our original matrix. This new preconditioning matrix can be applied most efficiently for iterative methods on massively parallel machines, since the preconditioning phase involves only a matrix-vector multiplication, with possibly a dense matrix. Furthermore the actual computation of the preconditioning matrix has natural parallelism. For a problem of size n, the preconditioning matrix can be computed by solving n independent small least squares problems. The algorithm and its implementation on the Connection Machine CM-5 are discussed in detail and supported by extensive timings obtained from real problem data.
NASA Astrophysics Data System (ADS)
Hung, Linda; Huang, Chen; Shin, Ilgyou; Ho, Gregory S.; Lignères, Vincent L.; Carter, Emily A.
2010-12-01
Orbital-free density functional theory (OFDFT) is a first principles quantum mechanics method to find the ground-state energy of a system by variationally minimizing with respect to the electron density. No orbitals are used in the evaluation of the kinetic energy (unlike Kohn-Sham DFT), and the method scales nearly linearly with the size of the system. The PRinceton Orbital-Free Electronic Structure Software (PROFESS) uses OFDFT to model materials from the atomic scale to the mesoscale. This new version of PROFESS allows the study of larger systems with two significant changes: PROFESS is now parallelized, and the ion-electron and ion-ion terms scale quasilinearly, instead of quadratically as in PROFESS v1 (L. Hung and E.A. Carter, Chem. Phys. Lett. 475 (2009) 163). At the start of a run, PROFESS reads the various input files that describe the geometry of the system (ion positions and cell dimensions), the type of elements (defined by electron-ion pseudopotentials), the actions you want it to perform (minimize with respect to electron density and/or ion positions and/or cell lattice vectors), and the various options for the computation (such as which functionals you want it to use). Based on these inputs, PROFESS sets up a computation and performs the appropriate optimizations. Energies, forces, stresses, material geometries, and electron density configurations are some of the values that can be output throughout the optimization. New version program summaryProgram Title: PROFESS Catalogue identifier: AEBN_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBN_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 68 721 No. of bytes in distributed program, including test data, etc.: 1 708 547 Distribution format: tar.gz Programming language: Fortran 90 Computer: Intel with ifort; AMD Opteron with pathf90 Operating system: Linux Has the code been vectorized or parallelized?: Yes. Parallelization is implemented through domain composition using MPI. RAM: Problem dependent, but 2 GB is sufficient for up to 10,000 ions. Classification: 7.3 External routines: FFTW 2.1.5 ( http://www.fftw.org) Catalogue identifier of previous version: AEBN_v1_0 Journal reference of previous version: Comput. Phys. Comm. 179 (2008) 839 Does the new version supersede the previous version?: Yes Nature of problem: Given a set of coordinates describing the initial ion positions under periodic boundary conditions, recovers the ground state energy, electron density, ion positions, and cell lattice vectors predicted by orbital-free density functional theory. The computation of all terms is effectively linear scaling. Parallelization is implemented through domain decomposition, and up to ˜10,000 ions may be included in the calculation on just a single processor, limited by RAM. For example, when optimizing the geometry of ˜50,000 aluminum ions (plus vacuum) on 48 cores, a single iteration of conjugate gradient ion geometry optimization takes ˜40 minutes wall time. However, each CG geometry step requires two or more electron density optimizations, so step times will vary. Solution method: Computes energies as described in text; minimizes this energy with respect to the electron density, ion positions, and cell lattice vectors. Reasons for new version: To allow much larger systems to be simulated using PROFESS. Restrictions: PROFESS cannot use nonlocal (such as ultrasoft) pseudopotentials. A variety of local pseudopotential files are available at the Carter group website ( http://www.princeton.edu/mae/people/faculty/carter/homepage/research/localpseudopotentials/). Also, due to the current state of the kinetic energy functionals, PROFESS is only reliable for main group metals and some properties of semiconductors. Running time: Problem dependent: the test example provided with the code takes less than a second to run. Timing results for large scale problems are given in the PROFESS paper and Ref. [1].
Scaling Optimization of the SIESTA MHD Code
NASA Astrophysics Data System (ADS)
Seal, Sudip; Hirshman, Steven; Perumalla, Kalyan
2013-10-01
SIESTA is a parallel three-dimensional plasma equilibrium code capable of resolving magnetic islands at high spatial resolutions for toroidal plasmas. Originally designed to exploit small-scale parallelism, SIESTA has now been scaled to execute efficiently over several thousands of processors P. This scaling improvement was accomplished with minimal intrusion to the execution flow of the original version. First, the efficiency of the iterative solutions was improved by integrating the parallel tridiagonal block solver code BCYCLIC. Krylov-space generation in GMRES was then accelerated using a customized parallel matrix-vector multiplication algorithm. Novel parallel Hessian generation algorithms were integrated and memory access latencies were dramatically reduced through loop nest optimizations and data layout rearrangement. These optimizations sped up equilibria calculations by factors of 30-50. It is possible to compute solutions with granularity N/P near unity on extremely fine radial meshes (N > 1024 points). Grid separation in SIESTA, which manifests itself primarily in the resonant components of the pressure far from rational surfaces, is strongly suppressed by finer meshes. Large problem sizes of up to 300 K simultaneous non-linear coupled equations have been solved on the NERSC supercomputers. Work supported by U.S. DOE under Contract DE-AC05-00OR22725 with UT-Battelle, LLC.
Spiking Neural P Systems With Rules on Synapses Working in Maximum Spiking Strategy.
Tao Song; Linqiang Pan
2015-06-01
Spiking neural P systems (called SN P systems for short) are a class of parallel and distributed neural-like computation models inspired by the way the neurons process information and communicate with each other by means of impulses or spikes. In this work, we introduce a new variant of SN P systems, called SN P systems with rules on synapses working in maximum spiking strategy, and investigate the computation power of the systems as both number and vector generators. Specifically, we prove that i) if no limit is imposed on the number of spikes in any neuron during any computation, such systems can generate the sets of Turing computable natural numbers and the sets of vectors of positive integers computed by k-output register machine; ii) if an upper bound is imposed on the number of spikes in each neuron during any computation, such systems can characterize semi-linear sets of natural numbers as number generating devices; as vector generating devices, such systems can only characterize the family of sets of vectors computed by sequential monotonic counter machine, which is strictly included in family of semi-linear sets of vectors. This gives a positive answer to the problem formulated in Song et al., Theor. Comput. Sci., vol. 529, pp. 82-95, 2014.
Passing in Command Line Arguments and Parallel Cluster/Multicore Batching in R with batch.
Hoffmann, Thomas J
2011-03-01
It is often useful to rerun a command line R script with some slight change in the parameters used to run it - a new set of parameters for a simulation, a different dataset to process, etc. The R package batch provides a means to pass in multiple command line options, including vectors of values in the usual R format, easily into R. The same script can be setup to run things in parallel via different command line arguments. The R package batch also provides a means to simplify this parallel batching by allowing one to use R and an R-like syntax for arguments to spread a script across a cluster or local multicore/multiprocessor computer, with automated syntax for several popular cluster types. Finally it provides a means to aggregate the results together of multiple processes run on a cluster.
Potential Application of a Graphical Processing Unit to Parallel Computations in the NUBEAM Code
NASA Astrophysics Data System (ADS)
Payne, J.; McCune, D.; Prater, R.
2010-11-01
NUBEAM is a comprehensive computational Monte Carlo based model for neutral beam injection (NBI) in tokamaks. NUBEAM computes NBI-relevant profiles in tokamak plasmas by tracking the deposition and the slowing of fast ions. At the core of NUBEAM are vector calculations used to track fast ions. These calculations have recently been parallelized to run on MPI clusters. However, cost and interlink bandwidth limit the ability to fully parallelize NUBEAM on an MPI cluster. Recent implementation of double precision capabilities for Graphical Processing Units (GPUs) presents a cost effective and high performance alternative or complement to MPI computation. Commercially available graphics cards can achieve up to 672 GFLOPS double precision and can handle hundreds of thousands of threads. The ability to execute at least one thread per particle simultaneously could significantly reduce the execution time and the statistical noise of NUBEAM. Progress on implementation on a GPU will be presented.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Deslippe, Jack; da Jornada, Felipe H.; Vigil-Fowler, Derek
2016-10-06
We profile and optimize calculations performed with the BerkeleyGW code on the Xeon-Phi architecture. BerkeleyGW depends both on hand-tuned critical kernels as well as on BLAS and FFT libraries. We describe the optimization process and performance improvements achieved. We discuss a layered parallelization strategy to take advantage of vector, thread and node-level parallelism. We discuss locality changes (including the consequence of the lack of L3 cache) and effective use of the on-package high-bandwidth memory. We show preliminary results on Knights-Landing including a roofline study of code performance before and after a number of optimizations. We find that the GW methodmore » is particularly well-suited for many-core architectures due to the ability to exploit a large amount of parallelism over plane-wave components, band-pairs, and frequencies.« less
Acoustooptic linear algebra processors - Architectures, algorithms, and applications
NASA Technical Reports Server (NTRS)
Casasent, D.
1984-01-01
Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.
Summary of research in applied mathematics, numerical analysis, and computer sciences
NASA Technical Reports Server (NTRS)
1986-01-01
The major categories of current ICASE research programs addressed include: numerical methods, with particular emphasis on the development and analysis of basic numerical algorithms; control and parameter identification problems, with emphasis on effective numerical methods; computational problems in engineering and physical sciences, particularly fluid dynamics, acoustics, and structural analysis; and computer systems and software, especially vector and parallel computers.
NASA Astrophysics Data System (ADS)
Sepehri Javan, N.; Rouhi Erdi, F.
2017-12-01
In this theoretical study, we investigate the generation of terahertz radiation by considering the beating of two similar Gaussian laser beams with different frequencies of ω1 and ω2 in a spatially modulated medium of graphite nanoparticles. The medium is assumed to contain spherical graphite nanoparticles of two different configurations: in the first configuration, the electric fields of the laser beams are parallel to the normal vector of the basal plane of the graphite structure, whereas in the second configuration, the electric fields are perpendicular to the normal vector of the basal plane. The interaction of the electric fields of lasers with the electronic clouds of the nanoparticles generates a ponderomotive force that in turn leads to the creation of a macroscopic electron current in the direction of laser polarizations and at the beat frequency ω1-ω2 , which can generate terahertz radiation. We show that, when the beat frequency lies near the effective plasmon frequency of the nanoparticles and the electric fields are parallel to the basal-plane normal, a resonant interaction of the laser beams causes intense terahertz radiation.
Effects of mechanostimulation on gravitropism and signal persistence in flax roots.
John, Susan P; Hasenstein, Karl H
2011-09-01
Gravitropism describes curvature of plants in response to gravity or differential acceleration and clinorotation is commonly used to compensate unilateral effect of gravity. We report on experiments that examine the persistence of the gravity signal and separate mechanostimulation from gravistimulation. Flax roots were reoriented (placed horizontally for 5, 10 or 15 min) and clinorotated at a rate of 0.5 to 5 rpm either vertically (parallel to the gravity vector and root axis) or horizontally (perpendicular to the gravity vector and parallel to the root axis). Image sequences showed that horizontal clinorotation did not affect root growth rate (0.81 ± 0.03 mm h-1) but vertical clinorotation reduced root growth by about 7%. The angular velocity (speed of clinorotation) did not affect growth for either direction. However, maximal curvature for vertical clinorotation decreased with increasing rate of rotation and produced straight roots at 5 rpm. In contrast, horizontal clinorotation increased curvature with increasing angular velocity. The point of maximal curvature was used to determine the longevity (memory) of the gravity signal, which lasted about 120 min. The data indicate that mechanostimulation modifies the magnitude of the graviresponse but does not affect memory persistence.
Anisotropic surface-state-mediated RKKY interaction between adatoms on a hexagonal lattice
NASA Astrophysics Data System (ADS)
Patrone, Paul N.; Einstein, T. L.
2012-01-01
Motivated by recent numerical studies of Ag on Pt(111), we derive an expression for the RKKY interaction mediated by surface states, considering the effect of anisotropy in the Fermi edge. Our analysis is based on a stationary phase approximation. The main contribution to the interaction comes from electrons whose Fermi velocity vF is parallel to the vector R connecting the interacting adatoms; we show that, in general, the corresponding Fermi wave vector kF is not parallel to R. The interaction is oscillatory; the amplitude and wavelength of oscillations have angular dependence arising from the anisotropy of the surface-state band structure. The wavelength, in particular, is determined by the projection of this kF (corresponding to vF) onto the direction of R. Our analysis is easily generalized to other systems. For Ag on Pt(111), our results indicate that the RKKY interaction between pairs of adatoms should be nearly isotropic and so cannot account for the anisotropy found in the studies motivating our work. However, for metals with surface-state dispersions similar to Be(101¯0), we show that the RKKY interaction should have considerable anisotropy.
Development of iterative techniques for the solution of unsteady compressible viscous flows
NASA Technical Reports Server (NTRS)
Hixon, Duane; Sankar, L. N.
1993-01-01
During the past two decades, there has been significant progress in the field of numerical simulation of unsteady compressible viscous flows. At present, a variety of solution techniques exist such as the transonic small disturbance analyses (TSD), transonic full potential equation-based methods, unsteady Euler solvers, and unsteady Navier-Stokes solvers. These advances have been made possible by developments in three areas: (1) improved numerical algorithms; (2) automation of body-fitted grid generation schemes; and (3) advanced computer architectures with vector processing and massively parallel processing features. In this work, the GMRES scheme has been considered as a candidate for acceleration of a Newton iteration time marching scheme for unsteady 2-D and 3-D compressible viscous flow calculation; from preliminary calculations, this will provide up to a 65 percent reduction in the computer time requirements over the existing class of explicit and implicit time marching schemes. The proposed method has ben tested on structured grids, but is flexible enough for extension to unstructured grids. The described scheme has been tested only on the current generation of vector processor architecture of the Cray Y/MP class, but should be suitable for adaptation to massively parallel machines.
Relativistic Transverse Gravitational Redshift
NASA Astrophysics Data System (ADS)
Mayer, A. F.
2012-12-01
The parametrized post-Newtonian (PPN) formalism is a tool for quantitative analysis of the weak gravitational field based on the field equations of general relativity. This formalism and its ten parameters provide the practical theoretical foundation for the evaluation of empirical data produced by space-based missions designed to map and better understand the gravitational field (e.g., GRAIL, GRACE, GOCE). Accordingly, mission data is interpreted in the context of the canonical PPN formalism; unexpected, anomalous data are explained as similarly unexpected but apparently real physical phenomena, which may be characterized as ``gravitational anomalies," or by various sources contributing to the total error budget. Another possibility, which is typically not considered, is a small modeling error in canonical general relativity. The concept of the idealized point-mass spherical equipotential surface, which originates with Newton's law of gravity, is preserved in Einstein's synthesis of special relativity with accelerated reference frames in the form of the field equations. It was not previously realized that the fundamental principles of relativity invalidate this concept and with it the idea that the gravitational field is conservative (i.e., zero net work is done on any closed path). The ideal radial free fall of a material body from arbitrarily-large range to a point on such an equipotential surface (S) determines a unique escape-velocity vector of magnitude v collinear to the acceleration vector of magnitude g at this point. For two such points on S separated by angle dφ , the Equivalence Principle implies distinct reference frames experiencing inertial acceleration of identical magnitude g in different directions in space. The complete equivalence of these inertially-accelerated frames to their analogous frames at rest on S requires evaluation at instantaneous velocity v relative to a local inertial observer. Because these velocity vectors are not parallel, a symmetric energy potential exists between the frames that is quantified by the instantaneous Δ {v} = v\\cdot{d}φ between them; in order for either frame to become indistinguishable from the other, such that their respective velocity and acceleration vectors are parallel, a change in velocity is required. While the qualitative features of general relativity imply this phenomenon (i.e., a symmetric potential difference between two points on a Newtonian `equipotential surface' that is similar to a friction effect), it is not predicted by the field equations due to a modeling error concerning time. This is an error of omission; time has fundamental geometric properties implied by the principles of relativity that are not reflected in the field equations. Where b is the radius and g is the gravitational acceleration characterizing a spherical geoid S of an ideal point-source gravitational field, an elegant derivation that rests on first principles shows that for two points at rest on S separated by a distance d << b, a symmetric relativistic redshift exists between these points of magnitude z = gd2/bc^2, which over 1 km at Earth sea level yields z ˜{10-17}. It can be tested with a variety of methods, in particular laser interferometry. A more sophisticated derivation yields a considerably more complex predictive formula for any two points in a gravitational field.
Cheng, Jerome; Hipp, Jason; Monaco, James; Lucas, David R; Madabhushi, Anant; Balis, Ulysses J
2011-01-01
Spatially invariant vector quantization (SIVQ) is a texture and color-based image matching algorithm that queries the image space through the use of ring vectors. In prior studies, the selection of one or more optimal vectors for a particular feature of interest required a manual process, with the user initially stochastically selecting candidate vectors and subsequently testing them upon other regions of the image to verify the vector's sensitivity and specificity properties (typically by reviewing a resultant heat map). In carrying out the prior efforts, the SIVQ algorithm was noted to exhibit highly scalable computational properties, where each region of analysis can take place independently of others, making a compelling case for the exploration of its deployment on high-throughput computing platforms, with the hypothesis that such an exercise will result in performance gains that scale linearly with increasing processor count. An automated process was developed for the selection of optimal ring vectors to serve as the predicate matching operator in defining histopathological features of interest. Briefly, candidate vectors were generated from every possible coordinate origin within a user-defined vector selection area (VSA) and subsequently compared against user-identified positive and negative "ground truth" regions on the same image. Each vector from the VSA was assessed for its goodness-of-fit to both the positive and negative areas via the use of the receiver operating characteristic (ROC) transfer function, with each assessment resulting in an associated area-under-the-curve (AUC) figure of merit. Use of the above-mentioned automated vector selection process was demonstrated in two cases of use: First, to identify malignant colonic epithelium, and second, to identify soft tissue sarcoma. For both examples, a very satisfactory optimized vector was identified, as defined by the AUC metric. Finally, as an additional effort directed towards attaining high-throughput capability for the SIVQ algorithm, we demonstrated the successful incorporation of it with the MATrix LABoratory (MATLAB™) application interface. The SIVQ algorithm is suitable for automated vector selection settings and high throughput computation.
Gharib, Ahmed M.; Ho, Vincent B.; Rosing, Douglas R.; Herzka, Daniel A.; Stuber, Matthias; Arai, Andrew E.; Pettigrew, Roderic I.
2008-01-01
The purpose of this study was to prospectively use a whole-heart three-dimensional (3D) coronary magnetic resonance (MR) angiography technique specifically adapted for use at 3 T and a parallel imaging technique (sensitivity encoding) to evaluate coronary arterial anomalies and variants (CAAV). This HIPAA-compliant study was approved by the local institutional review board, and informed consent was obtained from all participants. Twenty-two participants (11 men, 11 women; age range, 18–62 years) were included. Ten participants were healthy volunteers, whereas 12 participants were patients suspected of having CAAV. Coronary MR angiography was performed with a 3-T MR imager. A 3D free-breathing navigator-gated and vector electrocardiographically–gated segmented k-space gradient-echo sequence with adiabatic T2 preparation pulse and parallel imaging (sensitivity encoding) was used. Whole-heart acquisitions (repetition time msec/echo time msec, 4/1.35; 20° flip angle; 1 × 1 × 2-mm acquired voxel size) lasted 10–12 minutes. Mean examination time was 41 minutes ± 14 (standard deviation). Findings included aneurysms, ectasia, arteriovenous fistulas, and anomalous origins. The 3D whole-heart acquisitions developed for use with 3 T are feasible for use in the assessment of CAAV. © RSNA, 2008 PMID:18372470
NASA Astrophysics Data System (ADS)
Weiss, Chester J.
2013-08-01
An essential element for computational hypothesis testing, data inversion and experiment design for electromagnetic geophysics is a robust forward solver, capable of easily and quickly evaluating the electromagnetic response of arbitrary geologic structure. The usefulness of such a solver hinges on the balance among competing desires like ease of use, speed of forward calculation, scalability to large problems or compute clusters, parsimonious use of memory access, accuracy and by necessity, the ability to faithfully accommodate a broad range of geologic scenarios over extremes in length scale and frequency content. This is indeed a tall order. The present study addresses recent progress toward the development of a forward solver with these properties. Based on the Lorenz-gauged Helmholtz decomposition, a new finite volume solution over Cartesian model domains endowed with complex-valued electrical properties is shown to be stable over the frequency range 10-2-1010 Hz and range 10-3-105 m in length scale. Benchmark examples are drawn from magnetotellurics, exploration geophysics, geotechnical mapping and laboratory-scale analysis, showing excellent agreement with reference analytic solutions. Computational efficiency is achieved through use of a matrix-free implementation of the quasi-minimum-residual (QMR) iterative solver, which eliminates explicit storage of finite volume matrix elements in favor of "on the fly" computation as needed by the iterative Krylov sequence. Further efficiency is achieved through sparse coupling matrices between the vector and scalar potentials whose non-zero elements arise only in those parts of the model domain where the conductivity gradient is non-zero. Multi-thread parallelization in the QMR solver through OpenMP pragmas is used to reduce the computational cost of its most expensive step: the single matrix-vector product at each iteration. High-level MPI communicators farm independent processes to available compute nodes for simultaneous computation of multi-frequency or multi-transmitter responses.
Aprà, E; Kowalski, K
2016-03-08
In this paper we discuss the implementation of multireference coupled-cluster formalism with singles, doubles, and noniterative triples (MRCCSD(T)), which is capable of taking advantage of the processing power of the Intel Xeon Phi coprocessor. We discuss the integration of two levels of parallelism underlying the MRCCSD(T) implementation with computational kernels designed to offload the computationally intensive parts of the MRCCSD(T) formalism to Intel Xeon Phi coprocessors. Special attention is given to the enhancement of the parallel performance by task reordering that has improved load balancing in the noniterative part of the MRCCSD(T) calculations. We also discuss aspects regarding efficient optimization and vectorization strategies.
Antiparallel spin does not always contain more information
NASA Astrophysics Data System (ADS)
Ghosh, Sibasish; Roy, Anirban; Sen, Ujjwal
2001-01-01
We show that the Bloch vectors lying on any great circle comprise the largest set SL for which the parallel states \\|n-->,n-->> can always be exactly transformed into the antiparallel states \\|n-->,-n-->>. Thus more information about n--> is not extractable from \\|n-->,-n-->> than from \\|n-->,n-->> by any measuring strategy, for n-->∈SL. Surprisingly this most general transformation reduces to just a flip operation on the second particle. We also show here that a probabilistic exact parallel to antiparallel transformation is not possible if the corresponding antiparallel states span the whole Hilbert space of the two qubits. These considerations allow us to generalize a conjecture of Gisin and Popescu [Phys. Rev. Lett. 83, 432 (1999)].
Matrix Product Operator Simulations of Quantum Algorithms
2015-02-01
parallel to the Grover subspace parametrically: (Zi|φ〉)‖ = s cos γ|α〉+ s sin γ|β〉, s = √ a(k)2 (N − 1)2 + b(k)2, γ = tan −1 ( b(k)(N − 1) a(k) ) (6.32) Each...of this vector parallel to the Grover subspace in parametric form: (XiZi|φ〉)‖ = s cos(γ)|α〉+ s sin(γ)|β〉, s = 1√ N − 1 , γ = tan −1 ( cot (( k + 1 2 ) θ...quant- ph/0001106, 2000. Bibliography 146 [30] Jérémie Roland and Nicolas J Cerf. Quantum search by local adiabatic evolution. Physical Review A, 65(4
High performance compression of science data
NASA Technical Reports Server (NTRS)
Storer, James A.; Cohn, Martin
1994-01-01
Two papers make up the body of this report. One presents a single-pass adaptive vector quantization algorithm that learns a codebook of variable size and shape entries; the authors present experiments on a set of test images showing that with no training or prior knowledge of the data, for a given fidelity, the compression achieved typically equals or exceeds that of the JPEG standard. The second paper addresses motion compensation, one of the most effective techniques used in the interframe data compression. A parallel block-matching algorithm for estimating interframe displacement of blocks with minimum error is presented. The algorithm is designed for a simple parallel architecture to process video in real time.
Brief announcement: Hypergraph parititioning for parallel sparse matrix-matrix multiplication
Ballard, Grey; Druinsky, Alex; Knight, Nicholas; ...
2015-01-01
The performance of parallel algorithms for sparse matrix-matrix multiplication is typically determined by the amount of interprocessor communication performed, which in turn depends on the nonzero structure of the input matrices. In this paper, we characterize the communication cost of a sparse matrix-matrix multiplication algorithm in terms of the size of a cut of an associated hypergraph that encodes the computation for a given input nonzero structure. Obtaining an optimal algorithm corresponds to solving a hypergraph partitioning problem. Furthermore, our hypergraph model generalizes several existing models for sparse matrix-vector multiplication, and we can leverage hypergraph partitioners developed for that computationmore » to improve application-specific algorithms for multiplying sparse matrices.« less
Federici, Valentina; Ippoliti, Carla; Catalani, Monica; Di Provvido, Andrea; Santilli, Adriana; Quaglia, Michela; Mancini, Giuseppe; Di Nicola, Francesca; Di Gennaro, Annapia; Leone, Alessandra; Teodori, Liana; Conte, Annamaria; Savini, Giovanni
2016-09-30
Epizootic haemorrhagic disease (EHD) is an infectious non-contagious viral disease transmitted by Culicoides, which affects wild and domestic ruminants. The disease has never been reported in Europe, however recently outbreaks of EHD occurred in the Mediterranean Basin. Consequently, the risk that Epizootic haemorrhagic disease virus (EHDV) might spread in Italy cannot be ignored. The aim of this study was to evaluate the risk of EHDV transmission in Italy, in case of introduction, through indigenous potential vectors. In Italy, the most spread and abundant Culicoides species associated to livestock are Culicoides imicola and the members of the Obsoletus complex. Culicoides imicola is a competent vector of EHDV, whereas the vector status of the Obsoletus complex has not been assessed yet. Thus, its oral susceptibility to EHDV was here preliminary evaluated. To evaluate the risk of EHDV transmission a geographical information system-based Multi-Criteria Evaluation approach was adopted. Distribution of vector species and host density were used as predictors of potential suitable areas for EHDV transmission, in case of introduction in Italy. This study demonstrates that the whole peninsula is suitable for the disease, given the distribution and abundance of hosts and the competence of possible indigenous vectors.
Amplitude and dynamics of polarization-plane signaling in the central complex of the locust brain
Bockhorst, Tobias
2015-01-01
The polarization pattern of skylight provides a compass cue that various insect species use for allocentric orientation. In the desert locust, Schistocerca gregaria, a network of neurons tuned to the electric field vector (E-vector) angle of polarized light is present in the central complex of the brain. Preferred E-vector angles vary along slices of neuropils in a compasslike fashion (polarotopy). We studied how the activity in this polarotopic population is modulated in ways suited to control compass-guided locomotion. To this end, we analyzed tuning profiles using measures of correlation between spike rate and E-vector angle and, furthermore, tested for adaptation to stationary angles. The results suggest that the polarotopy is stabilized by antagonistic integration across neurons with opponent tuning. Downstream to the input stage of the network, responses to stationary E-vector angles adapted quickly, which may correlate with a tendency to steer a steady course previously observed in tethered flying locusts. By contrast, rotating E-vectors corresponding to changes in heading direction under a natural sky elicited nonadapting responses. However, response amplitudes were particularly variable at the output stage, covarying with the level of ongoing activity. Moreover, the responses to rotating E-vector angles depended on the direction of rotation in an anticipatory manner. Our observations support a view of the central complex as a substrate of higher-stage processing that could assign contextual meaning to sensory input for motor control in goal-driven behaviors. Parallels to higher-stage processing of sensory information in vertebrates are discussed. PMID:25609107
Segmentation of magnetic resonance images using fuzzy algorithms for learning vector quantization.
Karayiannis, N B; Pai, P I
1999-02-01
This paper evaluates a segmentation technique for magnetic resonance (MR) images of the brain based on fuzzy algorithms for learning vector quantization (FALVQ). These algorithms perform vector quantization by updating all prototypes of a competitive network through an unsupervised learning process. Segmentation of MR images is formulated as an unsupervised vector quantization process, where the local values of different relaxation parameters form the feature vectors which are represented by a relatively small set of prototypes. The experiments evaluate a variety of FALVQ algorithms in terms of their ability to identify different tissues and discriminate between normal tissues and abnormalities.
Parallel seed-based approach to multiple protein structure similarities detection
Chapuis, Guillaume; Le Boudic-Jamin, Mathilde; Andonov, Rumen; ...
2015-01-01
Finding similarities between protein structures is a crucial task in molecular biology. Most of the existing tools require proteins to be aligned in order-preserving way and only find single alignments even when multiple similar regions exist. We propose a new seed-based approach that discovers multiple pairs of similar regions. Its computational complexity is polynomial and it comes with a quality guarantee—the returned alignments have both root mean squared deviations (coordinate-based as well as internal-distances based) lower than a given threshold, if such exist. We do not require the alignments to be order preserving (i.e., we consider nonsequential alignments), which makesmore » our algorithm suitable for detecting similar domains when comparing multidomain proteins as well as to detect structural repetitions within a single protein. Because the search space for nonsequential alignments is much larger than for sequential ones, the computational burden is addressed by extensive use of parallel computing techniques: a coarse-grain level parallelism making use of available CPU cores for computation and a fine-grain level parallelism exploiting bit-level concurrency as well as vector instructions.« less
SAPNEW: Parallel finite element code for thin shell structures on the Alliant FX-80
NASA Astrophysics Data System (ADS)
Kamat, Manohar P.; Watson, Brian C.
1992-11-01
The finite element method has proven to be an invaluable tool for analysis and design of complex, high performance systems, such as bladed-disk assemblies in aircraft turbofan engines. However, as the problem size increase, the computation time required by conventional computers can be prohibitively high. Parallel processing computers provide the means to overcome these computation time limits. This report summarizes the results of a research activity aimed at providing a finite element capability for analyzing turbomachinery bladed-disk assemblies in a vector/parallel processing environment. A special purpose code, named with the acronym SAPNEW, has been developed to perform static and eigen analysis of multi-degree-of-freedom blade models built-up from flat thin shell elements. SAPNEW provides a stand alone capability for static and eigen analysis on the Alliant FX/80, a parallel processing computer. A preprocessor, named with the acronym NTOS, has been developed to accept NASTRAN input decks and convert them to the SAPNEW format to make SAPNEW more readily used by researchers at NASA Lewis Research Center.
Misra, Sanchit; Pamnany, Kiran; Aluru, Srinivas
2015-01-01
Construction of whole-genome networks from large-scale gene expression data is an important problem in systems biology. While several techniques have been developed, most cannot handle network reconstruction at the whole-genome scale, and the few that can, require large clusters. In this paper, we present a solution on the Intel Xeon Phi coprocessor, taking advantage of its multi-level parallelism including many x86-based cores, multiple threads per core, and vector processing units. We also present a solution on the Intel® Xeon® processor. Our solution is based on TINGe, a fast parallel network reconstruction technique that uses mutual information and permutation testing for assessing statistical significance. We demonstrate the first ever inference of a plant whole genome regulatory network on a single chip by constructing a 15,575 gene network of the plant Arabidopsis thaliana from 3,137 microarray experiments in only 22 minutes. In addition, our optimization for parallelizing mutual information computation on the Intel Xeon Phi coprocessor holds out lessons that are applicable to other domains.
Algorithms and programming tools for image processing on the MPP, part 2
NASA Technical Reports Server (NTRS)
Reeves, Anthony P.
1986-01-01
A number of algorithms were developed for image warping and pyramid image filtering. Techniques were investigated for the parallel processing of a large number of independent irregular shaped regions on the MPP. In addition some utilities for dealing with very long vectors and for sorting were developed. Documentation pages for the algorithms which are available for distribution are given. The performance of the MPP for a number of basic data manipulations was determined. From these results it is possible to predict the efficiency of the MPP for a number of algorithms and applications. The Parallel Pascal development system, which is a portable programming environment for the MPP, was improved and better documentation including a tutorial was written. This environment allows programs for the MPP to be developed on any conventional computer system; it consists of a set of system programs and a library of general purpose Parallel Pascal functions. The algorithms were tested on the MPP and a presentation on the development system was made to the MPP users group. The UNIX version of the Parallel Pascal System was distributed to a number of new sites.
Giotto magnetic field observations at the outbound quasi-parallel bow shock of Comet Halley
NASA Technical Reports Server (NTRS)
Neubauer, F. M.; Glassmeier, K. H.; Acuna, M. H.; Mariani, F.; Musmann, G.
1990-01-01
The investigation of the outbound bow shock of Comet Halley using Giotto magnetometer data leads to the following results: the shock is characterized by strong magnetic turbulence associated with an increasing background magnetic field and a change in direction by 60 deg as one goes inward. In HSE-coordinates, the observed normal turned out to be (0.544, - 0.801, 0.249). The thickness of the quasi-parallel shock was 120,000 km. The shock is shown to be a new type of shock transition called a 'draping shock'. In a draping shock with high beta in the transonic transition region, the transonic region is characterized by strong directional variations of the magnetic field. The magnetic turbulence ahead of the shock is characterized by k-vectors parallel or antiparallel to the average field (and, therefore, also to the normal of the quasi-parallel shock) and almost isotropic magnetic turbulence in the shock transition region. A model of the draping shock is proposed which also includes a hypothetical subshock in which the supersonic-subsonic transition is accomplished.
Dropulic, Boro
2005-07-01
The recent development of leukemia in three patients following retroviral vector gene transfer in hematopoietic stem cells, resulting in the death of one patient, has raised safety concerns for the use of integrating gene transfer vectors for human gene therapy. This review discusses these serious adverse events from the perspective of whether restrictions on vector design and vector-modified target cells are warranted at this time. A case is made against presently establishing specific restrictions for vector design and transduced cells; rather, their safety should be ascertained by empiric evaluation in appropriate preclinical models on a case-by-case basis. Such preclinical data, coupled with proper informed patient consent and a risk-benefit ratio analysis, provide the best available prospective evaluation of gene transfer vectors prior to their translation into the clinic.
Machine Learning Toolkit for Extreme Scale
DOE Office of Scientific and Technical Information (OSTI.GOV)
2014-03-31
Support Vector Machines (SVM) is a popular machine learning technique, which has been applied to a wide range of domains such as science, finance, and social networks for supervised learning. MaTEx undertakes the challenge of designing a scalable parallel SVM training algorithm for large scale systems, which includes commodity multi-core machines, tightly connected supercomputers and cloud computing systems. Several techniques are proposed for improved speed and memory space usage including adaptive and aggressive elimination of samples for faster convergence , and sparse format representation of data samples. Several heuristics for earliest possible to lazy elimination of non-contributing samples are consideredmore » in MaTEx. In many cases, where an early sample elimination might result in a false positive, low overhead mechanisms for reconstruction of key data structures are proposed. The proposed algorithm and heuristics are implemented and evaluated on various publicly available datasets« less
Pandey, Anuja; Zodpey, Sanjay; Kumar, Raj
2015-01-01
Vector-borne diseases account for a significant proportion of the global burden of infectious disease. They are one of the greatest contributors to human mortality and morbidity in tropical settings, including India. The World Health Organization declared vector-borne diseases as theme for the year 2014, and thus called for renewed commitment to their prevention and control. Human resources are critical to support public health systems, and medical entomologists play a crucial role in public health efforts to combat vector-borne diseases. This paper aims to review the capacity-building initiatives in medical entomology in India, to understand the demand and supply of medical entomologists, and to give future direction for the initiation of need-based training in the country. A systematic, predefined approach, with three parallel strategies, was used to collect and assemble the data regarding medical entomology training in India and assess the demand-supply gap in medical entomologists in the country. The findings suggest that, considering the high burden of vector-borne diseases in the country and the growing need of health manpower specialized in medical entomology, the availability of specialized training in medical entomology is insufficient in terms of number and intake capacity. The demand analysis of medical entomologists in India suggests a wide gap in demand and supply, which needs to be addressed to cater for the burden of vector-borne diseases in the country.
Pinning, rotation, and metastability of BiFeO 3 cycloidal domains in a magnetic field
Fishman, Randy S.
2018-01-03
Earlier models for the room-temperature multiferroic BiFeO 3 implicitly assumed that a very strong anisotropy restricts the domain wave vectors q to the threefold-symmetric axis normal to the static polarization P. However, recent measurements demonstrate that the domain wave vectors q rotate within the hexagonal plane normal to P away from the magnetic field orientation m. In this paper, we show that the previously neglected threefold anisotropy K 3 restricts the wave vectors to lie along the threefold axis in zero field. Taking m to lie along a threefold axis, the domain with q parallel to m remains metastable belowmore » B c1≈7 T. Due to the pinning of domains by nonmagnetic impurities, the wave vectors of the other two domains start to rotate away from m above 5.6 T, when the component of the torque τ=M×B along P exceeds a threshold value τ pin. Since τ=0 when m⊥q, the wave vectors of those domains never become completely perpendicular to the magnetic field. Our results explain recent measurements of the critical field as a function of field orientation, small-angle neutron scattering measurements of the wave vectors, as well as spectroscopic measurements with m along a threefold axis. Finally, the model developed in this paper also explains how the three multiferroic domains of BiFeO 3 for a fixed P can be manipulated by a magnetic field.« less
Pinning, rotation, and metastability of BiFeO3 cycloidal domains in a magnetic field
NASA Astrophysics Data System (ADS)
Fishman, Randy S.
2018-01-01
Earlier models for the room-temperature multiferroic BiFeO3 implicitly assumed that a very strong anisotropy restricts the domain wave vectors q to the threefold-symmetric axis normal to the static polarization P . However, recent measurements demonstrate that the domain wave vectors q rotate within the hexagonal plane normal to P away from the magnetic field orientation m . We show that the previously neglected threefold anisotropy K3 restricts the wave vectors to lie along the threefold axis in zero field. Taking m to lie along a threefold axis, the domain with q parallel to m remains metastable below Bc 1≈7 T. Due to the pinning of domains by nonmagnetic impurities, the wave vectors of the other two domains start to rotate away from m above 5.6 T, when the component of the torque τ =M ×B along P exceeds a threshold value τpin. Since τ =0 when m ⊥q , the wave vectors of those domains never become completely perpendicular to the magnetic field. Our results explain recent measurements of the critical field as a function of field orientation, small-angle neutron scattering measurements of the wave vectors, as well as spectroscopic measurements with m along a threefold axis. The model developed in this paper also explains how the three multiferroic domains of BiFeO3 for a fixed P can be manipulated by a magnetic field.
Pinning, rotation, and metastability of BiFeO 3 cycloidal domains in a magnetic field
DOE Office of Scientific and Technical Information (OSTI.GOV)
Fishman, Randy S.
Earlier models for the room-temperature multiferroic BiFeO 3 implicitly assumed that a very strong anisotropy restricts the domain wave vectors q to the threefold-symmetric axis normal to the static polarization P. However, recent measurements demonstrate that the domain wave vectors q rotate within the hexagonal plane normal to P away from the magnetic field orientation m. In this paper, we show that the previously neglected threefold anisotropy K 3 restricts the wave vectors to lie along the threefold axis in zero field. Taking m to lie along a threefold axis, the domain with q parallel to m remains metastable belowmore » B c1≈7 T. Due to the pinning of domains by nonmagnetic impurities, the wave vectors of the other two domains start to rotate away from m above 5.6 T, when the component of the torque τ=M×B along P exceeds a threshold value τ pin. Since τ=0 when m⊥q, the wave vectors of those domains never become completely perpendicular to the magnetic field. Our results explain recent measurements of the critical field as a function of field orientation, small-angle neutron scattering measurements of the wave vectors, as well as spectroscopic measurements with m along a threefold axis. Finally, the model developed in this paper also explains how the three multiferroic domains of BiFeO 3 for a fixed P can be manipulated by a magnetic field.« less
Adsorption and dissociation of molecular oxygen on α-Pu (0 2 0) surface: A density functional study
NASA Astrophysics Data System (ADS)
Wang, Jianguang; Ray, Asok K.
2011-09-01
Molecular and dissociative oxygen adsorptions on the α-Pu (0 2 0) surface have been systematically studied using the full-potential linearized augmented-plane-wave plus local orbitals (FP-LAPW+lo) basis method and the Perdew-Burke-Ernzerhof (PBE) exchange-correlation functional. Chemisorption energies have been optimized for the distance of the admolecule from the Pu surface and the bond length of O-O atoms for four adsorption sites and three approaches of O 2 admolecule to the (0 2 0) surface. Chemisorption energies have been calculated at the scalar relativistic level with no spin-orbit coupling (NSOC) and at the fully relativistic level with spin-orbit coupling (SOC). Dissociative adsorptions are found at the two horizontal approaches (O 2 is parallel to the surface and perpendicular/parallel to a lattice vector). Hor2 (O 2 is parallel to the surface and perpendicular to a lattice vector) approach at the one-fold top site is the most stable adsorption site, with chemisorption energies of 8.048 and 8.415 eV for the NSOC and SOC cases, respectively, and an OO separation of 3.70 Å. Molecular adsorption occurs at the Vert (O 2 is vertical to the surface) approach of each adsorption site. The calculated work functions and net spin magnetic moments, respectively, increase and decrease in all cases upon chemisorption compared to the clean surface. The partial charges inside the muffin-tins, the difference charge density distributions, and the local density of states have been used to investigate the Pu-admolecule electronic structures and bonding mechanisms.
Accurate treatment of total photoabsorption cross sections by an ab initio time-dependent method
NASA Astrophysics Data System (ADS)
Daud, Mohammad Noh
2014-09-01
A detailed discussion of parallel and perpendicular transitions required for the photoabsorption of a molecule is presented within a time-dependent view. Total photoabsorption cross sections for the first two ultraviolet absorption bands of the N2O molecule corresponding to transitions from the X1 A' state to the 21 A' and 11 A'' states are calculated to test the reliability of the method. By fully considering the property of the electric field polarization vector of the incident light, the method treats the coupling of angular momentum and the parity differently for two kinds of transitions depending on the direction of the vector whether it is: (a) situated parallel in a molecular plane for an electronic transition between states with the same symmetry; (b) situated perpendicular to a molecular plane for an electronic transition between states with different symmetry. Through this, for those transitions, we are able to offer an insightful picture of the dynamics involved and to characterize some new aspects in the photoabsorption process of N2O. Our calculations predicted that the parallel transition to the 21 A' state is the major dissociation pathway which is in qualitative agreement with the experimental observations. Most importantly, a significant improvement in the absolute value of the total cross section over previous theoretical results [R. Schinke, J. Chem. Phys. 134, 064313 (2011), M.N. Daud, G.G. Balint-Kurti, A. Brown, J. Chem. Phys. 122, 054305 (2005), S. Nanbu, M.S. Johnson, J. Phys. Chem. A 108, 8905 (2004)] was obtained.
NASA Astrophysics Data System (ADS)
Jacobs, Shane Earl
This dissertation presents the concept of a Morphing Upper Torso, an innovative pressure suit design that incorporates robotic elements to enable a resizable, highly mobile and easy to don/doff spacesuit. The torso is modeled as a system of interconnected, pressure-constrained, reduced-DOF, wire-actuated parallel manipulators, that enable the dimensions of the suit to be reconfigured to match the wearer. The kinematics, dynamics and control of wire-actuated manipulators are derived and simulated, along with the Jacobian transforms, which relate the total twist vector of the system to the vector of actuator velocities. Tools are developed that allow calculation of the workspace for both single and interconnected reduced-DOF robots of this type, using knowledge of the link lengths. The forward kinematics and statics equations are combined and solved to produce the pose of the platforms along with the link tensions. These tools allow analysis of the full Morphing Upper Torso design, in which the back hatch of a rear-entry torso is interconnected with the waist ring, helmet ring and two scye bearings. Half-scale and full-scale experimental models are used along with analytical models to examine the feasibility of this novel space suit concept. The analytical and experimental results demonstrate that the torso could be expanded to facilitate donning and doffng, and then contracted to match different wearer's body dimensions. Using the system of interconnected parallel manipulators, suit components can be accurately repositioned to different desired configurations. The demonstrated feasibility of the Morphing Upper Torso concept makes it an exciting candidate for inclusion in a future planetary suit architecture.
Martínez, Alejandro; Míguez, Hernán; Sánchez-Dehesa, José; Martí, Javier
2005-05-30
This work presents a comprehensive analysis of electromagnetic wave propagation inside a two-dimensional photonic crystal in a spectral region in which the crystal behaves as an effective medium to which a negative effective index of refraction can be associated. It is obtained that the main plane wave component of the Bloch mode that propagates inside the photonic crystal has its wave vector k' out of the first Brillouin zone and it is parallel to the Poynting vector ( S' ? k'> 0 ), so light propagation in these composites is different from that reported for left-handed materials despite the fact that negative refraction can take place at the interface between air and both kinds of composites. However, wave coupling at the interfaces is well explained using the reduced wave vector ( k' ) in the first Brillouin zone, which is opposed to the energy flow, and agrees well with previous works dealing with negative refraction in photonic crystals.
Subatomic-scale force vector mapping above a Ge(001) dimer using bimodal atomic force microscopy
NASA Astrophysics Data System (ADS)
Naitoh, Yoshitaka; Turanský, Robert; Brndiar, Ján; Li, Yan Jun; Štich, Ivan; Sugawara, Yasuhiro
2017-07-01
Probing physical quantities on the nanoscale that have directionality, such as magnetic moments, electric dipoles, or the force response of a surface, is essential for characterizing functionalized materials for nanotechnological device applications. Currently, such physical quantities are usually experimentally obtained as scalars. To investigate the physical properties of a surface on the nanoscale in depth, these properties must be measured as vectors. Here we demonstrate a three-force-component detection method, based on multi-frequency atomic force microscopy on the subatomic scale and apply it to a Ge(001)-c(4 × 2) surface. We probed the surface-normal and surface-parallel force components above the surface and their direction-dependent anisotropy and expressed them as a three-dimensional force vector distribution. Access to the atomic-scale force distribution on the surface will enable better understanding of nanoscale surface morphologies, chemical composition and reactions, probing nanostructures via atomic or molecular manipulation, and provide insights into the behaviour of nano-machines on substrates.
Current harmonics elimination control method for six-phase PM synchronous motor drives.
Yuan, Lei; Chen, Ming-liang; Shen, Jian-qing; Xiao, Fei
2015-11-01
To reduce the undesired 5th and 7th stator harmonic current in the six-phase permanent magnet synchronous motor (PMSM), an improved vector control algorithm was proposed based on vector space decomposition (VSD) transformation method, which can control the fundamental and harmonic subspace separately. To improve the traditional VSD technology, a novel synchronous rotating coordinate transformation matrix was presented in this paper, and only using the traditional PI controller in d-q subspace can meet the non-static difference adjustment, the controller parameter design method is given by employing internal model principle. Moreover, the current PI controller parallel with resonant controller is employed in x-y subspace to realize the specific 5th and 7th harmonic component compensation. In addition, a new six-phase SVPWM algorithm based on VSD transformation theory is also proposed. Simulation and experimental results verify the effectiveness of current decoupling vector controller. Copyright © 2015 ISA. Published by Elsevier Ltd. All rights reserved.
Lanczos eigensolution method for high-performance computers
NASA Technical Reports Server (NTRS)
Bostic, Susan W.
1991-01-01
The theory, computational analysis, and applications are presented of a Lanczos algorithm on high performance computers. The computationally intensive steps of the algorithm are identified as: the matrix factorization, the forward/backward equation solution, and the matrix vector multiples. These computational steps are optimized to exploit the vector and parallel capabilities of high performance computers. The savings in computational time from applying optimization techniques such as: variable band and sparse data storage and access, loop unrolling, use of local memory, and compiler directives are presented. Two large scale structural analysis applications are described: the buckling of a composite blade stiffened panel with a cutout, and the vibration analysis of a high speed civil transport. The sequential computational time for the panel problem executed on a CONVEX computer of 181.6 seconds was decreased to 14.1 seconds with the optimized vector algorithm. The best computational time of 23 seconds for the transport problem with 17,000 degs of freedom was on the the Cray-YMP using an average of 3.63 processors.
Phase space analysis in anisotropic optical systems
NASA Technical Reports Server (NTRS)
Rivera, Ana Leonor; Chumakov, Sergey M.; Wolf, Kurt Bernardo
1995-01-01
From the minimal action principle follows the Hamilton equations of evolution for geometric optical rays in anisotropic media. As in classical mechanics of velocity-dependent potentials, the velocity and the canonical momentum are not parallel, but differ by an anisotropy vector potential, similar to that of linear electromagnetism. Descartes' well known diagram for refraction is generalized and a factorization theorem holds for interfaces between two anisotropic media.
Geometrization of the Dirac theory of the electron
NASA Technical Reports Server (NTRS)
Fock, V.
1977-01-01
Using the concept of parallel displacement of a half vector, the Dirac equations are generally written in invariant form. The energy tensor is formed and both the macroscopic and quantum mechanic equations of motion are set up. The former have the usual form: divergence of the energy tensor equals the Lorentz force and the latter are essentially identical with those of the geodesic line.
Generalizing on Multiple Grounds: Performance Learning in Model-Based Troubleshooting
1989-02-01
Aritificial Intelligence , 24, 1984. [Ble88] Guy E. Blelloch. Scan Primitives and Parallel Vector Models. PhD thesis, Artificial Intelligence Laboratory...Diagnostic reasoning based on strcture and behavior. Aritificial Intelligence , 24, 1984. [dK86] J. de Kleer. An assumption-based truth maintenance system...diagnosis. Aritificial Intelligence , 24. . )3 94 BIBLIOGRAPHY [Ham87] Kristian J. Hammond. Learning to anticipate and avoid planning prob- lems
DOE Office of Scientific and Technical Information (OSTI.GOV)
Moreland, Kenneth D.
2017-07-01
The FY17Q3 milestone of the ECP/VTK-m project includes the completion of a VTK-m filter that computes normal vectors for surfaces. Normal vectors are those that point perpendicular to the surface and are an important direction when rendering the surface. The implementation includes the parallel algorithm itself, a filter module to simplify integrating it into other software, and documentation in the VTK-m Users’ Guide. With the completion of this milestone, we are able to necessary information to rendering systems to provide appropriate shading of surfaces. This milestone also feeds into subsequent milestones that progressively improve the approximation of surface direction.
Optical vector network analyzer based on double-sideband modulation.
Jun, Wen; Wang, Ling; Yang, Chengwu; Li, Ming; Zhu, Ning Hua; Guo, Jinjin; Xiong, Liangming; Li, Wei
2017-11-01
We report an optical vector network analyzer (OVNA) based on double-sideband (DSB) modulation using a dual-parallel Mach-Zehnder modulator. The device under test (DUT) is measured twice with different modulation schemes. By post-processing the measurement results, the response of the DUT can be obtained accurately. Since DSB modulation is used in our approach, the measurement range is doubled compared with conventional single-sideband (SSB) modulation-based OVNA. Moreover, the measurement accuracy is improved by eliminating the even-order sidebands. The key advantage of the proposed scheme is that the measurement of a DUT with bandpass response can also be simply realized, which is a big challenge for the SSB-based OVNA. The proposed method is theoretically and experimentally demonstrated.
Electromagnetic Whistler Precursors at Supercritical Interplanetary Shocks
NASA Technical Reports Server (NTRS)
Wilson, L. B., III
2012-01-01
We present observations of electromagnetic precursor waves, identified as whistler mode waves, at supercritical interplanetary shocks using the Wind search coil magnetometer. The precursors propagate obliquely with respect to the local magnetic field, shock normal vector, solar wind velocity, and they are not phase standing structures. All are right-hand polarized with respect to the magnetic field (spacecraft frame), and all but one are right-hand polarized with respect to the shock normal vector in the normal incidence frame. Particle distributions show signatures of specularly reflected gyrating ions, which may be a source of free energy for the observed modes. In one event, we simultaneously observe perpendicular ion heating and parallel electron acceleration, consistent with wave heating/acceleration due to these waves.
Flies dynamically anti-track, rather than ballistically escape, aversive odor during flight.
Wasserman, Sara; Lu, Patrick; Aptekar, Jacob W; Frye, Mark A
2012-08-15
Tracking distant odor sources is crucial to foraging, courtship and reproductive success for many animals including fish, flies and birds. Upon encountering a chemical plume in flight, Drosophila melanogaster integrates the spatial intensity gradient and temporal fluctuations over the two antennae, while simultaneously reducing the amplitude and frequency of rapid steering maneuvers, stabilizing the flight vector. There are infinite escape vectors away from a noxious source, in contrast to a single best tracking vector towards an attractive source. Attractive and aversive odors are segregated into parallel neuronal pathways in flies; therefore, the behavioral algorithms for avoidance may be categorically different from tracking. Do flies plot random ballistic or otherwise variable escape vectors? Or do they instead make use of temporally dynamic mechanisms for continuously and directly avoiding noxious odors in a manner similar to tracking appetitive ones? We examine this question using a magnetic tether flight simulator that permits free yaw movements, such that flies can actively orient within spatially defined odor plumes. We show that in-flight aversive flight behavior shares all of the key features of attraction such that flies continuously 'anti-track' the noxious source.
Flies dynamically anti-track, rather than ballistically escape, aversive odor during flight
Wasserman, Sara; Lu, Patrick; Aptekar, Jacob W.; Frye, Mark A.
2012-01-01
SUMMARY Tracking distant odor sources is crucial to foraging, courtship and reproductive success for many animals including fish, flies and birds. Upon encountering a chemical plume in flight, Drosophila melanogaster integrates the spatial intensity gradient and temporal fluctuations over the two antennae, while simultaneously reducing the amplitude and frequency of rapid steering maneuvers, stabilizing the flight vector. There are infinite escape vectors away from a noxious source, in contrast to a single best tracking vector towards an attractive source. Attractive and aversive odors are segregated into parallel neuronal pathways in flies; therefore, the behavioral algorithms for avoidance may be categorically different from tracking. Do flies plot random ballistic or otherwise variable escape vectors? Or do they instead make use of temporally dynamic mechanisms for continuously and directly avoiding noxious odors in a manner similar to tracking appetitive ones? We examine this question using a magnetic tether flight simulator that permits free yaw movements, such that flies can actively orient within spatially defined odor plumes. We show that in-flight aversive flight behavior shares all of the key features of attraction such that flies continuously ‘anti-track’ the noxious source. PMID:22837456
Molla, Mijanur R; Böser, Alexander; Rana, Akshita; Schwarz, Karina; Levkin, Pavel A
2018-04-18
Efficient delivery of nucleic acids into cells is of great interest in the field of cell biology and gene therapy. Despite a lot of research, transfection efficiency and structural diversity of gene-delivery vectors are still limited. A better understanding of the structure-function relationship of gene delivery vectors is also essential for the design of novel and intelligent delivery vectors, efficient in "difficult-to-transfect" cells and in vivo clinical applications. Most of the existing strategies for the synthesis of gene-delivery vectors require multiple steps and lengthy procedures. Here, we demonstrate a facile, three-component one-pot synthesis of a combinatorial library of 288 structurally diverse lipid-like molecules termed "lipidoids" via a thiolactone ring opening reaction. This strategy introduces the possibility to synthesize lipidoids with hydrophobic tails containing both unsaturated bonds and reducible disulfide groups. The whole synthesis and purification are convenient, extremely fast, and can be accomplished within a few hours. Screening of the produced lipidoids using HEK293T cells without addition of helper lipids resulted in identification of highly stable liposomes demonstrating ∼95% transfection efficiency with low toxicity.
Andréasson, Claes; Schick, Anna J; Pfeiffer, Susanne M; Sarov, Mihail; Stewart, Francis; Wurst, Wolfgang; Schick, Joel A
2013-01-01
Efficient gene targeting in embryonic stem cells requires that modifying DNA sequences are identical to those in the targeted chromosomal locus. Yet, there is a paucity of isogenic genomic clones for human cell lines and PCR amplification cannot be used in many mutation-sensitive applications. Here, we describe a novel method for the direct cloning of genomic DNA into a targeting vector, pRTVIR, using oligonucleotide-directed homologous recombination in yeast. We demonstrate the applicability of the method by constructing functional targeting vectors for mammalian genes Uhrf1 and Gfap. Whereas the isogenic targeting of the gene Uhrf1 showed a substantial increase in targeting efficiency compared to non-isogenic DNA in mouse E14 cells, E14-derived DNA performed better than the isogenic DNA in JM8 cells for both Uhrf1 and Gfap. Analysis of 70 C57BL/6-derived targeting vectors electroporated in JM8 and E14 cell lines in parallel showed a clear dependence on isogenicity for targeting, but for three genes isogenic DNA was found to be inhibitory. In summary, this study provides a straightforward methodological approach for the direct generation of isogenic gene targeting vectors.
Design of a universal two-layered neural network derived from the PLI theory
NASA Astrophysics Data System (ADS)
Hu, Chia-Lun J.
2004-05-01
The if-and-only-if (IFF) condition that a set of M analog-to-digital vector-mapping relations can be learned by a one-layered-feed-forward neural network (OLNN) is that all the input analog vectors dichotomized by the i-th output bit must be positively, linearly independent, or PLI. If they are not PLI, then the OLNN just cannot learn no matter what learning rules is employed because the solution of the connection matrix does not exist mathematically. However, in this case, one can still design a parallel-cascaded, two-layered, perceptron (PCTLP) to acheive this general mapping goal. The design principle of this "universal" neural network is derived from the major mathematical properties of the PLI theory - changing the output bits of the dependent relations existing among the dichotomized input vectors to make the PLD relations PLI. Then with a vector concatenation technique, the required mapping can still be learned by this PCTLP system with very high efficiency. This paper will report in detail the mathematical derivation of the general design principle and the design procedures of the PCTLP neural network system. It then will be verified in general by a practical numerical example.
The Advanced Software Development and Commercialization Project
DOE Office of Scientific and Technical Information (OSTI.GOV)
Gallopoulos, E.; Canfield, T.R.; Minkoff, M.
1990-09-01
This is the first of a series of reports pertaining to progress in the Advanced Software Development and Commercialization Project, a joint collaborative effort between the Center for Supercomputing Research and Development of the University of Illinois and the Computing and Telecommunications Division of Argonne National Laboratory. The purpose of this work is to apply techniques of parallel computing that were pioneered by University of Illinois researchers to mature computational fluid dynamics (CFD) and structural dynamics (SD) computer codes developed at Argonne. The collaboration in this project will bring this unique combination of expertise to bear, for the first time,more » on industrially important problems. By so doing, it will expose the strengths and weaknesses of existing techniques for parallelizing programs and will identify those problems that need to be solved in order to enable wide spread production use of parallel computers. Secondly, the increased efficiency of the CFD and SD codes themselves will enable the simulation of larger, more accurate engineering models that involve fluid and structural dynamics. In order to realize the above two goals, we are considering two production codes that have been developed at ANL and are widely used by both industry and Universities. These are COMMIX and WHAMS-3D. The first is a computational fluid dynamics code that is used for both nuclear reactor design and safety and as a design tool for the casting industry. The second is a three-dimensional structural dynamics code used in nuclear reactor safety as well as crashworthiness studies. These codes are currently available for both sequential and vector computers only. Our main goal is to port and optimize these two codes on shared memory multiprocessors. In so doing, we shall establish a process that can be followed in optimizing other sequential or vector engineering codes for parallel processors.« less
NASA Technical Reports Server (NTRS)
Fijany, Amir (Inventor); Bejczy, Antal K. (Inventor)
1993-01-01
This is a real-time robotic controller and simulator which is a MIMD-SIMD parallel architecture for interfacing with an external host computer and providing a high degree of parallelism in computations for robotic control and simulation. It includes a host processor for receiving instructions from the external host computer and for transmitting answers to the external host computer. There are a plurality of SIMD microprocessors, each SIMD processor being a SIMD parallel processor capable of exploiting fine grain parallelism and further being able to operate asynchronously to form a MIMD architecture. Each SIMD processor comprises a SIMD architecture capable of performing two matrix-vector operations in parallel while fully exploiting parallelism in each operation. There is a system bus connecting the host processor to the plurality of SIMD microprocessors and a common clock providing a continuous sequence of clock pulses. There is also a ring structure interconnecting the plurality of SIMD microprocessors and connected to the clock for providing the clock pulses to the SIMD microprocessors and for providing a path for the flow of data and instructions between the SIMD microprocessors. The host processor includes logic for controlling the RRCS by interpreting instructions sent by the external host computer, decomposing the instructions into a series of computations to be performed by the SIMD microprocessors, using the system bus to distribute associated data among the SIMD microprocessors, and initiating activity of the SIMD microprocessors to perform the computations on the data by procedure call.
Full-field 3D deformation measurement: comparison between speckle phase and displacement evaluation.
Khodadad, Davood; Singh, Alok Kumar; Pedrini, Giancarlo; Sjödahl, Mikael
2016-09-20
The objective of this paper is to describe a full-field deformation measurement method based on 3D speckle displacements. The deformation is evaluated from the slope of the speckle displacement function that connects the different reconstruction planes. For our experiment, a symmetrical arrangement with four illuminations parallel to the planes (x,z) and (y,z) was used. Four sets of speckle patterns were sequentially recorded by illuminating an object from the four directions, respectively. A single camera is used to record the holograms before and after deformations. Digital speckle photography is then used to calculate relative speckle displacements in each direction between two numerically propagated planes. The 3D speckle displacements vector is calculated as a combination of the speckle displacements from the holograms recorded in each illumination direction. Using the speckle displacements, problems associated with rigid body movements and phase wrapping are avoided. In our experiment, the procedure is shown to give the theoretical accuracy of 0.17 pixels yielding the accuracy of 2×10-3 in the measurement of deformation gradients.
Dimensional synthesis of a 3-DOF parallel manipulator with full circle rotation
NASA Astrophysics Data System (ADS)
Ni, Yanbing; Wu, Nan; Zhong, Xueyong; Zhang, Biao
2015-07-01
Parallel robots are widely used in the academic and industrial fields. In spite of the numerous achievements in the design and dimensional synthesis of the low-mobility parallel robots, few research efforts are directed towards the asymmetric 3-DOF parallel robots whose end-effector can realize 2 translational and 1 rotational(2T1R) motion. In order to develop a manipulator with the capability of full circle rotation to enlarge the workspace, a new 2T1R parallel mechanism is proposed. The modeling approach and kinematic analysis of this proposed mechanism are investigated. Using the method of vector analysis, the inverse kinematic equations are established. This is followed by a vigorous proof that this mechanism attains an annular workspace through its circular rotation and 2 dimensional translations. Taking the first order perturbation of the kinematic equations, the error Jacobian matrix which represents the mapping relationship between the error sources of geometric parameters and the end-effector position errors is derived. With consideration of the constraint conditions of pressure angles and feasible workspace, the dimensional synthesis is conducted with a goal to minimize the global comprehensive performance index. The dimension parameters making the mechanism to have optimal error mapping and kinematic performance are obtained through the optimization algorithm. All these research achievements lay the foundation for the prototype building of such kind of parallel robots.
Review of An Introduction to Parallel and Vector Scientific Computing
DOE Office of Scientific and Technical Information (OSTI.GOV)
Bailey, David H.; Lefton, Lew
2006-06-30
On one hand, the field of high-performance scientific computing is thriving beyond measure. Performance of leading-edge systems on scientific calculations, as measured say by the Top500 list, has increased by an astounding factor of 8000 during the 15-year period from 1993 to 2008, which is slightly faster even than Moore's Law. Even more importantly, remarkable advances in numerical algorithms, numerical libraries and parallel programming environments have led to improvements in the scope of what can be computed that are entirely on a par with the advances in computing hardware. And these successes have spread far beyond the confines of largemore » government-operated laboratories, many universities, modest-sized research institutes and private firms now operate clusters that differ only in scale from the behemoth systems at the large-scale facilities. In the wake of these recent successes, researchers from fields that heretofore have not been part of the scientific computing world have been drawn into the arena. For example, at the recent SC07 conference, the exhibit hall, which long has hosted displays from leading computer systems vendors and government laboratories, featured some 70 exhibitors who had not previously participated. In spite of all these exciting developments, and in spite of the clear need to present these concepts to a much broader technical audience, there is a perplexing dearth of training material and textbooks in the field, particularly at the introductory level. Only a handful of universities offer coursework in the specific area of highly parallel scientific computing, and instructors of such courses typically rely on custom-assembled material. For example, the present reviewer and Robert F. Lucas relied on materials assembled in a somewhat ad-hoc fashion from colleagues and personal resources when presenting a course on parallel scientific computing at the University of California, Berkeley, a few years ago. Thus it is indeed refreshing to see the publication of the book An Introduction to Parallel and Vector Scientic Computing, written by Ronald W. Shonkwiler and Lew Lefton, both of the Georgia Institute of Technology. They have taken the bull by the horns and produced a book that appears to be entirely satisfactory as an introductory textbook for use in such a course. It is also of interest to the much broader community of researchers who are already in the field, laboring day by day to improve the power and performance of their numerical simulations. The book is organized into 11 chapters, plus an appendix. The first three chapters describe the basics of system architecture including vector, parallel and distributed memory systems, the details of task dependence and synchronization, and the various programming models currently in use - threads, MPI and OpenMP. Chapters four through nine provide a competent introduction to floating-point arithmetic, numerical error and numerical linear algebra. Some of the topics presented include Gaussian elimination, LU decomposition, tridiagonal systems, Givens rotations, QR decompositions, Gauss-Seidel iterations and Householder transformations. Chapters 10 and 11 introduce Monte Carlo methods and schemes for discrete optimization such as genetic algorithms.« less
Receptor-mediated gene transfer vectors: progress towards genetic pharmaceuticals.
Molas, M; Gómez-Valadés, A G; Vidal-Alabró, A; Miguel-Turu, M; Bermudez, J; Bartrons, R; Perales, J C
2003-10-01
Although specific delivery to tissues and unique cell types in vivo has been demonstrated for many non-viral vectors, current methods are still inadequate for human applications, mainly because of limitations on their efficiencies. All the steps required for an efficient receptor-mediated gene transfer process may in principle be exploited to enhance targeted gene delivery. These steps are: DNA/vector binding, internalization, subcellular trafficking, vesicular escape, nuclear import, and unpacking either for transcription or other functions (i.e., antisense, RNA interference, etc.). The large variety of vector designs that are currently available, usually aimed at improving the efficiency of these steps, has complicated the evaluation of data obtained from specific derivatives of such vectors. The importance of the structure of the final vector and the consequences of design decisions at specific steps on the overall efficiency of the vector will be discussed in detail. We emphasize in this review that stability in serum and thus, proper bioavailability of vectors to their specific receptors may be the single greatest limiting factor on the overall gene transfer efficiency in vivo. We discuss current approaches to overcome the intrinsic instability of synthetic vectors in the blood. In this regard, a summary of the structural features of the vectors obtained from current protocols will be presented and their functional characteristics evaluated. Dissecting information on molecular conjugates obtained by such methodologies, when carefully evaluated, should provide important guidelines for the creation of effective, targeted and safe DNA therapeutics.
Fast parallel image registration on CPU and GPU for diagnostic classification of Alzheimer's disease
Shamonin, Denis P.; Bron, Esther E.; Lelieveldt, Boudewijn P. F.; Smits, Marion; Klein, Stefan; Staring, Marius
2013-01-01
Nonrigid image registration is an important, but time-consuming task in medical image analysis. In typical neuroimaging studies, multiple image registrations are performed, i.e., for atlas-based segmentation or template construction. Faster image registration routines would therefore be beneficial. In this paper we explore acceleration of the image registration package elastix by a combination of several techniques: (i) parallelization on the CPU, to speed up the cost function derivative calculation; (ii) parallelization on the GPU building on and extending the OpenCL framework from ITKv4, to speed up the Gaussian pyramid computation and the image resampling step; (iii) exploitation of certain properties of the B-spline transformation model; (iv) further software optimizations. The accelerated registration tool is employed in a study on diagnostic classification of Alzheimer's disease and cognitively normal controls based on T1-weighted MRI. We selected 299 participants from the publicly available Alzheimer's Disease Neuroimaging Initiative database. Classification is performed with a support vector machine based on gray matter volumes as a marker for atrophy. We evaluated two types of strategies (voxel-wise and region-wise) that heavily rely on nonrigid image registration. Parallelization and optimization resulted in an acceleration factor of 4–5x on an 8-core machine. Using OpenCL a speedup factor of 2 was realized for computation of the Gaussian pyramids, and 15–60 for the resampling step, for larger images. The voxel-wise and the region-wise classification methods had an area under the receiver operator characteristic curve of 88 and 90%, respectively, both for standard and accelerated registration. We conclude that the image registration package elastix was substantially accelerated, with nearly identical results to the non-optimized version. The new functionality will become available in the next release of elastix as open source under the BSD license. PMID:24474917
Kaur, Navneet; Hasegawa, Daniel K; Ling, Kai-Shu; Wintermantel, William M
2016-10-01
The relationships between plant viruses and their vectors have evolved over the millennia, and yet, studies on viruses began <150 years ago and investigations into the virus and vector interactions even more recently. The advent of next generation sequencing, including rapid genome and transcriptome analysis, methods for evaluation of small RNAs, and the related disciplines of proteomics and metabolomics offer a significant shift in the ability to elucidate molecular mechanisms involved in virus infection and transmission by insect vectors. Genomic technologies offer an unprecedented opportunity to examine the response of insect vectors to the presence of ingested viruses through gene expression changes and altered biochemical pathways. This review focuses on the interactions between viruses and their whitefly or thrips vectors and on potential applications of genomics-driven control of the insect vectors. Recent studies have evaluated gene expression in vectors during feeding on plants infected with begomoviruses, criniviruses, and tospoviruses, which exhibit very different types of virus-vector interactions. These studies demonstrate the advantages of genomics and the potential complementary studies that rapidly advance our understanding of the biology of virus transmission by insect vectors and offer additional opportunities to design novel genetic strategies to manage insect vectors and the viruses they transmit.
NASA Technical Reports Server (NTRS)
Keppenne, Christian L.; Rienecker, Michele; Borovikov, Anna Y.; Suarez, Max
1999-01-01
A massively parallel ensemble Kalman filter (EnKF)is used to assimilate temperature data from the TOGA/TAO array and altimetry from TOPEX/POSEIDON into a Pacific basin version of the NASA Seasonal to Interannual Prediction Project (NSIPP)ls quasi-isopycnal ocean general circulation model. The EnKF is an approximate Kalman filter in which the error-covariance propagation step is modeled by the integration of multiple instances of a numerical model. An estimate of the true error covariances is then inferred from the distribution of the ensemble of model state vectors. This inplementation of the filter takes advantage of the inherent parallelism in the EnKF algorithm by running all the model instances concurrently. The Kalman filter update step also occurs in parallel by having each processor process the observations that occur in the region of physical space for which it is responsible. The massively parallel data assimilation system is validated by withholding some of the data and then quantifying the extent to which the withheld information can be inferred from the assimilation of the remaining data. The distributions of the forecast and analysis error covariances predicted by the ENKF are also examined.
Spatiotemporal Domain Decomposition for Massive Parallel Computation of Space-Time Kernel Density
NASA Astrophysics Data System (ADS)
Hohl, A.; Delmelle, E. M.; Tang, W.
2015-07-01
Accelerated processing capabilities are deemed critical when conducting analysis on spatiotemporal datasets of increasing size, diversity and availability. High-performance parallel computing offers the capacity to solve computationally demanding problems in a limited timeframe, but likewise poses the challenge of preventing processing inefficiency due to workload imbalance between computing resources. Therefore, when designing new algorithms capable of implementing parallel strategies, careful spatiotemporal domain decomposition is necessary to account for heterogeneity in the data. In this study, we perform octtree-based adaptive decomposition of the spatiotemporal domain for parallel computation of space-time kernel density. In order to avoid edge effects near subdomain boundaries, we establish spatiotemporal buffers to include adjacent data-points that are within the spatial and temporal kernel bandwidths. Then, we quantify computational intensity of each subdomain to balance workloads among processors. We illustrate the benefits of our methodology using a space-time epidemiological dataset of Dengue fever, an infectious vector-borne disease that poses a severe threat to communities in tropical climates. Our parallel implementation of kernel density reaches substantial speedup compared to sequential processing, and achieves high levels of workload balance among processors due to great accuracy in quantifying computational intensity. Our approach is portable of other space-time analytical tests.
GSRP/David Marshall: Fully Automated Cartesian Grid CFD Application for MDO in High Speed Flows
NASA Technical Reports Server (NTRS)
2003-01-01
With the renewed interest in Cartesian gridding methodologies for the ease and speed of gridding complex geometries in addition to the simplicity of the control volumes used in the computations, it has become important to investigate ways of extending the existing Cartesian grid solver functionalities. This includes developing methods of modeling the viscous effects in order to utilize Cartesian grids solvers for accurate drag predictions and addressing the issues related to the distributed memory parallelization of Cartesian solvers. This research presents advances in two areas of interest in Cartesian grid solvers, viscous effects modeling and MPI parallelization. The development of viscous effects modeling using solely Cartesian grids has been hampered by the widely varying control volume sizes associated with the mesh refinement and the cut cells associated with the solid surface. This problem is being addressed by using physically based modeling techniques to update the state vectors of the cut cells and removing them from the finite volume integration scheme. This work is performed on a new Cartesian grid solver, NASCART-GT, with modifications to its cut cell functionality. The development of MPI parallelization addresses issues associated with utilizing Cartesian solvers on distributed memory parallel environments. This work is performed on an existing Cartesian grid solver, CART3D, with modifications to its parallelization methodology.
Gyrokinetic Magnetohydrodynamics and the Associated Equilibrium
NASA Astrophysics Data System (ADS)
Lee, W. W.; Hudson, S. R.; Ma, C. H.
2017-10-01
A proposed scheme for the calculations of gyrokinetic MHD and its associated equilibrium is discussed related a recent paper on the subject. The scheme is based on the time-dependent gyrokinetic vorticity equation and parallel Ohm's law, as well as the associated gyrokinetic Ampere's law. This set of equations, in terms of the electrostatic potential, ϕ, and the vector potential, ϕ , supports both spatially varying perpendicular and parallel pressure gradients and their associated currents. The MHD equilibrium can be reached when ϕ -> 0 and A becomes constant in time, which, in turn, gives ∇ . (J|| +J⊥) = 0 and the associated magnetic islands. Examples in simple cylindrical geometry will be given. The present work is partially supported by US DoE Grant DE-AC02-09CH11466.
50 GFlops molecular dynamics on the Connection Machine 5
DOE Office of Scientific and Technical Information (OSTI.GOV)
Lomdahl, P.S.; Tamayo, P.; Groenbech-Jensen, N.
1993-12-31
The authors present timings and performance numbers for a new short range three dimensional (3D) molecular dynamics (MD) code, SPaSM, on the Connection Machine-5 (CM-5). They demonstrate that runs with more than 10{sup 8} particles are now possible on massively parallel MIMD computers. To the best of their knowledge this is at least an order of magnitude more particles than what has previously been reported. Typical production runs show sustained performance (including communication) in the range of 47--50 GFlops on a 1024 node CM-5 with vector units (VUs). The speed of the code scales linearly with the number of processorsmore » and with the number of particles and shows 95% parallel efficiency in the speedup.« less
User's Manual for PCSMS (Parallel Complex Sparse Matrix Solver). Version 1.
NASA Technical Reports Server (NTRS)
Reddy, C. J.
2000-01-01
PCSMS (Parallel Complex Sparse Matrix Solver) is a computer code written to make use of the existing real sparse direct solvers to solve complex, sparse matrix linear equations. PCSMS converts complex matrices into real matrices and use real, sparse direct matrix solvers to factor and solve the real matrices. The solution vector is reconverted to complex numbers. Though, this utility is written for Silicon Graphics (SGI) real sparse matrix solution routines, it is general in nature and can be easily modified to work with any real sparse matrix solver. The User's Manual is written to make the user acquainted with the installation and operation of the code. Driver routines are given to aid the users to integrate PCSMS routines in their own codes.
Chui, S T; Wang, Weihua; Zhou, L; Lin, Z F
2009-07-22
We study the propagation of plane electromagnetic waves through different systems consisting of arrays of split rings of different orientations. Many extraordinary EM phenomena were discovered in such systems, contributed by the off-diagonal magnetoelectric susceptibilities. We find a mode such that the electric field becomes elliptically polarized with a component in the longitudinal direction (i.e. parallel to the wavevector). Even though the group velocity [Formula: see text] and the wavevector k are parallel, in the presence of damping, the Poynting vector does not just get 'broadened', but can possess a component perpendicular to the wavevector. The speed of light can be real even when the product ϵμ is negative. Other novel properties are explored.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Spotz, William F.
PyTrilinos is a set of Python interfaces to compiled Trilinos packages. This collection supports serial and parallel dense linear algebra, serial and parallel sparse linear algebra, direct and iterative linear solution techniques, algebraic and multilevel preconditioners, nonlinear solvers and continuation algorithms, eigensolvers and partitioning algorithms. Also included are a variety of related utility functions and classes, including distributed I/O, coloring algorithms and matrix generation. PyTrilinos vector objects are compatible with the popular NumPy Python package. As a Python front end to compiled libraries, PyTrilinos takes advantage of the flexibility and ease of use of Python, and the efficiency of themore » underlying C++, C and Fortran numerical kernels. This paper covers recent, previously unpublished advances in the PyTrilinos package.« less
Thomas, Shalu; Ravishankaran, Sangamithra; Johnson Amala Justin, N A; Asokan, Aswin; Maria Jusler Kalsingh, T; Mathai, Manu Thomas; Valecha, Neena; Eapen, Alex
2016-11-09
The physico-chemical characteristics of lentic aquatic habitats greatly influence mosquito species in selecting suitable oviposition sites; immature development, pupation and adult emergence, therefore are considerations for their preferred ecological niche. Correlating water quality parameters with mosquito breeding, as well as immature vector density, are useful for vector control operations in identifying and targeting potential breeding habitats. A total of 40 known habitats of Anopheles stephensi, randomly selected based on a vector survey in parallel, were inspected for the physical and chemical nature of the aquatic environment. Water samples were collected four times during 2013, representing four seasons (i.e., ten habitats per season). The physico-chemical variables and mosquito breeding were statistically analysed to find their correlation with immature density of An. stephensi and also co-inhabitation with other mosquito species. Anopheles stephensi prefer water with low nitrite content and high phosphate content. Parameters such as total dissolved solids, electrical conductivity, total hardness, chloride, fluoride and sulfate had a positive correlation in habitats with any mosquito species breeding (p < 0.05) and also in habitats with An. stephensi alone breeding. Fluoride was observed to have a strong positive correlation with immature density of An. stephensi in both overhead tanks and wells. Knowledge of larval ecology of vector mosquitoes is a key factor in risk assessment and for implementing appropriate and sustainable vector control operations. The presence of fluoride in potential breeding habitats and a strong positive correlation with An. stephensi immature density is useful information, as fluoride can be considered an indicator/predictor of vector breeding. Effective larval source management can be focussed on specified habitats in vulnerable areas to reduce vector abundance and malaria transmission.
Ripple formation on Si surfaces during plasma etching in Cl2
NASA Astrophysics Data System (ADS)
Nakazaki, Nobuya; Matsumoto, Haruka; Sonobe, Soma; Hatsuse, Takumi; Tsuda, Hirotaka; Takao, Yoshinori; Eriguchi, Koji; Ono, Kouichi
2018-05-01
Nanoscale surface roughening and ripple formation in response to ion incidence angle has been investigated during inductively coupled plasma etching of Si in Cl2, using sheath control plates to achieve the off-normal ion incidence on blank substrate surfaces. The sheath control plate consisted of an array of inclined trenches, being set into place on the rf-biased electrode, where their widths and depths were chosen in such a way that the sheath edge was pushed out of the trenches. The distortion of potential distributions and the consequent deflection of ion trajectories above and in the trenches were then analyzed based on electrostatic particle-in-cell simulations of the plasma sheath, to evaluate the angular distributions of ion fluxes incident on substrates pasted on sidewalls and/or at the bottom of the trenches. Experiments showed well-defined periodic sawtooth-like ripples with their wave vector oriented parallel to the direction of ion incidence at intermediate off-normal angles, while relatively weak corrugations or ripplelike structures with the wave vector perpendicular to it at high off-normal angles. Possible mechanisms for the formation of surface ripples during plasma etching are discussed with the help of Monte Carlo simulations of plasma-surface interactions and feature profile evolution. The results indicate the possibility of providing an alternative to ion beam sputtering for self-organized formation of ordered surface nanostructures.
Selot, Ruchita; Arumugam, Sathyathithan; Mary, Bertin; Cheemadan, Sabna; Jayandharan, Giridhara R.
2017-01-01
Of the 12 common serotypes used for gene delivery applications, Adeno-associated virus (AAV)rh.10 serotype has shown sustained hepatic transduction and has the lowest seropositivity in humans. We have evaluated if further modifications to AAVrh.10 at its phosphodegron like regions or predicted immunogenic epitopes could improve its hepatic gene transfer and immune evasion potential. Mutant AAVrh.10 vectors were generated by site directed mutagenesis of the predicted targets. These mutant vectors were first tested for their transduction efficiency in HeLa and HEK293T cells. The optimal vector was further evaluated for their cellular uptake, entry, and intracellular trafficking by quantitative PCR and time-lapse confocal microscopy. To evaluate their potential during hepatic gene therapy, C57BL/6 mice were administered with wild-type or optimal mutant AAVrh.10 and the luciferase transgene expression was documented by serial bioluminescence imaging at 14, 30, 45, and 72 days post-gene transfer. Their hepatic transduction was further verified by a quantitative PCR analysis of AAV copy number in the liver tissue. The optimal AAVrh.10 vector was further evaluated for their immune escape potential, in animals pre-immunized with human intravenous immunoglobulin. Our results demonstrate that a modified AAVrh.10 S671A vector had enhanced cellular entry (3.6 fold), migrate rapidly to the perinuclear region (1 vs. >2 h for wild type vectors) in vitro, which further translates to modest increase in hepatic gene transfer efficiency in vivo. More importantly, the mutant AAVrh.10 vector was able to partially evade neutralizing antibodies (~27–64 fold) in pre-immunized animals. The development of an AAV vector system that can escape the circulating neutralizing antibodies in the host will substantially widen the scope of gene therapy applications in humans. PMID:28769791
NASA Astrophysics Data System (ADS)
Srivastava, D. P.; Sahni, V.; Satsangi, P. S.
2014-08-01
Graph-theoretic quantum system modelling (GTQSM) is facilitated by considering the fundamental unit of quantum computation and information, viz. a quantum bit or qubit as a basic building block. Unit directional vectors "ket 0" and "ket 1" constitute two distinct fundamental quantum across variable orthonormal basis vectors, for the Hilbert space, specifying the direction of propagation of information, or computation data, while complementary fundamental quantum through, or flow rate, variables specify probability parameters, or amplitudes, as surrogates for scalar quantum information measure (von Neumann entropy). This paper applies GTQSM in continuum of protein heterodimer tubulin molecules of self-assembling polymers, viz. microtubules in the brain as a holistic system of interacting components representing hierarchical clustered quantum Hopfield network, hQHN, of networks. The quantum input/output ports of the constituent elemental interaction components, or processes, of tunnelling interactions and Coulombic bidirectional interactions are in cascade and parallel interconnections with each other, while the classical output ports of all elemental components are interconnected in parallel to accumulate micro-energy functions generated in the system as Hamiltonian, or Lyapunov, energy function. The paper presents an insight, otherwise difficult to gain, for the complex system of systems represented by clustered quantum Hopfield network, hQHN, through the application of GTQSM construct.
Mechanical forces in plant growth and development
NASA Technical Reports Server (NTRS)
Fisher, D. D.; Cyr, R. J.
2000-01-01
Plant cells perceive forces that arise from the environment and from the biophysics of plant growth. These forces provide meaningful cues that can affect the development of the plant. Seedlings of Arabidopsis thaliana were used to examine the cytoplasmic tensile character of cells that have been implicated in the gravitropic response. Laser-trapping technology revealed that the starch-containing statoliths of the central columella cells in root caps are held loosely within the cytoplasm. In contrast, the peripheral cells have starch granules that are relatively resistant to movement. The role of the actin cytoskeleton in affecting the tensile character of these cells is discussed. To explore the role that biophysical forces might play in generating developmental cues, we have developed an experimental model system in which protoplasts, embedded in a synthetic agarose matrix, are subjected to stretching or compression. We have found that protoplasts subjected to these forces from five minutes to two hours will subsequently elongate either at right angles or parallel to the tensive or compressive force vector. Moreover, the cortical microtubules are found to be organized either at right angles or parallel to the tensive or compressive force vector. We discuss these results in terms of an interplay of information between the extracellular matrix and the underlying cytoskeleton.
Accelerating next generation sequencing data analysis with system level optimizations.
Kathiresan, Nagarajan; Temanni, Ramzi; Almabrazi, Hakeem; Syed, Najeeb; Jithesh, Puthen V; Al-Ali, Rashid
2017-08-22
Next generation sequencing (NGS) data analysis is highly compute intensive. In-memory computing, vectorization, bulk data transfer, CPU frequency scaling are some of the hardware features in the modern computing architectures. To get the best execution time and utilize these hardware features, it is necessary to tune the system level parameters before running the application. We studied the GATK-HaplotypeCaller which is part of common NGS workflows, that consume more than 43% of the total execution time. Multiple GATK 3.x versions were benchmarked and the execution time of HaplotypeCaller was optimized by various system level parameters which included: (i) tuning the parallel garbage collection and kernel shared memory to simulate in-memory computing, (ii) architecture-specific tuning in the PairHMM library for vectorization, (iii) including Java 1.8 features through GATK source code compilation and building a runtime environment for parallel sorting and bulk data transfer (iv) the default 'on-demand' mode of CPU frequency is over-clocked by using 'performance-mode' to accelerate the Java multi-threads. As a result, the HaplotypeCaller execution time was reduced by 82.66% in GATK 3.3 and 42.61% in GATK 3.7. Overall, the execution time of NGS pipeline was reduced to 70.60% and 34.14% for GATK 3.3 and GATK 3.7 respectively.
NASA Astrophysics Data System (ADS)
Haddock, C.; Crawford, B.; Fox, W.; Francis, I.; Holley, A.; Magers, S.; Sarsour, M.; Snow, W. M.; Vanderwerp, J.
2018-03-01
We discuss the design and construction of a novel target array of nonmagnetic test masses used in a neutron polarimetry measurement made in search for new possible exotic spin dependent neutron-atominteractions of Nature at sub-mm length scales. This target was designed to accept and efficiently transmit a transversely polarized slow neutron beam through a series of long open parallel slots bounded by flat rectangular plates. These openings possessed equal atom density gradients normal to the slots from the flat test masses with dimensions optimized to achieve maximum sensitivity to an exotic spin-dependent interaction from vector boson exchanges with ranges in the mm - μm regime. The parallel slots were oriented differently in four quadrants that can be rotated about the neutron beam axis in discrete 90°increments using a Geneva drive. The spin rotation signals from the 4 quadrants were measured using a segmented neutron ion chamber to suppress possible systematic errors from stray magnetic fields in the target region. We discuss the per-neutron sensitivity of the target to the exotic interaction, the design constraints, the potential sources of systematic errors which could be present in this design, and our estimate of the achievable sensitivity using this method.
NASA Astrophysics Data System (ADS)
Hayakawa, Hitoshi; Ogawa, Makoto; Shibata, Tadashi
2005-04-01
A very large scale integrated circuit (VLSI) architecture for a multiple-instruction-stream multiple-data-stream (MIMD) associative processor has been proposed. The processor employs an architecture that enables seamless switching from associative operations to arithmetic operations. The MIMD element is convertible to a regular central processing unit (CPU) while maintaining its high performance as an associative processor. Therefore, the MIMD associative processor can perform not only on-chip perception, i.e., searching for the vector most similar to an input vector throughout the on-chip cache memory, but also arithmetic and logic operations similar to those in ordinary CPUs, both simultaneously in parallel processing. Three key technologies have been developed to generate the MIMD element: associative-operation-and-arithmetic-operation switchable calculation units, a versatile register control scheme within the MIMD element for flexible operations, and a short instruction set for minimizing the memory size for program storage. Key circuit blocks were designed and fabricated using 0.18 μm complementary metal-oxide-semiconductor (CMOS) technology. As a result, the full-featured MIMD element is estimated to be 3 mm2, showing the feasibility of an 8-parallel-MIMD-element associative processor in a single chip of 5 mm× 5 mm.
DOE Office of Scientific and Technical Information (OSTI.GOV)
Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter
In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, reliesmore » on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.« less
Rouet, François-Henry; Li, Xiaoye S.; Ghysels, Pieter; ...
2016-06-30
In this paper, we present a distributed-memory library for computations with dense structured matrices. A matrix is considered structured if its off-diagonal blocks can be approximated by a rank-deficient matrix with low numerical rank. Here, we use Hierarchically Semi-Separable (HSS) representations. Such matrices appear in many applications, for example, finite-element methods, boundary element methods, and so on. Exploiting this structure allows for fast solution of linear systems and/or fast computation of matrix-vector products, which are the two main building blocks of matrix computations. The compression algorithm that we use, that computes the HSS form of an input dense matrix, reliesmore » on randomized sampling with a novel adaptive sampling mechanism. We discuss the parallelization of this algorithm and also present the parallelization of structured matrix-vector product, structured factorization, and solution routines. The efficiency of the approach is demonstrated on large problems from different academic and industrial applications, on up to 8,000 cores. Finally, this work is part of a more global effort, the STRUctured Matrices PACKage (STRUMPACK) software package for computations with sparse and dense structured matrices. Hence, although useful on their own right, the routines also represent a step in the direction of a distributed-memory sparse solver.« less
Benson, A. J.; Guedry, F. E.; Jones, G. Melvill
1970-01-01
1. Recent experiments have shown that rotation of a linear acceleration vector round the head can generate involuntary ocular nystagmus in the absence of angular acceleration. The present experiments examine the suggestion that adequate stimulation of the semicircular canals may contribute to this response. 2. Decerebrate cats were located in a stereotaxic device on a platform, slung from four parallel cables, which could be driven smoothly round a circular orbit without inducing significant angular movement of the platform. This Parallel Swing Rotation (PSR) generated a centripetal acceleration of 4·4 m/sec2 which rotated round the head at 0·52 rev/sec. 3. The discharge frequency of specifically lateral canal-dependent neural units in the vestibular nuclei of cats was recorded during PSR to right and left, and in the absence of motion. The dynamic responses to purely angular motion were also examined on a servo-driven turntable. 4. Without exception all proven canal-dependent cells examined (twenty-nine cells in nine cats) were more active during PSR in the direction of endolymph circulation assessed to be excitatory to the unit, than during PSR in the opposite direction. 5. The observed changes in discharge frequency are assessed to have been of a magnitude appropriate for the generation of the involuntary oculomotor response induced by the same stimulus in the intact animal. 6. The findings suggest that a linear acceleration vector which rotates in the plane of the lateral semicircular canals can be an adequate stimulus to ampullary receptors, though an explanation which invokes the modulation of canal cells by a signal dependent upon the sequential activation of macular receptors cannot be positively excluded. PMID:5501270
NASA Astrophysics Data System (ADS)
Loring, B.; Karimabadi, H.; Rortershteyn, V.
2015-10-01
The surface line integral convolution(LIC) visualization technique produces dense visualization of vector fields on arbitrary surfaces. We present a screen space surface LIC algorithm for use in distributed memory data parallel sort last rendering infrastructures. The motivations for our work are to support analysis of datasets that are too large to fit in the main memory of a single computer and compatibility with prevalent parallel scientific visualization tools such as ParaView and VisIt. By working in screen space using OpenGL we can leverage the computational power of GPUs when they are available and run without them when they are not. We address efficiency and performance issues that arise from the transformation of data from physical to screen space by selecting an alternate screen space domain decomposition. We analyze the algorithm's scaling behavior with and without GPUs on two high performance computing systems using data from turbulent plasma simulations.
Traditional Tracking with Kalman Filter on Parallel Architectures
NASA Astrophysics Data System (ADS)
Cerati, Giuseppe; Elmer, Peter; Lantz, Steven; MacNeill, Ian; McDermott, Kevin; Riley, Dan; Tadel, Matevž; Wittich, Peter; Würthwein, Frank; Yagil, Avi
2015-05-01
Power density constraints are limiting the performance improvements of modern CPUs. To address this, we have seen the introduction of lower-power, multi-core processors, but the future will be even more exciting. In order to stay within the power density limits but still obtain Moore's Law performance/price gains, it will be necessary to parallelize algorithms to exploit larger numbers of lightweight cores and specialized functions like large vector units. Example technologies today include Intel's Xeon Phi and GPGPUs. Track finding and fitting is one of the most computationally challenging problems for event reconstruction in particle physics. At the High Luminosity LHC, for example, this will be by far the dominant problem. The most common track finding techniques in use today are however those based on the Kalman Filter. Significant experience has been accumulated with these techniques on real tracking detector systems, both in the trigger and offline. We report the results of our investigations into the potential and limitations of these algorithms on the new parallel hardware.
O'keefe, Matthew; Parr, Terence; Edgar, B. Kevin; ...
1995-01-01
Massively parallel processors (MPPs) hold the promise of extremely high performance that, if realized, could be used to study problems of unprecedented size and complexity. One of the primary stumbling blocks to this promise has been the lack of tools to translate application codes to MPP form. In this article we show how applications codes written in a subset of Fortran 77, called Fortran-P, can be translated to achieve good performance on several massively parallel machines. This subset can express codes that are self-similar, where the algorithm applied to the global data domain is also applied to each subdomain. Wemore » have found many codes that match the Fortran-P programming style and have converted them using our tools. We believe a self-similar coding style will accomplish what a vectorizable style has accomplished for vector machines by allowing the construction of robust, user-friendly, automatic translation systems that increase programmer productivity and generate fast, efficient code for MPPs.« less
Rapid, parallel path planning by propagating wavefronts of spiking neural activity
Ponulak, Filip; Hopfield, John J.
2013-01-01
Efficient path planning and navigation is critical for animals, robotics, logistics and transportation. We study a model in which spatial navigation problems can rapidly be solved in the brain by parallel mental exploration of alternative routes using propagating waves of neural activity. A wave of spiking activity propagates through a hippocampus-like network, altering the synaptic connectivity. The resulting vector field of synaptic change then guides a simulated animal to the appropriate selected target locations. We demonstrate that the navigation problem can be solved using realistic, local synaptic plasticity rules during a single passage of a wavefront. Our model can find optimal solutions for competing possible targets or learn and navigate in multiple environments. The model provides a hypothesis on the possible computational mechanisms for optimal path planning in the brain, at the same time it is useful for neuromorphic implementations, where the parallelism of information processing proposed here can fully be harnessed in hardware. PMID:23882213
DOE Office of Scientific and Technical Information (OSTI.GOV)
Loring, Burlen; Karimabadi, Homa; Rortershteyn, Vadim
2014-07-01
The surface line integral convolution(LIC) visualization technique produces dense visualization of vector fields on arbitrary surfaces. We present a screen space surface LIC algorithm for use in distributed memory data parallel sort last rendering infrastructures. The motivations for our work are to support analysis of datasets that are too large to fit in the main memory of a single computer and compatibility with prevalent parallel scientific visualization tools such as ParaView and VisIt. By working in screen space using OpenGL we can leverage the computational power of GPUs when they are available and run without them when they are not.more » We address efficiency and performance issues that arise from the transformation of data from physical to screen space by selecting an alternate screen space domain decomposition. We analyze the algorithm's scaling behavior with and without GPUs on two high performance computing systems using data from turbulent plasma simulations.« less
Parallel and Scalable Clustering and Classification for Big Data in Geosciences
NASA Astrophysics Data System (ADS)
Riedel, M.
2015-12-01
Machine learning, data mining, and statistical computing are common techniques to perform analysis in earth sciences. This contribution will focus on two concrete and widely used data analytics methods suitable to analyse 'big data' in the context of geoscience use cases: clustering and classification. From the broad class of available clustering methods we focus on the density-based spatial clustering of appliactions with noise (DBSCAN) algorithm that enables the identification of outliers or interesting anomalies. A new open source parallel and scalable DBSCAN implementation will be discussed in the light of a scientific use case that detects water mixing events in the Koljoefjords. The second technique we cover is classification, with a focus set on the support vector machines algorithm (SVMs), as one of the best out-of-the-box classification algorithm. A parallel and scalable SVM implementation will be discussed in the light of a scientific use case in the field of remote sensing with 52 different classes of land cover types.
DOE Office of Scientific and Technical Information (OSTI.GOV)
You, Yang; Song, Shuaiwen; Fu, Haohuan
2014-08-16
Support Vector Machine (SVM) has been widely used in data-mining and Big Data applications as modern commercial databases start to attach an increasing importance to the analytic capabilities. In recent years, SVM was adapted to the field of High Performance Computing for power/performance prediction, auto-tuning, and runtime scheduling. However, even at the risk of losing prediction accuracy due to insufficient runtime information, researchers can only afford to apply offline model training to avoid significant runtime training overhead. To address the challenges above, we designed and implemented MICSVM, a highly efficient parallel SVM for x86 based multi-core and many core architectures,more » such as the Intel Ivy Bridge CPUs and Intel Xeon Phi coprocessor (MIC).« less
Exploiting MIC architectures for the simulation of channeling of charged particles in crystals
NASA Astrophysics Data System (ADS)
Bagli, Enrico; Karpusenko, Vadim
2016-08-01
Coherent effects of ultra-relativistic particles in crystals is an area of science under development. DYNECHARM + + is a toolkit for the simulation of coherent interactions between high-energy charged particles and complex crystal structures. The particle trajectory in a crystal is computed through numerical integration of the equation of motion. The code was revised and improved in order to exploit parallelization on multi-cores and vectorization of single instructions on multiple data. An Intel Xeon Phi card was adopted for the performance measurements. The computation time was proved to scale linearly as a function of the number of physical and virtual cores. By enabling the auto-vectorization flag of the compiler a three time speedup was obtained. The performances of the card were compared to the Dual Xeon ones.
NASA Technical Reports Server (NTRS)
Laframboise, J. G.
1985-01-01
In low Earth orbit, the geomagnetic field B(vector) is strong enough that secondary electrons emitted from spacecraft surfaces have an average gyroradius much smaller than typical dimensions of large spacecraft. This implies that escape of secondaries will be strongly inhibited on surfaces which are nearly parallel to B(vector), even if a repelling electric field exists outside them. This effect is likely to make an important contribution to the current balance and hence the equilibrium potential of such surfaces, making high voltage charging of them more likely. Numerically calculated escaping secondary electron fluxes are presented for these conditions. For use in numerical spacecraft charging simulations, an analytic curve fit to these results is given which is accurate to within 3% of the emitted current.
Investigations on the hierarchy of reference frames in geodesy and geodynamics
NASA Technical Reports Server (NTRS)
Grafarend, E. W.; Mueller, I. I.; Papo, H. B.; Richter, B.
1979-01-01
Problems related to reference directions were investigated. Space and time variant angular parameters are illustrated in hierarchic structures or towers. Using least squares techniques, model towers of triads are presented which allow the formation of linear observation equations. Translational and rotational degrees of freedom (origin and orientation) are discussed along with and the notion of length and scale degrees of freedom. According to the notion of scale parallelism, scale factors with respect to a unit length are given. Three-dimensional geodesy was constructed from the set of three base vectors (gravity, earth-rotation and the ecliptic normal vector). Space and time variations are given with respect to a polar and singular value decomposition or in terms of changes in translation, rotation, deformation (shear, dilatation or angular and scale distortions).
FFTs in external or hierarchical memory
NASA Technical Reports Server (NTRS)
Bailey, David H.
1989-01-01
A description is given of advanced techniques for computing an ordered FFT on a computer with external or hierarchical memory. These algorithms (1) require as few as two passes through the external data set, (2) use strictly unit stride, long vector transfers between main memory and external storage, (3) require only a modest amount of scratch space in main memory, and (4) are well suited for vector and parallel computation. Performance figures are included for implementations of some of these algorithms on Cray supercomputers. Of interest is the fact that a main memory version outperforms the current Cray library FFT routines on the Cray-2, the Cray X-MP, and the Cray Y-MP systems. Using all eight processors on the Cray Y-MP, this main memory routine runs at nearly 2 Gflops.
Opportunities and choice in a new vector era
NASA Astrophysics Data System (ADS)
Nowak, A.
2014-06-01
This work discusses the significant changes in computing landscape related to the progression of Moore's Law, and the implications on scientific computing. Particular attention is devoted to the High Energy Physics domain (HEP), which has always made good use of threading, but levels of parallelism closer to the hardware were often left underutilized. Findings of the CERN openlab Platform Competence Center are reported in the context of expanding "performance dimensions", and especially the resurgence of vectors. These suggest that data oriented designs are feasible in HEP and have considerable potential for performance improvements on multiple levels, but will rarely trump algorithmic enhancements. Finally, an analysis of upcoming hardware and software technologies identifies heterogeneity as a major challenge for software, which will require more emphasis on scalable, efficient design.
Lidar detection of underwater objects using a neuro-SVM-based architecture.
Mitra, Vikramjit; Wang, Chia-Jiu; Banerjee, Satarupa
2006-05-01
This paper presents a neural network architecture using a support vector machine (SVM) as an inference engine (IE) for classification of light detection and ranging (Lidar) data. Lidar data gives a sequence of laser backscatter intensities obtained from laser shots generated from an airborne object at various altitudes above the earth surface. Lidar data is pre-filtered to remove high frequency noise. As the Lidar shots are taken from above the earth surface, it has some air backscatter information, which is of no importance for detecting underwater objects. Because of these, the air backscatter information is eliminated from the data and a segment of this data is subsequently selected to extract features for classification. This is then encoded using linear predictive coding (LPC) and polynomial approximation. The coefficients thus generated are used as inputs to the two branches of a parallel neural architecture. The decisions obtained from the two branches are vector multiplied and the result is fed to an SVM-based IE that presents the final inference. Two parallel neural architectures using multilayer perception (MLP) and hybrid radial basis function (HRBF) are considered in this paper. The proposed structure fits the Lidar data classification task well due to the inherent classification efficiency of neural networks and accurate decision-making capability of SVM. A Bayesian classifier and a quadratic classifier were considered for the Lidar data classification task but they failed to offer high prediction accuracy. Furthermore, a single-layered artificial neural network (ANN) classifier was also considered and it failed to offer good accuracy. The parallel ANN architecture proposed in this paper offers high prediction accuracy (98.9%) and is found to be the most suitable architecture for the proposed task of Lidar data classification.
Shared Memory Parallelism for 3D Cartesian Discrete Ordinates Solver
NASA Astrophysics Data System (ADS)
Moustafa, Salli; Dutka-Malen, Ivan; Plagne, Laurent; Ponçot, Angélique; Ramet, Pierre
2014-06-01
This paper describes the design and the performance of DOMINO, a 3D Cartesian SN solver that implements two nested levels of parallelism (multicore+SIMD) on shared memory computation nodes. DOMINO is written in C++, a multi-paradigm programming language that enables the use of powerful and generic parallel programming tools such as Intel TBB and Eigen. These two libraries allow us to combine multi-thread parallelism with vector operations in an efficient and yet portable way. As a result, DOMINO can exploit the full power of modern multi-core processors and is able to tackle very large simulations, that usually require large HPC clusters, using a single computing node. For example, DOMINO solves a 3D full core PWR eigenvalue problem involving 26 energy groups, 288 angular directions (S16), 46 × 106 spatial cells and 1 × 1012 DoFs within 11 hours on a single 32-core SMP node. This represents a sustained performance of 235 GFlops and 40:74% of the SMP node peak performance for the DOMINO sweep implementation. The very high Flops/Watt ratio of DOMINO makes it a very interesting building block for a future many-nodes nuclear simulation tool.
The influence of the self-consistent mode structure on the Coriolis pinch effect
DOE Office of Scientific and Technical Information (OSTI.GOV)
Peeters, A. G.; Camenen, Y.; Casson, F. J.
This paper discusses the effect of the mode structure on the Coriolis pinch effect [A. G. Peeters, C. Angioni, and D. Strintzi, Phys. Rev. Lett. 98, 265003 (2007)]. It is shown that the Coriolis drift effect can be compensated for by a finite parallel wave vector, resulting in a reduced momentum pinch velocity. Gyrokinetic simulations in full toroidal geometry reveal that parallel dynamics effectively removes the Coriolis pinch for the case of adiabatic electrons, while the compensation due to the parallel dynamics is incomplete for the case of kinetic electrons, resulting in a finite pinch velocity. The finite flux inmore » the case of kinetic electrons is interpreted to be related to the electron trapping, which prevents a strong asymmetry in the electrostatic potential with respect to the low field side position. The physics picture developed here leads to the discovery and explanation of two unexpected effects: First the pinch velocity scales with the trapped particle fraction (root of the inverse aspect ratio), and second there is no strong collisionality dependence. The latter is related to the role of the trapped electrons, which retain some symmetry in the eigenmode, but play no role in the perturbed parallel velocity.« less
NASA Technical Reports Server (NTRS)
Keppenne, Christian L.; Rienecker, Michele M.; Koblinsky, Chester (Technical Monitor)
2001-01-01
A multivariate ensemble Kalman filter (MvEnKF) implemented on a massively parallel computer architecture has been implemented for the Poseidon ocean circulation model and tested with a Pacific Basin model configuration. There are about two million prognostic state-vector variables. Parallelism for the data assimilation step is achieved by regionalization of the background-error covariances that are calculated from the phase-space distribution of the ensemble. Each processing element (PE) collects elements of a matrix measurement functional from nearby PEs. To avoid the introduction of spurious long-range covariances associated with finite ensemble sizes, the background-error covariances are given compact support by means of a Hadamard (element by element) product with a three-dimensional canonical correlation function. The methodology and the MvEnKF configuration are discussed. It is shown that the regionalization of the background covariances; has a negligible impact on the quality of the analyses. The parallel algorithm is very efficient for large numbers of observations but does not scale well beyond 100 PEs at the current model resolution. On a platform with distributed memory, memory rather than speed is the limiting factor.
An economic evaluation of vector control in the age of a dengue vaccine.
Fitzpatrick, Christopher; Haines, Alexander; Bangert, Mathieu; Farlow, Andrew; Hemingway, Janet; Velayudhan, Raman
2017-08-01
Dengue is a rapidly emerging vector-borne Neglected Tropical Disease, with a 30-fold increase in the number of cases reported since 1960. The economic cost of the illness is measured in the billions of dollars annually. Environmental change and unplanned urbanization are conspiring to raise the health and economic cost even further beyond the reach of health systems and households. The health-sector response has depended in large part on control of the Aedes aegypti and Ae. albopictus (mosquito) vectors. The cost-effectiveness of the first-ever dengue vaccine remains to be evaluated in the field. In this paper, we examine how it might affect the cost-effectiveness of sustained vector control. We employ a dynamic Markov model of the effects of vector control on dengue in both vectors and humans over a 15-year period, in six countries: Brazil, Columbia, Malaysia, Mexico, the Philippines, and Thailand. We evaluate the cost (direct medical costs and control programme costs) and cost-effectiveness of sustained vector control, outbreak response and/or medical case management, in the presence of a (hypothetical) highly targeted and low cost immunization strategy using a (non-hypothetical) medium-efficacy vaccine. Sustained vector control using existing technologies would cost little more than outbreak response, given the associated costs of medical case management. If sustained use of existing or upcoming technologies (of similar price) reduce vector populations by 70-90%, the cost per disability-adjusted life year averted is 2013 US$ 679-1331 (best estimates) relative to no intervention. Sustained vector control could be highly cost-effective even with less effective technologies (50-70% reduction in vector populations) and in the presence of a highly targeted and low cost immunization strategy using a medium-efficacy vaccine. Economic evaluation of the first-ever dengue vaccine is ongoing. However, even under very optimistic assumptions about a highly targeted and low cost immunization strategy, our results suggest that sustained vector control will continue to play an important role in mitigating the impact of environmental change and urbanization on human health. If additional benefits for the control of other Aedes borne diseases, such as Chikungunya, yellow fever and Zika fever are taken into account, the investment case is even stronger. High-burden endemic countries should proceed to map populations to be covered by sustained vector control.
An economic evaluation of vector control in the age of a dengue vaccine
Haines, Alexander; Bangert, Mathieu; Farlow, Andrew; Hemingway, Janet; Velayudhan, Raman
2017-01-01
Introduction Dengue is a rapidly emerging vector-borne Neglected Tropical Disease, with a 30-fold increase in the number of cases reported since 1960. The economic cost of the illness is measured in the billions of dollars annually. Environmental change and unplanned urbanization are conspiring to raise the health and economic cost even further beyond the reach of health systems and households. The health-sector response has depended in large part on control of the Aedes aegypti and Ae. albopictus (mosquito) vectors. The cost-effectiveness of the first-ever dengue vaccine remains to be evaluated in the field. In this paper, we examine how it might affect the cost-effectiveness of sustained vector control. Methods We employ a dynamic Markov model of the effects of vector control on dengue in both vectors and humans over a 15-year period, in six countries: Brazil, Columbia, Malaysia, Mexico, the Philippines, and Thailand. We evaluate the cost (direct medical costs and control programme costs) and cost-effectiveness of sustained vector control, outbreak response and/or medical case management, in the presence of a (hypothetical) highly targeted and low cost immunization strategy using a (non-hypothetical) medium-efficacy vaccine. Results Sustained vector control using existing technologies would cost little more than outbreak response, given the associated costs of medical case management. If sustained use of existing or upcoming technologies (of similar price) reduce vector populations by 70–90%, the cost per disability-adjusted life year averted is 2013 US$ 679–1331 (best estimates) relative to no intervention. Sustained vector control could be highly cost-effective even with less effective technologies (50–70% reduction in vector populations) and in the presence of a highly targeted and low cost immunization strategy using a medium-efficacy vaccine. Discussion Economic evaluation of the first-ever dengue vaccine is ongoing. However, even under very optimistic assumptions about a highly targeted and low cost immunization strategy, our results suggest that sustained vector control will continue to play an important role in mitigating the impact of environmental change and urbanization on human health. If additional benefits for the control of other Aedes borne diseases, such as Chikungunya, yellow fever and Zika fever are taken into account, the investment case is even stronger. High-burden endemic countries should proceed to map populations to be covered by sustained vector control. PMID:28806786
Test of understanding of vectors: A reliable multiple-choice vector concept test
NASA Astrophysics Data System (ADS)
Barniol, Pablo; Zavala, Genaro
2014-06-01
In this article we discuss the findings of our research on students' understanding of vector concepts in problems without physical context. First, we develop a complete taxonomy of the most frequent errors made by university students when learning vector concepts. This study is based on the results of several test administrations of open-ended problems in which a total of 2067 students participated. Using this taxonomy, we then designed a 20-item multiple-choice test [Test of understanding of vectors (TUV)] and administered it in English to 423 students who were completing the required sequence of introductory physics courses at a large private Mexican university. We evaluated the test's content validity, reliability, and discriminatory power. The results indicate that the TUV is a reliable assessment tool. We also conducted a detailed analysis of the students' understanding of the vector concepts evaluated in the test. The TUV is included in the Supplemental Material as a resource for other researchers studying vector learning, as well as instructors teaching the material.
Anticrossproducts and cross divisions.
de Leva, Paolo
2008-01-01
This paper defines, in the context of conventional vector algebra, the concept of anticrossproduct and a family of simple operations called cross or vector divisions. It is impossible to solve for a or b the equation axb=c, where a and b are three-dimensional space vectors, and axb is their cross product. However, the problem becomes solvable if some "knowledge about the unknown" (a or b) is available, consisting of one of its components, or the angle it forms with the other operand of the cross product. Independently of the selected reference frame orientation, the known component of a may be parallel to b, or vice versa. The cross divisions provide a compact and insightful symbolic representation of a family of algorithms specifically designed to solve problems of such kind. A generalized algorithm was also defined, incorporating the rules for selecting the appropriate kind of cross division, based on the type of input data. Four examples of practical application were provided, including the computation of the point of application of a force and the angular velocity of a rigid body. The definition and geometrical interpretation of the cross divisions stemmed from the concept of anticrossproduct. The "anticrossproducts of axb" were defined as the infinitely many vectors x(i) such that x(i)xb=axb.
Prediction of Broadband Shock-Associated Noise Including Propagation Effects Originating NASA
NASA Technical Reports Server (NTRS)
Miller, Steven; Morris, Philip J.
2012-01-01
An acoustic analogy is developed based on the Euler equations for broadband shock-associated noise (BBSAN) that directly incorporates the vector Green s function of the linearized Euler equations and a steady Reynolds-Averaged Navier-Stokes solution (SRANS) to describe the mean flow. The vector Green s function allows the BBSAN propagation through the jet shear layer to be determined. The large-scale coherent turbulence is modeled by two-point second order velocity cross-correlations. Turbulent length and time scales are related to the turbulent kinetic energy and dissipation rate. An adjoint vector Green s function solver is implemented to determine the vector Green s function based on a locally parallel mean flow at different streamwise locations. The newly developed acoustic analogy can be simplified to one that uses the Green s function associated with the Helmholtz equation, which is consistent with a previous formulation by the authors. A large number of predictions are generated using three different nozzles over a wide range of fully-expanded jet Mach numbers and jet stagnation temperatures. These predictions are compared with experimental data from multiple jet noise experimental facilities. In addition, two models for the so-called fine-scale mixing noise are included in the comparisons. Improved BBSAN predictions are obtained relative to other models that do not include propagation effects.
NASA Astrophysics Data System (ADS)
Finsterbusch, Jürgen
2010-12-01
Double- or two-wave-vector diffusion-weighting experiments with short mixing times in which two diffusion-weighting periods are applied in direct succession, are a promising tool to estimate cell sizes in the living tissue. However, the underlying effect, a signal difference between parallel and antiparallel wave vector orientations, is considerably reduced for the long gradient pulses required on whole-body MR systems. Recently, it has been shown that multiple concatenations of the two wave vectors in a single acquisition can double the modulation amplitude if short gradient pulses are used. In this study, numerical simulations of such experiments were performed with parameters achievable with whole-body MR systems. It is shown that the theoretical model yields a good approximation of the signal behavior if an additional term describing free diffusion is included. More importantly, it is demonstrated that the shorter gradient pulses sufficient to achieve the desired diffusion weighting for multiple concatenations, increase the signal modulation considerably, e.g. by a factor of about five for five concatenations. Even at identical echo times, achieved by a shortened diffusion time, a moderate number of concatenations significantly improves the signal modulation. Thus, experiments on whole-body MR systems may benefit from multiple concatenations.
Sub-Pixel Extraction of Laser Stripe Center Using an Improved Gray-Gravity Method †
Li, Yuehua; Zhou, Jingbo; Huang, Fengshan; Liu, Lijian
2017-01-01
Laser stripe center extraction is a key step for the profile measurement of line structured light sensors (LSLS). To accurately obtain the center coordinates at sub-pixel level, an improved gray-gravity method (IGGM) was proposed. Firstly, the center points of the stripe were computed using the gray-gravity method (GGM) for all columns of the image. By fitting these points using the moving least squares algorithm, the tangential vector, the normal vector and the radius of curvature can be robustly obtained. One rectangular region could be defined around each of the center points. Its two sides that are parallel to the tangential vector could alter their lengths according to the radius of the curvature. After that, the coordinate for each center point was recalculated within the rectangular region and in the direction of the normal vector. The center uncertainty was also analyzed based on the Monte Carlo method. The obtained experimental results indicate that the IGGM is suitable for both the smooth stripes and the ones with sharp corners. The high accuracy center points can be obtained at a relatively low computation cost. The measured results of the stairs and the screw surface further demonstrate the effectiveness of the method. PMID:28394288
Electromagnetically induced transparency in the case of elliptic polarization of interacting fields
NASA Astrophysics Data System (ADS)
Parshkov, Oleg M.
2018-04-01
The theoretical investigation results of disintegration effect of elliptic polarized shot probe pulses of electromagnetically induced transparency in the counterintuitive superposed elliptic polarized control field and in weak probe field approximation are presented. It is shown that this disintegration occurs because the probe field in the medium is the sum of two normal modes, which correspond to elliptic polarized pulses with different speeds of propagation. The polarization ellipses of normal modes have equal eccentricities and mutually perpendicular major axes. Major axis of polarization ellipse of one normal mode is parallel to polarization ellipse major axis of control field, and electric vector of this mode rotates in the opposite direction, than electric vector of the control field. The electric vector other normal mode rotates in the same direction that the control field electric vector. The normal mode speed of the first type aforementioned is less than that of the second type. The polarization characteristics of the normal mode depend uniquely on the polarization characteristics of elliptic polarized control field and remain changeless in the propagation process. The theoretical investigation is performed for Λ-scheme of degenerated quantum transitions between 3P0, 3P10 and 3P2 energy levels of 208Pb isotope.
A Code Generation Approach for Auto-Vectorization in the Spade Compiler
NASA Astrophysics Data System (ADS)
Wang, Huayong; Andrade, Henrique; Gedik, Buğra; Wu, Kun-Lung
We describe an auto-vectorization approach for the Spade stream processing programming language, comprising two ideas. First, we provide support for vectors as a primitive data type. Second, we provide a C++ library with architecture-specific implementations of a large number of pre-vectorized operations as the means to support language extensions. We evaluate our approach with several stream processing operators, contrasting Spade's auto-vectorization with the native auto-vectorization provided by the GNU gcc and Intel icc compilers.
Topical Meeting on Optical Bistability Held at Rochester, New York on 15-17 June 1983.
1983-01-01
distortion of their initial directions of polarization : both of the beams are linearly polarized , with their electric vectors either (i)parallel to...New Zealand. ChSAM aIB ct Multistability, self-oscillation, and chaos in a model for polarization I Chas mnd Optlcal Bltabillty: Blfuraton...second circularly polarized pumping beam has been observed, transition sequence arises that is consistent with recent observ- Sense of response
Hotez, Peter J
2017-09-25
New findings of widespread neglected diseases among the poor living in wealthy group of 20 (G20) economies and the concept of "blue marble health" offer innovative mechanisms for financing urgently new vaccines, especially for vector-borne neglected tropical diseases (NTDs). This approach could complement or parallel a recently suggested global vaccine development fund for pandemic threats. Copyright © 2017 Elsevier Ltd. All rights reserved.
Fusion of Asynchronous, Parallel, Unreliable Data Streams
2010-09-01
channels that might be used. The two channels chosen for this study, galvanic skin response (GSR) and pulse rate, are convenient and reasonably well...vector as NA. The MDS software tool, PERMAP, uses this same abbreviation. The impact of the lack of information may vary depending on the situation...of how PERMAP (and MDS in general) functions when the input parameters are varied. That is outlined in this section; the impact of those choices is
A Higher-Order Trapezoidal Vector Vortex Panel for Subsonic Flow.
1980-12-01
Presented to the Faculty of the School of Engineering of the Air Force Institute of Technology Air University In Partial Fulfillment of the...Requirements for the Degree of Master of Science by Ronald E. Luther, B.S. Capt USAF Graduate Aeronautical Engineering December 1980 Approved for public... methd also permits analysis of cranked leading and/or trailiig edges. The root edge, tip edge and all chordwise boundaries are parallel to the x-axis
Using algebra for massively parallel processor design and utilization
NASA Technical Reports Server (NTRS)
Campbell, Lowell; Fellows, Michael R.
1990-01-01
This paper summarizes the author's advances in the design of dense processor networks. Within is reported a collection of recent constructions of dense symmetric networks that provide the largest know values for the number of nodes that can be placed in a network of a given degree and diameter. The constructions are in the range of current potential engineering significance and are based on groups of automorphisms of finite-dimensional vector spaces.
The ecological foundations of transmission potential and vector-borne disease in urban landscapes.
LaDeau, Shannon L; Allan, Brian F; Leisnham, Paul T; Levy, Michael Z
2015-07-01
Urban transmission of arthropod-vectored disease has increased in recent decades. Understanding and managing transmission potential in urban landscapes requires integration of sociological and ecological processes that regulate vector population dynamics, feeding behavior, and vector-pathogen interactions in these unique ecosystems. Vectorial capacity is a key metric for generating predictive understanding about transmission potential in systems with obligate vector transmission. This review evaluates how urban conditions, specifically habitat suitability and local temperature regimes, and the heterogeneity of urban landscapes can influence the biologically-relevant parameters that define vectorial capacity: vector density, survivorship, biting rate, extrinsic incubation period, and vector competence.Urban landscapes represent unique mosaics of habitat. Incidence of vector-borne disease in urban host populations is rarely, if ever, evenly distributed across an urban area. The persistence and quality of vector habitat can vary significantly across socio-economic boundaries to influence vector species composition and abundance, often generating socio-economically distinct gradients of transmission potential across neighborhoods.Urban regions often experience unique temperature regimes, broadly termed urban heat islands (UHI). Arthropod vectors are ectothermic organisms and their growth, survival, and behavior are highly sensitive to environmental temperatures. Vector response to UHI conditions is dependent on regional temperature profiles relative to the vector's thermal performance range. In temperate climates UHI can facilitate increased vector development rates while having countervailing influence on survival and feeding behavior. Understanding how urban heat island (UHI) conditions alter thermal and moisture constraints across the vector life cycle to influence transmission processes is an important direction for both empirical and modeling research.There remain persistent gaps in understanding of vital rates and drivers in mosquito-vectored disease systems, and vast holes in understanding for other arthropod vectored diseases. Empirical studies are needed to better understand the physiological constraints and socio-ecological processes that generate heterogeneity in critical transmission parameters, including vector survival and fitness. Likewise, laboratory experiments and transmission models must evaluate vector response to realistic field conditions, including variability in sociological and environmental conditions.
Development of iterative techniques for the solution of unsteady compressible viscous flows
NASA Technical Reports Server (NTRS)
Sankar, Lakshmi N.; Hixon, Duane
1991-01-01
Efficient iterative solution methods are being developed for the numerical solution of two- and three-dimensional compressible Navier-Stokes equations. Iterative time marching methods have several advantages over classical multi-step explicit time marching schemes, and non-iterative implicit time marching schemes. Iterative schemes have better stability characteristics than non-iterative explicit and implicit schemes. Thus, the extra work required by iterative schemes can also be designed to perform efficiently on current and future generation scalable, missively parallel machines. An obvious candidate for iteratively solving the system of coupled nonlinear algebraic equations arising in CFD applications is the Newton method. Newton's method was implemented in existing finite difference and finite volume methods. Depending on the complexity of the problem, the number of Newton iterations needed per step to solve the discretized system of equations can, however, vary dramatically from a few to several hundred. Another popular approach based on the classical conjugate gradient method, known as the GMRES (Generalized Minimum Residual) algorithm is investigated. The GMRES algorithm was used in the past by a number of researchers for solving steady viscous and inviscid flow problems with considerable success. Here, the suitability of this algorithm is investigated for solving the system of nonlinear equations that arise in unsteady Navier-Stokes solvers at each time step. Unlike the Newton method which attempts to drive the error in the solution at each and every node down to zero, the GMRES algorithm only seeks to minimize the L2 norm of the error. In the GMRES algorithm the changes in the flow properties from one time step to the next are assumed to be the sum of a set of orthogonal vectors. By choosing the number of vectors to a reasonably small value N (between 5 and 20) the work required for advancing the solution from one time step to the next may be kept to (N+1) times that of a noniterative scheme. Many of the operations required by the GMRES algorithm such as matrix-vector multiplies, matrix additions and subtractions can all be vectorized and parallelized efficiently.
Spinozzi, Giulio; Calabria, Andrea; Brasca, Stefano; Beretta, Stefano; Merelli, Ivan; Milanesi, Luciano; Montini, Eugenio
2017-11-25
Bioinformatics tools designed to identify lentiviral or retroviral vector insertion sites in the genome of host cells are used to address the safety and long-term efficacy of hematopoietic stem cell gene therapy applications and to study the clonal dynamics of hematopoietic reconstitution. The increasing number of gene therapy clinical trials combined with the increasing amount of Next Generation Sequencing data, aimed at identifying integration sites, require both highly accurate and efficient computational software able to correctly process "big data" in a reasonable computational time. Here we present VISPA2 (Vector Integration Site Parallel Analysis, version 2), the latest optimized computational pipeline for integration site identification and analysis with the following features: (1) the sequence analysis for the integration site processing is fully compliant with paired-end reads and includes a sequence quality filter before and after the alignment on the target genome; (2) an heuristic algorithm to reduce false positive integration sites at nucleotide level to reduce the impact of Polymerase Chain Reaction or trimming/alignment artifacts; (3) a classification and annotation module for integration sites; (4) a user friendly web interface as researcher front-end to perform integration site analyses without computational skills; (5) the time speedup of all steps through parallelization (Hadoop free). We tested VISPA2 performances using simulated and real datasets of lentiviral vector integration sites, previously obtained from patients enrolled in a hematopoietic stem cell gene therapy clinical trial and compared the results with other preexisting tools for integration site analysis. On the computational side, VISPA2 showed a > 6-fold speedup and improved precision and recall metrics (1 and 0.97 respectively) compared to previously developed computational pipelines. These performances indicate that VISPA2 is a fast, reliable and user-friendly tool for integration site analysis, which allows gene therapy integration data to be handled in a cost and time effective fashion. Moreover, the web access of VISPA2 ( http://openserver.itb.cnr.it/vispa/ ) ensures accessibility and ease of usage to researches of a complex analytical tool. We released the source code of VISPA2 in a public repository ( https://bitbucket.org/andreacalabria/vispa2 ).
The Prediction of Broadband Shock-Associated Noise Including Propagation Effects
NASA Technical Reports Server (NTRS)
Miller, Steven; Morris, Philip J.
2011-01-01
An acoustic analogy is developed based on the Euler equations for broadband shock- associated noise (BBSAN) that directly incorporates the vector Green's function of the linearized Euler equations and a steady Reynolds-Averaged Navier-Stokes solution (SRANS) as the mean flow. The vector Green's function allows the BBSAN propagation through the jet shear layer to be determined. The large-scale coherent turbulence is modeled by two-point second order velocity cross-correlations. Turbulent length and time scales are related to the turbulent kinetic energy and dissipation. An adjoint vector Green's function solver is implemented to determine the vector Green's function based on a locally parallel mean flow at streamwise locations of the SRANS solution. However, the developed acoustic analogy could easily be based on any adjoint vector Green's function solver, such as one that makes no assumptions about the mean flow. The newly developed acoustic analogy can be simplified to one that uses the Green's function associated with the Helmholtz equation, which is consistent with the formulation of Morris and Miller (AIAAJ 2010). A large number of predictions are generated using three different nozzles over a wide range of fully expanded Mach numbers and jet stagnation temperatures. These predictions are compared with experimental data from multiple jet noise labs. In addition, two models for the so-called 'fine-scale' mixing noise are included in the comparisons. Improved BBSAN predictions are obtained relative to other models that do not include the propagation effects, especially in the upstream direction of the jet.
Lichtenstein, DL; Spencer, JF; Doronin, K; Patra, D; Meyer, JM; Shashkova, EV; Kuppuswamy, M; Dhar, D; Thomas, MA; Tollefson, AE; Zumstein, LA; Wold, WSM; Toth, K
2012-01-01
Oncolytic (replication-competent) adenoviruses as anticancer agents provide new, promising tools to fight cancer. In support of a Phase I clinical trial, here we report safety data with INGN 007 (VRX-007), an oncolytic adenovirus with increased anti-tumor efficacy due to overexpression of the adenovirus-encoded ADP protein. Wild-type adenovirus type 5 (Ad5) and a replication-defective version of Ad5 were also studied as controls. A parallel study investigating the biodistribution of these viruses is described elsewhere in this issue. The toxicology experiments were conducted in two species, the Syrian hamster, which is permissive for INGN 007 and Ad5 replication and the poorly permissive mouse. The studies demonstrated that the safety profile of INGN 007 is similar to Ad5. Both viruses caused transient liver damage upon intravenous injection that resolved by 28 days post-infection. The No-Observable-Adverse-Effect-Level (NOAEL) for INGN 007 in hamsters was 3 × 1010 viral particles per kg. In hamsters, the replication-defective vector caused less toxicity, indicating that replication of Ad vectors in the host is an important factor in pathogenesis. With mice, INGN 007 and Ad5 caused toxicity comparable to the replication-defective adenovirus vector. Partially based on these results, the FDA granted permission to enter into a Phase I clinical trial with INGN 007. PMID:19197324
Lichtenstein, D L; Spencer, J F; Doronin, K; Patra, D; Meyer, J M; Shashkova, E V; Kuppuswamy, M; Dhar, D; Thomas, M A; Tollefson, A E; Zumstein, L A; Wold, W S M; Toth, K
2009-08-01
Oncolytic (replication-competent) adenoviruses as anticancer agents provide new, promising tools to fight cancer. In support of a Phase I clinical trial, here we report safety data with INGN 007 (VRX-007), an oncolytic adenovirus with increased anti-tumor efficacy due to overexpression of the adenovirus-encoded ADP protein. Wild-type adenovirus type 5 (Ad5) and a replication-defective version of Ad5 were also studied as controls. A parallel study investigating the biodistribution of these viruses is described elsewhere in this issue. The toxicology experiments were conducted in two species, the Syrian hamster, which is permissive for INGN 007 and Ad5 replication and the poorly permissive mouse. The studies demonstrated that the safety profile of INGN 007 is similar to Ad5. Both viruses caused transient liver damage upon intravenous injection that resolved by 28 days post-infection. The No-Observable-Adverse-Effect-Level (NOAEL) for INGN 007 in hamsters was 3 x 10(10) viral particles per kg. In hamsters, the replication-defective vector caused less toxicity, indicating that replication of Ad vectors in the host is an important factor in pathogenesis. With mice, INGN 007 and Ad5 caused toxicity comparable to the replication-defective adenovirus vector. Partially based on these results, the FDA granted permission to enter into a Phase I clinical trial with INGN 007.
Zhang, Li; Zhou, WeiDa
2013-12-01
This paper deals with fast methods for training a 1-norm support vector machine (SVM). First, we define a specific class of linear programming with many sparse constraints, i.e., row-column sparse constraint linear programming (RCSC-LP). In nature, the 1-norm SVM is a sort of RCSC-LP. In order to construct subproblems for RCSC-LP and solve them, a family of row-column generation (RCG) methods is introduced. RCG methods belong to a category of decomposition techniques, and perform row and column generations in a parallel fashion. Specially, for the 1-norm SVM, the maximum size of subproblems of RCG is identical with the number of Support Vectors (SVs). We also introduce a semi-deleting rule for RCG methods and prove the convergence of RCG methods when using the semi-deleting rule. Experimental results on toy data and real-world datasets illustrate that it is efficient to use RCG to train the 1-norm SVM, especially in the case of small SVs. Copyright © 2013 Elsevier Ltd. All rights reserved.
Color and Vector Flow Imaging in Parallel Ultrasound With Sub-Nyquist Sampling.
Madiena, Craig; Faurie, Julia; Poree, Jonathan; Garcia, Damien; Garcia, Damien; Madiena, Craig; Faurie, Julia; Poree, Jonathan
2018-05-01
RF acquisition with a high-performance multichannel ultrasound system generates massive data sets in short periods of time, especially in "ultrafast" ultrasound when digital receive beamforming is required. Sampling at a rate four times the carrier frequency is the standard procedure since this rule complies with the Nyquist-Shannon sampling theorem and simplifies quadrature sampling. Bandpass sampling (or undersampling) outputs a bandpass signal at a rate lower than the maximal frequency without harmful aliasing. Advantages over Nyquist sampling are reduced storage volumes and data workflow, and simplified digital signal processing tasks. We used RF undersampling in color flow imaging (CFI) and vector flow imaging (VFI) to decrease data volume significantly (factor of 3 to 13 in our configurations). CFI and VFI with Nyquist and sub-Nyquist samplings were compared in vitro and in vivo. The estimate errors due to undersampling were small or marginal, which illustrates that Doppler and vector Doppler images can be correctly computed with a drastically reduced amount of RF samples. Undersampling can be a method of choice in CFI and VFI to avoid information overload and reduce data transfer and storage.
2007-01-19
fever in Nonhuman Primate Models" Date d?JO )oi Date )&*7 Date Dissertation and Abstract Approved: Robert Friedm ,M.D. Department of Pathology Committee...thesis manuscript entitled: "Evaluation of the Protective Efficacy of Recombinant Vesicular Stomatitis Virus Vectors Against Marburg Hemorrhagic fever ...stomatitis virus vectors against Marburg hemorrhagic fever in nonhuman primate models By Kathleen Daddario-DiCaprio Dissertation
DOE Office of Scientific and Technical Information (OSTI.GOV)
Jiang, Huaiguang; Zhang, Yingchen; Muljadi, Eduard
In this paper, a short-term load forecasting approach based network reconfiguration is proposed in a parallel manner. Specifically, a support vector regression (SVR) based short-term load forecasting approach is designed to provide an accurate load prediction and benefit the network reconfiguration. Because of the nonconvexity of the three-phase balanced optimal power flow, a second-order cone program (SOCP) based approach is used to relax the optimal power flow problem. Then, the alternating direction method of multipliers (ADMM) is used to compute the optimal power flow in distributed manner. Considering the limited number of the switches and the increasing computation capability, themore » proposed network reconfiguration is solved in a parallel way. The numerical results demonstrate the feasible and effectiveness of the proposed approach.« less
NASA Astrophysics Data System (ADS)
Farengo, R.; Guzdar, P. N.; Lee, Y. C.
1989-08-01
The effect of finite parallel wavenumber and electron temperature gradients on the lower hybrid drift instability is studied in the parameter regime corresponding to the TRX-2 device [Fusion Technol. 9, 48 (1986)]. Perturbations in the electrostatic potential and all three components of the vector potential are considered and finite beta electron orbit modifications are included. The electron temperature gradient decreases the growth rate of the instability but, for kz=0, unstable modes exist for ηe(=T'en0/Ten0)>6. Since finite kz effects completely stabilize the mode at small values of kz/ky(≂5×10-3), magnetic shear could be responsible for stabilizing the lower hybrid drift instability in field-reversed configurations.
Implementation of a partitioned algorithm for simulation of large CSI problems
NASA Technical Reports Server (NTRS)
Alvin, Kenneth F.; Park, K. C.
1991-01-01
The implementation of a partitioned numerical algorithm for determining the dynamic response of coupled structure/controller/estimator finite-dimensional systems is reviewed. The partitioned approach leads to a set of coupled first and second-order linear differential equations which are numerically integrated with extrapolation and implicit step methods. The present software implementation, ACSIS, utilizes parallel processing techniques at various levels to optimize performance on a shared-memory concurrent/vector processing system. A general procedure for the design of controller and filter gains is also implemented, which utilizes the vibration characteristics of the structure to be solved. Also presented are: example problems; a user's guide to the software; the procedures and algorithm scripts; a stability analysis for the algorithm; and the source code for the parallel implementation.
Azil, Aishah H; Ritchie, Scott A; Williams, Craig R
2015-10-01
This qualitative study aimed to describe field worker perceptions, evaluations of worth, and time costs of routine dengue vector surveillance methods in Cairns (Australia), Kuala Lumpur and Petaling District (Malaysia). In Cairns, the BG-Sentinel trap is a favored method for field workers because of its user-friendliness, but is not as cost-efficient as the sticky ovitrap. In Kuala Lumpur, the Mosquito Larvae Trapping Device is perceived as a solution for the inaccessibility of premises to larval surveys. Nonetheless, the larval survey method is retained in Malaysia for prompt detection of dengue vectors. For dengue vector surveillance to be successful, there needs to be not only technical, quantitative evaluations of method performance but also an appreciation of how amenable field workers are to using particular methods. Here, we report novel field worker perceptions of dengue vector surveillance methods in addition to time analysis for each method. © 2014 APJPH.
Towards a comprehensive model of Earth's disk-integrated Stokes vector
NASA Astrophysics Data System (ADS)
García Muñoz, A.
2015-07-01
A significant body of work on simulating the remote appearance of Earth-like exoplanets has been done over the last decade. The research is driven by the prospect of characterizing habitable planets beyond the Solar System in the near future. In this work, I present a method to produce the disk-integrated signature of planets that are described in their three-dimensional complexity, i.e. with both horizontal and vertical variations in the optical properties of their envelopes. The approach is based on Pre-conditioned Backward Monte Carlo integration of the vector Radiative Transport Equation and yields the full Stokes vector for outgoing reflected radiation. The method is demonstrated through selected examples inspired by published work at wavelengths from the visible to the near infrared and terrestrial prescriptions of both cloud and surface albedo maps. I explore the performance of the method in terms of computational time and accuracy. A clear strength of this approach is that its computational cost does not appear to be significantly affected by non-uniformities in the planet optical properties. Earth's simulated appearance is strongly dependent on wavelength; both brightness and polarization undergo diurnal variations arising from changes in the planet cover, but polarization yields a better insight into variations with phase angle. There is partial cancellation of the polarized signal from the northern and southern hemispheres so that the outgoing polarization vector lies preferentially either in the plane parallel or perpendicular to the planet scattering plane, also for non-uniform cloud and albedo properties and various levels of absorption within the atmosphere. The evaluation of circular polarization is challenging; a number of one-photon experiments of 109 or more is needed to resolve hemispherically integrated degrees of circular polarization of a few times 10-5. Last, I introduce brightness curves of Earth obtained with one of the Messenger cameras at three wavelengths (0.48, 0.56 and 0.63 μm) during a flyby in 2005. The light curves show distinct structure associated with the varying aspect of the Earth's visible disk (phases of 98-107°) as the planet undergoes a full 24 h rotation; the structure is reasonably well reproduced with model simulations.
Energy flow of electric dipole radiation in between parallel mirrors
NASA Astrophysics Data System (ADS)
Xu, Zhangjin; Arnoldus, Henk F.
2017-11-01
We have studied the energy flow patterns of the radiation emitted by an electric dipole located in between parallel mirrors. It appears that the field lines of the Poynting vector (the flow lines of energy) can have very intricate structures, including many singularities and vortices. The flow line patterns depend on the distance between the mirrors, the distance of the dipole to one of the mirrors and the angle of oscillation of the dipole moment with respect to the normal of the mirror surfaces. Already for the simplest case of a dipole moment oscillating perpendicular to the mirrors, singularities appear at regular intervals along the direction of propagation (parallel to the mirrors). For a parallel dipole, vortices appear in the neighbourhood of the dipole. For a dipole oscillating under a finite angle with the surface normal, the radiating tends to swirl around the dipole before travelling off parallel to the mirrors. For relatively large mirror separations, vortices appear in the pattern. When the dipole is off-centred with respect to the midway point between the mirrors, the flow line structure becomes even more complicated, with numerous vortices in the pattern, and tiny loops near the dipole. We have also investigated the locations of the vortices and singularities, and these can be found without any specific knowledge about the flow lines. This provides an independent means of studying the propagation of dipole radiation between mirrors.
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing
Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin
2016-01-01
With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate. PMID:27070606
Accelerating Spaceborne SAR Imaging Using Multiple CPU/GPU Deep Collaborative Computing.
Zhang, Fan; Li, Guojun; Li, Wei; Hu, Wei; Hu, Yuxin
2016-04-07
With the development of synthetic aperture radar (SAR) technologies in recent years, the huge amount of remote sensing data brings challenges for real-time imaging processing. Therefore, high performance computing (HPC) methods have been presented to accelerate SAR imaging, especially the GPU based methods. In the classical GPU based imaging algorithm, GPU is employed to accelerate image processing by massive parallel computing, and CPU is only used to perform the auxiliary work such as data input/output (IO). However, the computing capability of CPU is ignored and underestimated. In this work, a new deep collaborative SAR imaging method based on multiple CPU/GPU is proposed to achieve real-time SAR imaging. Through the proposed tasks partitioning and scheduling strategy, the whole image can be generated with deep collaborative multiple CPU/GPU computing. In the part of CPU parallel imaging, the advanced vector extension (AVX) method is firstly introduced into the multi-core CPU parallel method for higher efficiency. As for the GPU parallel imaging, not only the bottlenecks of memory limitation and frequent data transferring are broken, but also kinds of optimized strategies are applied, such as streaming, parallel pipeline and so on. Experimental results demonstrate that the deep CPU/GPU collaborative imaging method enhances the efficiency of SAR imaging on single-core CPU by 270 times and realizes the real-time imaging in that the imaging rate outperforms the raw data generation rate.
NASA Astrophysics Data System (ADS)
Schratz, Patrick; Herrmann, Tobias; Brenning, Alexander
2017-04-01
Computational and statistical prediction methods such as the support vector machine have gained popularity in remote-sensing applications in recent years and are often compared to more traditional approaches like maximum-likelihood classification. However, the accuracy assessment of such predictive models in a spatial context needs to account for the presence of spatial autocorrelation in geospatial data by using spatial cross-validation and bootstrap strategies instead of their now more widely used non-spatial equivalent. The R package sperrorest by A. Brenning [IEEE International Geoscience and Remote Sensing Symposium, 1, 374 (2012)] provides a generic interface for performing (spatial) cross-validation of any statistical or machine-learning technique available in R. Since spatial statistical models as well as flexible machine-learning algorithms can be computationally expensive, parallel computing strategies are required to perform cross-validation efficiently. The most recent major release of sperrorest therefore comes with two new features (aside from improved documentation): The first one is the parallelized version of sperrorest(), parsperrorest(). This function features two parallel modes to greatly speed up cross-validation runs. Both parallel modes are platform independent and provide progress information. par.mode = 1 relies on the pbapply package and calls interactively (depending on the platform) parallel::mclapply() or parallel::parApply() in the background. While forking is used on Unix-Systems, Windows systems use a cluster approach for parallel execution. par.mode = 2 uses the foreach package to perform parallelization. This method uses a different way of cluster parallelization than the parallel package does. In summary, the robustness of parsperrorest() is increased with the implementation of two independent parallel modes. A new way of partitioning the data in sperrorest is provided by partition.factor.cv(). This function gives the user the possibility to perform cross-validation at the level of some grouping structure. As an example, in remote sensing of agricultural land uses, pixels from the same field contain nearly identical information and will thus be jointly placed in either the test set or the training set. Other spatial sampling resampling strategies are already available and can be extended by the user.
Van Roey, Karel; Sokny, Mao; Denis, Leen; Van den Broeck, Nick; Heng, Somony; Siv, Sovannaroth; Sluydts, Vincent; Sochantha, Tho; Coosemans, Marc; Durnez, Lies
2014-12-01
Scaling up of insecticide treated nets has contributed to a substantial malaria decline. However, some malaria vectors, and most arbovirus vectors, bite outdoors and in the early evening. Therefore, topically applied insect repellents may provide crucial additional protection against mosquito-borne pathogens. Among topical repellents, DEET is the most commonly used, followed by others such as picaridin. The protective efficacy of two formulated picaridin repellents against mosquito bites, including arbovirus and malaria vectors, was evaluated in a field study in Cambodia. Over a period of two years, human landing collections were performed on repellent treated persons, with rotation to account for the effect of collection place, time and individual collector. Based on a total of 4996 mosquitoes collected on negative control persons, the overall five hour protection rate was 97.4% [95%CI: 97.1-97.8%], not decreasing over time. Picaridin 20% performed equally well as DEET 20% and better than picaridin 10%. Repellents performed better against Mansonia and Culex spp. as compared to aedines and anophelines. A lower performance was observed against Aedes albopictus as compared to Aedes aegypti, and against Anopheles barbirostris as compared to several vector species. Parity rates were higher in vectors collected on repellent treated person as compared to control persons. As such, field evaluation shows that repellents can provide additional personal protection against early and outdoor biting malaria and arbovirus vectors, with excellent protection up to five hours after application. The heterogeneity in repellent sensitivity between mosquito genera and vector species could however impact the efficacy of repellents in public health programs. Considering its excellent performance and potential to protect against early and outdoor biting vectors, as well as its higher acceptability as compared to DEET, picaridin is an appropriate product to evaluate the epidemiological impact of large scale use of topical repellents on arthropod borne diseases.
Denis, Leen; Van den Broeck, Nick; Heng, Somony; Siv, Sovannaroth; Sluydts, Vincent; Sochantha, Tho; Coosemans, Marc; Durnez, Lies
2014-01-01
Scaling up of insecticide treated nets has contributed to a substantial malaria decline. However, some malaria vectors, and most arbovirus vectors, bite outdoors and in the early evening. Therefore, topically applied insect repellents may provide crucial additional protection against mosquito-borne pathogens. Among topical repellents, DEET is the most commonly used, followed by others such as picaridin. The protective efficacy of two formulated picaridin repellents against mosquito bites, including arbovirus and malaria vectors, was evaluated in a field study in Cambodia. Over a period of two years, human landing collections were performed on repellent treated persons, with rotation to account for the effect of collection place, time and individual collector. Based on a total of 4996 mosquitoes collected on negative control persons, the overall five hour protection rate was 97.4% [95%CI: 97.1–97.8%], not decreasing over time. Picaridin 20% performed equally well as DEET 20% and better than picaridin 10%. Repellents performed better against Mansonia and Culex spp. as compared to aedines and anophelines. A lower performance was observed against Aedes albopictus as compared to Aedes aegypti, and against Anopheles barbirostris as compared to several vector species. Parity rates were higher in vectors collected on repellent treated person as compared to control persons. As such, field evaluation shows that repellents can provide additional personal protection against early and outdoor biting malaria and arbovirus vectors, with excellent protection up to five hours after application. The heterogeneity in repellent sensitivity between mosquito genera and vector species could however impact the efficacy of repellents in public health programs. Considering its excellent performance and potential to protect against early and outdoor biting vectors, as well as its higher acceptability as compared to DEET, picaridin is an appropriate product to evaluate the epidemiological impact of large scale use of topical repellents on arthropod borne diseases. PMID:25522134
DOE Office of Scientific and Technical Information (OSTI.GOV)
Yang, R; Fallone, B; Cross Cancer Institute, Edmonton, AB
Purpose: To develop a Graphic Processor Unit (GPU) accelerated deterministic solution to the Linear Boltzmann Transport Equation (LBTE) for accurate dose calculations in radiotherapy (RT). A deterministic solution yields the potential for major speed improvements due to the sparse matrix-vector and vector-vector multiplications and would thus be of benefit to RT. Methods: In order to leverage the massively parallel architecture of GPUs, the first order LBTE was reformulated as a second order self-adjoint equation using the Least Squares Finite Element Method (LSFEM). This produces a symmetric positive-definite matrix which is efficiently solved using a parallelized conjugate gradient (CG) solver. Themore » LSFEM formalism is applied in space, discrete ordinates is applied in angle, and the Multigroup method is applied in energy. The final linear system of equations produced is tightly coupled in space and angle. Our code written in CUDA-C was benchmarked on an Nvidia GeForce TITAN-X GPU against an Intel i7-6700K CPU. A spatial mesh of 30,950 tetrahedral elements was used with an S4 angular approximation. Results: To avoid repeating a full computationally intensive finite element matrix assembly at each Multigroup energy, a novel mapping algorithm was developed which minimized the operations required at each energy. Additionally, a parallelized memory mapping for the kronecker product between the sparse spatial and angular matrices, including Dirichlet boundary conditions, was created. Atomicity is preserved by graph-coloring overlapping nodes into separate kernel launches. The one-time mapping calculations for matrix assembly, kronecker product, and boundary condition application took 452±1ms on GPU. Matrix assembly for 16 energy groups took 556±3s on CPU, and 358±2ms on GPU using the mappings developed. The CG solver took 93±1s on CPU, and 468±2ms on GPU. Conclusion: Three computationally intensive subroutines in deterministically solving the LBTE have been formulated on GPU, resulting in two orders of magnitude speedup. Funding support from Natural Sciences and Engineering Research Council and Alberta Innovates Health Solutions. Dr. Fallone is a co-founder and CEO of MagnetTx Oncology Solutions (under discussions to license Alberta bi-planar linac MR for commercialization).« less
Walsh-Hadamard transform kernel-based feature vector for shot boundary detection.
Lakshmi, Priya G G; Domnic, S
2014-12-01
Video shot boundary detection (SBD) is the first step of video analysis, summarization, indexing, and retrieval. In SBD process, videos are segmented into basic units called shots. In this paper, a new SBD method is proposed using color, edge, texture, and motion strength as vector of features (feature vector). Features are extracted by projecting the frames on selected basis vectors of Walsh-Hadamard transform (WHT) kernel and WHT matrix. After extracting the features, based on the significance of the features, weights are calculated. The weighted features are combined to form a single continuity signal, used as input for Procedure Based shot transition Identification process (PBI). Using the procedure, shot transitions are classified into abrupt and gradual transitions. Experimental results are examined using large-scale test sets provided by the TRECVID 2007, which has evaluated hard cut and gradual transition detection. To evaluate the robustness of the proposed method, the system evaluation is performed. The proposed method yields F1-Score of 97.4% for cut, 78% for gradual, and 96.1% for overall transitions. We have also evaluated the proposed feature vector with support vector machine classifier. The results show that WHT-based features can perform well than the other existing methods. In addition to this, few more video sequences are taken from the Openvideo project and the performance of the proposed method is compared with the recent existing SBD method.
A simple method of equine limb force vector analysis and its potential applications.
Hobbs, Sarah Jane; Robinson, Mark A; Clayton, Hilary M
2018-01-01
Ground reaction forces (GRF) measured during equine gait analysis are typically evaluated by analyzing discrete values obtained from continuous force-time data for the vertical, longitudinal and transverse GRF components. This paper describes a simple, temporo-spatial method of displaying and analyzing sagittal plane GRF vectors. In addition, the application of statistical parametric mapping (SPM) is introduced to analyse differences between contra-lateral fore and hindlimb force-time curves throughout the stance phase. The overall aim of the study was to demonstrate alternative methods of evaluating functional (a)symmetry within horses. GRF and kinematic data were collected from 10 horses trotting over a series of four force plates (120 Hz). The kinematic data were used to determine clean hoof contacts. The stance phase of each hoof was determined using a 50 N threshold. Vertical and longitudinal GRF for each stance phase were plotted both as force-time curves and as force vector diagrams in which vectors originating at the centre of pressure on the force plate were drawn at intervals of 8.3 ms for the duration of stance. Visual evaluation was facilitated by overlay of the vector diagrams for different limbs. Summary vectors representing the magnitude (VecMag) and direction (VecAng) of the mean force over the entire stance phase were superimposed on the force vector diagram. Typical measurements extracted from the force-time curves (peak forces, impulses) were compared with VecMag and VecAng using partial correlation (controlling for speed). Paired samples t -tests (left v. right diagonal pair comparison and high v. low vertical force diagonal pair comparison) were performed on discrete and vector variables using traditional methods and Hotelling's T 2 tests on normalized stance phase data using SPM. Evidence from traditional statistical tests suggested that VecMag is more influenced by the vertical force and impulse, whereas VecAng is more influenced by the longitudinal force and impulse. When used to evaluate mean data from the group of ten sound horses, SPM did not identify differences between the left and right contralateral limb pairs or between limb pairs classified according to directional asymmetry. When evaluating a single horse, three periods were identified during which differences in the forces between the left and right forelimbs exceeded the critical threshold ( p < .01). Traditional statistical analysis of 2D GRF peak values, summary vector variables and visual evaluation of force vector diagrams gave harmonious results and both methods identified the same inter-limb asymmetries. As alpha was more tightly controlled using SPM, significance was only found in the individual horse although T 2 plots followed the same trends as discrete analysis for the group. The techniques of force vector analysis and SPM hold promise for investigations of sidedness and asymmetry in horses.
Influenza virus-specific TCR-transduced T cells as a model for adoptive immunotherapy
Berdien, Belinda; Reinhard, Henrike; Meyer, Sabrina; Spöck, Stefanie; Kröger, Nicolaus; Atanackovic, Djordje; Fehse, Boris
2013-01-01
Adoptive transfer of T lymphocytes equipped with tumor-antigen specific T-cell receptors (TCRs) represents a promising strategy in cancer immunotherapy, but the approach remains technically demanding. Using influenza virus (Flu)-specific T-cell responses as a model system we compared different methods for the generation of T-cell clones and isolation of antigen-specific TCRs. Altogether, we generated 12 CD8+ T-cell clones reacting to the Flu matrix protein (Flu-M) and 6 CD4+ T-cell clones reacting to the Flu nucleoprotein (Flu-NP) from 4 healthy donors. IFN-γ-secretion-based enrichment of antigen-specific cells, optionally combined with tetramer staining, was the most efficient way for generating T-cell clones. In contrast, the commonly used limiting dilution approach was least efficient. TCR genes were isolated from T-cell clones and cloned into both a previously used gammaretroviral LTR-vector, MP91 and the novel lentiviral self-inactivating vector LeGO-MP that contains MP91-derived promotor and regulatory elements. To directly compare their functional efficiencies, we in parallel transduced T-cell lines and primary T cells with the two vectors encoding identical TCRs. Transduction efficiencies were approximately twice higher with the gammaretroviral vector. Secretion of high amounts of IFN-γ, IL-2 and TNF-α by transduced cells after exposure to the respective influenza target epitope proved efficient specificity transfer of the isolated TCRs to primary T-cells for both vectors, at the same time indicating superior functionality of MP91-transduced cells. In conclusion, we have developed optimized strategies to obtain and transfer antigen-specific TCRs as well as designed a novel lentiviral vector for TCR-gene transfer. Our data may help to improve adoptive T-cell therapies. PMID:23428899
NASA Astrophysics Data System (ADS)
Lindberg, P. A. P.; Shen, Z.-X.; Dessau, D. S.; Wells, B. O.; Mitzi, D. B.; Lindau, I.; Spicer, W. E.; Kapitulnik, A.
1989-09-01
Angle-resolved photoemission studies of single-crystalline La-doped Bi-Sr-Ca-Cu- 90-K superconductors (Bi2.0Sr1.8Ca0.8La0.3Cu2.1O8+δ) were performed utilizing synchrotron radiation covering the photon energy range 10-40 eV. The data conclusively reveal a dispersionless character of the valence-band states as a function of the wave-vector component parallel to the c axis, in agreement with the predictions of band calculations. Band effects are evident from both intensity modulations of the spectral features in the valence band and from energy dispersions as a function of the wave vector component lying in the basal a-b plane.
NASA Technical Reports Server (NTRS)
Stieler, B.
1971-01-01
An inertial navigation system is described and analyzed based on two two-degree-of-freedom Schuler-gyropendulums and one two-degree-of-freedom azimuth gyro. The three sensors, each base motion isolated about its two input axes, are mounted on a common base, strapped down to the vehicle. The up and down pointing spin vectors of the two properly tuned gyropendulums track the vertical and indicate physically their velocity with respect to inertial space. The spin vector of the azimuth gyro is pointing northerly parallel to the earth axis. The system can be made self-aligning on a stationary base. If external measurements for the north direction and the vertical are available, initial disturbance torques can be measured and easily biased out. The error analysis shows that the system is practicable with today's technology.
Solving large sparse eigenvalue problems on supercomputers
NASA Technical Reports Server (NTRS)
Philippe, Bernard; Saad, Youcef
1988-01-01
An important problem in scientific computing consists in finding a few eigenvalues and corresponding eigenvectors of a very large and sparse matrix. The most popular methods to solve these problems are based on projection techniques on appropriate subspaces. The main attraction of these methods is that they only require the use of the matrix in the form of matrix by vector multiplications. The implementations on supercomputers of two such methods for symmetric matrices, namely Lanczos' method and Davidson's method are compared. Since one of the most important operations in these two methods is the multiplication of vectors by the sparse matrix, methods of performing this operation efficiently are discussed. The advantages and the disadvantages of each method are compared and implementation aspects are discussed. Numerical experiments on a one processor CRAY 2 and CRAY X-MP are reported. Possible parallel implementations are also discussed.
Spatial distribution of an infectious disease in a small mammal community
NASA Astrophysics Data System (ADS)
Correa, Juana P.; Bacigalupo, Antonella; Fontúrbel, Francisco E.; Oda, Esteban; Cattan, Pedro E.; Solari, Aldo; Botto-Mahan, Carezza
2015-10-01
Chagas disease is a zoonosis caused by the parasite Trypanosoma cruzi and transmitted by insect vectors to several mammals, but little is known about its spatial epidemiology. We assessed the spatial distribution of T. cruzi infection in vectors and small mammals to test if mammal infection status is related to the proximity to vector colonies. During four consecutive years we captured and georeferenced the locations of mammal species and colonies of Mepraia spinolai, a restricted-movement vector. Infection status on mammals and vectors was evaluated by molecular techniques. To examine the effect of vector colonies on mammal infection status, we constructed an infection distance index using the distance between the location of each captured mammal to each vector colony and the average T. cruzi prevalence of each vector colony, weighted by the number of colonies assessed. We collected and evaluated T. cruzi infection in 944 mammals and 1976 M. spinolai. We found a significant effect of the infection distance index in explaining their infection status, when considering all mammal species together. By examining the most abundant species separately, we found this effect only for the diurnal and gregarious rodent Octodon degus. Spatially explicit models involving the prevalence and location of infected vectors and hosts had not been reported previously for a wild disease.
Extreme Weather Events and Impacts on Vector-borne Diseases and Agriculture
USDA-ARS?s Scientific Manuscript database
Extreme weather events during the period 2010-2012 impacted agriculture and vector-borne disease throughout the world. We evaluated specific weather events with satellite remotely sensed environmental data and evaluated crop production and diseases associated with these events. Significant droughts ...
The kinematics and initiation mechanisms of the earthquake-triggered Daguangbao landslide
NASA Astrophysics Data System (ADS)
Yang, Che-Ming; Cheng, Hui-Yun; Tsao, Chia-Che; Wu, Wen-Jie; Dong, Jia-Jyun; Lee, Chyi-Tyi; Lin, Ming-Lang; Zhang, Wei-Fong; Pei, Xiang-Jun; Wang, Gong-Hui; Huang, Run-Qiu
2015-04-01
The Daguangbao (DGB) landslide is one of the largest earthquake-triggered landslides induced by the 2008 Wenchuan earthquake in the world over the past century. Based on remote sensing images, topography analysis and field investigation, this landslide was speculated a gigantic atypical wedge failure with the folded bedding plane and a zigzag stepping-out joint system, which outcropped at the south and north, respectively. With the inferred failure surfaces, the volume of the DGB landslide is about 1,051 Mm3. The frequently adopted Rigid Wedge Method (RWM), which assumed zero shear stress on the sliding surface along the vectors perpendicular to the intersection line when evaluating the wedge stability, could not be valid for this super large DGB wedge. Under an assumption that the shear strength is fully mobilized on the sliding surface along the vectors perpendicular to the intersection line, this study proposed to use a Maximum Shear Stress Method (MSSM) to calculate the factor of safety (FOS) of the DGB wedge. Based on the assumptions of the two methods, the FOS of the RWM and MSSM are the upper and lower bounds for the wedge stability analysis. Based on the rotary shear tests, the averaged friction coefficients of the representative materials of the two sliding surfaces are 0.79 (bedding parallel fault gauges) and 0.71 (dolomite joints). Without external force, the FOSs of the DGB landslide are 4.14 and 2.51 by the RWM and MSSM, respectively. Restate, the wedge is stable before the 2008 Wenchuan earthquake. However, DGB landslide can be triggered at 35.7 sec based on the ground acceleration records of strong motion station MZQP during the 2008 Wenchuan earthquake and the pseudo-static stability analysis incorporated into MSSM (Acceleration: EW=0.272g, NS=0.152g, Vertical=0.244g). Moreover, using the friction coefficient of the representative materials under large shear displacement under shear velocity of 1.3 m/s (0.16 for bedding parallel fault gouges and 0.1 for dolomite joints), the gigantic wedge can be speeded up to a maximum velocity of 54 m/sec. The traveled time will be 70 seconds with a travel distance of 1.9 km.
Comparison of the MPP with other supercomputers for LANDSAT data processing
NASA Technical Reports Server (NTRS)
Ozga, Martin
1987-01-01
The massively parallel processor is compared to the CRAY X-MP and the CYBER-205 for LANDSAT data processing. The maximum likelihood classification algorithm is the basis for comparison since this algorithm is simple to implement and vectorizes very well. The algorithm was implemented on all three machines and tested by classifying the same full scene of LANDSAT multispectral scan data. Timings are compared as well as features of the machines and available software.
NASA Technical Reports Server (NTRS)
Ortega, J. M.
1985-01-01
Synopses are given for NASA supported work in computer science at the University of Virginia. Some areas of research include: error seeding as a testing method; knowledge representation for engineering design; analysis of faults in a multi-version software experiment; implementation of a parallel programming environment; two computer graphics systems for visualization of pressure distribution and convective density particles; task decomposition for multiple robot arms; vectorized incomplete conjugate gradient; and iterative methods for solving linear equations on the Flex/32.
2014-08-29
lends itself to parallelization due to it’s discontinuous nature [27]. It has well-established stability properties and is actively being researched...from considering Conservation of Mass , Conservation of Momentum, and Conservation of Energy. This well-known result, known as the Navier-Stokes...internal plus kinetic) per unit mass and h = e+ Pρ as the enthalpy. qj is the jth component of the heat flux vector. This can be related to temperature