parallel linear system: Topics by Science.gov

Sample records for parallel linear system

Iterative algorithms for large sparse linear systems on parallel computers

NASA Technical Reports Server (NTRS)

Adams, L. M.

1982-01-01

Algorithms for assembling in parallel the sparse system of linear equations that result from finite difference or finite element discretizations of elliptic partial differential equations, such as those that arise in structural engineering are developed. Parallel linear stationary iterative algorithms and parallel preconditioned conjugate gradient algorithms are developed for solving these systems. In addition, a model for comparing parallel algorithms on array architectures is developed and results of this model for the algorithms are given.
Parallel/distributed direct method for solving linear systems

NASA Technical Reports Server (NTRS)

Lin, Avi

1990-01-01

A new family of parallel schemes for directly solving linear systems is presented and analyzed. It is shown that these schemes exhibit a near optimal performance and enjoy several important features: (1) For large enough linear systems, the design of the appropriate paralleled algorithm is insensitive to the number of processors as its performance grows monotonically with them; (2) It is especially good for large matrices, with dimensions large relative to the number of processors in the system; (3) It can be used in both distributed parallel computing environments and tightly coupled parallel computing systems; and (4) This set of algorithms can be mapped onto any parallel architecture without any major programming difficulties or algorithmical changes.
LDRD final report on massively-parallel linear programming : the parPCx system.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Parekh, Ojas; Phillips, Cynthia Ann; Boman, Erik Gunnar

2005-02-01

This report summarizes the research and development performed from October 2002 to September 2004 at Sandia National Laboratories under the Laboratory-Directed Research and Development (LDRD) project ''Massively-Parallel Linear Programming''. We developed a linear programming (LP) solver designed to use a large number of processors. LP is the optimization of a linear objective function subject to linear constraints. Companies and universities have expended huge efforts over decades to produce fast, stable serial LP solvers. Previous parallel codes run on shared-memory systems and have little or no distribution of the constraint matrix. We have seen no reports of general LP solver runsmore » on large numbers of processors. Our parallel LP code is based on an efficient serial implementation of Mehrotra's interior-point predictor-corrector algorithm (PCx). The computational core of this algorithm is the assembly and solution of a sparse linear system. We have substantially rewritten the PCx code and based it on Trilinos, the parallel linear algebra library developed at Sandia. Our interior-point method can use either direct or iterative solvers for the linear system. To achieve a good parallel data distribution of the constraint matrix, we use a (pre-release) version of a hypergraph partitioner from the Zoltan partitioning library. We describe the design and implementation of our new LP solver called parPCx and give preliminary computational results. We summarize a number of issues related to efficient parallel solution of LPs with interior-point methods including data distribution, numerical stability, and solving the core linear system using both direct and iterative methods. We describe a number of applications of LP specific to US Department of Energy mission areas and we summarize our efforts to integrate parPCx (and parallel LP solvers in general) into Sandia's massively-parallel integer programming solver PICO (Parallel Interger and Combinatorial Optimizer). We conclude with directions for long-term future algorithmic research and for near-term development that could improve the performance of parPCx.« less
AZTEC. Parallel Iterative method Software for Solving Linear Systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, S.; Shadid, J.; Tuminaro, R.

1995-07-01

AZTEC is an interactive library that greatly simplifies the parrallelization process when solving the linear systems of equations Ax=b where A is a user supplied n X n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. AZTEC is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparse unstructured matricesmore » for parallel solutions.« less
Parallels between control PDE's (Partial Differential Equations) and systems of ODE's (Ordinary Differential Equations)

NASA Technical Reports Server (NTRS)

Hunt, L. R.; Villarreal, Ramiro

1987-01-01

System theorists understand that the same mathematical objects which determine controllability for nonlinear control systems of ordinary differential equations (ODEs) also determine hypoellipticity for linear partial differentail equations (PDEs). Moreover, almost any study of ODE systems begins with linear systems. It is remarkable that Hormander's paper on hypoellipticity of second order linear p.d.e.'s starts with equations due to Kolmogorov, which are shown to be analogous to the linear PDEs. Eigenvalue placement by state feedback for a controllable linear system can be paralleled for a Kolmogorov equation if an appropriate type of feedback is introduced. Results concerning transformations of nonlinear systems to linear systems are similar to results for transforming a linear PDE to a Kolmogorov equation.
A parallel solver for huge dense linear systems

NASA Astrophysics Data System (ADS)

Badia, J. M.; Movilla, J. L.; Climente, J. I.; Castillo, M.; Marqués, M.; Mayo, R.; Quintana-Ortí, E. S.; Planelles, J.

2011-11-01

HDSS (Huge Dense Linear System Solver) is a Fortran Application Programming Interface (API) to facilitate the parallel solution of very large dense systems to scientists and engineers. The API makes use of parallelism to yield an efficient solution of the systems on a wide range of parallel platforms, from clusters of processors to massively parallel multiprocessors. It exploits out-of-core strategies to leverage the secondary memory in order to solve huge linear systems O(100.000). The API is based on the parallel linear algebra library PLAPACK, and on its Out-Of-Core (OOC) extension POOCLAPACK. Both PLAPACK and POOCLAPACK use the Message Passing Interface (MPI) as the communication layer and BLAS to perform the local matrix operations. The API provides a friendly interface to the users, hiding almost all the technical aspects related to the parallel execution of the code and the use of the secondary memory to solve the systems. In particular, the API can automatically select the best way to store and solve the systems, depending of the dimension of the system, the number of processes and the main memory of the platform. Experimental results on several parallel platforms report high performance, reaching more than 1 TFLOP with 64 cores to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors. New version program summaryProgram title: Huge Dense System Solver (HDSS) Catalogue identifier: AEHU_v1_1 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEHU_v1_1.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 87 062 No. of bytes in distributed program, including test data, etc.: 1 069 110 Distribution format: tar.gz Programming language: Fortran90, C Computer: Parallel architectures: multiprocessors, computer clusters Operating system: Linux/Unix Has the code been vectorized or parallelized?: Yes, includes MPI primitives. RAM: Tested for up to 190 GB Classification: 6.5 External routines: MPI ( http://www.mpi-forum.org/), BLAS ( http://www.netlib.org/blas/), PLAPACK ( http://www.cs.utexas.edu/~plapack/), POOCLAPACK ( ftp://ftp.cs.utexas.edu/pub/rvdg/PLAPACK/pooclapack.ps) (code for PLAPACK and POOCLAPACK is included in the distribution). Catalogue identifier of previous version: AEHU_v1_0 Journal reference of previous version: Comput. Phys. Comm. 182 (2011) 533 Does the new version supersede the previous version?: Yes Nature of problem: Huge scale dense systems of linear equations, Ax=B, beyond standard LAPACK capabilities. Solution method: The linear systems are solved by means of parallelized routines based on the LU factorization, using efficient secondary storage algorithms when the available main memory is insufficient. Reasons for new version: In many applications we need to guarantee a high accuracy in the solution of very large linear systems and we can do it by using double-precision arithmetic. Summary of revisions: Version 1.1 Can be used to solve linear systems using double-precision arithmetic. New version of the initialization routine. The user can choose the kind of arithmetic and the values of several parameters of the environment. Running time: About 5 hours to solve a system with more than 200 000 equations and more than 10 000 right-hand side vectors using double-precision arithmetic on an eight-node commodity cluster with a total of 64 Intel cores.
Moving-Article X-Ray Imaging System and Method for 3-D Image Generation

NASA Technical Reports Server (NTRS)

Fernandez, Kenneth R. (Inventor)

2012-01-01

An x-ray imaging system and method for a moving article are provided for an article moved along a linear direction of travel while the article is exposed to non-overlapping x-ray beams. A plurality of parallel linear sensor arrays are disposed in the x-ray beams after they pass through the article. More specifically, a first half of the plurality are disposed in a first of the x-ray beams while a second half of the plurality are disposed in a second of the x-ray beams. Each of the parallel linear sensor arrays is oriented perpendicular to the linear direction of travel. Each of the parallel linear sensor arrays in the first half is matched to a corresponding one of the parallel linear sensor arrays in the second half in terms of an angular position in the first of the x-ray beams and the second of the x-ray beams, respectively.
AZTEC: A parallel iterative package for the solving linear systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S.

1996-12-31

We describe a parallel linear system package, AZTEC. The package incorporates a number of parallel iterative methods (e.g. GMRES, biCGSTAB, CGS, TFQMR) and preconditioners (e.g. Jacobi, Gauss-Seidel, polynomial, domain decomposition with LU or ILU within subdomains). Additionally, AZTEC allows for the reuse of previous preconditioning factorizations within Newton schemes for nonlinear methods. Currently, a number of different users are using this package to solve a variety of PDE applications.
An efficient parallel algorithm for the solution of a tridiagonal linear system of equations

NASA Technical Reports Server (NTRS)

Stone, H. S.

1971-01-01

Tridiagonal linear systems of equations are solved on conventional serial machines in a time proportional to N, where N is the number of equations. The conventional algorithms do not lend themselves directly to parallel computations on computers of the ILLIAC IV class, in the sense that they appear to be inherently serial. An efficient parallel algorithm is presented in which computation time grows as log sub 2 N. The algorithm is based on recursive doubling solutions of linear recurrence relations, and can be used to solve recurrence relations of all orders.
On the parallel solution of parabolic equations

NASA Technical Reports Server (NTRS)

Gallopoulos, E.; Saad, Youcef

1989-01-01

Parallel algorithms for the solution of linear parabolic problems are proposed. The first of these methods is based on using polynomial approximation to the exponential. It does not require solving any linear systems and is highly parallelizable. The two other methods proposed are based on Pade and Chebyshev approximations to the matrix exponential. The parallelization of these methods is achieved by using partial fraction decomposition techniques to solve the resulting systems and thus offers the potential for increased time parallelism in time dependent problems. Experimental results from the Alliant FX/8 and the Cray Y-MP/832 vector multiprocessors are also presented.
Third-order linearization for self-beating filtered microwave photonic systems using a dual parallel Mach-Zehnder modulator.

PubMed

Pérez, Daniel; Gasulla, Ivana; Capmany, José; Fandiño, Javier S; Muñoz, Pascual; Alavi, Hossein

2016-09-05

We develop, analyze and apply a linearization technique based on dual parallel Mach-Zehnder modulator to self-beating microwave photonics systems. The approach enables broadband low-distortion transmission and reception at expense of a moderate electrical power penalty yielding a small optical power penalty (<1 dB).
Parallel computation using boundary elements in solid mechanics

NASA Technical Reports Server (NTRS)

Chien, L. S.; Sun, C. T.

1990-01-01

The inherent parallelism of the boundary element method is shown. The boundary element is formulated by assuming the linear variation of displacements and tractions within a line element. Moreover, MACSYMA symbolic program is employed to obtain the analytical results for influence coefficients. Three computational components are parallelized in this method to show the speedup and efficiency in computation. The global coefficient matrix is first formed concurrently. Then, the parallel Gaussian elimination solution scheme is applied to solve the resulting system of equations. Finally, and more importantly, the domain solutions of a given boundary value problem are calculated simultaneously. The linear speedups and high efficiencies are shown for solving a demonstrated problem on Sequent Symmetry S81 parallel computing system.
Robust control of a parallel hybrid drivetrain with a CVT

DOE Office of Scientific and Technical Information (OSTI.GOV)

Mayer, T.; Schroeder, D.

1996-09-01

In this paper the design of a robust control system for a parallel hybrid drivetrain is presented. The drivetrain is based on a continuously variable transmission (CVT) and is therefore a highly nonlinear multiple-input-multiple-output system (MIMO-System). Input-Output-Linearization offers the possibility of linearizing and of decoupling the system. Since for example the vehicle mass varies with the load and the efficiency of the gearbox depends strongly on the actual working point, an exact linearization of the plant will mostly fail. Therefore a robust control algorithm based on sliding mode is used to control the drivetrain.
Algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations with the use of parallel computations

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moryakov, A. V., E-mail: sailor@orc.ru

2016-12-15

An algorithm for solving the linear Cauchy problem for large systems of ordinary differential equations is presented. The algorithm for systems of first-order differential equations is implemented in the EDELWEISS code with the possibility of parallel computations on supercomputers employing the MPI (Message Passing Interface) standard for the data exchange between parallel processes. The solution is represented by a series of orthogonal polynomials on the interval [0, 1]. The algorithm is characterized by simplicity and the possibility to solve nonlinear problems with a correction of the operator in accordance with the solution obtained in the previous iterative process.
Parallelization of the FLAPW method and comparison with the PPW method

NASA Astrophysics Data System (ADS)

Canning, Andrew; Mannstadt, Wolfgang; Freeman, Arthur

2000-03-01

The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining electronic and magnetic properties of crystals and surfaces. In the past the FLAPW method has been limited to systems of about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell running on up to 512 processors on a Cray T3E parallel supercomputer. Some results will also be presented on a comparison of the plane-wave pseudopotential method and the FLAPW method on large systems.
Adaptive Identification by Systolic Arrays.

DTIC Science & Technology

1987-12-01

BIBLIOGRIAPHY Anton , Howard, Elementary Linear Algebra , John Wiley & Sons, 19S4. Cristi, Roberto, A Parallel Structure Jor Adaptive Pole Placement...10 11. SYSTEM IDENTIFICATION M*YETHODS ....................... 12 A. LINEAR SYSTEM MODELING ......................... 12 B. SOLUTION OF SYSTEMS OF... LINEAR EQUATIONS ......... 13 C. QR DECOMPOSITION ................................ 14 D. RECURSIVE LEAST SQUARES ......................... 16 E. BLOCK
Solving very large, sparse linear systems on mesh-connected parallel computers

NASA Technical Reports Server (NTRS)

Opsahl, Torstein; Reif, John

1987-01-01

The implementation of Pan and Reif's Parallel Nested Dissection (PND) algorithm on mesh connected parallel computers is described. This is the first known algorithm that allows very large, sparse linear systems of equations to be solved efficiently in polylog time using a small number of processors. How the processor bound of PND can be matched to the number of processors available on a given parallel computer by slowing down the algorithm by constant factors is described. Also, for the important class of problems where G(A) is a grid graph, a unique memory mapping that reduces the inter-processor communication requirements of PND to those that can be executed on mesh connected parallel machines is detailed. A description of an implementation on the Goodyear Massively Parallel Processor (MPP), located at Goddard is given. Also, a detailed discussion of data mappings and performance issues is given.
Parallelization of the FLAPW method

NASA Astrophysics Data System (ADS)

Canning, A.; Mannstadt, W.; Freeman, A. J.

2000-08-01

The FLAPW (full-potential linearized-augmented plane-wave) method is one of the most accurate first-principles methods for determining structural, electronic and magnetic properties of crystals and surfaces. Until the present work, the FLAPW method has been limited to systems of less than about a hundred atoms due to the lack of an efficient parallel implementation to exploit the power and memory of parallel computers. In this work, we present an efficient parallelization of the method by division among the processors of the plane-wave components for each state. The code is also optimized for RISC (reduced instruction set computer) architectures, such as those found on most parallel computers, making full use of BLAS (basic linear algebra subprograms) wherever possible. Scaling results are presented for systems of up to 686 silicon atoms and 343 palladium atoms per unit cell, running on up to 512 processors on a CRAY T3E parallel supercomputer.
A survey of packages for large linear systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Wu, Kesheng; Milne, Brent

2000-02-11

This paper evaluates portable software packages for the iterative solution of very large sparse linear systems on parallel architectures. While we cannot hope to tell individual users which package will best suit their needs, we do hope that our systematic evaluation provides essential unbiased information about the packages and the evaluation process may serve as an example on how to evaluate these packages. The information contained here include feature comparisons, usability evaluations and performance characterizations. This review is primarily focused on self-contained packages that can be easily integrated into an existing program and are capable of computing solutions to verymore » large sparse linear systems of equations. More specifically, it concentrates on portable parallel linear system solution packages that provide iterative solution schemes and related preconditioning schemes because iterative methods are more frequently used than competing schemes such as direct methods. The eight packages evaluated are: Aztec, BlockSolve,ISIS++, LINSOL, P-SPARSLIB, PARASOL, PETSc, and PINEAPL. Among the eight portable parallel iterative linear system solvers reviewed, we recommend PETSc and Aztec for most application programmers because they have well designed user interface, extensive documentation and very responsive user support. Both PETSc and Aztec are written in the C language and are callable from Fortran. For those users interested in using Fortran 90, PARASOL is a good alternative. ISIS++is a good alternative for those who prefer the C++ language. Both PARASOL and ISIS++ are relatively new and are continuously evolving. Thus their user interface may change. In general, those packages written in Fortran 77 are more cumbersome to use because the user may need to directly deal with a number of arrays of varying sizes. Languages like C++ and Fortran 90 offer more convenient data encapsulation mechanisms which make it easier to implement a clean and intuitive user interface. In addition to reviewing these portable parallel iterative solver packages, we also provide a more cursory assessment of a range of related packages, from specialized parallel preconditioners to direct methods for sparse linear systems.« less
A high performance linear equation solver on the VPP500 parallel supercomputer

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nakanishi, Makoto; Ina, Hiroshi; Miura, Kenichi

1994-12-31

This paper describes the implementation of two high performance linear equation solvers developed for the Fujitsu VPP500, a distributed memory parallel supercomputer system. The solvers take advantage of the key architectural features of VPP500--(1) scalability for an arbitrary number of processors up to 222 processors, (2) flexible data transfer among processors provided by a crossbar interconnection network, (3) vector processing capability on each processor, and (4) overlapped computation and transfer. The general linear equation solver based on the blocked LU decomposition method achieves 120.0 GFLOPS performance with 100 processors in the LIN-PACK Highly Parallel Computing benchmark.

Matrix-Free Polynomial-Based Nonlinear Least Squares Optimized Preconditioning and its Application to Discontinuous Galerkin Discretizations of the Euler Equations

DTIC Science & Technology

2015-06-01

cient parallel code for applying the operator. Our method constructs a polynomial preconditioner using a nonlinear least squares (NLLS) algorithm. We show...apply the underlying operator. Such a preconditioner can be very attractive in scenarios where one has a highly efficient parallel code for applying...repeatedly solve a large system of linear equations where one has an extremely fast parallel code for applying an underlying fixed linear operator
Block iterative restoration of astronomical images with the massively parallel processor

NASA Technical Reports Server (NTRS)

Heap, Sara R.; Lindler, Don J.

1987-01-01

A method is described for algebraic image restoration capable of treating astronomical images. For a typical 500 x 500 image, direct algebraic restoration would require the solution of a 250,000 x 250,000 linear system. The block iterative approach is used to reduce the problem to solving 4900 121 x 121 linear systems. The algorithm was implemented on the Goddard Massively Parallel Processor, which can solve a 121 x 121 system in approximately 0.06 seconds. Examples are shown of the results for various astronomical images.
Parallel Dynamics Simulation Using a Krylov-Schwarz Linear Solution Scheme

DOE PAGES

Abhyankar, Shrirang; Constantinescu, Emil M.; Smith, Barry F.; ...

2016-11-07

Fast dynamics simulation of large-scale power systems is a computational challenge because of the need to solve a large set of stiff, nonlinear differential-algebraic equations at every time step. The main bottleneck in dynamic simulations is the solution of a linear system during each nonlinear iteration of Newton’s method. In this paper, we present a parallel Krylov- Schwarz linear solution scheme that uses the Krylov subspacebased iterative linear solver GMRES with an overlapping restricted additive Schwarz preconditioner. As a result, performance tests of the proposed Krylov-Schwarz scheme for several large test cases ranging from 2,000 to 20,000 buses, including amore » real utility network, show good scalability on different computing architectures.« less
Parallel Dynamics Simulation Using a Krylov-Schwarz Linear Solution Scheme

DOE Office of Scientific and Technical Information (OSTI.GOV)

Abhyankar, Shrirang; Constantinescu, Emil M.; Smith, Barry F.

Fast dynamics simulation of large-scale power systems is a computational challenge because of the need to solve a large set of stiff, nonlinear differential-algebraic equations at every time step. The main bottleneck in dynamic simulations is the solution of a linear system during each nonlinear iteration of Newton’s method. In this paper, we present a parallel Krylov- Schwarz linear solution scheme that uses the Krylov subspacebased iterative linear solver GMRES with an overlapping restricted additive Schwarz preconditioner. As a result, performance tests of the proposed Krylov-Schwarz scheme for several large test cases ranging from 2,000 to 20,000 buses, including amore » real utility network, show good scalability on different computing architectures.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing

DOE PAGES

Desai, Ajit; Khalil, Mohammad; Pettit, Chris; ...

2017-09-21

Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Scalable domain decomposition solvers for stochastic PDEs in high performance computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Desai, Ajit; Khalil, Mohammad; Pettit, Chris

Stochastic spectral finite element models of practical engineering systems may involve solutions of linear systems or linearized systems for non-linear problems with billions of unknowns. For stochastic modeling, it is therefore essential to design robust, parallel and scalable algorithms that can efficiently utilize high-performance computing to tackle such large-scale systems. Domain decomposition based iterative solvers can handle such systems. And though these algorithms exhibit excellent scalabilities, significant algorithmic and implementational challenges exist to extend them to solve extreme-scale stochastic systems using emerging computing platforms. Intrusive polynomial chaos expansion based domain decomposition algorithms are extended here to concurrently handle high resolutionmore » in both spatial and stochastic domains using an in-house implementation. Sparse iterative solvers with efficient preconditioners are employed to solve the resulting global and subdomain level local systems through multi-level iterative solvers. We also use parallel sparse matrix–vector operations to reduce the floating-point operations and memory requirements. Numerical and parallel scalabilities of these algorithms are presented for the diffusion equation having spatially varying diffusion coefficient modeled by a non-Gaussian stochastic process. Scalability of the solvers with respect to the number of random variables is also investigated.« less
Propulsion and stabilization system for magnetically levitated vehicles

DOEpatents

Coffey, Howard T.

1993-06-29

A propulsion and stabilization system for an inductive repulsion type magnetically levitated vehicle which is propelled and stabilized by a system which includes propulsion windings mounted above and parallel to vehicle-borne suspension magnets. A linear synchronous motor is part of the vehicle guideway and is mounted above and parallel to superconducting magnets attached to the magnetically levitated vehicle.
Classification of hyperspectral imagery using MapReduce on a NVIDIA graphics processing unit (Conference Presentation)

NASA Astrophysics Data System (ADS)

Ramirez, Andres; Rahnemoonfar, Maryam

2017-04-01

A hyperspectral image provides multidimensional figure rich in data consisting of hundreds of spectral dimensions. Analyzing the spectral and spatial information of such image with linear and non-linear algorithms will result in high computational time. In order to overcome this problem, this research presents a system using a MapReduce-Graphics Processing Unit (GPU) model that can help analyzing a hyperspectral image through the usage of parallel hardware and a parallel programming model, which will be simpler to handle compared to other low-level parallel programming models. Additionally, Hadoop was used as an open-source version of the MapReduce parallel programming model. This research compared classification accuracy results and timing results between the Hadoop and GPU system and tested it against the following test cases: the CPU and GPU test case, a CPU test case and a test case where no dimensional reduction was applied.
Parallel dynamics between non-Hermitian and Hermitian systems

NASA Astrophysics Data System (ADS)

Wang, P.; Lin, S.; Jin, L.; Song, Z.

2018-06-01

We reveals a connection between non-Hermitian and Hermitian systems by studying the connection between a family of non-Hermitian and Hermitian Hamiltonians based on exact solutions. In general, for a dynamic process in a non-Hermitian system H , there always exists a parallel dynamic process governed by the corresponding Hermitian conjugate system H†. We show that a linear superposition of the two parallel dynamics is exactly equivalent to the time evolution of a state under a Hermitian Hamiltonian H , and we present the relations between {H ,H ,H†} .
On the time-weighted quadratic sum of linear discrete systems

NASA Technical Reports Server (NTRS)

Jury, E. I.; Gutman, S.

1975-01-01

A method is proposed for obtaining the time-weighted quadratic sum for linear discrete systems. The formula of the weighted quadratic sum is obtained from matrix z-transform formulation. In addition, it is shown that this quadratic sum can be derived in a recursive form for several useful weighted functions. The discussion presented parallels that of MacFarlane (1963) for weighted quadratic integral for linear continuous systems.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Luszczek, Piotr R; Tomov, Stanimire Z; Dongarra, Jack J

We present an efficient and scalable programming model for the development of linear algebra in heterogeneous multi-coprocessor environments. The model incorporates some of the current best design and implementation practices for the heterogeneous acceleration of dense linear algebra (DLA). Examples are given as the basis for solving linear systems' algorithms - the LU, QR, and Cholesky factorizations. To generate the extreme level of parallelism needed for the efficient use of coprocessors, algorithms of interest are redesigned and then split into well-chosen computational tasks. The tasks execution is scheduled over the computational components of a hybrid system of multi-core CPUs andmore » coprocessors using a light-weight runtime system. The use of lightweight runtime systems keeps scheduling overhead low, while enabling the expression of parallelism through otherwise sequential code. This simplifies the development efforts and allows the exploration of the unique strengths of the various hardware components.« less
Large-Scale Linear Optimization through Machine Learning: From Theory to Practical System Design and Implementation

DTIC Science & Technology

2016-08-10

AFRL-AFOSR-JP-TR-2016-0073 Large-scale Linear Optimization through Machine Learning: From Theory to Practical System Design and Implementation ...2016 4. TITLE AND SUBTITLE Large-scale Linear Optimization through Machine Learning: From Theory to Practical System Design and Implementation 5a...performances on various machine learning tasks and it naturally lends itself to fast parallel implementations . Despite this, very little work has been
Some fast elliptic solvers on parallel architectures and their complexities

NASA Technical Reports Server (NTRS)

Gallopoulos, E.; Saad, Y.

1989-01-01

The discretization of separable elliptic partial differential equations leads to linear systems with special block tridiagonal matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconstant coefficients. A method was recently proposed to parallelize and vectorize BCR. In this paper, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational compelxity lower than that of parallel BCR.
Some fast elliptic solvers on parallel architectures and their complexities

NASA Technical Reports Server (NTRS)

Gallopoulos, E.; Saad, Youcef

1989-01-01

The discretization of separable elliptic partial differential equations leads to linear systems with special block triangular matrices. Several methods are known to solve these systems, the most general of which is the Block Cyclic Reduction (BCR) algorithm which handles equations with nonconsistant coefficients. A method was recently proposed to parallelize and vectorize BCR. Here, the mapping of BCR on distributed memory architectures is discussed, and its complexity is compared with that of other approaches, including the Alternating-Direction method. A fast parallel solver is also described, based on an explicit formula for the solution, which has parallel computational complexity lower than that of parallel BCR.
Efficient parallel architecture for highly coupled real-time linear system applications

NASA Technical Reports Server (NTRS)

Carroll, Chester C.; Homaifar, Abdollah; Barua, Soumavo

1988-01-01

A systematic procedure is developed for exploiting the parallel constructs of computation in a highly coupled, linear system application. An overall top-down design approach is adopted. Differential equations governing the application under consideration are partitioned into subtasks on the basis of a data flow analysis. The interconnected task units constitute a task graph which has to be computed in every update interval. Multiprocessing concepts utilizing parallel integration algorithms are then applied for efficient task graph execution. A simple scheduling routine is developed to handle task allocation while in the multiprocessor mode. Results of simulation and scheduling are compared on the basis of standard performance indices. Processor timing diagrams are developed on the basis of program output accruing to an optimal set of processors. Basic architectural attributes for implementing the system are discussed together with suggestions for processing element design. Emphasis is placed on flexible architectures capable of accommodating widely varying application specifics.
A parallel time integrator for noisy nonlinear oscillatory systems

NASA Astrophysics Data System (ADS)

Subber, Waad; Sarkar, Abhijit

2018-06-01

In this paper, we adapt a parallel time integration scheme to track the trajectories of noisy non-linear dynamical systems. Specifically, we formulate a parallel algorithm to generate the sample path of nonlinear oscillator defined by stochastic differential equations (SDEs) using the so-called parareal method for ordinary differential equations (ODEs). The presence of Wiener process in SDEs causes difficulties in the direct application of any numerical integration techniques of ODEs including the parareal algorithm. The parallel implementation of the algorithm involves two SDEs solvers, namely a fine-level scheme to integrate the system in parallel and a coarse-level scheme to generate and correct the required initial conditions to start the fine-level integrators. For the numerical illustration, a randomly excited Duffing oscillator is investigated in order to study the performance of the stochastic parallel algorithm with respect to a range of system parameters. The distributed implementation of the algorithm exploits Massage Passing Interface (MPI).
Aztec user`s guide. Version 1

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hutchinson, S.A.; Shadid, J.N.; Tuminaro, R.S.

1995-10-01

Aztec is an iterative library that greatly simplifies the parallelization process when solving the linear systems of equations Ax = b where A is a user supplied n x n sparse matrix, b is a user supplied vector of length n and x is a vector of length n to be computed. Aztec is intended as a software tool for users who want to avoid cumbersome parallel programming details but who have large sparse linear systems which require an efficiently utilized parallel processing system. A collection of data transformation tools are provided that allow for easy creation of distributed sparsemore » unstructured matrices for parallel solution. Once the distributed matrix is created, computation can be performed on any of the parallel machines running Aztec: nCUBE 2, IBM SP2 and Intel Paragon, MPI platforms as well as standard serial and vector platforms. Aztec includes a number of Krylov iterative methods such as conjugate gradient (CG), generalized minimum residual (GMRES) and stabilized biconjugate gradient (BICGSTAB) to solve systems of equations. These Krylov methods are used in conjunction with various preconditioners such as polynomial or domain decomposition methods using LU or incomplete LU factorizations within subdomains. Although the matrix A can be general, the package has been designed for matrices arising from the approximation of partial differential equations (PDEs). In particular, the Aztec package is oriented toward systems arising from PDE applications.« less
Nonlinear and parallel algorithms for finite element discretizations of the incompressible Navier-Stokes equations

NASA Astrophysics Data System (ADS)

Arteaga, Santiago Egido

1998-12-01

The steady-state Navier-Stokes equations are of considerable interest because they are used to model numerous common physical phenomena. The applications encountered in practice often involve small viscosities and complicated domain geometries, and they result in challenging problems in spite of the vast attention that has been dedicated to them. In this thesis we examine methods for computing the numerical solution of the primitive variable formulation of the incompressible equations on distributed memory parallel computers. We use the Galerkin method to discretize the differential equations, although most results are stated so that they apply also to stabilized methods. We also reformulate some classical results in a single framework and discuss some issues frequently dismissed in the literature, such as the implementation of pressure space basis and non- homogeneous boundary values. We consider three nonlinear methods: Newton's method, Oseen's (or Picard) iteration, and sequences of Stokes problems. All these iterative nonlinear methods require solving a linear system at every step. Newton's method has quadratic convergence while that of the others is only linear; however, we obtain theoretical bounds showing that Oseen's iteration is more robust, and we confirm it experimentally. In addition, although Oseen's iteration usually requires more iterations than Newton's method, the linear systems it generates tend to be simpler and its overall costs (in CPU time) are lower. The Stokes problems result in linear systems which are easier to solve, but its convergence is much slower, so that it is competitive only for large viscosities. Inexact versions of these methods are studied, and we explain why the best timings are obtained using relatively modest error tolerances in solving the corresponding linear systems. We also present a new damping optimization strategy based on the quadratic nature of the Navier-Stokes equations, which improves the robustness of all the linearization strategies considered and whose computational cost is negligible. The algebraic properties of these systems depend on both the discretization and nonlinear method used. We study in detail the positive definiteness and skewsymmetry of the advection submatrices (essentially, convection-diffusion problems). We propose a discretization based on a new trilinear form for Newton's method. We solve the linear systems using three Krylov subspace methods, GMRES, QMR and TFQMR, and compare the advantages of each. Our emphasis is on parallel algorithms, and so we consider preconditioners suitable for parallel computers such as line variants of the Jacobi and Gauss- Seidel methods, alternating direction implicit methods, and Chebyshev and least squares polynomial preconditioners. These work well for moderate viscosities (moderate Reynolds number). For small viscosities we show that effective parallel solution of the advection subproblem is a critical factor to improve performance. Implementation details on a CM-5 are presented.
Low-Rank Correction Methods for Algebraic Domain Decomposition Preconditioners

DOE PAGES

Li, Ruipeng; Saad, Yousef

2017-08-01

This study presents a parallel preconditioning method for distributed sparse linear systems, based on an approximate inverse of the original matrix, that adopts a general framework of distributed sparse matrices and exploits domain decomposition (DD) and low-rank corrections. The DD approach decouples the matrix and, once inverted, a low-rank approximation is applied by exploiting the Sherman--Morrison--Woodbury formula, which yields two variants of the preconditioning methods. The low-rank expansion is computed by the Lanczos procedure with reorthogonalizations. Numerical experiments indicate that, when combined with Krylov subspace accelerators, this preconditioner can be efficient and robust for solving symmetric sparse linear systems. Comparisonsmore » with pARMS, a DD-based parallel incomplete LU (ILU) preconditioning method, are presented for solving Poisson's equation and linear elasticity problems.« less
Low-Rank Correction Methods for Algebraic Domain Decomposition Preconditioners

DOE Office of Scientific and Technical Information (OSTI.GOV)

Li, Ruipeng; Saad, Yousef

This study presents a parallel preconditioning method for distributed sparse linear systems, based on an approximate inverse of the original matrix, that adopts a general framework of distributed sparse matrices and exploits domain decomposition (DD) and low-rank corrections. The DD approach decouples the matrix and, once inverted, a low-rank approximation is applied by exploiting the Sherman--Morrison--Woodbury formula, which yields two variants of the preconditioning methods. The low-rank expansion is computed by the Lanczos procedure with reorthogonalizations. Numerical experiments indicate that, when combined with Krylov subspace accelerators, this preconditioner can be efficient and robust for solving symmetric sparse linear systems. Comparisonsmore » with pARMS, a DD-based parallel incomplete LU (ILU) preconditioning method, are presented for solving Poisson's equation and linear elasticity problems.« less

Using parallel banded linear system solvers in generalized eigenvalue problems

NASA Technical Reports Server (NTRS)

Zhang, Hong; Moss, William F.

1993-01-01

Subspace iteration is a reliable and cost effective method for solving positive definite banded symmetric generalized eigenproblems, especially in the case of large scale problems. This paper discusses an algorithm that makes use of two parallel banded solvers in subspace iteration. A shift is introduced to decompose the banded linear systems into relatively independent subsystems and to accelerate the iterations. With this shift, an eigenproblem is mapped efficiently into the memories of a multiprocessor and a high speed-up is obtained for parallel implementations. An optimal shift is a shift that balances total computation and communication costs. Under certain conditions, we show how to estimate an optimal shift analytically using the decay rate for the inverse of a banded matrix, and how to improve this estimate. Computational results on iPSC/2 and iPSC/860 multiprocessors are presented.
Parallel Spectral Acquisition with an Ion Cyclotron Resonance Cell Array.

PubMed

Park, Sung-Gun; Anderson, Gordon A; Navare, Arti T; Bruce, James E

2016-01-19

Mass measurement accuracy is a critical analytical figure-of-merit in most areas of mass spectrometry application. However, the time required for acquisition of high-resolution, high mass accuracy data limits many applications and is an aspect under continual pressure for development. Current efforts target implementation of higher electrostatic and magnetic fields because ion oscillatory frequencies increase linearly with field strength. As such, the time required for spectral acquisition of a given resolving power and mass accuracy decreases linearly with increasing fields. Mass spectrometer developments to include multiple high-resolution detectors that can be operated in parallel could further decrease the acquisition time by a factor of n, the number of detectors. Efforts described here resulted in development of an instrument with a set of Fourier transform ion cyclotron resonance (ICR) cells as detectors that constitute the first MS array capable of parallel high-resolution spectral acquisition. ICR cell array systems consisting of three or five cells were constructed with printed circuit boards and installed within a single superconducting magnet and vacuum system. Independent ion populations were injected and trapped within each cell in the array. Upon filling the array, all ions in all cells were simultaneously excited and ICR signals from each cell were independently amplified and recorded in parallel. Presented here are the initial results of successful parallel spectral acquisition, parallel mass spectrometry (MS) and MS/MS measurements, and parallel high-resolution acquisition with the MS array system.
Summer Proceedings 2016: The Center for Computing Research at Sandia National Laboratories

DOE Office of Scientific and Technical Information (OSTI.GOV)

Carleton, James Brian; Parks, Michael L.

Solving sparse linear systems from the discretization of elliptic partial differential equations (PDEs) is an important building block in many engineering applications. Sparse direct solvers can solve general linear systems, but are usually slower and use much more memory than effective iterative solvers. To overcome these two disadvantages, a hierarchical solver (LoRaSp) based on H2-matrices was introduced in [22]. Here, we have developed a parallel version of the algorithm in LoRaSp to solve large sparse matrices on distributed memory machines. On a single processor, the factorization time of our parallel solver scales almost linearly with the problem size for three-dimensionalmore » problems, as opposed to the quadratic scalability of many existing sparse direct solvers. Moreover, our solver leads to almost constant numbers of iterations, when used as a preconditioner for Poisson problems. On more than one processor, our algorithm has significant speedups compared to sequential runs. With this parallel algorithm, we are able to solve large problems much faster than many existing packages as demonstrated by the numerical experiments.« less
Highly efficient and exact method for parallelization of grid-based algorithms and its implementation in DelPhi

PubMed Central

Li, Chuan; Li, Lin; Zhang, Jie; Alexov, Emil

2012-01-01

The Gauss-Seidel method is a standard iterative numerical method widely used to solve a system of equations and, in general, is more efficient comparing to other iterative methods, such as the Jacobi method. However, standard implementation of the Gauss-Seidel method restricts its utilization in parallel computing due to its requirement of using updated neighboring values (i.e., in current iteration) as soon as they are available. Here we report an efficient and exact (not requiring assumptions) method to parallelize iterations and to reduce the computational time as a linear/nearly linear function of the number of CPUs. In contrast to other existing solutions, our method does not require any assumptions and is equally applicable for solving linear and nonlinear equations. This approach is implemented in the DelPhi program, which is a finite difference Poisson-Boltzmann equation solver to model electrostatics in molecular biology. This development makes the iterative procedure on obtaining the electrostatic potential distribution in the parallelized DelPhi several folds faster than that in the serial code. Further we demonstrate the advantages of the new parallelized DelPhi by computing the electrostatic potential and the corresponding energies of large supramolecular structures. PMID:22674480
Linked exploratory visualizations for uncertain MR spectroscopy data

NASA Astrophysics Data System (ADS)

Feng, David; Kwock, Lester; Lee, Yueh; Taylor, Russell M., II

2010-01-01

We present a system for visualizing magnetic resonance spectroscopy (MRS) data sets. Using MRS, radiologists generate multiple 3D scalar fields of metabolite concentrations within the brain and compare them to anatomical magnetic resonance imaging. By understanding the relationship between metabolic makeup and anatomical structure, radiologists hope to better diagnose and treat tumors and lesions. Our system consists of three linked visualizations: a spatial glyph-based technique we call Scaled Data-Driven Spheres, a parallel coordinates visualization augmented to incorporate uncertainty in the data, and a slice plane for accurate data value extraction. The parallel coordinates visualization uses specialized brush interactions designed to help users identify nontrivial linear relationships between scalar fields. We describe two novel contributions to parallel coordinates visualizations: linear function brushing and new axis construction. Users have discovered significant relationships among metabolites and anatomy by linking interactions between the three visualizations.
Linked Exploratory Visualizations for Uncertain MR Spectroscopy Data

PubMed Central

Feng, David; Kwock, Lester; Lee, Yueh; Taylor, Russell M.

2010-01-01

We present a system for visualizing magnetic resonance spectroscopy (MRS) data sets. Using MRS, radiologists generate multiple 3D scalar fields of metabolite concentrations within the brain and compare them to anatomical magnetic resonance imaging. By understanding the relationship between metabolic makeup and anatomical structure, radiologists hope to better diagnose and treat tumors and lesions. Our system consists of three linked visualizations: a spatial glyph-based technique we call Scaled Data-Driven Spheres, a parallel coordinates visualization augmented to incorporate uncertainty in the data, and a slice plane for accurate data value extraction. The parallel coordinates visualization uses specialized brush interactions designed to help users identify nontrivial linear relationships between scalar fields. We describe two novel contributions to parallel coordinates visualizations: linear function brushing and new axis construction. Users have discovered significant relationships among metabolites and anatomy by linking interactions between the three visualizations. PMID:21152337
An efficient implementation of a high-order filter for a cubed-sphere spectral element model

NASA Astrophysics Data System (ADS)

Kang, Hyun-Gyu; Cheong, Hyeong-Bin

2017-03-01

A parallel-scalable, isotropic, scale-selective spatial filter was developed for the cubed-sphere spectral element model on the sphere. The filter equation is a high-order elliptic (Helmholtz) equation based on the spherical Laplacian operator, which is transformed into cubed-sphere local coordinates. The Laplacian operator is discretized on the computational domain, i.e., on each cell, by the spectral element method with Gauss-Lobatto Lagrange interpolating polynomials (GLLIPs) as the orthogonal basis functions. On the global domain, the discrete filter equation yielded a linear system represented by a highly sparse matrix. The density of this matrix increases quadratically (linearly) with the order of GLLIP (order of the filter), and the linear system is solved in only O (Ng) operations, where Ng is the total number of grid points. The solution, obtained by a row reduction method, demonstrated the typical accuracy and convergence rate of the cubed-sphere spectral element method. To achieve computational efficiency on parallel computers, the linear system was treated by an inverse matrix method (a sparse matrix-vector multiplication). The density of the inverse matrix was lowered to only a few times of the original sparse matrix without degrading the accuracy of the solution. For better computational efficiency, a local-domain high-order filter was introduced: The filter equation is applied to multiple cells, and then the central cell was only used to reconstruct the filtered field. The parallel efficiency of applying the inverse matrix method to the global- and local-domain filter was evaluated by the scalability on a distributed-memory parallel computer. The scale-selective performance of the filter was demonstrated on Earth topography. The usefulness of the filter as a hyper-viscosity for the vorticity equation was also demonstrated.
Accelerated Electron-Beam Formation with a High Capture Coefficient in a Parallel Coupled Accelerating Structure

NASA Astrophysics Data System (ADS)

Chernousov, Yu. D.; Shebolaev, I. V.; Ikryanov, I. M.

2018-01-01

An electron beam with a high (close to 100%) coefficient of electron capture into the regime of acceleration has been obtained in a linear electron accelerator based on a parallel coupled slow-wave structure, electron gun with microwave-controlled injection current, and permanent-magnet beam-focusing system. The high capture coefficient was due to the properties of the accelerating structure, beam-focusing system, and electron-injection system. Main characteristics of the proposed systems are presented.
Optical computing using optical flip-flops in Fourier processors: use in matrix multiplication and discrete linear transforms.

PubMed

Ando, S; Sekine, S; Mita, M; Katsuo, S

1989-12-15

An architecture and the algorithms for matrix multiplication using optical flip-flops (OFFs) in optical processors are proposed based on residue arithmetic. The proposed system is capable of processing all elements of matrices in parallel utilizing the information retrieving ability of optical Fourier processors. The employment of OFFs enables bidirectional data flow leading to a simpler architecture and the burden of residue-to-decimal (or residue-to-binary) conversion to operation time can be largely reduced by processing all elements in parallel. The calculated characteristics of operation time suggest a promising use of the system in a real time 2-D linear transform.
Polymerase chain reaction system

DOEpatents

Benett, William J.; Richards, James B.; Stratton, Paul L.; Hadley, Dean R.; Milanovich, Fred P.; Belgrader, Phil; Meyer, Peter L.

2004-03-02

A portable polymerase chain reaction DNA amplification and detection system includes one or more chamber modules. Each module supports a duplex assay of a biological sample. Each module has two parallel interrogation ports with a linear optical system. The system is capable of being handheld.
Domain decomposition methods for the parallel computation of reacting flows

NASA Technical Reports Server (NTRS)

Keyes, David E.

1988-01-01

Domain decomposition is a natural route to parallel computing for partial differential equation solvers. Subdomains of which the original domain of definition is comprised are assigned to independent processors at the price of periodic coordination between processors to compute global parameters and maintain the requisite degree of continuity of the solution at the subdomain interfaces. In the domain-decomposed solution of steady multidimensional systems of PDEs by finite difference methods using a pseudo-transient version of Newton iteration, the only portion of the computation which generally stands in the way of efficient parallelization is the solution of the large, sparse linear systems arising at each Newton step. For some Jacobian matrices drawn from an actual two-dimensional reacting flow problem, comparisons are made between relaxation-based linear solvers and also preconditioned iterative methods of Conjugate Gradient and Chebyshev type, focusing attention on both iteration count and global inner product count. The generalized minimum residual method with block-ILU preconditioning is judged the best serial method among those considered, and parallel numerical experiments on the Encore Multimax demonstrate for it approximately 10-fold speedup on 16 processors.
Porting AMG2013 to Heterogeneous CPU+GPU Nodes

DOE Office of Scientific and Technical Information (OSTI.GOV)

Samfass, Philipp

LLNL's future advanced technology system SIERRA will feature heterogeneous compute nodes that consist of IBM PowerV9 CPUs and NVIDIA Volta GPUs. Conceptually, the motivation for such an architecture is quite straightforward: While GPUs are optimized for throughput on massively parallel workloads, CPUs strive to minimize latency for rather sequential operations. Yet, making optimal use of heterogeneous architectures raises new challenges for the development of scalable parallel software, e.g., with respect to work distribution. Porting LLNL's parallel numerical libraries to upcoming heterogeneous CPU+GPU architectures is therefore a critical factor for ensuring LLNL's future success in ful lling its national mission. Onemore » of these libraries, called HYPRE, provides parallel solvers and precondi- tioners for large, sparse linear systems of equations. In the context of this intern- ship project, I consider AMG2013 which is a proxy application for major parts of HYPRE that implements a benchmark for setting up and solving di erent systems of linear equations. In the following, I describe in detail how I ported multiple parts of AMG2013 to the GPU (Section 2) and present results for di erent experiments that demonstrate a successful parallel implementation on the heterogeneous ma- chines surface and ray (Section 3). In Section 4, I give guidelines on how my code should be used. Finally, I conclude and give an outlook for future work (Section 5).« less
Analysis of Monte Carlo accelerated iterative methods for sparse linear systems: Analysis of Monte Carlo accelerated iterative methods for sparse linear systems

DOE PAGES

Benzi, Michele; Evans, Thomas M.; Hamilton, Steven P.; ...

2017-03-05

Here, we consider hybrid deterministic-stochastic iterative algorithms for the solution of large, sparse linear systems. Starting from a convergent splitting of the coefficient matrix, we analyze various types of Monte Carlo acceleration schemes applied to the original preconditioned Richardson (stationary) iteration. We expect that these methods will have considerable potential for resiliency to faults when implemented on massively parallel machines. We also establish sufficient conditions for the convergence of the hybrid schemes, and we investigate different types of preconditioners including sparse approximate inverses. Numerical experiments on linear systems arising from the discretization of partial differential equations are presented.
Parallel Event Analysis Under Unix

NASA Astrophysics Data System (ADS)

Looney, S.; Nilsson, B. S.; Oest, T.; Pettersson, T.; Ranjard, F.; Thibonnier, J.-P.

The ALEPH experiment at LEP, the CERN CN division and Digital Equipment Corp. have, in a joint project, developed a parallel event analysis system. The parallel physics code is identical to ALEPH's standard analysis code, ALPHA, only the organisation of input/output is changed. The user may switch between sequential and parallel processing by simply changing one input "card". The initial implementation runs on an 8-node DEC 3000/400 farm, using the PVM software, and exhibits a near-perfect speed-up linearity, reducing the turn-around time by a factor of 8.
Massively parallel and linear-scaling algorithm for second-order Møller-Plesset perturbation theory applied to the study of supramolecular wires

NASA Astrophysics Data System (ADS)

Kjærgaard, Thomas; Baudin, Pablo; Bykov, Dmytro; Eriksen, Janus Juul; Ettenhuber, Patrick; Kristensen, Kasper; Larkin, Jeff; Liakh, Dmitry; Pawłowski, Filip; Vose, Aaron; Wang, Yang Min; Jørgensen, Poul

2017-03-01

We present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide-Expand-Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide-Expand-Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalability of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the "resolution of the identity second-order Møller-Plesset perturbation theory" (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.
Parallel and fault-tolerant algorithms for hypercube multiprocessors

DOE Office of Scientific and Technical Information (OSTI.GOV)

Aykanat, C.

1988-01-01

Several techniques for increasing the performance of parallel algorithms on distributed-memory message-passing multi-processor systems are investigated. These techniques are effectively implemented for the parallelization of the Scaled Conjugate Gradient (SCG) algorithm on a hypercube connected message-passing multi-processor. Significant performance improvement is achieved by using these techniques. The SCG algorithm is used for the solution phase of an FE modeling system. Almost linear speed-up is achieved, and it is shown that hypercube topology is scalable for an FE class of problem. The SCG algorithm is also shown to be suitable for vectorization, and near supercomputer performance is achieved on a vectormore » hypercube multiprocessor by exploiting both parallelization and vectorization. Fault-tolerance issues for the parallel SCG algorithm and for the hypercube topology are also addressed.« less
Parallel Algorithms for Least Squares and Related Computations.

DTIC Science & Technology

1991-03-22

for dense computations in linear algebra . The work has recently been published in a general reference book on parallel algorithms by SIAM. AFO SR...written his Ph.D. dissertation with the principal investigator. (See publication 6.) • Parallel Algorithms for Dense Linear Algebra Computations. Our...and describe and to put into perspective a selection of the more important parallel algorithms for numerical linear algebra . We give a major new
An M-step preconditioned conjugate gradient method for parallel computation

NASA Technical Reports Server (NTRS)

Adams, L.

1983-01-01

This paper describes a preconditioned conjugate gradient method that can be effectively implemented on both vector machines and parallel arrays to solve sparse symmetric and positive definite systems of linear equations. The implementation on the CYBER 203/205 and on the Finite Element Machine is discussed and results obtained using the method on these machines are given.
Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation

PubMed Central

Lee, Jae H.; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T.; Seo, Youngho

2014-01-01

The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting. PMID:27081299
Handling Big Data in Medical Imaging: Iterative Reconstruction with Large-Scale Automated Parallel Computation.

PubMed

Lee, Jae H; Yao, Yushu; Shrestha, Uttam; Gullberg, Grant T; Seo, Youngho

2014-11-01

The primary goal of this project is to implement the iterative statistical image reconstruction algorithm, in this case maximum likelihood expectation maximum (MLEM) used for dynamic cardiac single photon emission computed tomography, on Spark/GraphX. This involves porting the algorithm to run on large-scale parallel computing systems. Spark is an easy-to- program software platform that can handle large amounts of data in parallel. GraphX is a graph analytic system running on top of Spark to handle graph and sparse linear algebra operations in parallel. The main advantage of implementing MLEM algorithm in Spark/GraphX is that it allows users to parallelize such computation without any expertise in parallel computing or prior knowledge in computer science. In this paper we demonstrate a successful implementation of MLEM in Spark/GraphX and present the performance gains with the goal to eventually make it useable in clinical setting.

Efficient parallel simulation of CO2 geologic sequestration insaline aquifers

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Keni; Doughty, Christine; Wu, Yu-Shu

2007-01-01

An efficient parallel simulator for large-scale, long-termCO2 geologic sequestration in saline aquifers has been developed. Theparallel simulator is a three-dimensional, fully implicit model thatsolves large, sparse linear systems arising from discretization of thepartial differential equations for mass and energy balance in porous andfractured media. The simulator is based on the ECO2N module of the TOUGH2code and inherits all the process capabilities of the single-CPU TOUGH2code, including a comprehensive description of the thermodynamics andthermophysical properties of H2O-NaCl- CO2 mixtures, modeling singleand/or two-phase isothermal or non-isothermal flow processes, two-phasemixtures, fluid phases appearing or disappearing, as well as saltprecipitation or dissolution. The newmore » parallel simulator uses MPI forparallel implementation, the METIS software package for simulation domainpartitioning, and the iterative parallel linear solver package Aztec forsolving linear equations by multiple processors. In addition, theparallel simulator has been implemented with an efficient communicationscheme. Test examples show that a linear or super-linear speedup can beobtained on Linux clusters as well as on supercomputers. Because of thesignificant improvement in both simulation time and memory requirement,the new simulator provides a powerful tool for tackling larger scale andmore complex problems than can be solved by single-CPU codes. Ahigh-resolution simulation example is presented that models buoyantconvection, induced by a small increase in brine density caused bydissolution of CO2.« less
An Efficient Fuzzy Controller Design for Parallel Connected Induction Motor Drives

NASA Astrophysics Data System (ADS)

Usha, S.; Subramani, C.

2018-04-01

Generally, an induction motors are highly non-linear and has a complex time varying dynamics. This makes the speed control of an induction motor a challenging issue in the industries. But, due to the recent trends in the power electronic devices and intelligent controllers, the speed control of the induction motor is achieved by including non-linear characteristics also. Conventionally a single inverter is used to run one induction motor in industries. In the traction applications, two or more inductions motors are operated in parallel to reduce the size and cost of induction motors. In this application, the parallel connected induction motors can be driven by a single inverter unit. The stability problems may introduce in the parallel operation under low speed operating conditions. Hence, the speed deviations should be reduce with help of suitable controllers. The speed control of the parallel connected system is performed by PID controller and fuzzy logic controller. In this paper the speed response of the induction motor for the rating of IHP, 1440 rpm, and 50Hz with these controller are compared in time domain specifications. The stability analysis of the system also performed under low speed using matlab platform. The hardware model is developed for speed control using fuzzy logic controller which exhibited superior performances over the other controller.
Linearly exact parallel closures for slab geometry

NASA Astrophysics Data System (ADS)

Ji, Jeong-Young; Held, Eric D.; Jhang, Hogun

2013-08-01

Parallel closures are obtained by solving a linearized kinetic equation with a model collision operator using the Fourier transform method. The closures expressed in wave number space are exact for time-dependent linear problems to within the limits of the model collision operator. In the adiabatic, collisionless limit, an inverse Fourier transform is performed to obtain integral (nonlocal) parallel closures in real space; parallel heat flow and viscosity closures for density, temperature, and flow velocity equations replace Braginskii's parallel closure relations, and parallel flow velocity and heat flow closures for density and temperature equations replace Spitzer's parallel transport relations. It is verified that the closures reproduce the exact linear response function of Hammett and Perkins [Phys. Rev. Lett. 64, 3019 (1990)] for Landau damping given a temperature gradient. In contrast to their approximate closures where the vanishing viscosity coefficient numerically gives an exact response, our closures relate the heat flow and nonvanishing viscosity to temperature and flow velocity (gradients).
Parallel numerical modeling of hybrid-dimensional compositional non-isothermal Darcy flows in fractured porous media

NASA Astrophysics Data System (ADS)

Xing, F.; Masson, R.; Lopez, S.

2017-09-01

This paper introduces a new discrete fracture model accounting for non-isothermal compositional multiphase Darcy flows and complex networks of fractures with intersecting, immersed and non-immersed fractures. The so called hybrid-dimensional model using a 2D model in the fractures coupled with a 3D model in the matrix is first derived rigorously starting from the equi-dimensional matrix fracture model. Then, it is discretized using a fully implicit time integration combined with the Vertex Approximate Gradient (VAG) finite volume scheme which is adapted to polyhedral meshes and anisotropic heterogeneous media. The fully coupled systems are assembled and solved in parallel using the Single Program Multiple Data (SPMD) paradigm with one layer of ghost cells. This strategy allows for a local assembly of the discrete systems. An efficient preconditioner is implemented to solve the linear systems at each time step and each Newton type iteration of the simulation. The numerical efficiency of our approach is assessed on different meshes, fracture networks, and physical settings in terms of parallel scalability, nonlinear convergence and linear convergence.
On the modelling of linear-assisted DC-DC voltage regulators for photovoltaic solar energy systems

NASA Astrophysics Data System (ADS)

Martínez-García, Herminio; García-Vílchez, Encarna

2017-11-01

This paper shows the modelling of linear-assisted or hybrid (linear & switching) DC/DC voltage regulators. In this kind of regulators, an auxiliary linear regulator is used, which objective is to cancel the ripple at the output voltage and provide fast responses for load variations. On the other hand, a switching DC/DC converter, connected in parallel with the linear regulator, allows to supply almost the whole output current demanded by the load. The objective of this topology is to take advantage of the suitable regulation characteristics that series linear voltage regulators have, but almost achieving the high efficiency that switching DC/DC converters provide. Linear-assisted DC/DC regulators are feedback systems with potential instability. Therefore, their modelling is mandatory in order to obtain design guidelines and assure stability of the implemented power supply system.
Design Method of Digital Optimal Control Scheme and Multiple Paralleled Bridge Type Current Amplifier for Generating Gradient Magnetic Fields in MRI Systems

NASA Astrophysics Data System (ADS)

Watanabe, Shuji; Takano, Hiroshi; Fukuda, Hiroya; Hiraki, Eiji; Nakaoka, Mutsuo

This paper deals with a digital control scheme of multiple paralleled high frequency switching current amplifier with four-quadrant chopper for generating gradient magnetic fields in MRI (Magnetic Resonance Imaging) systems. In order to track high precise current pattern in Gradient Coils (GC), the proposal current amplifier cancels the switching current ripples in GC with each other and designed optimum switching gate pulse patterns without influences of the large filter current ripple amplitude. The optimal control implementation and the linear control theory in GC current amplifiers have affinity to each other with excellent characteristics. The digital control system can be realized easily through the digital control implementation, DSPs or microprocessors. Multiple-parallel operational microprocessors realize two or higher paralleled GC current pattern tracking amplifier with optimal control design and excellent results are given for improving the image quality of MRI systems.
Reduced-thickness backlighter for autostereoscopic display and display using the backlighter

NASA Technical Reports Server (NTRS)

Eichenlaub, Jesse B (Inventor); Gruhlke, Russell W (Inventor)

1999-01-01

A reduced-thickness backlighter for an autostereoscopic display is disclosed having a lightguide and at least one light source parallel to an edge of the lightguide so as to be substantially coplanar with the lightguide. The lightguide is provided with a first surface which has a plurality of reflective linear regions, such as elongated grooves or glossy lines, parallel to the illuminated edge of the lightguide. Preferably the lightguide further has a second surface which has a plurality of lenticular lenses for reimaging the reflected light from the linear regions into a series of thin vertical lines outside the guide. Because of the reduced thickness of the backlighter system, autostereoscopic viewing is enabled in applications requiring thin backlighter systems. In addition to taking up less space, the reduced-thickness backlighter uses less lamps and less power. For accommodating 2-D applications, a 2-D diffuser plate or a 2-D lightguide parallel to the 3-D backlighter is disclosed for switching back and forth between 3-D viewing and 2-D viewing.
Computing Gröbner and Involutive Bases for Linear Systems of Difference Equations

NASA Astrophysics Data System (ADS)

Yanovich, Denis

2018-02-01

The computation of involutive bases and Gröbner bases for linear systems of difference equations is solved and its importance for physical and mathematical problems is discussed. The algorithm and issues concerning its implementation in C are presented and calculation times are compared with the competing programs. The paper ends with consideration on the parallel version of this implementation and its scalability.
Advanced propulsion system concept for hybrid vehicles

NASA Technical Reports Server (NTRS)

Bhate, S.; Chen, H.; Dochat, G.

1980-01-01

A series hybrid system, utilizing a free piston Stirling engine with a linear alternator, and a parallel hybrid system, incorporating a kinematic Stirling engine, are analyzed for various specified reference missions/vehicles ranging from a small two passenger commuter vehicle to a van. Parametric studies for each configuration, detail tradeoff studies to determine engine, battery and system definition, short term energy storage evaluation, and detail life cycle cost studies were performed. Results indicate that the selection of a parallel Stirling engine/electric, hybrid propulsion system can significantly reduce petroleum consumption by 70 percent over present conventional vehicles.
Tri-state oriented parallel processing system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tenenbaum, J.; Wallach, Y.

1982-08-01

An alternating sequential/parallel system, the MOPPS was introduced a few years ago and is modified despite the fact that it solved satisfactorily a number of real-time problems. The new system, the TOPPS is described and compared to MOPPS and two applications are chosen to prove it to be superior. The advantage of having a third basic, the ring mode, is illustrated when solving sets of linear equations with band matrices. The advantage of having independent I/O for the slaves is illustrated for biomedical signal analysis. 11 references.
Massively parallel and linear-scaling algorithm for second-order Moller–Plesset perturbation theory applied to the study of supramolecular wires

DOE PAGES

Kjaergaard, Thomas; Baudin, Pablo; Bykov, Dmytro; ...

2016-11-16

Here, we present a scalable cross-platform hybrid MPI/OpenMP/OpenACC implementation of the Divide–Expand–Consolidate (DEC) formalism with portable performance on heterogeneous HPC architectures. The Divide–Expand–Consolidate formalism is designed to reduce the steep computational scaling of conventional many-body methods employed in electronic structure theory to linear scaling, while providing a simple mechanism for controlling the error introduced by this approximation. Our massively parallel implementation of this general scheme has three levels of parallelism, being a hybrid of the loosely coupled task-based parallelization approach and the conventional MPI +X programming model, where X is either OpenMP or OpenACC. We demonstrate strong and weak scalabilitymore » of this implementation on heterogeneous HPC systems, namely on the GPU-based Cray XK7 Titan supercomputer at the Oak Ridge National Laboratory. Using the “resolution of the identity second-order Moller–Plesset perturbation theory” (RI-MP2) as the physical model for simulating correlated electron motion, the linear-scaling DEC implementation is applied to 1-aza-adamantane-trione (AAT) supramolecular wires containing up to 40 monomers (2440 atoms, 6800 correlated electrons, 24 440 basis functions and 91 280 auxiliary functions). This represents the largest molecular system treated at the MP2 level of theory, demonstrating an efficient removal of the scaling wall pertinent to conventional quantum many-body methods.« less
Center for Parallel Optimization.

DTIC Science & Technology

1996-03-19

A NEW OPTIMIZATION BASED APPROACH TO IMPROVING GENERALIZATION IN MACHINE LEARNING HAS BEEN PROPOSED AND COMPUTATIONALLY VALIDATED ON SIMPLE LINEAR MODELS AS WELL AS ON HIGHLY NONLINEAR SYSTEMS SUCH AS NEURAL NETWORKS.
Geopotential Error Analysis from Satellite Gradiometer and Global Positioning System Observables on Parallel Architecture

NASA Technical Reports Server (NTRS)

Schutz, Bob E.; Baker, Gregory A.

1997-01-01

The recovery of a high resolution geopotential from satellite gradiometer observations motivates the examination of high performance computational techniques. The primary subject matter addresses specifically the use of satellite gradiometer and GPS observations to form and invert the normal matrix associated with a large degree and order geopotential solution. Memory resident and out-of-core parallel linear algebra techniques along with data parallel batch algorithms form the foundation of the least squares application structure. A secondary topic includes the adoption of object oriented programming techniques to enhance modularity and reusability of code. Applications implementing the parallel and object oriented methods successfully calculate the degree variance for a degree and order 110 geopotential solution on 32 processors of the Cray T3E. The memory resident gradiometer application exhibits an overall application performance of 5.4 Gflops, and the out-of-core linear solver exhibits an overall performance of 2.4 Gflops. The combination solution derived from a sun synchronous gradiometer orbit produce average geoid height variances of 17 millimeters.
Geopotential error analysis from satellite gradiometer and global positioning system observables on parallel architectures

NASA Astrophysics Data System (ADS)

Baker, Gregory Allen

The recovery of a high resolution geopotential from satellite gradiometer observations motivates the examination of high performance computational techniques. The primary subject matter addresses specifically the use of satellite gradiometer and GPS observations to form and invert the normal matrix associated with a large degree and order geopotential solution. Memory resident and out-of-core parallel linear algebra techniques along with data parallel batch algorithms form the foundation of the least squares application structure. A secondary topic includes the adoption of object oriented programming techniques to enhance modularity and reusability of code. Applications implementing the parallel and object oriented methods successfully calculate the degree variance for a degree and order 110 geopotential solution on 32 processors of the Cray T3E. The memory resident gradiometer application exhibits an overall application performance of 5.4 Gflops, and the out-of-core linear solver exhibits an overall performance of 2.4 Gflops. The combination solution derived from a sun synchronous gradiometer orbit produce average geoid height variances of 17 millimeters.
Frequency-encoded photonic qubits for scalable quantum information processing

DOE PAGES

Lukens, Joseph M.; Lougovski, Pavel

2016-12-21

Among the objectives for large-scale quantum computation is the quantum interconnect: a device that uses photons to interface qubits that otherwise could not interact. However, the current approaches require photons indistinguishable in frequency—a major challenge for systems experiencing different local environments or of different physical compositions altogether. Here, we develop an entirely new platform that actually exploits such frequency mismatch for processing quantum information. Labeled “spectral linear optical quantum computation” (spectral LOQC), our protocol offers favorable linear scaling of optical resources and enjoys an unprecedented degree of parallelism, as an arbitrary Ν-qubit quantum gate may be performed in parallel onmore » multiple Ν-qubit sets in the same linear optical device. Here, not only does spectral LOQC offer new potential for optical interconnects, but it also brings the ubiquitous technology of high-speed fiber optics to bear on photonic quantum information, making wavelength-configurable and robust optical quantum systems within reach.« less
Frequency-encoded photonic qubits for scalable quantum information processing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lukens, Joseph M.; Lougovski, Pavel

Among the objectives for large-scale quantum computation is the quantum interconnect: a device that uses photons to interface qubits that otherwise could not interact. However, the current approaches require photons indistinguishable in frequency—a major challenge for systems experiencing different local environments or of different physical compositions altogether. Here, we develop an entirely new platform that actually exploits such frequency mismatch for processing quantum information. Labeled “spectral linear optical quantum computation” (spectral LOQC), our protocol offers favorable linear scaling of optical resources and enjoys an unprecedented degree of parallelism, as an arbitrary Ν-qubit quantum gate may be performed in parallel onmore » multiple Ν-qubit sets in the same linear optical device. Here, not only does spectral LOQC offer new potential for optical interconnects, but it also brings the ubiquitous technology of high-speed fiber optics to bear on photonic quantum information, making wavelength-configurable and robust optical quantum systems within reach.« less
A Numerical Study of Scalable Cardiac Electro-Mechanical Solvers on HPC Architectures

PubMed Central

Colli Franzone, Piero; Pavarino, Luca F.; Scacchi, Simone

2018-01-01

We introduce and study some scalable domain decomposition preconditioners for cardiac electro-mechanical 3D simulations on parallel HPC (High Performance Computing) architectures. The electro-mechanical model of the cardiac tissue is composed of four coupled sub-models: (1) the static finite elasticity equations for the transversely isotropic deformation of the cardiac tissue; (2) the active tension model describing the dynamics of the intracellular calcium, cross-bridge binding and myofilament tension; (3) the anisotropic Bidomain model describing the evolution of the intra- and extra-cellular potentials in the deforming cardiac tissue; and (4) the ionic membrane model describing the dynamics of ionic currents, gating variables, ionic concentrations and stretch-activated channels. This strongly coupled electro-mechanical model is discretized in time with a splitting semi-implicit technique and in space with isoparametric finite elements. The resulting scalable parallel solver is based on Multilevel Additive Schwarz preconditioners for the solution of the Bidomain system and on BDDC preconditioned Newton-Krylov solvers for the non-linear finite elasticity system. The results of several 3D parallel simulations show the scalability of both linear and non-linear solvers and their application to the study of both physiological excitation-contraction cardiac dynamics and re-entrant waves in the presence of different mechano-electrical feedbacks. PMID:29674971
Solar Wind Proton Temperature Anisotropy: Linear Theory and WIND/SWE Observations

NASA Technical Reports Server (NTRS)

Hellinger, P.; Travnicek, P.; Kasper, J. C.; Lazarus, A. J.

2006-01-01

We present a comparison between WIND/SWE observations (Kasper et al., 2006) of beta parallel to p and T perpendicular to p/T parallel to p (where beta parallel to p is the proton parallel beta and T perpendicular to p and T parallel to p are the perpendicular and parallel proton are the perpendicular and parallel proton temperatures, respectively; here parallel and perpendicular indicate directions with respect to the ambient magnetic field) and predictions of the Vlasov linear theory. In the slow solar wind, the observed proton temperature anisotropy seems to be constrained by oblique instabilities, by the mirror one and the oblique fire hose, contrary to the results of the linear theory which predicts a dominance of the proton cyclotron instability and the parallel fire hose. The fast solar wind core protons exhibit an anticorrelation between beta parallel to c and T perpendicular to c/T parallel to c (where beta parallel to c is the core proton parallel beta and T perpendicular to c and T parallel to c are the perpendicular and parallel core proton temperatures, respectively) similar to that observed in the HELIOS data (Marsch et al., 2004).
A Parallel Numerical Algorithm To Solve Linear Systems Of Equations Emerging From 3D Radiative Transfer

NASA Astrophysics Data System (ADS)

Wichert, Viktoria; Arkenberg, Mario; Hauschildt, Peter H.

2016-10-01

Highly resolved state-of-the-art 3D atmosphere simulations will remain computationally extremely expensive for years to come. In addition to the need for more computing power, rethinking coding practices is necessary. We take a dual approach by introducing especially adapted, parallel numerical methods and correspondingly parallelizing critical code passages. In the following, we present our respective work on PHOENIX/3D. With new parallel numerical algorithms, there is a big opportunity for improvement when iteratively solving the system of equations emerging from the operator splitting of the radiative transfer equation J = ΛS. The narrow-banded approximate Λ-operator Λ* , which is used in PHOENIX/3D, occurs in each iteration step. By implementing a numerical algorithm which takes advantage of its characteristic traits, the parallel code's efficiency is further increased and a speed-up in computational time can be achieved.
Optics Program Modified for Multithreaded Parallel Computing

NASA Technical Reports Server (NTRS)

Lou, John; Bedding, Dave; Basinger, Scott

2006-01-01

A powerful high-performance computer program for simulating and analyzing adaptive and controlled optical systems has been developed by modifying the serial version of the Modeling and Analysis for Controlled Optical Systems (MACOS) program to impart capabilities for multithreaded parallel processing on computing systems ranging from supercomputers down to Symmetric Multiprocessing (SMP) personal computers. The modifications included the incorporation of OpenMP, a portable and widely supported application interface software, that can be used to explicitly add multithreaded parallelism to an application program under a shared-memory programming model. OpenMP was applied to parallelize ray-tracing calculations, one of the major computing components in MACOS. Multithreading is also used in the diffraction propagation of light in MACOS based on pthreads [POSIX Thread, (where "POSIX" signifies a portable operating system for UNIX)]. In tests of the parallelized version of MACOS, the speedup in ray-tracing calculations was found to be linear, or proportional to the number of processors, while the speedup in diffraction calculations ranged from 50 to 60 percent, depending on the type and number of processors. The parallelized version of MACOS is portable, and, to the user, its interface is basically the same as that of the original serial version of MACOS.

Parallel and Preemptable Dynamically Dimensioned Search Algorithms for Single and Multi-objective Optimization in Water Resources

NASA Astrophysics Data System (ADS)

Tolson, B.; Matott, L. S.; Gaffoor, T. A.; Asadzadeh, M.; Shafii, M.; Pomorski, P.; Xu, X.; Jahanpour, M.; Razavi, S.; Haghnegahdar, A.; Craig, J. R.

2015-12-01

We introduce asynchronous parallel implementations of the Dynamically Dimensioned Search (DDS) family of algorithms including DDS, discrete DDS, PA-DDS and DDS-AU. These parallel algorithms are unique from most existing parallel optimization algorithms in the water resources field in that parallel DDS is asynchronous and does not require an entire population (set of candidate solutions) to be evaluated before generating and then sending a new candidate solution for evaluation. One key advance in this study is developing the first parallel PA-DDS multi-objective optimization algorithm. The other key advance is enhancing the computational efficiency of solving optimization problems (such as model calibration) by combining a parallel optimization algorithm with the deterministic model pre-emption concept. These two efficiency techniques can only be combined because of the asynchronous nature of parallel DDS. Model pre-emption functions to terminate simulation model runs early, prior to completely simulating the model calibration period for example, when intermediate results indicate the candidate solution is so poor that it will definitely have no influence on the generation of further candidate solutions. The computational savings of deterministic model preemption available in serial implementations of population-based algorithms (e.g., PSO) disappear in synchronous parallel implementations as these algorithms. In addition to the key advances above, we implement the algorithms across a range of computation platforms (Windows and Unix-based operating systems from multi-core desktops to a supercomputer system) and package these for future modellers within a model-independent calibration software package called Ostrich as well as MATLAB versions. Results across multiple platforms and multiple case studies (from 4 to 64 processors) demonstrate the vast improvement over serial DDS-based algorithms and highlight the important role model pre-emption plays in the performance of parallel, pre-emptable DDS algorithms. Case studies include single- and multiple-objective optimization problems in water resources model calibration and in many cases linear or near linear speedups are observed.
Communications oriented programming of parallel iterative solutions of sparse linear systems

NASA Technical Reports Server (NTRS)

Patrick, M. L.; Pratt, T. W.

1986-01-01

Parallel algorithms are developed for a class of scientific computational problems by partitioning the problems into smaller problems which may be solved concurrently. The effectiveness of the resulting parallel solutions is determined by the amount and frequency of communication and synchronization and the extent to which communication can be overlapped with computation. Three different parallel algorithms for solving the same class of problems are presented, and their effectiveness is analyzed from this point of view. The algorithms are programmed using a new programming environment. Run-time statistics and experience obtained from the execution of these programs assist in measuring the effectiveness of these algorithms.
Influence of equilibrium shear flow in the parallel magnetic direction on edge localized mode crash

DOE Office of Scientific and Technical Information (OSTI.GOV)

Luo, Y.; Xiong, Y. Y.; Chen, S. Y., E-mail: sychen531@163.com

2016-04-15

The influence of the parallel shear flow on the evolution of peeling-ballooning (P-B) modes is studied with the BOUT++ four-field code in this paper. The parallel shear flow has different effects in linear simulation and nonlinear simulation. In the linear simulations, the growth rate of edge localized mode (ELM) can be increased by Kelvin-Helmholtz term, which can be caused by the parallel shear flow. In the nonlinear simulations, the results accord with the linear simulations in the linear phase. However, the ELM size is reduced by the parallel shear flow in the beginning of the turbulence phase, which is recognizedmore » as the P-B filaments' structure. Then during the turbulence phase, the ELM size is decreased by the shear flow.« less
A scalable parallel black oil simulator on distributed memory parallel computers

NASA Astrophysics Data System (ADS)

Wang, Kun; Liu, Hui; Chen, Zhangxin

2015-11-01

This paper presents our work on developing a parallel black oil simulator for distributed memory computers based on our in-house parallel platform. The parallel simulator is designed to overcome the performance issues of common simulators that are implemented for personal computers and workstations. The finite difference method is applied to discretize the black oil model. In addition, some advanced techniques are employed to strengthen the robustness and parallel scalability of the simulator, including an inexact Newton method, matrix decoupling methods, and algebraic multigrid methods. A new multi-stage preconditioner is proposed to accelerate the solution of linear systems from the Newton methods. Numerical experiments show that our simulator is scalable and efficient, and is capable of simulating extremely large-scale black oil problems with tens of millions of grid blocks using thousands of MPI processes on parallel computers.
Graph-based linear scaling electronic structure theory.

PubMed

Niklasson, Anders M N; Mniszewski, Susan M; Negre, Christian F A; Cawkwell, Marc J; Swart, Pieter J; Mohd-Yusof, Jamal; Germann, Timothy C; Wall, Michael E; Bock, Nicolas; Rubensson, Emanuel H; Djidjev, Hristo

2016-06-21

We show how graph theory can be combined with quantum theory to calculate the electronic structure of large complex systems. The graph formalism is general and applicable to a broad range of electronic structure methods and materials, including challenging systems such as biomolecules. The methodology combines well-controlled accuracy, low computational cost, and natural low-communication parallelism. This combination addresses substantial shortcomings of linear scaling electronic structure theory, in particular with respect to quantum-based molecular dynamics simulations.
Graph-based linear scaling electronic structure theory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Niklasson, Anders M. N., E-mail: amn@lanl.gov; Negre, Christian F. A.; Cawkwell, Marc J.

2016-06-21

We show how graph theory can be combined with quantum theory to calculate the electronic structure of large complex systems. The graph formalism is general and applicable to a broad range of electronic structure methods and materials, including challenging systems such as biomolecules. The methodology combines well-controlled accuracy, low computational cost, and natural low-communication parallelism. This combination addresses substantial shortcomings of linear scaling electronic structure theory, in particular with respect to quantum-based molecular dynamics simulations.
Scalable Parallel Computation for Extended MHD Modeling of Fusion Plasmas

NASA Astrophysics Data System (ADS)

Glasser, Alan H.

2008-11-01

Parallel solution of a linear system is scalable if simultaneously doubling the number of dependent variables and the number of processors results in little or no increase in the computation time to solution. Two approaches have this property for parabolic systems: multigrid and domain decomposition. Since extended MHD is primarily a hyperbolic rather than a parabolic system, additional steps must be taken to parabolize the linear system to be solved by such a method. Such physics-based preconditioning (PBP) methods have been pioneered by Chac'on, using finite volumes for spatial discretization, multigrid for solution of the preconditioning equations, and matrix-free Newton-Krylov methods for the accurate solution of the full nonlinear preconditioned equations. The work described here is an extension of these methods using high-order spectral element methods and FETI-DP domain decomposition. Application of PBP to a flux-source representation of the physics equations is discussed. The resulting scalability will be demonstrated for simple wave and for ideal and Hall MHD waves.
Parallel iterative solution for h and p approximations of the shallow water equations

USGS Publications Warehouse

Barragy, E.J.; Walters, R.A.

1998-01-01

A p finite element scheme and parallel iterative solver are introduced for a modified form of the shallow water equations. The governing equations are the three-dimensional shallow water equations. After a harmonic decomposition in time and rearrangement, the resulting equations are a complex Helmholz problem for surface elevation, and a complex momentum equation for the horizontal velocity. Both equations are nonlinear and the resulting system is solved using the Picard iteration combined with a preconditioned biconjugate gradient (PBCG) method for the linearized subproblems. A subdomain-based parallel preconditioner is developed which uses incomplete LU factorization with thresholding (ILUT) methods within subdomains, overlapping ILUT factorizations for subdomain boundaries and under-relaxed iteration for the resulting block system. The method builds on techniques successfully applied to linear elements by introducing ordering and condensation techniques to handle uniform p refinement. The combined methods show good performance for a range of p (element order), h (element size), and N (number of processors). Performance and scalability results are presented for a field scale problem where up to 512 processors are used. ?? 1998 Elsevier Science Ltd. All rights reserved.
Multiscale asymmetric orthogonal wavelet kernel for linear programming support vector learning and nonlinear dynamic systems identification.

PubMed

Lu, Zhao; Sun, Jing; Butts, Kenneth

2014-05-01

Support vector regression for approximating nonlinear dynamic systems is more delicate than the approximation of indicator functions in support vector classification, particularly for systems that involve multitudes of time scales in their sampled data. The kernel used for support vector learning determines the class of functions from which a support vector machine can draw its solution, and the choice of kernel significantly influences the performance of a support vector machine. In this paper, to bridge the gap between wavelet multiresolution analysis and kernel learning, the closed-form orthogonal wavelet is exploited to construct new multiscale asymmetric orthogonal wavelet kernels for linear programming support vector learning. The closed-form multiscale orthogonal wavelet kernel provides a systematic framework to implement multiscale kernel learning via dyadic dilations and also enables us to represent complex nonlinear dynamics effectively. To demonstrate the superiority of the proposed multiscale wavelet kernel in identifying complex nonlinear dynamic systems, two case studies are presented that aim at building parallel models on benchmark datasets. The development of parallel models that address the long-term/mid-term prediction issue is more intricate and challenging than the identification of series-parallel models where only one-step ahead prediction is required. Simulation results illustrate the effectiveness of the proposed multiscale kernel learning.
Massive parallelization of serial inference algorithms for a complex generalized linear model

PubMed Central

Suchard, Marc A.; Simpson, Shawn E.; Zorych, Ivan; Ryan, Patrick; Madigan, David

2014-01-01

Following a series of high-profile drug safety disasters in recent years, many countries are redoubling their efforts to ensure the safety of licensed medical products. Large-scale observational databases such as claims databases or electronic health record systems are attracting particular attention in this regard, but present significant methodological and computational concerns. In this paper we show how high-performance statistical computation, including graphics processing units, relatively inexpensive highly parallel computing devices, can enable complex methods in large databases. We focus on optimization and massive parallelization of cyclic coordinate descent approaches to fit a conditioned generalized linear model involving tens of millions of observations and thousands of predictors in a Bayesian context. We find orders-of-magnitude improvement in overall run-time. Coordinate descent approaches are ubiquitous in high-dimensional statistics and the algorithms we propose open up exciting new methodological possibilities with the potential to significantly improve drug safety. PMID:25328363
Acoustooptic linear algebra processors - Architectures, algorithms, and applications

NASA Technical Reports Server (NTRS)

Casasent, D.

1984-01-01

Architectures, algorithms, and applications for systolic processors are described with attention to the realization of parallel algorithms on various optical systolic array processors. Systolic processors for matrices with special structure and matrices of general structure, and the realization of matrix-vector, matrix-matrix, and triple-matrix products and such architectures are described. Parallel algorithms for direct and indirect solutions to systems of linear algebraic equations and their implementation on optical systolic processors are detailed with attention to the pipelining and flow of data and operations. Parallel algorithms and their optical realization for LU and QR matrix decomposition are specifically detailed. These represent the fundamental operations necessary in the implementation of least squares, eigenvalue, and SVD solutions. Specific applications (e.g., the solution of partial differential equations, adaptive noise cancellation, and optimal control) are described to typify the use of matrix processors in modern advanced signal processing.
FoSSI: the family of simplified solver interfaces for the rapid development of parallel numerical atmosphere and ocean models

NASA Astrophysics Data System (ADS)

Frickenhaus, Stephan; Hiller, Wolfgang; Best, Meike

The portable software FoSSI is introduced that—in combination with additional free solver software packages—allows for an efficient and scalable parallel solution of large sparse linear equations systems arising in finite element model codes. FoSSI is intended to support rapid model code development, completely hiding the complexity of the underlying solver packages. In particular, the model developer need not be an expert in parallelization and is yet free to switch between different solver packages by simple modifications of the interface call. FoSSI offers an efficient and easy, yet flexible interface to several parallel solvers, most of them available on the web, such as PETSC, AZTEC, MUMPS, PILUT and HYPRE. FoSSI makes use of the concept of handles for vectors, matrices, preconditioners and solvers, that is frequently used in solver libraries. Hence, FoSSI allows for a flexible treatment of several linear equations systems and associated preconditioners at the same time, even in parallel on separate MPI-communicators. The second special feature in FoSSI is the task specifier, being a combination of keywords, each configuring a certain phase in the solver setup. This enables the user to control a solver over one unique subroutine. Furthermore, FoSSI has rather similar features for all solvers, making a fast solver intercomparison or exchange an easy task. FoSSI is a community software, proven in an adaptive 2D-atmosphere model and a 3D-primitive equation ocean model, both formulated in finite elements. The present paper discusses perspectives of an OpenMP-implementation of parallel iterative solvers based on domain decomposition methods. This approach to OpenMP solvers is rather attractive, as the code for domain-local operations of factorization, preconditioning and matrix-vector product can be readily taken from a sequential implementation that is also suitable to be used in an MPI-variant. Code development in this direction is in an advanced state under the name ScOPES: the Scalable Open Parallel sparse linear Equations Solver.
FaCSI: A block parallel preconditioner for fluid-structure interaction in hemodynamics

NASA Astrophysics Data System (ADS)

Deparis, Simone; Forti, Davide; Grandperrin, Gwenol; Quarteroni, Alfio

2016-12-01

Modeling Fluid-Structure Interaction (FSI) in the vascular system is mandatory to reliably compute mechanical indicators in vessels undergoing large deformations. In order to cope with the computational complexity of the coupled 3D FSI problem after discretizations in space and time, a parallel solution is often mandatory. In this paper we propose a new block parallel preconditioner for the coupled linearized FSI system obtained after space and time discretization. We name it FaCSI to indicate that it exploits the Factorized form of the linearized FSI matrix, the use of static Condensation to formally eliminate the interface degrees of freedom of the fluid equations, and the use of a SIMPLE preconditioner for saddle-point problems. FaCSI is built upon a block Gauss-Seidel factorization of the FSI Jacobian matrix and it uses ad-hoc preconditioners for each physical component of the coupled problem, namely the fluid, the structure and the geometry. In the fluid subproblem, after operating static condensation of the interface fluid variables, we use a SIMPLE preconditioner on the reduced fluid matrix. Moreover, to efficiently deal with a large number of processes, FaCSI exploits efficient single field preconditioners, e.g., based on domain decomposition or the multigrid method. We measure the parallel performances of FaCSI on a benchmark cylindrical geometry and on a problem of physiological interest, namely the blood flow through a patient-specific femoropopliteal bypass. We analyze the dependence of the number of linear solver iterations on the cores count (scalability of the preconditioner) and on the mesh size (optimality).
Simulated parallel annealing within a neighborhood for optimization of biomechanical systems.

PubMed

Higginson, J S; Neptune, R R; Anderson, F C

2005-09-01

Optimization problems for biomechanical systems have become extremely complex. Simulated annealing (SA) algorithms have performed well in a variety of test problems and biomechanical applications; however, despite advances in computer speed, convergence to optimal solutions for systems of even moderate complexity has remained prohibitive. The objective of this study was to develop a portable parallel version of a SA algorithm for solving optimization problems in biomechanics. The algorithm for simulated parallel annealing within a neighborhood (SPAN) was designed to minimize interprocessor communication time and closely retain the heuristics of the serial SA algorithm. The computational speed of the SPAN algorithm scaled linearly with the number of processors on different computer platforms for a simple quadratic test problem and for a more complex forward dynamic simulation of human pedaling.
Power/Performance Trade-offs of Small Batched LU Based Solvers on GPUs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Villa, Oreste; Fatica, Massimiliano; Gawande, Nitin A.

In this paper we propose and analyze a set of batched linear solvers for small matrices on Graphic Processing Units (GPUs), evaluating the various alternatives depending on the size of the systems to solve. We discuss three different solutions that operate with different level of parallelization and GPU features. The first, exploiting the CUBLAS library, manages matrices of size up to 32x32 and employs Warp level (one matrix, one Warp) parallelism and shared memory. The second works at Thread-block level parallelism (one matrix, one Thread-block), still exploiting shared memory but managing matrices up to 76x76. The third is Thread levelmore » parallel (one matrix, one thread) and can reach sizes up to 128x128, but it does not exploit shared memory and only relies on the high memory bandwidth of the GPU. The first and second solution only support partial pivoting, the third one easily supports partial and full pivoting, making it attractive to problems that require greater numerical stability. We analyze the trade-offs in terms of performance and power consumption as function of the size of the linear systems that are simultaneously solved. We execute the three implementations on a Tesla M2090 (Fermi) and on a Tesla K20 (Kepler).« less
Parallel Implementation of Triangular Cellular Automata for Computing Two-Dimensional Elastodynamic Response on Arbitrary Domains

NASA Astrophysics Data System (ADS)

Leamy, Michael J.; Springer, Adam C.

In this research we report parallel implementation of a Cellular Automata-based simulation tool for computing elastodynamic response on complex, two-dimensional domains. Elastodynamic simulation using Cellular Automata (CA) has recently been presented as an alternative, inherently object-oriented technique for accurately and efficiently computing linear and nonlinear wave propagation in arbitrarily-shaped geometries. The local, autonomous nature of the method should lead to straight-forward and efficient parallelization. We address this notion on symmetric multiprocessor (SMP) hardware using a Java-based object-oriented CA code implementing triangular state machines (i.e., automata) and the MPI bindings written in Java (MPJ Express). We use MPJ Express to reconfigure our existing CA code to distribute a domain's automata to cores present on a dual quad-core shared-memory system (eight total processors). We note that this message passing parallelization strategy is directly applicable to computer clustered computing, which will be the focus of follow-on research. Results on the shared memory platform indicate nearly-ideal, linear speed-up. We conclude that the CA-based elastodynamic simulator is easily configured to run in parallel, and yields excellent speed-up on SMP hardware.
Domain decomposition in time for PDE-constrained optimization

DOE PAGES

Barker, Andrew T.; Stoll, Martin

2015-08-28

Here, PDE-constrained optimization problems have a wide range of applications, but they lead to very large and ill-conditioned linear systems, especially if the problems are time dependent. In this paper we outline an approach for dealing with such problems by decomposing them in time and applying an additive Schwarz preconditioner in time, so that we can take advantage of parallel computers to deal with the very large linear systems. We then illustrate the performance of our method on a variety of problems.
Mechanical design of deformation compensated flexural pivots structured for linear nanopositioning stages

DOEpatents

Shu, Deming; Kearney, Steven P.; Preissner, Curt A.

2015-02-17

A method and deformation compensated flexural pivots structured for precision linear nanopositioning stages are provided. A deformation-compensated flexural linear guiding mechanism includes a basic parallel mechanism including a U-shaped member and a pair of parallel bars linked to respective pairs of I-link bars and each of the I-bars coupled by a respective pair of flexural pivots. The basic parallel mechanism includes substantially evenly distributed flexural pivots minimizing center shift dynamic errors.
Parallel SOR methods with a parabolic-diffusion acceleration technique for solving an unstructured-grid Poisson equation on 3D arbitrary geometries

NASA Astrophysics Data System (ADS)

Zapata, M. A. Uh; Van Bang, D. Pham; Nguyen, K. D.

2016-05-01

This paper presents a parallel algorithm for the finite-volume discretisation of the Poisson equation on three-dimensional arbitrary geometries. The proposed method is formulated by using a 2D horizontal block domain decomposition and interprocessor data communication techniques with message passing interface. The horizontal unstructured-grid cells are reordered according to the neighbouring relations and decomposed into blocks using a load-balanced distribution to give all processors an equal amount of elements. In this algorithm, two parallel successive over-relaxation methods are presented: a multi-colour ordering technique for unstructured grids based on distributed memory and a block method using reordering index following similar ideas of the partitioning for structured grids. In all cases, the parallel algorithms are implemented with a combination of an acceleration iterative solver. This solver is based on a parabolic-diffusion equation introduced to obtain faster solutions of the linear systems arising from the discretisation. Numerical results are given to evaluate the performances of the methods showing speedups better than linear.
Robust parallel iterative solvers for linear and least-squares problems, Final Technical Report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Saad, Yousef

2014-01-16

The primary goal of this project is to study and develop robust iterative methods for solving linear systems of equations and least squares systems. The focus of the Minnesota team is on algorithms development, robustness issues, and on tests and validation of the methods on realistic problems. 1. The project begun with an investigation on how to practically update a preconditioner obtained from an ILU-type factorization, when the coefficient matrix changes. 2. We investigated strategies to improve robustness in parallel preconditioners in a specific case of a PDE with discontinuous coefficients. 3. We explored ways to adapt standard preconditioners formore » solving linear systems arising from the Helmholtz equation. These are often difficult linear systems to solve by iterative methods. 4. We have also worked on purely theoretical issues related to the analysis of Krylov subspace methods for linear systems. 5. We developed an effective strategy for performing ILU factorizations for the case when the matrix is highly indefinite. The strategy uses shifting in some optimal way. The method was extended to the solution of Helmholtz equations by using complex shifts, yielding very good results in many cases. 6. We addressed the difficult problem of preconditioning sparse systems of equations on GPUs. 7. A by-product of the above work is a software package consisting of an iterative solver library for GPUs based on CUDA. This was made publicly available. It was the first such library that offers complete iterative solvers for GPUs. 8. We considered another form of ILU which blends coarsening techniques from Multigrid with algebraic multilevel methods. 9. We have released a new version on our parallel solver - called pARMS [new version is version 3]. As part of this we have tested the code in complex settings - including the solution of Maxwell and Helmholtz equations and for a problem of crystal growth.10. As an application of polynomial preconditioning we considered the problem of evaluating f(A)v which arises in statistical sampling. 11. As an application to the methods we developed, we tackled the problem of computing the diagonal of the inverse of a matrix. This arises in statistical applications as well as in many applications in physics. We explored probing methods as well as domain-decomposition type methods. 12. A collaboration with researchers from Toulouse, France, considered the important problem of computing the Schur complement in a domain-decomposition approach. 13. We explored new ways of preconditioning linear systems, based on low-rank approximations.« less

Linear and nonlinear stability of the Blasius boundary layer

NASA Technical Reports Server (NTRS)

Bertolotti, F. P.; Herbert, TH.; Spalart, P. R.

1992-01-01

Two new techniques for the study of the linear and nonlinear instability in growing boundary layers are presented. The first technique employs partial differential equations of parabolic type exploiting the slow change of the mean flow, disturbance velocity profiles, wavelengths, and growth rates in the streamwise direction. The second technique solves the Navier-Stokes equation for spatially evolving disturbances using buffer zones adjacent to the inflow and outflow boundaries. Results of both techniques are in excellent agreement. The linear and nonlinear development of Tollmien-Schlichting (TS) waves in the Blasius boundary layer is investigated with both techniques and with a local procedure based on a system of ordinary differential equations. The results are compared with previous work and the effects of non-parallelism and nonlinearity are clarified. The effect of nonparallelism is confirmed to be weak and, consequently, not responsible for the discrepancies between measurements and theoretical results for parallel flow.
Linear quadratic regulators with eigenvalue placement in a specified region

NASA Technical Reports Server (NTRS)

Shieh, Leang S.; Dib, Hani M.; Ganesan, Sekar

1988-01-01

A linear optimal quadratic regulator is developed for optimally placing the closed-loop poles of multivariable continuous-time systems within the common region of an open sector, bounded by lines inclined at + or - pi/2k (k = 2 or 3) from the negative real axis with a sector angle of pi/2 or less, and the left-hand side of a line parallel to the imaginary axis in the complex s-plane. The design method is mainly based on the solution of a linear matrix Liapunov equation, and the resultant closed-loop system with its eigenvalues in the desired region is optimal with respect to a quadratic performance index.
Multirate parallel distributed compensation of a cluster in wireless sensor and actor networks

NASA Astrophysics Data System (ADS)

Yang, Chun-xi; Huang, Ling-yun; Zhang, Hao; Hua, Wang

2016-01-01

The stabilisation problem for one of the clusters with bounded multiple random time delays and packet dropouts in wireless sensor and actor networks is investigated in this paper. A new multirate switching model is constructed to describe the feature of this single input multiple output linear system. According to the difficulty of controller design under multi-constraints in multirate switching model, this model can be converted to a Takagi-Sugeno fuzzy model. By designing a multirate parallel distributed compensation, a sufficient condition is established to ensure this closed-loop fuzzy control system to be globally exponentially stable. The solution of the multirate parallel distributed compensation gains can be obtained by solving an auxiliary convex optimisation problem. Finally, two numerical examples are given to show, compared with solving switching controller, multirate parallel distributed compensation can be obtained easily. Furthermore, it has stronger robust stability than arbitrary switching controller and single-rate parallel distributed compensation under the same conditions.
A class of parallel algorithms for computation of the manipulator inertia matrix

NASA Technical Reports Server (NTRS)

Fijany, Amir; Bejczy, Antal K.

1989-01-01

Parallel and parallel/pipeline algorithms for computation of the manipulator inertia matrix are presented. An algorithm based on composite rigid-body spatial inertia method, which provides better features for parallelization, is used for the computation of the inertia matrix. Two parallel algorithms are developed which achieve the time lower bound in computation. Also described is the mapping of these algorithms with topological variation on a two-dimensional processor array, with nearest-neighbor connection, and with cardinality variation on a linear processor array. An efficient parallel/pipeline algorithm for the linear array was also developed, but at significantly higher efficiency.
Real-time electron dynamics for massively parallel excited-state simulations

NASA Astrophysics Data System (ADS)

Andrade, Xavier

The simulation of the real-time dynamics of electrons, based on time dependent density functional theory (TDDFT), is a powerful approach to study electronic excited states in molecular and crystalline systems. What makes the method attractive is its flexibility to simulate different kinds of phenomena beyond the linear-response regime, including strongly-perturbed electronic systems and non-adiabatic electron-ion dynamics. Electron-dynamics simulations are also attractive from a computational point of view. They can run efficiently on massively parallel architectures due to the low communication requirements. Our implementations of electron dynamics, based on the codes Octopus (real-space) and Qball (plane-waves), allow us to simulate systems composed of thousands of atoms and to obtain good parallel scaling up to 1.6 million processor cores. Due to the versatility of real-time electron dynamics and its parallel performance, we expect it to become the method of choice to apply the capabilities of exascale supercomputers for the simulation of electronic excited states.
Optical ranked-order filtering using threshold decomposition

DOEpatents

Allebach, Jan P.; Ochoa, Ellen; Sweeney, Donald W.

1990-01-01

A hybrid optical/electronic system performs median filtering and related ranked-order operations using threshold decomposition to encode the image. Threshold decomposition transforms the nonlinear neighborhood ranking operation into a linear space-invariant filtering step followed by a point-to-point threshold comparison step. Spatial multiplexing allows parallel processing of all the threshold components as well as recombination by a second linear, space-invariant filtering step. An incoherent optical correlation system performs the linear filtering, using a magneto-optic spatial light modulator as the input device and a computer-generated hologram in the filter plane. Thresholding is done electronically. By adjusting the value of the threshold, the same architecture is used to perform median, minimum, and maximum filtering of images. A totally optical system is also disclosed.
Gauss Elimination: Workhorse of Linear Algebra.

DTIC Science & Technology

1995-08-05

linear algebra computation for solving systems, computing determinants and determining the rank of matrix. All of these are discussed in varying contexts. These include different arithmetic or algebraic setting such as integer arithmetic or polynomial rings as well as conventional real (floating-point) arithmetic. These have effects on both accuracy and complexity analyses of the algorithm. These, too, are covered here. The impact of modern parallel computer architecture on GE is also
TU-H-BRA-02: The Physics of Magnetic Field Isolation in a Novel Compact Linear Accelerator Based MRI-Guided Radiation Therapy System

DOE Office of Scientific and Technical Information (OSTI.GOV)

Low, D; Mutic, S; Shvartsman, S

Purpose: To develop a method for isolating the MRI magnetic field from field-sensitive linear accelerator components at distances close to isocenter. Methods: A MRI-guided radiation therapy system has been designed that integrates a linear accelerator with simultaneous MR imaging. In order to accomplish this, the magnetron, port circulator, radiofrequency waveguide, gun driver, and linear accelerator needed to be placed in locations with low magnetic fields. The system was also required to be compact, so moving these components far from the main magnetic field and isocenter was not an option. The magnetic field sensitive components (exclusive of the waveguide) were placedmore » in coaxial steel sleeves that were electrically and mechanically isolated and whose thickness and placement were optimized using E&M modeling software. Six sets of sleeves were placed 60° apart, 85 cm from isocenter. The Faraday effect occurs when the direction of propagation is parallel to the magnetic RF field component, rotating the RF polarization, subsequently diminishing RF power. The Faraday effect was avoided by orienting the waveguides such that the magnetic field RF component was parallel to the magnetic field. Results: The magnetic field within the shields was measured to be less than 40 Gauss, significantly below the amount needed for the magnetron and port circulator. Additional mu-metal was employed to reduce the magnetic field at the linear accelerator to less than 1 Gauss. The orientation of the RF waveguides allowed the RT transport with minimal loss and reflection. Conclusion: One of the major challenges in designing a compact linear accelerator based MRI-guided radiation therapy system, that of creating low magnetic field environments for the magnetic-field sensitive components, has been solved. The measured magnetic fields are sufficiently small to enable system integration. This work supported by ViewRay, Inc.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chen, Chao; Pouransari, Hadi; Rajamanickam, Sivasankaran

We present a parallel hierarchical solver for general sparse linear systems on distributed-memory machines. For large-scale problems, this fully algebraic algorithm is faster and more memory-efficient than sparse direct solvers because it exploits the low-rank structure of fill-in blocks. Depending on the accuracy of low-rank approximations, the hierarchical solver can be used either as a direct solver or as a preconditioner. The parallel algorithm is based on data decomposition and requires only local communication for updating boundary data on every processor. Moreover, the computation-to-communication ratio of the parallel algorithm is approximately the volume-to-surface-area ratio of the subdomain owned by everymore » processor. We also provide various numerical results to demonstrate the versatility and scalability of the parallel algorithm.« less
Implementation of parallel moment equations in NIMROD

NASA Astrophysics Data System (ADS)

Lee, Hankyu Q.; Held, Eric D.; Ji, Jeong-Young

2017-10-01

As collisionality is low (the Knudsen number is large) in many plasma applications, kinetic effects become important, particularly in parallel dynamics for magnetized plasmas. Fluid models can capture some kinetic effects when integral parallel closures are adopted. The adiabatic and linear approximations are used in solving general moment equations to obtain the integral closures. In this work, we present an effort to incorporate non-adiabatic (time-dependent) and nonlinear effects into parallel closures. Instead of analytically solving the approximate moment system, we implement exact parallel moment equations in the NIMROD fluid code. The moment code is expected to provide a natural convergence scheme by increasing the number of moments. Work in collaboration with the PSI Center and supported by the U.S. DOE under Grant Nos. DE-SC0014033, DE-SC0016256, and DE-FG02-04ER54746.
Toward Millions of File System IOPS on Low-Cost, Commodity Hardware

PubMed Central

Zheng, Da; Burns, Randal; Szalay, Alexander S.

2013-01-01

We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space. We redesign page caching to eliminate CPU overhead and lock-contention in non-uniform memory architecture machines. We evaluate our design on a 32 core NUMA machine with four, eight-core processors. Experiments show that our design delivers 1.23 million 512-byte read IOPS. The page cache realizes the scalable IOPS of Linux asynchronous I/O (AIO) and increases user-perceived I/O performance linearly with cache hit rates. The parallel, set-associative cache matches the cache hit rates of the global Linux page cache under real workloads. PMID:24402052
Toward Millions of File System IOPS on Low-Cost, Commodity Hardware.

PubMed

Zheng, Da; Burns, Randal; Szalay, Alexander S

2013-01-01

We describe a storage system that removes I/O bottlenecks to achieve more than one million IOPS based on a user-space file abstraction for arrays of commodity SSDs. The file abstraction refactors I/O scheduling and placement for extreme parallelism and non-uniform memory and I/O. The system includes a set-associative, parallel page cache in the user space. We redesign page caching to eliminate CPU overhead and lock-contention in non-uniform memory architecture machines. We evaluate our design on a 32 core NUMA machine with four, eight-core processors. Experiments show that our design delivers 1.23 million 512-byte read IOPS. The page cache realizes the scalable IOPS of Linux asynchronous I/O (AIO) and increases user-perceived I/O performance linearly with cache hit rates. The parallel, set-associative cache matches the cache hit rates of the global Linux page cache under real workloads.
Scalable parallel communications

NASA Technical Reports Server (NTRS)

Maly, K.; Khanna, S.; Overstreet, C. M.; Mukkamala, R.; Zubair, M.; Sekhar, Y. S.; Foudriat, E. C.

1992-01-01

Coarse-grain parallelism in networking (that is, the use of multiple protocol processors running replicated software sending over several physical channels) can be used to provide gigabit communications for a single application. Since parallel network performance is highly dependent on real issues such as hardware properties (e.g., memory speeds and cache hit rates), operating system overhead (e.g., interrupt handling), and protocol performance (e.g., effect of timeouts), we have performed detailed simulations studies of both a bus-based multiprocessor workstation node (based on the Sun Galaxy MP multiprocessor) and a distributed-memory parallel computer node (based on the Touchstone DELTA) to evaluate the behavior of coarse-grain parallelism. Our results indicate: (1) coarse-grain parallelism can deliver multiple 100 Mbps with currently available hardware platforms and existing networking protocols (such as Transmission Control Protocol/Internet Protocol (TCP/IP) and parallel Fiber Distributed Data Interface (FDDI) rings); (2) scale-up is near linear in n, the number of protocol processors, and channels (for small n and up to a few hundred Mbps); and (3) since these results are based on existing hardware without specialized devices (except perhaps for some simple modifications of the FDDI boards), this is a low cost solution to providing multiple 100 Mbps on current machines. In addition, from both the performance analysis and the properties of these architectures, we conclude: (1) multiple processors providing identical services and the use of space division multiplexing for the physical channels can provide better reliability than monolithic approaches (it also provides graceful degradation and low-cost load balancing); (2) coarse-grain parallelism supports running several transport protocols in parallel to provide different types of service (for example, one TCP handles small messages for many users, other TCP's running in parallel provide high bandwidth service to a single application); and (3) coarse grain parallelism will be able to incorporate many future improvements from related work (e.g., reduced data movement, fast TCP, fine-grain parallelism) also with near linear speed-ups.
Multilevel Parallelization of AutoDock 4.2.

PubMed

Norgan, Andrew P; Coffman, Paul K; Kocher, Jean-Pierre A; Katzmann, David J; Sosa, Carlos P

2011-04-28

Virtual (computational) screening is an increasingly important tool for drug discovery. AutoDock is a popular open-source application for performing molecular docking, the prediction of ligand-receptor interactions. AutoDock is a serial application, though several previous efforts have parallelized various aspects of the program. In this paper, we report on a multi-level parallelization of AutoDock 4.2 (mpAD4). Using MPI and OpenMP, AutoDock 4.2 was parallelized for use on MPI-enabled systems and to multithread the execution of individual docking jobs. In addition, code was implemented to reduce input/output (I/O) traffic by reusing grid maps at each node from docking to docking. Performance of mpAD4 was examined on two multiprocessor computers. Using MPI with OpenMP multithreading, mpAD4 scales with near linearity on the multiprocessor systems tested. In situations where I/O is limiting, reuse of grid maps reduces both system I/O and overall screening time. Multithreading of AutoDock's Lamarkian Genetic Algorithm with OpenMP increases the speed of execution of individual docking jobs, and when combined with MPI parallelization can significantly reduce the execution time of virtual screens. This work is significant in that mpAD4 speeds the execution of certain molecular docking workloads and allows the user to optimize the degree of system-level (MPI) and node-level (OpenMP) parallelization to best fit both workloads and computational resources.
Development of a linear induction motor based artificial muscle system.

PubMed

Gruber, A; Arguello, E; Silva, R

2013-01-01

We present the design of a linear induction motor based on electromagnetic interactions. The engine is capable of producing a linear movement from electricity. The design consists of stators arranged in parallel, which produce a magnetic field sufficient to displace a plunger along its axial axis. Furthermore, the winding has a shell and cap of ferromagnetic material that amplifies the magnetic field. This produces a force along the length of the motor that is similar to that of skeletal muscle. In principle, the objective is to use the engine in the development of an artificial muscle system for prosthetic applications, but it could have multiple applications, not only in the medical field, but in other industries.
On a model of three-dimensional bursting and its parallel implementation

NASA Astrophysics Data System (ADS)

Tabik, S.; Romero, L. F.; Garzón, E. M.; Ramos, J. I.

2008-04-01

A mathematical model for the simulation of three-dimensional bursting phenomena and its parallel implementation are presented. The model consists of four nonlinearly coupled partial differential equations that include fast and slow variables, and exhibits bursting in the absence of diffusion. The differential equations have been discretized by means of a second-order accurate in both space and time, linearly-implicit finite difference method in equally-spaced grids. The resulting system of linear algebraic equations at each time level has been solved by means of the Preconditioned Conjugate Gradient (PCG) method. Three different parallel implementations of the proposed mathematical model have been developed; two of these implementations, i.e., the MPI and the PETSc codes, are based on a message passing paradigm, while the third one, i.e., the OpenMP code, is based on a shared space address paradigm. These three implementations are evaluated on two current high performance parallel architectures, i.e., a dual-processor cluster and a Shared Distributed Memory (SDM) system. A novel representation of the results that emphasizes the most relevant factors that affect the performance of the paralled implementations, is proposed. The comparative analysis of the computational results shows that the MPI and the OpenMP implementations are about twice more efficient than the PETSc code on the SDM system. It is also shown that, for the conditions reported here, the nonlinear dynamics of the three-dimensional bursting phenomena exhibits three stages characterized by asynchronous, synchronous and then asynchronous oscillations, before a quiescent state is reached. It is also shown that the fast system reaches steady state in much less time than the slow variables.
An analysis of a nonlinear instability in the implementation of a VTOL control system

NASA Technical Reports Server (NTRS)

Weber, J. M.

1982-01-01

The contributions to nonlinear behavior and unstable response of the model following yaw control system of a VTOL aircraft during hover were determined. The system was designed as a state rate feedback implicit model follower that provided yaw rate command/heading hold capability and used combined full authority parallel and limited authority series servo actuators to generate an input to the yaw reaction control system of the aircraft. Both linear and nonlinear system models, as well as describing function linearization techniques were used to determine the influence on the control system instability of input magnitude and bandwidth, series servo authority, and system bandwidth. Results of the analysis describe stability boundaries as a function of these system design characteristics.
TMVOC-MP: a parallel numerical simulator for Three-PhaseNon-isothermal Flows of Multicomponent Hydrocarbon Mixtures inporous/fractured media

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Keni; Yamamoto, Hajime; Pruess, Karsten

2008-02-15

TMVOC-MP is a massively parallel version of the TMVOC code (Pruess and Battistelli, 2002), a numerical simulator for three-phase non-isothermal flow of water, gas, and a multicomponent mixture of volatile organic chemicals (VOCs) in multidimensional heterogeneous porous/fractured media. TMVOC-MP was developed by introducing massively parallel computing techniques into TMVOC. It retains the physical process model of TMVOC, designed for applications to contamination problems that involve hydrocarbon fuels or organic solvents in saturated and unsaturated zones. TMVOC-MP can model contaminant behavior under 'natural' environmental conditions, as well as for engineered systems, such as soil vapor extraction, groundwater pumping, or steam-assisted sourcemore » remediation. With its sophisticated parallel computing techniques, TMVOC-MP can handle much larger problems than TMVOC, and can be much more computationally efficient. TMVOC-MP models multiphase fluid systems containing variable proportions of water, non-condensible gases (NCGs), and water-soluble volatile organic chemicals (VOCs). The user can specify the number and nature of NCGs and VOCs. There are no intrinsic limitations to the number of NCGs or VOCs, although the arrays for fluid components are currently dimensioned as 20, accommodating water plus 19 components that may be either NCGs or VOCs. Among them, NCG arrays are dimensioned as 10. The user may select NCGs from a data bank provided in the software. The currently available choices include O{sub 2}, N{sub 2}, CO{sub 2}, CH{sub 4}, ethane, ethylene, acetylene, and air (a pseudo-component treated with properties averaged from N{sub 2} and O{sub 2}). Thermophysical property data of VOCs can be selected from a chemical data bank, included with TMVOC-MP, that provides parameters for 26 commonly encountered chemicals. Users also can input their own data for other fluids. The fluid components may partition (volatilize and/or dissolve) among gas, aqueous, and NAPL phases. Any combination of the three phases may present, and phases may appear and disappear in the course of a simulation. In addition, VOCs may be adsorbed by the porous medium, and may biodegrade according to a simple half-life model. Detailed discussion of physical processes, assumptions, and fluid properties used in TMVOC-MP can be found in the TMVOC user's guide (Pruess and Battistelli, 2002). TMVOC-MP was developed based on the parallel framework of the TOUGH2-MP code (Zhang et al. 2001, Wu et al. 2002). It uses the MPI (Message Passing Forum, 1994) for parallel implementation. A domain decomposition approach is adopted for the parallelization. The code partitions a simulation domain, defined by an unstructured grid, using partitioning algorithm from the METIS software package (Karypsis and Kumar, 1998). In parallel simulation, each processor is in charge of one part of the simulation domain for assembling mass and energy balance equations, solving linear equation systems, updating thermophysical properties, and performing other local computations. The local linear-equation systems are solved in parallel by multiple processors with the Aztec linear solver package (Tuminaro et al., 1999). Although each processor solves the linearized equations of subdomains independently, the entire linear equation system is solved together by all processors collaboratively via communication between neighboring processors during each iteration. Detailed discussion of the prototype of the data-exchange scheme can be found in Elmroth et al. (2001). In addition, FORTRAN 90 features are introduced to TMVOC-MP, such as dynamic memory allocation, array operation, matrix manipulation, and replacing 'common blocks' (used in the original TMVOC) with modules. All new subroutines are written in FORTRAN 90. Program units imported from the original TMVOC remain in standard FORTRAN 77. This report provides a quick starting guide for using the TMVOC-MP program. We suppose that the users have basic knowledge of using the original TMVOC code. The users can find the detailed technical description of the physical processes modeled, and the mathematical and numerical methods in the user's guide for TMVOC (Pruess and Battistelli, 2002).« less
Research in Computational Aeroscience Applications Implemented on Advanced Parallel Computing Systems

NASA Technical Reports Server (NTRS)

Wigton, Larry

1996-01-01

Improving the numerical linear algebra routines for use in new Navier-Stokes codes, specifically Tim Barth's unstructured grid code, with spin-offs to TRANAIR is reported. A fast distance calculation routine for Navier-Stokes codes using the new one-equation turbulence models is written. The primary focus of this work was devoted to improving matrix-iterative methods. New algorithms have been developed which activate the full potential of classical Cray-class computers as well as distributed-memory parallel computers.
Optical ranked-order filtering using threshold decomposition

DOEpatents

Allebach, J.P.; Ochoa, E.; Sweeney, D.W.

1987-10-09

A hybrid optical/electronic system performs median filtering and related ranked-order operations using threshold decomposition to encode the image. Threshold decomposition transforms the nonlinear neighborhood ranking operation into a linear space-invariant filtering step followed by a point-to-point threshold comparison step. Spatial multiplexing allows parallel processing of all the threshold components as well as recombination by a second linear, space-invariant filtering step. An incoherent optical correlation system performs the linear filtering, using a magneto-optic spatial light modulator as the input device and a computer-generated hologram in the filter plane. Thresholding is done electronically. By adjusting the value of the threshold, the same architecture is used to perform median, minimum, and maximum filtering of images. A totally optical system is also disclosed. 3 figs.

Segmented amplifier configurations for laser amplifier

DOEpatents

Hagen, Wilhelm F.

1979-01-01

An amplifier system for high power lasers, the system comprising a compact array of segments which (1) preserves high, large signal gain with improved pumping efficiency and (2) allows the total amplifier length to be shortened by as much as one order of magnitude. The system uses a three dimensional array of segments, with the plane of each segment being oriented at substantially the amplifier medium Brewster angle relative to the incident laser beam and with one or more linear arrays of flashlamps positioned between adjacent rows of amplifier segments, with the plane of the linear array of flashlamps being substantially parallel to the beam propagation direction.
A nonrecursive order N preconditioned conjugate gradient: Range space formulation of MDOF dynamics

NASA Technical Reports Server (NTRS)

Kurdila, Andrew J.

1990-01-01

While excellent progress has been made in deriving algorithms that are efficient for certain combinations of system topologies and concurrent multiprocessing hardware, several issues must be resolved to incorporate transient simulation in the control design process for large space structures. Specifically, strategies must be developed that are applicable to systems with numerous degrees of freedom. In addition, the algorithms must have a growth potential in that they must also be amenable to implementation on forthcoming parallel system architectures. For mechanical system simulation, this fact implies that algorithms are required that induce parallelism on a fine scale, suitable for the emerging class of highly parallel processors; and transient simulation methods must be automatically load balancing for a wider collection of system topologies and hardware configurations. These problems are addressed by employing a combination range space/preconditioned conjugate gradient formulation of multi-degree-of-freedom dynamics. The method described has several advantages. In a sequential computing environment, the method has the features that: by employing regular ordering of the system connectivity graph, an extremely efficient preconditioner can be derived from the 'range space metric', as opposed to the system coefficient matrix; because of the effectiveness of the preconditioner, preliminary studies indicate that the method can achieve performance rates that depend linearly upon the number of substructures, hence the title 'Order N'; and the method is non-assembling. Furthermore, the approach is promising as a potential parallel processing algorithm in that the method exhibits a fine parallel granularity suitable for a wide collection of combinations of physical system topologies/computer architectures; and the method is easily load balanced among processors, and does not rely upon system topology to induce parallelism.
Parallel Numerical Simulations of Water Reservoirs

NASA Astrophysics Data System (ADS)

Torres, Pedro; Mangiavacchi, Norberto

2010-11-01

The study of the water flow and scalar transport in water reservoirs is important for the determination of the water quality during the initial stages of the reservoir filling and during the life of the reservoir. For this scope, a parallel 2D finite element code for solving the incompressible Navier-Stokes equations coupled with scalar transport was implemented using the message-passing programming model, in order to perform simulations of hidropower water reservoirs in a computer cluster environment. The spatial discretization is based on the MINI element that satisfies the Babuska-Brezzi (BB) condition, which provides sufficient conditions for a stable mixed formulation. All the distributed data structures needed in the different stages of the code, such as preprocessing, solving and post processing, were implemented using the PETSc library. The resulting linear systems for the velocity and the pressure fields were solved using the projection method, implemented by an approximate block LU factorization. In order to increase the parallel performance in the solution of the linear systems, we employ the static condensation method for solving the intermediate velocity at vertex and centroid nodes separately. We compare performance results of the static condensation method with the approach of solving the complete system. In our tests the static condensation method shows better performance for large problems, at the cost of an increased memory usage. Performance results for other intensive parts of the code in a computer cluster are also presented.
The effect of beam pre-bunching on the excitation of terahertz plasmons in a parallel plane guiding system

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sharma, Suresh C.; Malik, Pratibha

2015-04-15

The excitation of terahertz (THz) plasmons by a pre-bunched relativistic electron beam propagating in a parallel plane semiconducting guiding system is studied. It is found that the n-InSb semiconductor strongly supports the confined surface plasmons in the terahertz frequency range. The growth rate and efficiency of the THz surface plasmons increase linearly with modulation index and show the largest value as modulation index approaches unity. Moreover, the growth rate of the instability scales as one-third power of the beam density and inverse one-third power of the THz radiation frequency.
High Performance Radiation Transport Simulations on TITAN

DOE Office of Scientific and Technical Information (OSTI.GOV)

Baker, Christopher G; Davidson, Gregory G; Evans, Thomas M

2012-01-01

In this paper we describe the Denovo code system. Denovo solves the six-dimensional, steady-state, linear Boltzmann transport equation, of central importance to nuclear technology applications such as reactor core analysis (neutronics), radiation shielding, nuclear forensics and radiation detection. The code features multiple spatial differencing schemes, state-of-the-art linear solvers, the Koch-Baker-Alcouffe (KBA) parallel-wavefront sweep algorithm for inverting the transport operator, a new multilevel energy decomposition method scaling to hundreds of thousands of processing cores, and a modern, novel code architecture that supports straightforward integration of new features. In this paper we discuss the performance of Denovo on the 10--20 petaflop ORNLmore » GPU-based system, Titan. We describe algorithms and techniques used to exploit the capabilities of Titan's heterogeneous compute node architecture and the challenges of obtaining good parallel performance for this sparse hyperbolic PDE solver containing inherently sequential computations. Numerical results demonstrating Denovo performance on early Titan hardware are presented.« less
Linear solver performance in elastoplastic problem solution on GPU cluster

NASA Astrophysics Data System (ADS)

Khalevitsky, Yu. V.; Konovalov, A. V.; Burmasheva, N. V.; Partin, A. S.

2017-12-01

Applying the finite element method to severe plastic deformation problems involves solving linear equation systems. While the solution procedure is relatively hard to parallelize and computationally intensive by itself, a long series of large scale systems need to be solved for each problem. When dealing with fine computational meshes, such as in the simulations of three-dimensional metal matrix composite microvolume deformation, tens and hundreds of hours may be needed to complete the whole solution procedure, even using modern supercomputers. In general, one of the preconditioned Krylov subspace methods is used in a linear solver for such problems. The method convergence highly depends on the operator spectrum of a problem stiffness matrix. In order to choose the appropriate method, a series of computational experiments is used. Different methods may be preferable for different computational systems for the same problem. In this paper we present experimental data obtained by solving linear equation systems from an elastoplastic problem on a GPU cluster. The data can be used to substantiate the choice of the appropriate method for a linear solver to use in severe plastic deformation simulations.
Development and parallelization of a direct numerical simulation to study the formation and transport of nanoparticle clusters in a viscous fluid

NASA Astrophysics Data System (ADS)

Sloan, Gregory James

The direct numerical simulation (DNS) offers the most accurate approach to modeling the behavior of a physical system, but carries an enormous computation cost. There exists a need for an accurate DNS to model the coupled solid-fluid system seen in targeted drug delivery (TDD), nanofluid thermal energy storage (TES), as well as other fields where experiments are necessary, but experiment design may be costly. A parallel DNS can greatly reduce the large computation times required, while providing the same results and functionality of the serial counterpart. A D2Q9 lattice Boltzmann method approach was implemented to solve the fluid phase. The use of domain decomposition with message passing interface (MPI) parallelism resulted in an algorithm that exhibits super-linear scaling in testing, which may be attributed to the caching effect. Decreased performance on a per-node basis for a fixed number of processes confirms this observation. A multiscale approach was implemented to model the behavior of nanoparticles submerged in a viscous fluid, and used to examine the mechanisms that promote or inhibit clustering. Parallelization of this model using a masterworker algorithm with MPI gives less-than-linear speedup for a fixed number of particles and varying number of processes. This is due to the inherent inefficiency of the master-worker approach. Lastly, these separate simulations are combined, and two-way coupling is implemented between the solid and fluid.
Accelerate quasi Monte Carlo method for solving systems of linear algebraic equations through shared memory

NASA Astrophysics Data System (ADS)

Lai, Siyan; Xu, Ying; Shao, Bo; Guo, Menghan; Lin, Xiaola

2017-04-01

In this paper we study on Monte Carlo method for solving systems of linear algebraic equations (SLAE) based on shared memory. Former research demostrated that GPU can effectively speed up the computations of this issue. Our purpose is to optimize Monte Carlo method simulation on GPUmemoryachritecture specifically. Random numbers are organized to storein shared memory, which aims to accelerate the parallel algorithm. Bank conflicts can be avoided by our Collaborative Thread Arrays(CTA)scheme. The results of experiments show that the shared memory based strategy can speed up the computaions over than 3X at most.
Intrinsic suppression of turbulence in linear plasma devices

NASA Astrophysics Data System (ADS)

Leddy, J.; Dudson, B.

2017-12-01

Plasma turbulence is the dominant transport mechanism for heat and particles in magnetised plasmas in linear devices and tokamaks, so the study of turbulence is important in limiting and controlling this transport. Linear devices provide an axial magnetic field that serves to confine a plasma in cylindrical geometry as it travels along the magnetic field from the source to the strike point. Due to perpendicular transport, the plasma density and temperature have a roughly Gaussian radial profile with gradients that drive instabilities, such as resistive drift-waves and Kelvin-Helmholtz. If unstable, these instabilities cause perturbations to grow resulting in saturated turbulence, increasing the cross-field transport of heat and particles. When the plasma emerges from the source, there is a time, {τ }\\parallel , that describes the lifetime of the plasma based on parallel velocity and length of the device. As the plasma moves down the device, it also moves azimuthally according to E × B and diamagnetic velocities. There is a balance point in these parallel and perpendicular times that sets the stabilisation threshold. We simulate plasmas with a variety of parallel lengths and magnetic fields to vary the parallel and perpendicular lifetimes, respectively, and find that there is a clear correlation between the saturated RMS density perturbation level and the balance between these lifetimes. The threshold of marginal stability is seen to exist where {τ }\\parallel ≈ 11{τ }\\perp . This is also associated with the product {τ }\\parallel {γ }* , where {γ }* is the drift-wave linear growth rate, indicating that the instability must exist for roughly 100 times the growth time for the instability to enter the nonlinear growth phase. We explore the root of this correlation and the implications for linear device design.
MPF: A portable message passing facility for shared memory multiprocessors

NASA Technical Reports Server (NTRS)

Malony, Allen D.; Reed, Daniel A.; Mcguire, Patrick J.

1987-01-01

The design, implementation, and performance evaluation of a message passing facility (MPF) for shared memory multiprocessors are presented. The MPF is based on a message passing model conceptually similar to conversations. Participants (parallel processors) can enter or leave a conversation at any time. The message passing primitives for this model are implemented as a portable library of C function calls. The MPF is currently operational on a Sequent Balance 21000, and several parallel applications were developed and tested. Several simple benchmark programs are presented to establish interprocess communication performance for common patterns of interprocess communication. Finally, performance figures are presented for two parallel applications, linear systems solution, and iterative solution of partial differential equations.
Parallel algorithms for boundary value problems

NASA Technical Reports Server (NTRS)

Lin, Avi

1990-01-01

A general approach to solve boundary value problems numerically in a parallel environment is discussed. The basic algorithm consists of two steps: the local step where all the P available processors work in parallel, and the global step where one processor solves a tridiagonal linear system of the order P. The main advantages of this approach are two fold. First, this suggested approach is very flexible, especially in the local step and thus the algorithm can be used with any number of processors and with any of the SIMD or MIMD machines. Secondly, the communication complexity is very small and thus can be used as easily with shared memory machines. Several examples for using this strategy are discussed.
A dynamic system with digital lock-in-photon-counting for pharmacokinetic diffuse fluorescence tomography

NASA Astrophysics Data System (ADS)

Yin, Guoyan; Zhang, Limin; Zhang, Yanqi; Liu, Han; Du, Wenwen; Ma, Wenjuan; Zhao, Huijuan; Gao, Feng

2018-02-01

Pharmacokinetic diffuse fluorescence tomography (DFT) can describe the metabolic processes of fluorescent agents in biomedical tissue and provide helpful information for tumor differentiation. In this paper, a dynamic DFT system was developed by employing digital lock-in-photon-counting with square wave modulation, which predominates in ultra-high sensitivity and measurement parallelism. In this system, 16 frequency-encoded laser diodes (LDs) driven by self-designed light source system were distributed evenly in the imaging plane and irradiated simultaneously. Meanwhile, 16 detection fibers collected emission light in parallel by the digital lock-in-photon-counting module. The fundamental performances of the proposed system were assessed with phantom experiments in terms of stability, linearity, anti-crosstalk as well as images reconstruction. The results validated the availability of the proposed dynamic DFT system.
Parallel Optimization of Polynomials for Large-scale Problems in Stability and Control

NASA Astrophysics Data System (ADS)

Kamyar, Reza

In this thesis, we focus on some of the NP-hard problems in control theory. Thanks to the converse Lyapunov theory, these problems can often be modeled as optimization over polynomials. To avoid the problem of intractability, we establish a trade off between accuracy and complexity. In particular, we develop a sequence of tractable optimization problems --- in the form of Linear Programs (LPs) and/or Semi-Definite Programs (SDPs) --- whose solutions converge to the exact solution of the NP-hard problem. However, the computational and memory complexity of these LPs and SDPs grow exponentially with the progress of the sequence - meaning that improving the accuracy of the solutions requires solving SDPs with tens of thousands of decision variables and constraints. Setting up and solving such problems is a significant challenge. The existing optimization algorithms and software are only designed to use desktop computers or small cluster computers --- machines which do not have sufficient memory for solving such large SDPs. Moreover, the speed-up of these algorithms does not scale beyond dozens of processors. This in fact is the reason we seek parallel algorithms for setting-up and solving large SDPs on large cluster- and/or super-computers. We propose parallel algorithms for stability analysis of two classes of systems: 1) Linear systems with a large number of uncertain parameters; 2) Nonlinear systems defined by polynomial vector fields. First, we develop a distributed parallel algorithm which applies Polya's and/or Handelman's theorems to some variants of parameter-dependent Lyapunov inequalities with parameters defined over the standard simplex. The result is a sequence of SDPs which possess a block-diagonal structure. We then develop a parallel SDP solver which exploits this structure in order to map the computation, memory and communication to a distributed parallel environment. Numerical tests on a supercomputer demonstrate the ability of the algorithm to efficiently utilize hundreds and potentially thousands of processors, and analyze systems with 100+ dimensional state-space. Furthermore, we extend our algorithms to analyze robust stability over more complicated geometries such as hypercubes and arbitrary convex polytopes. Our algorithms can be readily extended to address a wide variety of problems in control such as Hinfinity synthesis for systems with parametric uncertainty and computing control Lyapunov functions.
The Television Generation, Television Literacy, and Television Trends.

ERIC Educational Resources Information Center

Cohen, Jodi R.

Unlike the linear, serial process of reading books, learning to "read" television is a parallel process in which multiple pieces of information are simultaneously received. Perceiving images, only one aspect of understanding television, requires the concurrent processing of information that is compounded within a symbol system. The…
Reduced-Order Structure-Preserving Model for Parallel-Connected Three-Phase Grid-Tied Inverters: Preprint

DOE Office of Scientific and Technical Information (OSTI.GOV)

Johnson, Brian B; Purba, Victor; Jafarpour, Saber

Given that next-generation infrastructures will contain large numbers of grid-connected inverters and these interfaces will be satisfying a growing fraction of system load, it is imperative to analyze the impacts of power electronics on such systems. However, since each inverter model has a relatively large number of dynamic states, it would be impractical to execute complex system models where the full dynamics of each inverter are retained. To address this challenge, we derive a reduced-order structure-preserving model for parallel-connected grid-tied three-phase inverters. Here, each inverter in the system is assumed to have a full-bridge topology, LCL filter at the pointmore » of common coupling, and the control architecture for each inverter includes a current controller, a power controller, and a phase-locked loop for grid synchronization. We outline a structure-preserving reduced-order inverter model for the setting where the parallel inverters are each designed such that the filter components and controller gains scale linearly with the power rating. By structure preserving, we mean that the reduced-order three-phase inverter model is also composed of an LCL filter, a power controller, current controller, and PLL. That is, we show that the system of parallel inverters can be modeled exactly as one aggregated inverter unit and this equivalent model has the same number of dynamical states as an individual inverter in the paralleled system. Numerical simulations validate the reduced-order models.« less
Linear-scaling density-functional simulations of charged point defects in Al2O3 using hierarchical sparse matrix algebra.

PubMed

Hine, N D M; Haynes, P D; Mostofi, A A; Payne, M C

2010-09-21

We present calculations of formation energies of defects in an ionic solid (Al(2)O(3)) extrapolated to the dilute limit, corresponding to a simulation cell of infinite size. The large-scale calculations required for this extrapolation are enabled by developments in the approach to parallel sparse matrix algebra operations, which are central to linear-scaling density-functional theory calculations. The computational cost of manipulating sparse matrices, whose sizes are determined by the large number of basis functions present, is greatly improved with this new approach. We present details of the sparse algebra scheme implemented in the ONETEP code using hierarchical sparsity patterns, and demonstrate its use in calculations on a wide range of systems, involving thousands of atoms on hundreds to thousands of parallel processes.
Master-slave interferometry for parallel spectral domain interferometry sensing and versatile 3D optical coherence tomography.

PubMed

Podoleanu, Adrian Gh; Bradu, Adrian

2013-08-12

Conventional spectral domain interferometry (SDI) methods suffer from the need of data linearization. When applied to optical coherence tomography (OCT), conventional SDI methods are limited in their 3D capability, as they cannot deliver direct en-face cuts. Here we introduce a novel SDI method, which eliminates these disadvantages. We denote this method as Master - Slave Interferometry (MSI), because a signal is acquired by a slave interferometer for an optical path difference (OPD) value determined by a master interferometer. The MSI method radically changes the main building block of an SDI sensor and of a spectral domain OCT set-up. The serially provided signal in conventional technology is replaced by multiple signals, a signal for each OPD point in the object investigated. This opens novel avenues in parallel sensing and in parallelization of signal processing in 3D-OCT, with applications in high- resolution medical imaging and microscopy investigation of biosamples. Eliminating the need of linearization leads to lower cost OCT systems and opens potential avenues in increasing the speed of production of en-face OCT images in comparison with conventional SDI.
Parallel iterative methods for sparse linear and nonlinear equations

NASA Technical Reports Server (NTRS)

Saad, Youcef

1989-01-01

As three-dimensional models are gaining importance, iterative methods will become almost mandatory. Among these, preconditioned Krylov subspace methods have been viewed as the most efficient and reliable, when solving linear as well as nonlinear systems of equations. There has been several different approaches taken to adapt iterative methods for supercomputers. Some of these approaches are discussed and the methods that deal more specifically with general unstructured sparse matrices, such as those arising from finite element methods, are emphasized.
Accurate and Efficient Parallel Implementation of an Effective Linear-Scaling Direct Random Phase Approximation Method.

PubMed

Graf, Daniel; Beuerle, Matthias; Schurkus, Henry F; Luenser, Arne; Savasci, Gökcen; Ochsenfeld, Christian

2018-05-08

An efficient algorithm for calculating the random phase approximation (RPA) correlation energy is presented that is as accurate as the canonical molecular orbital resolution-of-the-identity RPA (RI-RPA) with the important advantage of an effective linear-scaling behavior (instead of quartic) for large systems due to a formulation in the local atomic orbital space. The high accuracy is achieved by utilizing optimized minimax integration schemes and the local Coulomb metric attenuated by the complementary error function for the RI approximation. The memory bottleneck of former atomic orbital (AO)-RI-RPA implementations ( Schurkus, H. F.; Ochsenfeld, C. J. Chem. Phys. 2016 , 144 , 031101 and Luenser, A.; Schurkus, H. F.; Ochsenfeld, C. J. Chem. Theory Comput. 2017 , 13 , 1647 - 1655 ) is addressed by precontraction of the large 3-center integral matrix with the Cholesky factors of the ground state density reducing the memory requirements of that matrix by a factor of [Formula: see text]. Furthermore, we present a parallel implementation of our method, which not only leads to faster RPA correlation energy calculations but also to a scalable decrease in memory requirements, opening the door for investigations of large molecules even on small- to medium-sized computing clusters. Although it is known that AO methods are highly efficient for extended systems, where sparsity allows for reaching the linear-scaling regime, we show that our work also extends the applicability when considering highly delocalized systems for which no linear scaling can be achieved. As an example, the interlayer distance of two covalent organic framework pore fragments (comprising 384 atoms in total) is analyzed.
Noncoherent parallel optical processor for discrete two-dimensional linear transformations.

PubMed

Glaser, I

1980-10-01

We describe a parallel optical processor, based on a lenslet array, that provides general linear two-dimensional transformations using noncoherent light. Such a processor could become useful in image- and signal-processing applications in which the throughput requirements cannot be adequately satisfied by state-of-the-art digital processors. Experimental results that illustrate the feasibility of the processor by demonstrating its use in parallel optical computation of the two-dimensional Walsh-Hadamard transformation are presented.

Three pillars for achieving quantum mechanical molecular dynamics simulations of huge systems: Divide-and-conquer, density-functional tight-binding, and massively parallel computation.

PubMed

Nishizawa, Hiroaki; Nishimura, Yoshifumi; Kobayashi, Masato; Irle, Stephan; Nakai, Hiromi

2016-08-05

The linear-scaling divide-and-conquer (DC) quantum chemical methodology is applied to the density-functional tight-binding (DFTB) theory to develop a massively parallel program that achieves on-the-fly molecular reaction dynamics simulations of huge systems from scratch. The functions to perform large scale geometry optimization and molecular dynamics with DC-DFTB potential energy surface are implemented to the program called DC-DFTB-K. A novel interpolation-based algorithm is developed for parallelizing the determination of the Fermi level in the DC method. The performance of the DC-DFTB-K program is assessed using a laboratory computer and the K computer. Numerical tests show the high efficiency of the DC-DFTB-K program, a single-point energy gradient calculation of a one-million-atom system is completed within 60 s using 7290 nodes of the K computer. © 2016 Wiley Periodicals, Inc. © 2016 Wiley Periodicals, Inc.
Computational efficiency of parallel combinatorial OR-tree searches

NASA Technical Reports Server (NTRS)

Li, Guo-Jie; Wah, Benjamin W.

1990-01-01

The performance of parallel combinatorial OR-tree searches is analytically evaluated. This performance depends on the complexity of the problem to be solved, the error allowance function, the dominance relation, and the search strategies. The exact performance may be difficult to predict due to the nondeterminism and anomalies of parallelism. The authors derive the performance bounds of parallel OR-tree searches with respect to the best-first, depth-first, and breadth-first strategies, and verify these bounds by simulation. They show that a near-linear speedup can be achieved with respect to a large number of processors for parallel OR-tree searches. Using the bounds developed, the authors derive sufficient conditions for assuring that parallelism will not degrade performance and necessary conditions for allowing parallelism to have a speedup greater than the ratio of the numbers of processors. These bounds and conditions provide the theoretical foundation for determining the number of processors required to assure a near-linear speedup.
Linear static structural and vibration analysis on high-performance computers

NASA Technical Reports Server (NTRS)

Baddourah, M. A.; Storaasli, O. O.; Bostic, S. W.

1993-01-01

Parallel computers offer the oppurtunity to significantly reduce the computation time necessary to analyze large-scale aerospace structures. This paper presents algorithms developed for and implemented on massively-parallel computers hereafter referred to as Scalable High-Performance Computers (SHPC), for the most computationally intensive tasks involved in structural analysis, namely, generation and assembly of system matrices, solution of systems of equations and calculation of the eigenvalues and eigenvectors. Results on SHPC are presented for large-scale structural problems (i.e. models for High-Speed Civil Transport). The goal of this research is to develop a new, efficient technique which extends structural analysis to SHPC and makes large-scale structural analyses tractable.
Parallel Preconditioning for CFD Problems on the CM-5

NASA Technical Reports Server (NTRS)

Simon, Horst D.; Kremenetsky, Mark D.; Richardson, John; Lasinski, T. A. (Technical Monitor)

1994-01-01

Up to today, preconditioning methods on massively parallel systems have faced a major difficulty. The most successful preconditioning methods in terms of accelerating the convergence of the iterative solver such as incomplete LU factorizations are notoriously difficult to implement on parallel machines for two reasons: (1) the actual computation of the preconditioner is not very floating-point intensive, but requires a large amount of unstructured communication, and (2) the application of the preconditioning matrix in the iteration phase (i.e. triangular solves) are difficult to parallelize because of the recursive nature of the computation. Here we present a new approach to preconditioning for very large, sparse, unsymmetric, linear systems, which avoids both difficulties. We explicitly compute an approximate inverse to our original matrix. This new preconditioning matrix can be applied most efficiently for iterative methods on massively parallel machines, since the preconditioning phase involves only a matrix-vector multiplication, with possibly a dense matrix. Furthermore the actual computation of the preconditioning matrix has natural parallelism. For a problem of size n, the preconditioning matrix can be computed by solving n independent small least squares problems. The algorithm and its implementation on the Connection Machine CM-5 are discussed in detail and supported by extensive timings obtained from real problem data.
Parallelization of a Monte Carlo particle transport simulation code

NASA Astrophysics Data System (ADS)

Hadjidoukas, P.; Bousis, C.; Emfietzoglou, D.

2010-05-01

We have developed a high performance version of the Monte Carlo particle transport simulation code MC4. The original application code, developed in Visual Basic for Applications (VBA) for Microsoft Excel, was first rewritten in the C programming language for improving code portability. Several pseudo-random number generators have been also integrated and studied. The new MC4 version was then parallelized for shared and distributed-memory multiprocessor systems using the Message Passing Interface. Two parallel pseudo-random number generator libraries (SPRNG and DCMT) have been seamlessly integrated. The performance speedup of parallel MC4 has been studied on a variety of parallel computing architectures including an Intel Xeon server with 4 dual-core processors, a Sun cluster consisting of 16 nodes of 2 dual-core AMD Opteron processors and a 200 dual-processor HP cluster. For large problem size, which is limited only by the physical memory of the multiprocessor server, the speedup results are almost linear on all systems. We have validated the parallel implementation against the serial VBA and C implementations using the same random number generator. Our experimental results on the transport and energy loss of electrons in a water medium show that the serial and parallel codes are equivalent in accuracy. The present improvements allow for studying of higher particle energies with the use of more accurate physical models, and improve statistics as more particles tracks can be simulated in low response time.
Parallel robot for micro assembly with integrated innovative optical 3D-sensor

NASA Astrophysics Data System (ADS)

Hesselbach, Juergen; Ispas, Diana; Pokar, Gero; Soetebier, Sven; Tutsch, Rainer

2002-10-01

Recent advances in the fields of MEMS and MOEMS often require precise assembly of very small parts with an accuracy of a few microns. In order to meet this demand, a new approach using a robot based on parallel mechanisms in combination with a novel 3D-vision system has been chosen. The planar parallel robot structure with 2 DOF provides a high resolution in the XY-plane. It carries two additional serial axes for linear and rotational movement in/about z direction. In order to achieve high precision as well as good dynamic capabilities, the drive concept for the parallel (main) axes incorporates air bearings in combination with a linear electric servo motors. High accuracy position feedback is provided by optical encoders with a resolution of 0.1 μm. To allow for visualization and visual control of assembly processes, a camera module fits into the hollow tool head. It consists of a miniature CCD camera and a light source. In addition a modular gripper support is integrated into the tool head. To increase the accuracy a control loop based on an optoelectronic sensor will be implemented. As a result of an in-depth analysis of different approaches a photogrammetric system using one single camera and special beam-splitting optics was chosen. A pattern of elliptical marks is applied to the surfaces of workpiece and gripper. Using a model-based recognition algorithm the image processing software identifies the gripper and the workpiece and determines their relative position. A deviation vector is calculated and fed into the robot control to guide the gripper.
Hybrid MPI-OpenMP Parallelism in the ONETEP Linear-Scaling Electronic Structure Code: Application to the Delamination of Cellulose Nanofibrils.

PubMed

Wilkinson, Karl A; Hine, Nicholas D M; Skylaris, Chris-Kriton

2014-11-11

We present a hybrid MPI-OpenMP implementation of Linear-Scaling Density Functional Theory within the ONETEP code. We illustrate its performance on a range of high performance computing (HPC) platforms comprising shared-memory nodes with fast interconnect. Our work has focused on applying OpenMP parallelism to the routines which dominate the computational load, attempting where possible to parallelize different loops from those already parallelized within MPI. This includes 3D FFT box operations, sparse matrix algebra operations, calculation of integrals, and Ewald summation. While the underlying numerical methods are unchanged, these developments represent significant changes to the algorithms used within ONETEP to distribute the workload across CPU cores. The new hybrid code exhibits much-improved strong scaling relative to the MPI-only code and permits calculations with a much higher ratio of cores to atoms. These developments result in a significantly shorter time to solution than was possible using MPI alone and facilitate the application of the ONETEP code to systems larger than previously feasible. We illustrate this with benchmark calculations from an amyloid fibril trimer containing 41,907 atoms. We use the code to study the mechanism of delamination of cellulose nanofibrils when undergoing sonification, a process which is controlled by a large number of interactions that collectively determine the structural properties of the fibrils. Many energy evaluations were needed for these simulations, and as these systems comprise up to 21,276 atoms this would not have been feasible without the developments described here.
Parallel discrete event simulation: A shared memory approach

NASA Technical Reports Server (NTRS)

Reed, Daniel A.; Malony, Allen D.; Mccredie, Bradley D.

1987-01-01

With traditional event list techniques, evaluating a detailed discrete event simulation model can often require hours or even days of computation time. Parallel simulation mimics the interacting servers and queues of a real system by assigning each simulated entity to a processor. By eliminating the event list and maintaining only sufficient synchronization to insure causality, parallel simulation can potentially provide speedups that are linear in the number of processors. A set of shared memory experiments is presented using the Chandy-Misra distributed simulation algorithm to simulate networks of queues. Parameters include queueing network topology and routing probabilities, number of processors, and assignment of network nodes to processors. These experiments show that Chandy-Misra distributed simulation is a questionable alternative to sequential simulation of most queueing network models.
A parallel reaction-transport model applied to cement hydration and microstructure development

NASA Astrophysics Data System (ADS)

Bullard, Jeffrey W.; Enjolras, Edith; George, William L.; Satterfield, Steven G.; Terrill, Judith E.

2010-03-01

A recently described stochastic reaction-transport model on three-dimensional lattices is parallelized and is used to simulate the time-dependent structural and chemical evolution in multicomponent reactive systems. The model, called HydratiCA, uses probabilistic rules to simulate the kinetics of diffusion, homogeneous reactions and heterogeneous phenomena such as solid nucleation, growth and dissolution in complex three-dimensional systems. The algorithms require information only from each lattice site and its immediate neighbors, and this localization enables the parallelized model to exhibit near-linear scaling up to several hundred processors. Although applicable to a wide range of material systems, including sedimentary rock beds, reacting colloids and biochemical systems, validation is performed here on two minerals that are commonly found in Portland cement paste, calcium hydroxide and ettringite, by comparing their simulated dissolution or precipitation rates far from equilibrium to standard rate equations, and also by comparing simulated equilibrium states to thermodynamic calculations, as a function of temperature and pH. Finally, we demonstrate how HydratiCA can be used to investigate microstructure characteristics, such as spatial correlations between different condensed phases, in more complex microstructures.
Design of linear quadratic regulators with eigenvalue placement in a specified region

NASA Technical Reports Server (NTRS)

Shieh, Leang-San; Zhen, Liu; Coleman, Norman P.

1990-01-01

Two linear quadratic regulators are developed for placing the closed-loop poles of linear multivariable continuous-time systems within the common region of an open sector, bounded by lines inclined at +/- pi/2k (for a specified integer k not less than 1) from the negative real axis, and the left-hand side of a line parallel to the imaginary axis in the complex s-plane, and simultaneously minimizing a quadratic performance index. The design procedure mainly involves the solution of either Liapunov equations or Riccati equations. The general expression for finding the lower bound of a constant gain gamma is also developed.
Linear control theory for gene network modeling.

PubMed

Shin, Yong-Jun; Bleris, Leonidas

2010-09-16

Systems biology is an interdisciplinary field that aims at understanding complex interactions in cells. Here we demonstrate that linear control theory can provide valuable insight and practical tools for the characterization of complex biological networks. We provide the foundation for such analyses through the study of several case studies including cascade and parallel forms, feedback and feedforward loops. We reproduce experimental results and provide rational analysis of the observed behavior. We demonstrate that methods such as the transfer function (frequency domain) and linear state-space (time domain) can be used to predict reliably the properties and transient behavior of complex network topologies and point to specific design strategies for synthetic networks.
Performance Models for the Spike Banded Linear System Solver

DOE PAGES

Manguoglu, Murat; Saied, Faisal; Sameh, Ahmed; ...

2011-01-01

With availability of large-scale parallel platforms comprised of tens-of-thousands of processors and beyond, there is significant impetus for the development of scalable parallel sparse linear system solvers and preconditioners. An integral part of this design process is the development of performance models capable of predicting performance and providing accurate cost models for the solvers and preconditioners. There has been some work in the past on characterizing performance of the iterative solvers themselves. In this paper, we investigate the problem of characterizing performance and scalability of banded preconditioners. Recent work has demonstrated the superior convergence properties and robustness of banded preconditioners,more » compared to state-of-the-art ILU family of preconditioners as well as algebraic multigrid preconditioners. Furthermore, when used in conjunction with efficient banded solvers, banded preconditioners are capable of significantly faster time-to-solution. Our banded solver, the Truncated Spike algorithm is specifically designed for parallel performance and tolerance to deep memory hierarchies. Its regular structure is also highly amenable to accurate performance characterization. Using these characteristics, we derive the following results in this paper: (i) we develop parallel formulations of the Truncated Spike solver, (ii) we develop a highly accurate pseudo-analytical parallel performance model for our solver, (iii) we show excellent predication capabilities of our model – based on which we argue the high scalability of our solver. Our pseudo-analytical performance model is based on analytical performance characterization of each phase of our solver. These analytical models are then parameterized using actual runtime information on target platforms. An important consequence of our performance models is that they reveal underlying performance bottlenecks in both serial and parallel formulations. All of our results are validated on diverse heterogeneous multiclusters – platforms for which performance prediction is particularly challenging. Finally, we provide predict the scalability of the Spike algorithm using up to 65,536 cores with our model. In this paper we extend the results presented in the Ninth International Symposium on Parallel and Distributed Computing.« less
The Scalable Checkpoint/Restart Library

DOE Office of Scientific and Technical Information (OSTI.GOV)

Moody, A.

The Scalable Checkpoint/Restart (SCR) library provides an interface that codes may use to worite our and read in application-level checkpoints in a scalable fashion. In the current implementation, checkpoint files are cached in local storage (hard disk or RAM disk) on the compute nodes. This technique provides scalable aggregate bandwidth and uses storage resources that are fully dedicated to the job. This approach addresses the two common drawbacks of checkpointing a large-scale application to a shared parallel file system, namely, limited bandwidth and file system contention. In fact, on current platforms, SCR scales linearly with the number of compute nodes.more » It has been benchmarked as high as 720GB/s on 1094 nodes of Atlas, which is nearly two orders of magnitude faster thanthe parallel file system.« less
Extending substructure based iterative solvers to multiple load and repeated analyses

NASA Technical Reports Server (NTRS)

Farhat, Charbel

1993-01-01

Direct solvers currently dominate commercial finite element structural software, but do not scale well in the fine granularity regime targeted by emerging parallel processors. Substructure based iterative solvers--often called also domain decomposition algorithms--lend themselves better to parallel processing, but must overcome several obstacles before earning their place in general purpose structural analysis programs. One such obstacle is the solution of systems with many or repeated right hand sides. Such systems arise, for example, in multiple load static analyses and in implicit linear dynamics computations. Direct solvers are well-suited for these problems because after the system matrix has been factored, the multiple or repeated solutions can be obtained through relatively inexpensive forward and backward substitutions. On the other hand, iterative solvers in general are ill-suited for these problems because they often must restart from scratch for every different right hand side. In this paper, we present a methodology for extending the range of applications of domain decomposition methods to problems with multiple or repeated right hand sides. Basically, we formulate the overall problem as a series of minimization problems over K-orthogonal and supplementary subspaces, and tailor the preconditioned conjugate gradient algorithm to solve them efficiently. The resulting solution method is scalable, whereas direct factorization schemes and forward and backward substitution algorithms are not. We illustrate the proposed methodology with the solution of static and dynamic structural problems, and highlight its potential to outperform forward and backward substitutions on parallel computers. As an example, we show that for a linear structural dynamics problem with 11640 degrees of freedom, every time-step beyond time-step 15 is solved in a single iteration and consumes 1.0 second on a 32 processor iPSC-860 system; for the same problem and the same parallel processor, a pair of forward/backward substitutions at each step consumes 15.0 seconds.
Containment of groundwater contamination plumes: minimizing drawdown by aligning capture wells parallel to regional flow

NASA Astrophysics Data System (ADS)

Christ, John A.; Goltz, Mark N.

2004-01-01

Pump-and-treat systems that are installed to contain contaminated groundwater migration typically involve placement of extraction wells perpendicular to the regional groundwater flow direction at the down gradient edge of a contaminant plume. These wells capture contaminated water for above ground treatment and disposal, thereby preventing further migration of contaminated water down gradient. In this work, examining two-, three-, and four-well systems, we compare well configurations that are parallel and perpendicular to the regional groundwater flow direction. We show that orienting extraction wells co-linearly, parallel to regional flow, results in (1) a larger area of aquifer influenced by the wells at a given total well flow rate, (2) a center and ultimate capture zone width equal to the perpendicular configuration, and (3) more flexibility with regard to minimizing drawdown. Although not suited for some scenarios, we found orienting extraction wells parallel to regional flow along a plume centerline, when compared to a perpendicular configuration, reduces drawdown by up to 7% and minimizes the fraction of uncontaminated water captured.
A scalable geometric multigrid solver for nonsymmetric elliptic systems with application to variable-density flows

NASA Astrophysics Data System (ADS)

Esmaily, M.; Jofre, L.; Mani, A.; Iaccarino, G.

2018-03-01

A geometric multigrid algorithm is introduced for solving nonsymmetric linear systems resulting from the discretization of the variable density Navier-Stokes equations on nonuniform structured rectilinear grids and high-Reynolds number flows. The restriction operation is defined such that the resulting system on the coarser grids is symmetric, thereby allowing for the use of efficient smoother algorithms. To achieve an optimal rate of convergence, the sequence of interpolation and restriction operations are determined through a dynamic procedure. A parallel partitioning strategy is introduced to minimize communication while maintaining the load balance between all processors. To test the proposed algorithm, we consider two cases: 1) homogeneous isotropic turbulence discretized on uniform grids and 2) turbulent duct flow discretized on stretched grids. Testing the algorithm on systems with up to a billion unknowns shows that the cost varies linearly with the number of unknowns. This O (N) behavior confirms the robustness of the proposed multigrid method regarding ill-conditioning of large systems characteristic of multiscale high-Reynolds number turbulent flows. The robustness of our method to density variations is established by considering cases where density varies sharply in space by a factor of up to 104, showing its applicability to two-phase flow problems. Strong and weak scalability studies are carried out, employing up to 30,000 processors, to examine the parallel performance of our implementation. Excellent scalability of our solver is shown for a granularity as low as 104 to 105 unknowns per processor. At its tested peak throughput, it solves approximately 4 billion unknowns per second employing over 16,000 processors with a parallel efficiency higher than 50%.
Parallel Reconstruction Using Null Operations (PRUNO)

PubMed Central

Zhang, Jian; Liu, Chunlei; Moseley, Michael E.

2011-01-01

A novel iterative k-space data-driven technique, namely Parallel Reconstruction Using Null Operations (PRUNO), is presented for parallel imaging reconstruction. In PRUNO, both data calibration and image reconstruction are formulated into linear algebra problems based on a generalized system model. An optimal data calibration strategy is demonstrated by using Singular Value Decomposition (SVD). And an iterative conjugate- gradient approach is proposed to efficiently solve missing k-space samples during reconstruction. With its generalized formulation and precise mathematical model, PRUNO reconstruction yields good accuracy, flexibility, stability. Both computer simulation and in vivo studies have shown that PRUNO produces much better reconstruction quality than autocalibrating partially parallel acquisition (GRAPPA), especially under high accelerating rates. With the aid of PRUO reconstruction, ultra high accelerating parallel imaging can be performed with decent image quality. For example, we have done successful PRUNO reconstruction at a reduction factor of 6 (effective factor of 4.44) with 8 coils and only a few autocalibration signal (ACS) lines. PMID:21604290
Linear Arrangement of Motor Protein on a Mechanically Deposited Fluoropolymer Thin Film

NASA Astrophysics Data System (ADS)

Suzuki, Hitoshi; Oiwa, Kazuhiro; Yamada, Akira; Sakakibara, Hitoshi; Nakayama, Haruto; Mashiko, Shinro

1995-07-01

Motor protein molecules such as heavy meromyosin (HMM), one of the major components of skeletal muscle, were arranged linearly on a mechanically deposited fluoropolymer thin film substrate in order to regulate the direction of movement generated by the motor protein. The fluoropolymer film consisted of many linear parallel ridges whose heights and widths were 10 to 20 nm and 10 to 100 nm, respectively. The fluoropolymer ridges adsorbed HMM molecules that were applied onto the film. Actin filaments labeled with rhodamine-phalloidin were observed under a fluorescence microscope moving linearly on the HMM-coated ridges. The observation indicates that HMM molecules were aligned on the fluoropolymer ridges while retaining their function. The velocity of actin movement was measured in this system.
A Navier-Strokes Chimera Code on the Connection Machine CM-5: Design and Performance

NASA Technical Reports Server (NTRS)

Jespersen, Dennis C.; Levit, Creon; Kwak, Dochan (Technical Monitor)

1994-01-01

We have implemented a three-dimensional compressible Navier-Stokes code on the Connection Machine CM-5. The code is set up for implicit time-stepping on single or multiple structured grids. For multiple grids and geometrically complex problems, we follow the 'chimera' approach, where flow data on one zone is interpolated onto another in the region of overlap. We will describe our design philosophy and give some timing results for the current code. A parallel machine like the CM-5 is well-suited for finite-difference methods on structured grids. The regular pattern of connections of a structured mesh maps well onto the architecture of the machine. So the first design choice, finite differences on a structured mesh, is natural. We use centered differences in space, with added artificial dissipation terms. When numerically solving the Navier-Stokes equations, there are liable to be some mesh cells near a solid body that are small in at least one direction. This mesh cell geometry can impose a very severe CFL (Courant-Friedrichs-Lewy) condition on the time step for explicit time-stepping methods. Thus, though explicit time-stepping is well-suited to the architecture of the machine, we have adopted implicit time-stepping. We have further taken the approximate factorization approach. This creates the need to solve large banded linear systems and creates the first possible barrier to an efficient algorithm. To overcome this first possible barrier we have considered two options. The first is just to solve the banded linear systems with data spread over the whole machine, using whatever fast method is available. This option is adequate for solving scalar tridiagonal systems, but for scalar pentadiagonal or block tridiagonal systems it is somewhat slower than desired. The second option is to 'transpose' the flow and geometry variables as part of the time-stepping process: Start with x-lines of data in-processor. Form explicit terms in x, then transpose so y-lines of data are in-processor. Form explicit terms in y, then transpose so z-lines are in processor. Form explicit terms in z, then solve linear systems in the z-direction. Transpose to the y-direction, then solve linear systems in the y-direction. Finally transpose to the x direction and solve linear systems in the x-direction. This strategy avoids inter-processor communication when differencing and solving linear systems, but requires a large amount of communication when doing the transposes. The transpose method is more efficient than the non-transpose strategy when dealing with scalar pentadiagonal or block tridiagonal systems. For handling geometrically complex problems the chimera strategy was adopted. For multiple zone cases we compute on each zone sequentially (using the whole parallel machine), then send the chimera interpolation data to a distributed data structure (array) laid out over the whole machine. This information transfer implies an irregular communication pattern, and is the second possible barrier to an efficient algorithm. We have implemented these ideas on the CM-5 using CMF (Connection Machine Fortran), a data parallel language which combines elements of Fortran 90 and certain extensions, and which bears a strong similarity to High Performance Fortran. We make use of the Connection Machine Scientific Software Library (CMSSL) for the linear solver and array transpose operations.
Parallelization of implicit finite difference schemes in computational fluid dynamics

NASA Technical Reports Server (NTRS)

Decker, Naomi H.; Naik, Vijay K.; Nicoules, Michel

1990-01-01

Implicit finite difference schemes are often the preferred numerical schemes in computational fluid dynamics, requiring less stringent stability bounds than the explicit schemes. Each iteration in an implicit scheme involves global data dependencies in the form of second and higher order recurrences. Efficient parallel implementations of such iterative methods are considerably more difficult and non-intuitive. The parallelization of the implicit schemes that are used for solving the Euler and the thin layer Navier-Stokes equations and that require inversions of large linear systems in the form of block tri-diagonal and/or block penta-diagonal matrices is discussed. Three-dimensional cases are emphasized and schemes that minimize the total execution time are presented. Partitioning and scheduling schemes for alleviating the effects of the global data dependencies are described. An analysis of the communication and the computation aspects of these methods is presented. The effect of the boundary conditions on the parallel schemes is also discussed.

Advanced Computational Methods for Security Constrained Financial Transmission Rights: Structure and Parallelism

DOE Office of Scientific and Technical Information (OSTI.GOV)

Elbert, Stephen T.; Kalsi, Karanjit; Vlachopoulou, Maria

Financial Transmission Rights (FTRs) help power market participants reduce price risks associated with transmission congestion. FTRs are issued based on a process of solving a constrained optimization problem with the objective to maximize the FTR social welfare under power flow security constraints. Security constraints for different FTR categories (monthly, seasonal or annual) are usually coupled and the number of constraints increases exponentially with the number of categories. Commercial software for FTR calculation can only provide limited categories of FTRs due to the inherent computational challenges mentioned above. In this paper, a novel non-linear dynamical system (NDS) approach is proposed tomore » solve the optimization problem. The new formulation and performance of the NDS solver is benchmarked against widely used linear programming (LP) solvers like CPLEX™ and tested on large-scale systems using data from the Western Electricity Coordinating Council (WECC). The NDS is demonstrated to outperform the widely used CPLEX algorithms while exhibiting superior scalability. Furthermore, the NDS based solver can be easily parallelized which results in significant computational improvement.« less
Parallel algorithms for mapping pipelined and parallel computations

NASA Technical Reports Server (NTRS)

Nicol, David M.

1988-01-01

Many computational problems in image processing, signal processing, and scientific computing are naturally structured for either pipelined or parallel computation. When mapping such problems onto a parallel architecture it is often necessary to aggregate an obvious problem decomposition. Even in this context the general mapping problem is known to be computationally intractable, but recent advances have been made in identifying classes of problems and architectures for which optimal solutions can be found in polynomial time. Among these, the mapping of pipelined or parallel computations onto linear array, shared memory, and host-satellite systems figures prominently. This paper extends that work first by showing how to improve existing serial mapping algorithms. These improvements have significantly lower time and space complexities: in one case a published O(nm sup 3) time algorithm for mapping m modules onto n processors is reduced to an O(nm log m) time complexity, and its space requirements reduced from O(nm sup 2) to O(m). Run time complexity is further reduced with parallel mapping algorithms based on these improvements, which run on the architecture for which they create the mappings.
Nonlinear aeroacoustic characterization of Helmholtz resonators with a local-linear neuro-fuzzy network model

NASA Astrophysics Data System (ADS)

Förner, K.; Polifke, W.

2017-10-01

The nonlinear acoustic behavior of Helmholtz resonators is characterized by a data-based reduced-order model, which is obtained by a combination of high-resolution CFD simulation and system identification. It is shown that even in the nonlinear regime, a linear model is capable of describing the reflection behavior at a particular amplitude with quantitative accuracy. This observation motivates to choose a local-linear model structure for this study, which consists of a network of parallel linear submodels. A so-called fuzzy-neuron layer distributes the input signal over the linear submodels, depending on the root mean square of the particle velocity at the resonator surface. The resulting model structure is referred to as an local-linear neuro-fuzzy network. System identification techniques are used to estimate the free parameters of this model from training data. The training data are generated by CFD simulations of the resonator, with persistent acoustic excitation over a wide range of frequencies and sound pressure levels. The estimated nonlinear, reduced-order models show good agreement with CFD and experimental data over a wide range of amplitudes for several test cases.
Parallel Computation of Flow in Heterogeneous Media Modelled by Mixed Finite Elements

NASA Astrophysics Data System (ADS)

Cliffe, K. A.; Graham, I. G.; Scheichl, R.; Stals, L.

2000-11-01

In this paper we describe a fast parallel method for solving highly ill-conditioned saddle-point systems arising from mixed finite element simulations of stochastic partial differential equations (PDEs) modelling flow in heterogeneous media. Each realisation of these stochastic PDEs requires the solution of the linear first-order velocity-pressure system comprising Darcy's law coupled with an incompressibility constraint. The chief difficulty is that the permeability may be highly variable, especially when the statistical model has a large variance and a small correlation length. For reasonable accuracy, the discretisation has to be extremely fine. We solve these problems by first reducing the saddle-point formulation to a symmetric positive definite (SPD) problem using a suitable basis for the space of divergence-free velocities. The reduced problem is solved using parallel conjugate gradients preconditioned with an algebraically determined additive Schwarz domain decomposition preconditioner. The result is a solver which exhibits a good degree of robustness with respect to the mesh size as well as to the variance and to physically relevant values of the correlation length of the underlying permeability field. Numerical experiments exhibit almost optimal levels of parallel efficiency. The domain decomposition solver (DOUG, http://www.maths.bath.ac.uk/~parsoft) used here not only is applicable to this problem but can be used to solve general unstructured finite element systems on a wide range of parallel architectures.
The Event Based Language and Its Multiple Processor Implementations.

DTIC Science & Technology

1980-01-01

10 6.1 "Recursive" Linear Fibonacci ................................................ 105 6.2 The Readers Writers Problem...kinds. Examples of such systems are: C.mmp [Wu-72], Pluribus [He-73], Data Flow [ De -75], the boolean n-cube parallel machine [Su-77], and the MuNet [Wa...concurrency within programs; therefore, we hate concentrated on two types of systems which seem suitable: a processor network, and a data flow processor [ De -77
Reduced-Order Structure-Preserving Model for Parallel-Connected Three-Phase Grid-Tied Inverters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Johnson, Brian B; Purba, Victor; Jafarpour, Saber

Next-generation power networks will contain large numbers of grid-connected inverters satisfying a significant fraction of system load. Since each inverter model has a relatively large number of dynamic states, it is impractical to analyze complex system models where the full dynamics of each inverter are retained. To address this challenge, we derive a reduced-order structure-preserving model for parallel-connected grid-tied three-phase inverters. Here, each inverter in the system is assumed to have a full-bridge topology, LCL filter at the point of common coupling, and the control architecture for each inverter includes a current controller, a power controller, and a phase-locked loopmore » for grid synchronization. We outline a structure-preserving reduced-order inverter model with lumped parameters for the setting where the parallel inverters are each designed such that the filter components and controller gains scale linearly with the power rating. By structure preserving, we mean that the reduced-order three-phase inverter model is also composed of an LCL filter, a power controller, current controller, and PLL. We show that the system of parallel inverters can be modeled exactly as one aggregated inverter unit and this equivalent model has the same number of dynamical states as any individual inverter in the system. Numerical simulations validate the reduced-order model.« less
Parallel-vector computation for linear structural analysis and non-linear unconstrained optimization problems

NASA Technical Reports Server (NTRS)

Nguyen, D. T.; Al-Nasra, M.; Zhang, Y.; Baddourah, M. A.; Agarwal, T. K.; Storaasli, O. O.; Carmona, E. A.

1991-01-01

Several parallel-vector computational improvements to the unconstrained optimization procedure are described which speed up the structural analysis-synthesis process. A fast parallel-vector Choleski-based equation solver, pvsolve, is incorporated into the well-known SAP-4 general-purpose finite-element code. The new code, denoted PV-SAP, is tested for static structural analysis. Initial results on a four processor CRAY 2 show that using pvsolve reduces the equation solution time by a factor of 14-16 over the original SAP-4 code. In addition, parallel-vector procedures for the Golden Block Search technique and the BFGS method are developed and tested for nonlinear unconstrained optimization. A parallel version of an iterative solver and the pvsolve direct solver are incorporated into the BFGS method. Preliminary results on nonlinear unconstrained optimization test problems, using pvsolve in the analysis, show excellent parallel-vector performance indicating that these parallel-vector algorithms can be used in a new generation of finite-element based structural design/analysis-synthesis codes.
Linearized Optically Phase-Modulated Fiber Optic Links for Microwave Signal Transport

DTIC Science & Technology

2009-03-03

detectors (with internal 50- Ohm resistors) capable of 40-mA dc current per detector. With this link, the linearized SFDR would improve to 133 dB/Hz4/5...the IF) limitation on the signal. All calculations consider the 3dB power loss from the hybrid combiner and 6dB loss from parallel 50- Ohm resistors...283. [25] M. Nazarathy, J. Berger, A. Ley , I. Levi, and Y. Kagan, “Externally Modulated 80 Channel Am Catv Fiber-to-feeder Distribution System Over
Anderson acceleration of the Jacobi iterative method: An efficient alternative to Krylov methods for large, sparse linear systems

DOE PAGES

Pratapa, Phanisri P.; Suryanarayana, Phanish; Pask, John E.

2015-12-01

We employ Anderson extrapolation to accelerate the classical Jacobi iterative method for large, sparse linear systems. Specifically, we utilize extrapolation at periodic intervals within the Jacobi iteration to develop the Alternating Anderson–Jacobi (AAJ) method. We verify the accuracy and efficacy of AAJ in a range of test cases, including nonsymmetric systems of equations. We demonstrate that AAJ possesses a favorable scaling with system size that is accompanied by a small prefactor, even in the absence of a preconditioner. In particular, we show that AAJ is able to accelerate the classical Jacobi iteration by over four orders of magnitude, with speed-upsmore » that increase as the system gets larger. Moreover, we find that AAJ significantly outperforms the Generalized Minimal Residual (GMRES) method in the range of problems considered here, with the relative performance again improving with size of the system. As a result, the proposed method represents a simple yet efficient technique that is particularly attractive for large-scale parallel solutions of linear systems of equations.« less
Anderson acceleration of the Jacobi iterative method: An efficient alternative to Krylov methods for large, sparse linear systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Pratapa, Phanisri P.; Suryanarayana, Phanish; Pask, John E.

We employ Anderson extrapolation to accelerate the classical Jacobi iterative method for large, sparse linear systems. Specifically, we utilize extrapolation at periodic intervals within the Jacobi iteration to develop the Alternating Anderson–Jacobi (AAJ) method. We verify the accuracy and efficacy of AAJ in a range of test cases, including nonsymmetric systems of equations. We demonstrate that AAJ possesses a favorable scaling with system size that is accompanied by a small prefactor, even in the absence of a preconditioner. In particular, we show that AAJ is able to accelerate the classical Jacobi iteration by over four orders of magnitude, with speed-upsmore » that increase as the system gets larger. Moreover, we find that AAJ significantly outperforms the Generalized Minimal Residual (GMRES) method in the range of problems considered here, with the relative performance again improving with size of the system. As a result, the proposed method represents a simple yet efficient technique that is particularly attractive for large-scale parallel solutions of linear systems of equations.« less
Parallel processor for real-time structural control

NASA Astrophysics Data System (ADS)

Tise, Bert L.

1993-07-01

A parallel processor that is optimized for real-time linear control has been developed. This modular system consists of A/D modules, D/A modules, and floating-point processor modules. The scalable processor uses up to 1,000 Motorola DSP96002 floating-point processors for a peak computational rate of 60 GFLOPS. Sampling rates up to 625 kHz are supported by this analog-in to analog-out controller. The high processing rate and parallel architecture make this processor suitable for computing state-space equations and other multiply/accumulate-intensive digital filters. Processor features include 14-bit conversion devices, low input-to-output latency, 240 Mbyte/s synchronous backplane bus, low-skew clock distribution circuit, VME connection to host computer, parallelizing code generator, and look- up-tables for actuator linearization. This processor was designed primarily for experiments in structural control. The A/D modules sample sensors mounted on the structure and the floating- point processor modules compute the outputs using the programmed control equations. The outputs are sent through the D/A module to the power amps used to drive the structure's actuators. The host computer is a Sun workstation. An OpenWindows-based control panel is provided to facilitate data transfer to and from the processor, as well as to control the operating mode of the processor. A diagnostic mode is provided to allow stimulation of the structure and acquisition of the structural response via sensor inputs.
Development of a 3D parallel mechanism robot arm with three vertical-axial pneumatic actuators combined with a stereo vision system.

PubMed

Chiang, Mao-Hsiung; Lin, Hao-Ting

2011-01-01

This study aimed to develop a novel 3D parallel mechanism robot driven by three vertical-axial pneumatic actuators with a stereo vision system for path tracking control. The mechanical system and the control system are the primary novel parts for developing a 3D parallel mechanism robot. In the mechanical system, a 3D parallel mechanism robot contains three serial chains, a fixed base, a movable platform and a pneumatic servo system. The parallel mechanism are designed and analyzed first for realizing a 3D motion in the X-Y-Z coordinate system of the robot's end-effector. The inverse kinematics and the forward kinematics of the parallel mechanism robot are investigated by using the Denavit-Hartenberg notation (D-H notation) coordinate system. The pneumatic actuators in the three vertical motion axes are modeled. In the control system, the Fourier series-based adaptive sliding-mode controller with H(∞) tracking performance is used to design the path tracking controllers of the three vertical servo pneumatic actuators for realizing 3D path tracking control of the end-effector. Three optical linear scales are used to measure the position of the three pneumatic actuators. The 3D position of the end-effector is then calculated from the measuring position of the three pneumatic actuators by means of the kinematics. However, the calculated 3D position of the end-effector cannot consider the manufacturing and assembly tolerance of the joints and the parallel mechanism so that errors between the actual position and the calculated 3D position of the end-effector exist. In order to improve this situation, sensor collaboration is developed in this paper. A stereo vision system is used to collaborate with the three position sensors of the pneumatic actuators. The stereo vision system combining two CCD serves to measure the actual 3D position of the end-effector and calibrate the error between the actual and the calculated 3D position of the end-effector. Furthermore, to verify the feasibility of the proposed parallel mechanism robot driven by three vertical pneumatic servo actuators, a full-scale test rig of the proposed parallel mechanism pneumatic robot is set up. Thus, simulations and experiments for different complex 3D motion profiles of the robot end-effector can be successfully achieved. The desired, the actual and the calculated 3D position of the end-effector can be compared in the complex 3D motion control.
Development of a 3D Parallel Mechanism Robot Arm with Three Vertical-Axial Pneumatic Actuators Combined with a Stereo Vision System

PubMed Central

Chiang, Mao-Hsiung; Lin, Hao-Ting

2011-01-01

This study aimed to develop a novel 3D parallel mechanism robot driven by three vertical-axial pneumatic actuators with a stereo vision system for path tracking control. The mechanical system and the control system are the primary novel parts for developing a 3D parallel mechanism robot. In the mechanical system, a 3D parallel mechanism robot contains three serial chains, a fixed base, a movable platform and a pneumatic servo system. The parallel mechanism are designed and analyzed first for realizing a 3D motion in the X-Y-Z coordinate system of the robot’s end-effector. The inverse kinematics and the forward kinematics of the parallel mechanism robot are investigated by using the Denavit-Hartenberg notation (D-H notation) coordinate system. The pneumatic actuators in the three vertical motion axes are modeled. In the control system, the Fourier series-based adaptive sliding-mode controller with H∞ tracking performance is used to design the path tracking controllers of the three vertical servo pneumatic actuators for realizing 3D path tracking control of the end-effector. Three optical linear scales are used to measure the position of the three pneumatic actuators. The 3D position of the end-effector is then calculated from the measuring position of the three pneumatic actuators by means of the kinematics. However, the calculated 3D position of the end-effector cannot consider the manufacturing and assembly tolerance of the joints and the parallel mechanism so that errors between the actual position and the calculated 3D position of the end-effector exist. In order to improve this situation, sensor collaboration is developed in this paper. A stereo vision system is used to collaborate with the three position sensors of the pneumatic actuators. The stereo vision system combining two CCD serves to measure the actual 3D position of the end-effector and calibrate the error between the actual and the calculated 3D position of the end-effector. Furthermore, to verify the feasibility of the proposed parallel mechanism robot driven by three vertical pneumatic servo actuators, a full-scale test rig of the proposed parallel mechanism pneumatic robot is set up. Thus, simulations and experiments for different complex 3D motion profiles of the robot end-effector can be successfully achieved. The desired, the actual and the calculated 3D position of the end-effector can be compared in the complex 3D motion control. PMID:22247676
Graph cuts via l1 norm minimization.

PubMed

Bhusnurmath, Arvind; Taylor, Camillo J

2008-10-01

Graph cuts have become an increasingly important tool for solving a number of energy minimization problems in computer vision and other fields. In this paper, the graph cut problem is reformulated as an unconstrained l1 norm minimization that can be solved effectively using interior point methods. This reformulation exposes connections between the graph cuts and other related continuous optimization problems. Eventually the problem is reduced to solving a sequence of sparse linear systems involving the Laplacian of the underlying graph. The proposed procedure exploits the structure of these linear systems in a manner that is easily amenable to parallel implementations. Experimental results obtained by applying the procedure to graphs derived from image processing problems are provided.
Iterative-method performance evaluation for multiple vectors associated with a large-scale sparse matrix

NASA Astrophysics Data System (ADS)

Imamura, Seigo; Ono, Kenji; Yokokawa, Mitsuo

2016-07-01

Ensemble computing, which is an instance of capacity computing, is an effective computing scenario for exascale parallel supercomputers. In ensemble computing, there are multiple linear systems associated with a common coefficient matrix. We improve the performance of iterative solvers for multiple vectors by solving them at the same time, that is, by solving for the product of the matrices. We implemented several iterative methods and compared their performance. The maximum performance on Sparc VIIIfx was 7.6 times higher than that of a naïve implementation. Finally, to deal with the different convergence processes of linear systems, we introduced a control method to eliminate the calculation of already converged vectors.
Quasi-disjoint pentadiagonal matrix systems for the parallelization of compact finite-difference schemes and filters

NASA Astrophysics Data System (ADS)

Kim, Jae Wook

2013-05-01

This paper proposes a novel systematic approach for the parallelization of pentadiagonal compact finite-difference schemes and filters based on domain decomposition. The proposed approach allows a pentadiagonal banded matrix system to be split into quasi-disjoint subsystems by using a linear-algebraic transformation technique. As a result the inversion of pentadiagonal matrices can be implemented within each subdomain in an independent manner subject to a conventional halo-exchange process. The proposed matrix transformation leads to new subdomain boundary (SB) compact schemes and filters that require three halo terms to exchange with neighboring subdomains. The internode communication overhead in the present approach is equivalent to that of standard explicit schemes and filters based on seven-point discretization stencils. The new SB compact schemes and filters demand additional arithmetic operations compared to the original serial ones. However, it is shown that the additional cost becomes sufficiently low by choosing optimal sizes of their discretization stencils. Compared to earlier published results, the proposed SB compact schemes and filters successfully reduce parallelization artifacts arising from subdomain boundaries to a level sufficiently negligible for sophisticated aeroacoustic simulations without degrading parallel efficiency. The overall performance and parallel efficiency of the proposed approach are demonstrated by stringent benchmark tests.
Parallel Symmetric Eigenvalue Problem Solvers

DTIC Science & Technology

2015-05-01

get research, tutoring, and mentoring experience as an undergraduate. Last but not least, I thank my family for their love and support. v TABLE OF...32 4.6.2 Choice of the Ritz shifts . . . . . . . . . . . . . . . . . . . . 37 4.7 Relationship between...pencil. I will conclude with a discussion of the relationship between Trace- Min and simultaneous iteration. If both methods solve the linear systems
Evaluating Sparse Linear System Solvers on Scalable Parallel Architectures

DTIC Science & Technology

2008-10-01

42 3.4 Residual history of WSO banded preconditioner for problem 2D 54019 HIGHK . . . . . . . . . . . . . . . . . . . . . . . . . . 43...3.5 Residual history of WSO banded preconditioner for problem Appu 43 3.6 Residual history of WSO banded preconditioner for problem ASIC 680k...44 3.7 Residual history of WSO banded preconditioner for problem BUN- DLE1
DOE Office of Scientific and Technical Information (OSTI.GOV)

Dritz, K.W.; Boyle, J.M.

This paper addresses the problem of measuring and analyzing the performance of fine-grained parallel programs running on shared-memory multiprocessors. Such processors use locking (either directly in the application program, or indirectly in a subroutine library or the operating system) to serialize accesses to global variables. Given sufficiently high rates of locking, the chief factor preventing linear speedup (besides lack of adequate inherent parallelism in the application) is lock contention - the blocking of processes that are trying to acquire a lock currently held by another process. We show how a high-resolution, low-overhead clock may be used to measure both lockmore » contention and lack of parallel work. Several ways of presenting the results are covered, culminating in a method for calculating, in a single multiprocessing run, both the speedup actually achieved and the speedup lost to contention for each lock and to lack of parallel work. The speedup losses are reported in the same units, ''processor-equivalents,'' as the speedup achieved. Both are obtained without having to perform the usual one-process comparison run. We chronicle also a variety of experiments motivated by actual results obtained with our measurement method. The insights into program performance that we gained from these experiments helped us to refine the parts of our programs concerned with communication and synchronization. Ultimately these improvements reduced lock contention to a negligible amount and yielded nearly linear speedup in applications not limited by lack of parallel work. We describe two generally applicable strategies (''code motion out of critical regions'' and ''critical-region fissioning'') for reducing lock contention and one (''lock/variable fusion'') applicable only on certain architectures.« less
Multimode power processor

DOEpatents

O'Sullivan, G.A.; O'Sullivan, J.A.

1999-07-27

In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources. 31 figs.

Multimode power processor

DOEpatents

O'Sullivan, George A.; O'Sullivan, Joseph A.

1999-01-01

In one embodiment, a power processor which operates in three modes: an inverter mode wherein power is delivered from a battery to an AC power grid or load; a battery charger mode wherein the battery is charged by a generator; and a parallel mode wherein the generator supplies power to the AC power grid or load in parallel with the battery. In the parallel mode, the system adapts to arbitrary non-linear loads. The power processor may operate on a per-phase basis wherein the load may be synthetically transferred from one phase to another by way of a bumpless transfer which causes no interruption of power to the load when transferring energy sources. Voltage transients and frequency transients delivered to the load when switching between the generator and battery sources are minimized, thereby providing an uninterruptible power supply. The power processor may be used as part of a hybrid electrical power source system which may contain, in one embodiment, a photovoltaic array, diesel engine, and battery power sources.
Automated problem scheduling and reduction of synchronization delay effects

NASA Technical Reports Server (NTRS)

Saltz, Joel H.

1987-01-01

It is anticipated that in order to make effective use of many future high performance architectures, programs will have to exhibit at least a medium grained parallelism. A framework is presented for partitioning very sparse triangular systems of linear equations that is designed to produce favorable preformance results in a wide variety of parallel architectures. Efficient methods for solving these systems are of interest because: (1) they provide a useful model problem for use in exploring heuristics for the aggregation, mapping and scheduling of relatively fine grained computations whose data dependencies are specified by directed acrylic graphs, and (2) because such efficient methods can find direct application in the development of parallel algorithms for scientific computation. Simple expressions are derived that describe how to schedule computational work with varying degrees of granularity. The Encore Multimax was used as a hardware simulator to investigate the performance effects of using the partitioning techniques presented in shared memory architectures with varying relative synchronization costs.
New trends in Taylor series based applications

NASA Astrophysics Data System (ADS)

Kocina, Filip; Šátek, Václav; Veigend, Petr; Nečasová, Gabriela; Valenta, Václav; Kunovský, Jiří

2016-06-01

The paper deals with the solution of large system of linear ODEs when minimal comunication among parallel processors is required. The Modern Taylor Series Method (MTSM) is used. The MTSM allows using a higher order during the computation that means a larger integration step size while keeping desired accuracy. As an example of complex systems we can take the Telegraph Equation Model. Symbolic and numeric solutions are compared when harmonic input signal is used.
Estimation of vibration frequency of loudspeaker diaphragm by parallel phase-shifting digital holography

NASA Astrophysics Data System (ADS)

Kakue, T.; Endo, Y.; Shimobaba, T.; Ito, T.

2014-11-01

We report frequency estimation of loudspeaker diaphragm vibrating at high speed by parallel phase-shifting digital holography which is a technique of single-shot phase-shifting interferometry. This technique records multiple phaseshifted holograms required for phase-shifting interferometry by using space-division multiplexing. We constructed a parallel phase-shifting digital holography system consisting of a high-speed polarization-imaging camera. This camera has a micro-polarizer array which selects four linear polarization axes for 2 × 2 pixels. We set a loudspeaker as an object, and recorded vibration of diaphragm of the loudspeaker by the constructed system. By the constructed system, we demonstrated observation of vibration displacement of loudspeaker diaphragm. In this paper, we aim to estimate vibration frequency of the loudspeaker diaphragm by applying the experimental results to frequency analysis. Holograms consisting of 128 × 128 pixels were recorded at a frame rate of 262,500 frames per second by the camera. A sinusoidal wave was input to the loudspeaker via a phone connector. We observed displacement of the loudspeaker diaphragm vibrating by the system. We also succeeded in estimating vibration frequency of the loudspeaker diaphragm by applying frequency analysis to the experimental results.
Using earthquake clusters to identify fracture zones at Puna geothermal field, Hawaii

NASA Astrophysics Data System (ADS)

Lucas, A.; Shalev, E.; Malin, P.; Kenedi, C. L.

2010-12-01

The actively producing Puna geothermal system (PGS) is located on the Kilauea East Rift Zone (ERZ), which extends out from the active Kilauea volcano on Hawaii. In the Puna area the rift trend is identified as NE-SW from surface expressions of normal faulting with a corresponding strike; at PGS the surface expression offsets in a left step, but no rift perpendicular faulting is observed. An eight station borehole seismic network has been installed in the area of the geothermal system. Since June 2006, a total of 6162 earthquakes have been located close to or inside the geothermal system. The spread of earthquake locations follows the rift trend, but down rift to the NE of PGS almost no earthquakes are observed. Most earthquakes located within the PGS range between 2-3 km depth. Up rift to the SW of PGS the number of events decreases and the depth range increases to 3-4 km. All initial locations used Hypoinverse71 and showed no trends other than the dominant rift parallel. Double difference relocation of all earthquakes, using both catalog and cross-correlation, identified one large cluster but could not conclusively identify trends within the cluster. A large number of earthquake waveforms showed identifiable shear wave splitting. For five stations out of the six where shear wave splitting was observed, the dominant polarization direction was rift parallel. Two of the five stations also showed a smaller rift perpendicular signal. The sixth station (located close to the area of the rift offset) displayed a N-S polarization, approximately halfway between rift parallel and perpendicular. The shear wave splitting time delays indicate that fracture density is higher at the PGS compared to the surrounding ERZ. Correlation co-efficient clustering with independent P and S wave windows was used to identify clusters based on similar earthquake waveforms. In total, 40 localized clusters containing ten or more events were identified. The largest cluster was located in the production area for the power plant. Most of the clusters had linear features when their Hypoinverse locations were plotted. The concentration of individual linear features was higher in the PGS than the surrounding ERZ. The resolution of the features was resolved further by relocating each individual cluster through the catalog double difference method. Mapping of the linear features showed that a number of the larger features ran rift parallel. However a large number of rift perpendicular features were also identified. In the area where the anomalous (N-S) shear wave polarization was observed, a number of linear features with a similar orientation were identified. We assume that events occurring on the same fracture zone have similar source mechanisms and thus similar waveforms. It is concluded that the linear features identified by earthquake clustering are fracture zones. The orientation and concentration of the fracture zones is consistent with that of the shear wave splitting polarizations.
A fully non-linear multi-species Fokker–Planck–Landau collision operator for simulation of fusion plasma

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hager, Robert, E-mail: rhager@pppl.gov; Yoon, E.S., E-mail: yoone@rpi.edu; Ku, S., E-mail: sku@pppl.gov

2016-06-15

Fusion edge plasmas can be far from thermal equilibrium and require the use of a non-linear collision operator for accurate numerical simulations. In this article, the non-linear single-species Fokker–Planck–Landau collision operator developed by Yoon and Chang (2014) [9] is generalized to include multiple particle species. The finite volume discretization used in this work naturally yields exact conservation of mass, momentum, and energy. The implementation of this new non-linear Fokker–Planck–Landau operator in the gyrokinetic particle-in-cell codes XGC1 and XGCa is described and results of a verification study are discussed. Finally, the numerical techniques that make our non-linear collision operator viable onmore » high-performance computing systems are described, including specialized load balancing algorithms and nested OpenMP parallelization. The collision operator's good weak and strong scaling behavior are shown.« less
A fully non-linear multi-species Fokker–Planck–Landau collision operator for simulation of fusion plasma

DOE PAGES

Hager, Robert; Yoon, E. S.; Ku, S.; ...

2016-04-04

Fusion edge plasmas can be far from thermal equilibrium and require the use of a non-linear collision operator for accurate numerical simulations. The non-linear single-species Fokker–Planck–Landau collision operator developed by Yoon and Chang (2014) [9] is generalized to include multiple particle species. Moreover, the finite volume discretization used in this work naturally yields exact conservation of mass, momentum, and energy. The implementation of this new non-linear Fokker–Planck–Landau operator in the gyrokinetic particle-in-cell codes XGC1 and XGCa is described and results of a verification study are discussed. Finally, the numerical techniques that make our non-linear collision operator viable on high-performance computingmore » systems are described, including specialized load balancing algorithms and nested OpenMP parallelization. As a result, the collision operator's good weak and strong scaling behavior are shown.« less
Imparting Motion to a Test Object Such as a Motor Vehicle in a Controlled Fashion

NASA Technical Reports Server (NTRS)

Southward, Stephen C. (Inventor); Reubush, Chandler (Inventor); Pittman, Bryan (Inventor); Roehrig, Kurt (Inventor); Gerard, Doug (Inventor)

2014-01-01

An apparatus imparts motion to a test object such as a motor vehicle in a controlled fashion. A base has mounted on it a linear electromagnetic motor having a first end and a second end, the first end being connected to the base. A pneumatic cylinder and piston combination have a first end and a second end, the first end connected to the base so that the pneumatic cylinder and piston combination is generally parallel with the linear electromagnetic motor. The second ends of the linear electromagnetic motor and pneumatic cylinder and piston combination being commonly linked to a mount for the test object. A control system for the linear electromagnetic motor and pneumatic cylinder and piston combination drives the pneumatic cylinder and piston combination to support a substantial static load of the test object and the linear electromagnetic motor to impart controlled motion to the test object.
BPF-type region-of-interest reconstruction for parallel translational computed tomography.

PubMed

Wu, Weiwen; Yu, Hengyong; Wang, Shaoyu; Liu, Fenglin

2017-01-01

The objective of this study is to present and test a new ultra-low-cost linear scan based tomography architecture. Similar to linear tomosynthesis, the source and detector are translated in opposite directions and the data acquisition system targets on a region-of-interest (ROI) to acquire data for image reconstruction. This kind of tomographic architecture was named parallel translational computed tomography (PTCT). In previous studies, filtered backprojection (FBP)-type algorithms were developed to reconstruct images from PTCT. However, the reconstructed ROI images from truncated projections have severe truncation artefact. In order to overcome this limitation, we in this study proposed two backprojection filtering (BPF)-type algorithms named MP-BPF and MZ-BPF to reconstruct ROI images from truncated PTCT data. A weight function is constructed to deal with data redundancy for multi-linear translations modes. Extensive numerical simulations are performed to evaluate the proposed MP-BPF and MZ-BPF algorithms for PTCT in fan-beam geometry. Qualitative and quantitative results demonstrate that the proposed BPF-type algorithms cannot only more accurately reconstruct ROI images from truncated projections but also generate high-quality images for the entire image support in some circumstances.
Implementation of a partitioned algorithm for simulation of large CSI problems

NASA Technical Reports Server (NTRS)

Alvin, Kenneth F.; Park, K. C.

1991-01-01

The implementation of a partitioned numerical algorithm for determining the dynamic response of coupled structure/controller/estimator finite-dimensional systems is reviewed. The partitioned approach leads to a set of coupled first and second-order linear differential equations which are numerically integrated with extrapolation and implicit step methods. The present software implementation, ACSIS, utilizes parallel processing techniques at various levels to optimize performance on a shared-memory concurrent/vector processing system. A general procedure for the design of controller and filter gains is also implemented, which utilizes the vibration characteristics of the structure to be solved. Also presented are: example problems; a user's guide to the software; the procedures and algorithm scripts; a stability analysis for the algorithm; and the source code for the parallel implementation.
Early stages of transition in viscosity-stratified channel flow

NASA Astrophysics Data System (ADS)

Govindarajan, Rama; Jose, Sharath; Brandt, Luca

2013-11-01

In parallel shear flows, it is well known that transition to turbulence usually occurs through a subcritical process. In this work we consider a flow through a channel across which there is a linear temperature variation. The temperature gradient leads to a viscosity variation across the channel. A large body of work has been done in the linear regime for this problem, and it has been seen that viscosity stratification can lead to considerable changes in stability and transient growth characteristics. Moreover contradictory effects of introducing a non uniform viscosity in the system have been reported. We conduct a linear stability analysis and direct numerical simulations (DNS) for this system. We show that the optimal initial structures in the viscosity-stratified case, unlike in unstratified flow, do not span the width of the channel, but are focussed near one wall. The nonlinear consequences of the localisation of the structures will be discussed.
A visual parallel-BCI speller based on the time-frequency coding strategy.

PubMed

Xu, Minpeng; Chen, Long; Zhang, Lixin; Qi, Hongzhi; Ma, Lan; Tang, Jiabei; Wan, Baikun; Ming, Dong

2014-04-01

Spelling is one of the most important issues in brain-computer interface (BCI) research. This paper is to develop a visual parallel-BCI speller system based on the time-frequency coding strategy in which the sub-speller switching among four simultaneously presented sub-spellers and the character selection are identified in a parallel mode. The parallel-BCI speller was constituted by four independent P300+SSVEP-B (P300 plus SSVEP blocking) spellers with different flicker frequencies, thereby all characters had a specific time-frequency code. To verify its effectiveness, 11 subjects were involved in the offline and online spellings. A classification strategy was designed to recognize the target character through jointly using the canonical correlation analysis and stepwise linear discriminant analysis. Online spellings showed that the proposed parallel-BCI speller had a high performance, reaching the highest information transfer rate of 67.4 bit min(-1), with an average of 54.0 bit min(-1) and 43.0 bit min(-1) in the three rounds and five rounds, respectively. The results indicated that the proposed parallel-BCI could be effectively controlled by users with attention shifting fluently among the sub-spellers, and highly improved the BCI spelling performance.
Development of parallel line analysis criteria for recombinant adenovirus potency assay and definition of a unit of potency.

PubMed

Ogawa, Yasushi; Fawaz, Farah; Reyes, Candice; Lai, Julie; Pungor, Erno

2007-01-01

Parameter settings of a parallel line analysis procedure were defined by applying statistical analysis procedures to the absorbance data from a cell-based potency bioassay for a recombinant adenovirus, Adenovirus 5 Fibroblast Growth Factor-4 (Ad5FGF-4). The parallel line analysis was performed with a commercially available software, PLA 1.2. The software performs Dixon outlier test on replicates of the absorbance data, performs linear regression analysis to define linear region of the absorbance data, and tests parallelism between the linear regions of standard and sample. Width of Fiducial limit, expressed as a percent of the measured potency, was developed as a criterion for rejection of the assay data and to significantly improve the reliability of the assay results. With the linear range-finding criteria of the software set to a minimum of 5 consecutive dilutions and best statistical outcome, and in combination with the Fiducial limit width acceptance criterion of <135%, 13% of the assay results were rejected. With these criteria applied, the assay was found to be linear over the range of 0.25 to 4 relative potency units, defined as the potency of the sample normalized to the potency of Ad5FGF-4 standard containing 6 x 10(6) adenovirus particles/mL. The overall precision of the assay was estimated to be 52%. Without the application of Fiducial limit width criterion, the assay results were not linear over the range, and an overall precision of 76% was calculated from the data. An absolute unit of potency for the assay was defined by using the parallel line analysis procedure as the amount of Ad5FGF-4 that results in an absorbance value that is 121% of the average absorbance readings of the wells containing cells not infected with the adenovirus.
Multi-threaded parallel simulation of non-local non-linear problems in ultrashort laser pulse propagation in the presence of plasma

NASA Astrophysics Data System (ADS)

Baregheh, Mandana; Mezentsev, Vladimir; Schmitz, Holger

2011-06-01

We describe a parallel multi-threaded approach for high performance modelling of wide class of phenomena in ultrafast nonlinear optics. Specific implementation has been performed using the highly parallel capabilities of a programmable graphics processor.
Efficient multitasking of Choleski matrix factorization on CRAY supercomputers

NASA Technical Reports Server (NTRS)

Overman, Andrea L.; Poole, Eugene L.

1991-01-01

A Choleski method is described and used to solve linear systems of equations that arise in large scale structural analysis. The method uses a novel variable-band storage scheme and is structured to exploit fast local memory caches while minimizing data access delays between main memory and vector registers. Several parallel implementations of this method are described for the CRAY-2 and CRAY Y-MP computers demonstrating the use of microtasking and autotasking directives. A portable parallel language, FORCE, is used for comparison with the microtasked and autotasked implementations. Results are presented comparing the matrix factorization times for three representative structural analysis problems from runs made in both dedicated and multi-user modes on both computers. CPU and wall clock timings are given for the parallel implementations and are compared to single processor timings of the same algorithm.
Review and analysis of dense linear system solver package for distributed memory machines

NASA Technical Reports Server (NTRS)

Narang, H. N.

1993-01-01

A dense linear system solver package recently developed at the University of Texas at Austin for distributed memory machine (e.g. Intel Paragon) has been reviewed and analyzed. The package contains about 45 software routines, some written in FORTRAN, and some in C-language, and forms the basis for parallel/distributed solutions of systems of linear equations encountered in many problems of scientific and engineering nature. The package, being studied by the Computer Applications Branch of the Analysis and Computation Division, may provide a significant computational resource for NASA scientists and engineers in parallel/distributed computing. Since the package is new and not well tested or documented, many of its underlying concepts and implementations were unclear; our task was to review, analyze, and critique the package as a step in the process that will enable scientists and engineers to apply it to the solution of their problems. All routines in the package were reviewed and analyzed. Underlying theory or concepts which exist in the form of published papers or technical reports, or memos, were either obtained from the author, or from the scientific literature; and general algorithms, explanations, examples, and critiques have been provided to explain the workings of these programs. Wherever the things were still unclear, communications were made with the developer (author), either by telephone or by electronic mail, to understand the workings of the routines. Whenever possible, tests were made to verify the concepts and logic employed in their implementations. A detailed report is being separately documented to explain the workings of these routines.
Optical Data Processing for Missile Guidance.

DTIC Science & Technology

1984-11-21

and architectures for back -substitution and the solution of triangular systems of LAEs (linear algebraic equations). Most recently, a parallel QR...Calculation of I1 is quite difficult since the o T exact Z matrix is quite ill-conditioned. The two VC choices considered in our system are E - I and E I - 0...shown in fig. 1. It These operations are most commonly referred to as shows the ship in water with a sky and shoreline back - segmentation and also
Negative base encoding in optical linear algebra processors

NASA Technical Reports Server (NTRS)

Perlee, C.; Casasent, D.

1986-01-01

In the digital multiplication by analog convolution algorithm, the bits of two encoded numbers are convolved to form the product of the two numbers in mixed binary representation; this output can be easily converted to binary. Attention is presently given to negative base encoding, treating base -2 initially, and then showing that the negative base system can be readily extended to any radix. In general, negative base encoding in optical linear algebra processors represents a more efficient technique than either sign magnitude or 2's complement encoding, when the additions of digitally encoded products are performed in parallel.
Efficient parallel linear scaling construction of the density matrix for Born-Oppenheimer molecular dynamics.

PubMed

Mniszewski, S M; Cawkwell, M J; Wall, M E; Mohd-Yusof, J; Bock, N; Germann, T C; Niklasson, A M N

2015-10-13

We present an algorithm for the calculation of the density matrix that for insulators scales linearly with system size and parallelizes efficiently on multicore, shared memory platforms with small and controllable numerical errors. The algorithm is based on an implementation of the second-order spectral projection (SP2) algorithm [ Niklasson, A. M. N. Phys. Rev. B 2002 , 66 , 155115 ] in sparse matrix algebra with the ELLPACK-R data format. We illustrate the performance of the algorithm within self-consistent tight binding theory by total energy calculations of gas phase poly(ethylene) molecules and periodic liquid water systems containing up to 15,000 atoms on up to 16 CPU cores. We consider algorithm-specific performance aspects, such as local vs nonlocal memory access and the degree of matrix sparsity. Comparisons to sparse matrix algebra implementations using off-the-shelf libraries on multicore CPUs, graphics processing units (GPUs), and the Intel many integrated core (MIC) architecture are also presented. The accuracy and stability of the algorithm are illustrated with long duration Born-Oppenheimer molecular dynamics simulations of 1000 water molecules and a 303 atom Trp cage protein solvated by 2682 water molecules.
Full Parallel Implementation of an All-Electron Four-Component Dirac-Kohn-Sham Program.

PubMed

Rampino, Sergio; Belpassi, Leonardo; Tarantelli, Francesco; Storchi, Loriano

2014-09-09

A full distributed-memory implementation of the Dirac-Kohn-Sham (DKS) module of the program BERTHA (Belpassi et al., Phys. Chem. Chem. Phys. 2011, 13, 12368-12394) is presented, where the self-consistent field (SCF) procedure is replicated on all the parallel processes, each process working on subsets of the global matrices. The key feature of the implementation is an efficient procedure for switching between two matrix distribution schemes, one (integral-driven) optimal for the parallel computation of the matrix elements and another (block-cyclic) optimal for the parallel linear algebra operations. This approach, making both CPU-time and memory scalable with the number of processors used, virtually overcomes at once both time and memory barriers associated with DKS calculations. Performance, portability, and numerical stability of the code are illustrated on the basis of test calculations on three gold clusters of increasing size, an organometallic compound, and a perovskite model. The calculations are performed on a Beowulf and a BlueGene/Q system.

Bit error rate tester using fast parallel generation of linear recurring sequences

DOEpatents

Pierson, Lyndon G.; Witzke, Edward L.; Maestas, Joseph H.

2003-05-06

A fast method for generating linear recurring sequences by parallel linear recurring sequence generators (LRSGs) with a feedback circuit optimized to balance minimum propagation delay against maximal sequence period. Parallel generation of linear recurring sequences requires decimating the sequence (creating small contiguous sections of the sequence in each LRSG). A companion matrix form is selected depending on whether the LFSR is right-shifting or left-shifting. The companion matrix is completed by selecting a primitive irreducible polynomial with 1's most closely grouped in a corner of the companion matrix. A decimation matrix is created by raising the companion matrix to the (n*k).sup.th power, where k is the number of parallel LRSGs and n is the number of bits to be generated at a time by each LRSG. Companion matrices with 1's closely grouped in a corner will yield sparse decimation matrices. A feedback circuit comprised of XOR logic gates implements the decimation matrix in hardware. Sparse decimation matrices can be implemented with minimum number of XOR gates, and therefore a minimum propagation delay through the feedback circuit. The LRSG of the invention is particularly well suited to use as a bit error rate tester on high speed communication lines because it permits the receiver to synchronize to the transmitted pattern within 2n bits.
Determination of statistics for any rotation of axes of a bivariate normal elliptical distribution. [of wind vector components

NASA Technical Reports Server (NTRS)

Falls, L. W.; Crutcher, H. L.

1976-01-01

Transformation of statistics from a dimensional set to another dimensional set involves linear functions of the original set of statistics. Similarly, linear functions will transform statistics within a dimensional set such that the new statistics are relevant to a new set of coordinate axes. A restricted case of the latter is the rotation of axes in a coordinate system involving any two correlated random variables. A special case is the transformation for horizontal wind distributions. Wind statistics are usually provided in terms of wind speed and direction (measured clockwise from north) or in east-west and north-south components. A direct application of this technique allows the determination of appropriate wind statistics parallel and normal to any preselected flight path of a space vehicle. Among the constraints for launching space vehicles are critical values selected from the distribution of the expected winds parallel to and normal to the flight path. These procedures are applied to space vehicle launches at Cape Kennedy, Florida.
Final Report: Subcontract B623868 Algebraic Multigrid solvers for coupled PDE systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Brannick, J.

The Pennsylvania State University (“Subcontractor”) continued to work on the design of algebraic multigrid solvers for coupled systems of partial differential equations (PDEs) arising in numerical modeling of various applications, with a main focus on solving the Dirac equation arising in Quantum Chromodynamics (QCD). The goal of the proposed work was to develop combined geometric and algebraic multilevel solvers that are robust and lend themselves to efficient implementation on massively parallel heterogeneous computers for these QCD systems. The research in these areas built on previous works, focusing on the following three topics: (1) the development of parallel full-multigrid (PFMG) andmore » non-Galerkin coarsening techniques in this frame work for solving the Wilson Dirac system; (2) the use of these same Wilson MG solvers for preconditioning the Overlap and Domain Wall formulations of the Dirac equation; and (3) the design and analysis of algebraic coarsening algorithms for coupled PDE systems including Stokes equation, Maxwell equation and linear elasticity.« less
Parallel processor for real-time structural control

DOE Office of Scientific and Technical Information (OSTI.GOV)

Tise, B.L.

1992-01-01

A parallel processor that is optimized for real-time linear control has been developed. This modular system consists of A/D modules, D/A modules, and floating-point processor modules. The scalable processor uses up to 1,000 Motorola DSP96002 floating-point processors for a peak computational rate of 60 GFLOPS. Sampling rates up to 625 kHz are supported by this analog-in to analog-out controller. The high processing rate and parallel architecture make this processor suitable for computing state-space equations and other multiply/accumulate-intensive digital filters. Processor features include 14-bit conversion devices, low input-output latency, 240 Mbyte/s synchronous backplane bus, low-skew clock distribution circuit, VME connection tomore » host computer, parallelizing code generator, and look-up-tables for actuator linearization. This processor was designed primarily for experiments in structural control. The A/D modules sample sensors mounted on the structure and the floating-point processor modules compute the outputs using the programmed control equations. The outputs are sent through the D/A module to the power amps used to drive the structure's actuators. The host computer is a Sun workstation. An Open Windows-based control panel is provided to facilitate data transfer to and from the processor, as well as to control the operating mode of the processor. A diagnostic mode is provided to allow stimulation of the structure and acquisition of the structural response via sensor inputs.« less
Parallel implementation of an adaptive and parameter-free N-body integrator

NASA Astrophysics Data System (ADS)

Pruett, C. David; Ingham, William H.; Herman, Ralph D.

2011-05-01

Previously, Pruett et al. (2003) [3] described an N-body integrator of arbitrarily high order M with an asymptotic operation count of O(MN). The algorithm's structure lends itself readily to data parallelization, which we document and demonstrate here in the integration of point-mass systems subject to Newtonian gravitation. High order is shown to benefit parallel efficiency. The resulting N-body integrator is robust, parameter-free, highly accurate, and adaptive in both time-step and order. Moreover, it exhibits linear speedup on distributed parallel processors, provided that each processor is assigned at least a handful of bodies. Program summaryProgram title: PNB.f90 Catalogue identifier: AEIK_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEIK_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC license, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 3052 No. of bytes in distributed program, including test data, etc.: 68 600 Distribution format: tar.gz Programming language: Fortran 90 and OpenMPI Computer: All shared or distributed memory parallel processors Operating system: Unix/Linux Has the code been vectorized or parallelized?: The code has been parallelized but has not been explicitly vectorized. RAM: Dependent upon N Classification: 4.3, 4.12, 6.5 Nature of problem: High accuracy numerical evaluation of trajectories of N point masses each subject to Newtonian gravitation. Solution method: Parallel and adaptive extrapolation in time via power series of arbitrary degree. Running time: 5.1 s for the demo program supplied with the package.
Realization of preconditioned Lanczos and conjugate gradient algorithms on optical linear algebra processors.

PubMed

Ghosh, A

1988-08-01

Lanczos and conjugate gradient algorithms are important in computational linear algebra. In this paper, a parallel pipelined realization of these algorithms on a ring of optical linear algebra processors is described. The flow of data is designed to minimize the idle times of the optical multiprocessor and the redundancy of computations. The effects of optical round-off errors on the solutions obtained by the optical Lanczos and conjugate gradient algorithms are analyzed, and it is shown that optical preconditioning can improve the accuracy of these algorithms substantially. Algorithms for optical preconditioning and results of numerical experiments on solving linear systems of equations arising from partial differential equations are discussed. Since the Lanczos algorithm is used mostly with sparse matrices, a folded storage scheme to represent sparse matrices on spatial light modulators is also described.
Performance of a parallel algebraic multilevel preconditioner for stabilized finite element semiconductor device modeling

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Paul T.; Shadid, John N.; Sala, Marzio

In this study results are presented for the large-scale parallel performance of an algebraic multilevel preconditioner for solution of the drift-diffusion model for semiconductor devices. The preconditioner is the key numerical procedure determining the robustness, efficiency and scalability of the fully-coupled Newton-Krylov based, nonlinear solution method that is employed for this system of equations. The coupled system is comprised of a source term dominated Poisson equation for the electric potential, and two convection-diffusion-reaction type equations for the electron and hole concentration. The governing PDEs are discretized in space by a stabilized finite element method. Solution of the discrete system ismore » obtained through a fully-implicit time integrator, a fully-coupled Newton-based nonlinear solver, and a restarted GMRES Krylov linear system solver. The algebraic multilevel preconditioner is based on an aggressive coarsening graph partitioning of the nonzero block structure of the Jacobian matrix. Representative performance results are presented for various choices of multigrid V-cycles and W-cycles and parameter variations for smoothers based on incomplete factorizations. Parallel scalability results are presented for solution of up to 10{sup 8} unknowns on 4096 processors of a Cray XT3/4 and an IBM POWER eServer system.« less
Modified Chebyshev Picard Iteration for Efficient Numerical Integration of Ordinary Differential Equations

NASA Astrophysics Data System (ADS)

Macomber, B.; Woollands, R. M.; Probe, A.; Younes, A.; Bai, X.; Junkins, J.

2013-09-01

Modified Chebyshev Picard Iteration (MCPI) is an iterative numerical method for approximating solutions of linear or non-linear Ordinary Differential Equations (ODEs) to obtain time histories of system state trajectories. Unlike other step-by-step differential equation solvers, the Runge-Kutta family of numerical integrators for example, MCPI approximates long arcs of the state trajectory with an iterative path approximation approach, and is ideally suited to parallel computation. Orthogonal Chebyshev Polynomials are used as basis functions during each path iteration; the integrations of the Picard iteration are then done analytically. Due to the orthogonality of the Chebyshev basis functions, the least square approximations are computed without matrix inversion; the coefficients are computed robustly from discrete inner products. As a consequence of discrete sampling and weighting adopted for the inner product definition, Runge phenomena errors are minimized near the ends of the approximation intervals. The MCPI algorithm utilizes a vector-matrix framework for computational efficiency. Additionally, all Chebyshev coefficients and integrand function evaluations are independent, meaning they can be simultaneously computed in parallel for further decreased computational cost. Over an order of magnitude speedup from traditional methods is achieved in serial processing, and an additional order of magnitude is achievable in parallel architectures. This paper presents a new MCPI library, a modular toolset designed to allow MCPI to be easily applied to a wide variety of ODE systems. Library users will not have to concern themselves with the underlying mathematics behind the MCPI method. Inputs are the boundary conditions of the dynamical system, the integrand function governing system behavior, and the desired time interval of integration, and the output is a time history of the system states over the interval of interest. Examples from the field of astrodynamics are presented to compare the output from the MCPI library to current state-of-practice numerical integration methods. It is shown that MCPI is capable of out-performing the state-of-practice in terms of computational cost and accuracy.
On the equivalence of Gaussian elimination and Gauss-Jordan reduction in solving linear equations

NASA Technical Reports Server (NTRS)

Tsao, Nai-Kuan

1989-01-01

A novel general approach to round-off error analysis using the error complexity concepts is described. This is applied to the analysis of the Gaussian Elimination and Gauss-Jordan scheme for solving linear equations. The results show that the two algorithms are equivalent in terms of our error complexity measures. Thus the inherently parallel Gauss-Jordan scheme can be implemented with confidence if parallel computers are available.
Increased Energy Delivery for Parallel Battery Packs with No Regulated Bus

NASA Astrophysics Data System (ADS)

Hsu, Chung-Ti

In this dissertation, a new approach to paralleling different battery types is presented. A method for controlling charging/discharging of different battery packs by using low-cost bi-directional switches instead of DC-DC converters is proposed. The proposed system architecture, algorithms, and control techniques allow batteries with different chemistry, voltage, and SOC to be properly charged and discharged in parallel without causing safety problems. The physical design and cost for the energy management system is substantially reduced. Additionally, specific types of failures in the maximum power point tracking (MPPT) in a photovoltaic (PV) system when tracking only the load current of a DC-DC converter are analyzed. The periodic nonlinear load current will lead MPPT realized by the conventional perturb and observe (P&O) algorithm to be problematic. A modified MPPT algorithm is proposed and it still only requires typically measured signals, yet is suitable for both linear and periodic nonlinear loads. Moreover, for a modular DC-DC converter using several converters in parallel, the input power from PV panels is processed and distributed at the module level. Methods for properly implementing distributed MPPT are studied. A new approach to efficient MPPT under partial shading conditions is presented. The power stage architecture achieves fast input current change rate by combining a current-adjustable converter with a few converters operating at a constant current.
Many-to-one form-to-function mapping weakens parallel morphological evolution.

PubMed

Thompson, Cole J; Ahmed, Newaz I; Veen, Thor; Peichel, Catherine L; Hendry, Andrew P; Bolnick, Daniel I; Stuart, Yoel E

2017-11-01

Evolutionary ecologists aim to explain and predict evolutionary change under different selective regimes. Theory suggests that such evolutionary prediction should be more difficult for biomechanical systems in which different trait combinations generate the same functional output: "many-to-one mapping." Many-to-one mapping of phenotype to function enables multiple morphological solutions to meet the same adaptive challenges. Therefore, many-to-one mapping should undermine parallel morphological evolution, and hence evolutionary predictability, even when selection pressures are shared among populations. Studying 16 replicate pairs of lake- and stream-adapted threespine stickleback (Gasterosteus aculeatus), we quantified three parts of the teleost feeding apparatus and used biomechanical models to calculate their expected functional outputs. The three feeding structures differed in their form-to-function relationship from one-to-one (lower jaw lever ratio) to increasingly many-to-one (buccal suction index, opercular 4-bar linkage). We tested for (1) weaker linear correlations between phenotype and calculated function, and (2) less parallel evolution across lake-stream pairs, in the many-to-one systems relative to the one-to-one system. We confirm both predictions, thus supporting the theoretical expectation that increasing many-to-one mapping undermines parallel evolution. Therefore, sole consideration of morphological variation within and among populations might not serve as a proxy for functional variation when multiple adaptive trait combinations exist. © 2017 The Author(s). Evolution © 2017 The Society for the Study of Evolution.
An Efficient Objective Analysis System for Parallel Computers

NASA Technical Reports Server (NTRS)

Stobie, J.

1999-01-01

A new atmospheric objective analysis system designed for parallel computers will be described. The system can produce a global analysis (on a 1 X 1 lat-lon grid with 18 levels of heights and winds and 10 levels of moisture) using 120,000 observations in 17 minutes on 32 CPUs (SGI Origin 2000). No special parallel code is needed (e.g. MPI or multitasking) and the 32 CPUs do not have to be on the same platform. The system is totally portable and can run on several different architectures at once. In addition, the system can easily scale up to 100 or more CPUS. This will allow for much higher resolution and significant increases in input data. The system scales linearly as the number of observations and the number of grid points. The cost overhead in going from 1 to 32 CPUs is 18%. In addition, the analysis results are identical regardless of the number of processors used. This system has all the characteristics of optimal interpolation, combining detailed instrument and first guess error statistics to produce the best estimate of the atmospheric state. Static tests with a 2 X 2.5 resolution version of this system showed it's analysis increments are comparable to the latest NASA operational system including maintenance of mass-wind balance. Results from several months of cycling test in the Goddard EOS Data Assimilation System (GEOS DAS) show this new analysis retains the same level of agreement between the first guess and observations (O-F statistics) as the current operational system.
BioFVM: an efficient, parallelized diffusive transport solver for 3-D biological simulations

PubMed Central

Ghaffarizadeh, Ahmadreza; Friedman, Samuel H.; Macklin, Paul

2016-01-01

Motivation: Computational models of multicellular systems require solving systems of PDEs for release, uptake, decay and diffusion of multiple substrates in 3D, particularly when incorporating the impact of drugs, growth substrates and signaling factors on cell receptors and subcellular systems biology. Results: We introduce BioFVM, a diffusive transport solver tailored to biological problems. BioFVM can simulate release and uptake of many substrates by cell and bulk sources, diffusion and decay in large 3D domains. It has been parallelized with OpenMP, allowing efficient simulations on desktop workstations or single supercomputer nodes. The code is stable even for large time steps, with linear computational cost scalings. Solutions are first-order accurate in time and second-order accurate in space. The code can be run by itself or as part of a larger simulator. Availability and implementation: BioFVM is written in C ++ with parallelization in OpenMP. It is maintained and available for download at http://BioFVM.MathCancer.org and http://BioFVM.sf.net under the Apache License (v2.0). Contact: paul.macklin@usc.edu. Supplementary information: Supplementary data are available at Bioinformatics online. PMID:26656933
Astrophysical N-body Simulations Using Hierarchical Tree Data Structures

NASA Astrophysics Data System (ADS)

Warren, M. S.; Salmon, J. K.

The authors report on recent large astrophysical N-body simulations executed on the Intel Touchstone Delta system. They review the astrophysical motivation and the numerical techniques and discuss steps taken to parallelize these simulations. The methods scale as O(N log N), for large values of N, and also scale linearly with the number of processors. The performance sustained for a duration of 67 h, was between 5.1 and 5.4 Gflop/s on a 512-processor system.
Digital signal processing and control and estimation theory -- Points of tangency, area of intersection, and parallel directions

NASA Technical Reports Server (NTRS)

Willsky, A. S.

1976-01-01

A number of current research directions in the fields of digital signal processing and modern control and estimation theory were studied. Topics such as stability theory, linear prediction and parameter identification, system analysis and implementation, two-dimensional filtering, decentralized control and estimation, image processing, and nonlinear system theory were examined in order to uncover some of the basic similarities and differences in the goals, techniques, and philosophy of the two disciplines. An extensive bibliography is included.
Architecting the Finite Element Method Pipeline for the GPU.

PubMed

Fu, Zhisong; Lewis, T James; Kirby, Robert M; Whitaker, Ross T

2014-02-01

The finite element method (FEM) is a widely employed numerical technique for approximating the solution of partial differential equations (PDEs) in various science and engineering applications. Many of these applications benefit from fast execution of the FEM pipeline. One way to accelerate the FEM pipeline is by exploiting advances in modern computational hardware, such as the many-core streaming processors like the graphical processing unit (GPU). In this paper, we present the algorithms and data-structures necessary to move the entire FEM pipeline to the GPU. First we propose an efficient GPU-based algorithm to generate local element information and to assemble the global linear system associated with the FEM discretization of an elliptic PDE. To solve the corresponding linear system efficiently on the GPU, we implement a conjugate gradient method preconditioned with a geometry-informed algebraic multi-grid (AMG) method preconditioner. We propose a new fine-grained parallelism strategy, a corresponding multigrid cycling stage and efficient data mapping to the many-core architecture of GPU. Comparison of our on-GPU assembly versus a traditional serial implementation on the CPU achieves up to an 87 × speedup. Focusing on the linear system solver alone, we achieve a speedup of up to 51 × versus use of a comparable state-of-the-art serial CPU linear system solver. Furthermore, the method compares favorably with other GPU-based, sparse, linear solvers.
A parallel-vector algorithm for rapid structural analysis on high-performance computers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.

1990-01-01

A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the 'loop unrolling' technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large-scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.
A parallel-vector algorithm for rapid structural analysis on high-performance computers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.

1990-01-01

A fast, accurate Choleski method for the solution of symmetric systems of linear equations is presented. This direct method is based on a variable-band storage scheme and takes advantage of column heights to reduce the number of operations in the Choleski factorization. The method employs parallel computation in the outermost DO-loop and vector computation via the loop unrolling technique in the innermost DO-loop. The method avoids computations with zeros outside the column heights, and as an option, zeros inside the band. The close relationship between Choleski and Gauss elimination methods is examined. The minor changes required to convert the Choleski code to a Gauss code to solve non-positive-definite symmetric systems of equations are identified. The results for two large scale structural analyses performed on supercomputers, demonstrate the accuracy and speed of the method.
Final Report, DE-FG01-06ER25718 Domain Decomposition and Parallel Computing

DOE Office of Scientific and Technical Information (OSTI.GOV)

Widlund, Olof B.

2015-06-09

The goal of this project is to develop and improve domain decomposition algorithms for a variety of partial differential equations such as those of linear elasticity and electro-magnetics.These iterative methods are designed for massively parallel computing systems and allow the fast solution of the very large systems of algebraic equations that arise in large scale and complicated simulations. A special emphasis is placed on problems arising from Maxwell's equation. The approximate solvers, the preconditioners, are combined with the conjugate gradient method and must always include a solver of a coarse model in order to have a performance which is independentmore » of the number of processors used in the computer simulation. A recent development allows for an adaptive construction of this coarse component of the preconditioner.« less
Evidence of a New Instability in Gyrokinetic Simulations of LAPD Plasmas

NASA Astrophysics Data System (ADS)

Terry, P. W.; Pueschel, M. J.; Rossi, G.; Jenko, F.; Told, D.; Carter, T. A.

2015-11-01

Recent experiments at the LArge Plasma Device (LAPD) have focused on structure formation driven by density and temperature gradients. A central difference relative to typical, tokamak-like plasmas stems from the linear geometry and absence of background magnetic shear. At sufficiently high β, strong excitation of parallel (compressional) magnetic fluctuations was observed. Here, linear and nonlinear simulations with the Gene code are used to demonstrate that these findings can be explained through the linear excitation of a Gradient-driven Drift Coupling mode (GDC). This recently-discovered instability, unlike other drift waves, relies on the grad-B drift due to parallel magnetic fluctuations in lieu of a parallel electron response, and can be driven by density or temperature gradients. The linear properties of the GDC for LAPD parameters are studied in detail, and the corresponding turbulence is investigated. It is found that, despite the very large collisionality in the experiment, many properties are recovered fairly well in the simulations. In addition to confirming the existence of the GDC, this opens up interesting questions regarding GDC activity in astrophysical and space plasmas. Supported by USDOE.

Phase space analysis in anisotropic optical systems

NASA Technical Reports Server (NTRS)

Rivera, Ana Leonor; Chumakov, Sergey M.; Wolf, Kurt Bernardo

1995-01-01

From the minimal action principle follows the Hamilton equations of evolution for geometric optical rays in anisotropic media. As in classical mechanics of velocity-dependent potentials, the velocity and the canonical momentum are not parallel, but differ by an anisotropy vector potential, similar to that of linear electromagnetism. Descartes' well known diagram for refraction is generalized and a factorization theorem holds for interfaces between two anisotropic media.
Analog Microcontroller Model for an Energy Harvesting Round Counter

DTIC Science & Technology

2012-07-01

densities representing the duration of ≥ for all scaled piezo ................................7 1 INTRODUCTION An accurate count...limited surface area available for mounting piezos on the gun system. Figure 1. Equivalent circuit model for a piezoelectric transducer...circuit model for the linear I-V relationships is parallel combination of six stages, each of which is comprised of a series combination of a resistor , DC
Feasibility of Decentralized Linear-Quadratic-Gaussian Control of Autonomous Distributed Spacecraft

NASA Technical Reports Server (NTRS)

Carpenter, J. Russell

1999-01-01

A distributed satellite formation, modeled as an arbitrary number of fully connected nodes in a network, could be controlled using a decentralized controller framework that distributes operations in parallel over the network. For such problems, a solution that minimizes data transmission requirements, in the context of linear-quadratic-Gaussian (LQG) control theory, was given by Speyer. This approach is advantageous because it is non-hierarchical, detected failures gracefully degrade system performance, fewer local computations are required than for a centralized controller, and it is optimal with respect to the standard LQG cost function. Disadvantages of the approach are the need for a fully connected communications network, the total operations performed over all the nodes are greater than for a centralized controller, and the approach is formulated for linear time-invariant systems. To investigate the feasibility of the decentralized approach to satellite formation flying, a simple centralized LQG design for a spacecraft orbit control problem is adapted to the decentralized framework. The simple design uses a fixed reference trajectory (an equatorial, Keplerian, circular orbit), and by appropriate choice of coordinates and measurements is formulated as a linear time-invariant system.
Storm Observations of Persistent Three-Dimensional Shoreline Morphology and Bathymetry Along a Geologically Influenced Shoreface Using X-Band Radar (BASIR)

NASA Astrophysics Data System (ADS)

Brodie, K. L.; McNinch, J. E.

2008-12-01

Accurate predictions of shoreline response to storms are contingent upon coastal-morphodynamic models effectively synthesizing the complex evolving relationships between beach topography, sandbar morphology, nearshore bathymetry, underlying geology, and the nearshore wave-field during storm events. Analysis of "pre" and "post" storm data sets have led to a common theory for event response of the nearshore system: pre-storm three-dimensional bar and shoreline configurations shift to two-dimensional, linear forms post- storm. A lack of data during storms has unfortunately left a gap in our knowledge of how the system explicitly changes during the storm event. This work presents daily observations of the beach and nearshore during high-energy storm events over a spatially extensive field site (order of magnitude: 10 km) using Bar and Swash Imaging Radar (BASIR), a mobile x-band radar system. The field site contains a complexity of features including shore-oblique bars and troughs, heterogeneous sediment, and an erosional hotspot. BASIR data provide observations of the evolution of shoreline and bar morphology, as well as nearshore bathymetry, throughout the storm events. Nearshore bathymetry is calculated using a bathymetry inversion from radar- derived wave celerity measurements. Preliminary results show a relatively stable but non-linear shore-parallel bar and a non-linear shoreline with megacusp and embayment features (order of magnitude: 1 km) that are enhanced during the wave events. Both the shoreline and shore-parallel bar undulate at a similar spatial frequency to the nearshore shore- oblique bar-field. Large-scale shore-oblique bars and troughs remain relatively static in position and morphology throughout the storm events. The persistence of a three-dimensional shoreline, shore-parallel bar, and large-scale shore-oblique bars and troughs, contradicts the idea of event-driven shifts to two- dimensional morphology and suggests that beach and nearshore response to storms may be location specific. We hypothesize that the influence of underlying geology, defined by (1) the introduction of heterogeneous sediment and (2) the possible creation of shore-oblique bars and troughs in the nearshore, may be responsible for the persistence of three-dimensional forms and the associated shoreline hotspots during storm events.
Numerical modelling of series-parallel cooling systems in power plant

NASA Astrophysics Data System (ADS)

Regucki, Paweł; Lewkowicz, Marek; Kucięba, Małgorzata

2017-11-01

The paper presents a mathematical model allowing one to study series-parallel hydraulic systems like, e.g., the cooling system of a power boiler's auxiliary devices or a closed cooling system including condensers and cooling towers. The analytical approach is based on a set of non-linear algebraic equations solved using numerical techniques. As a result of the iterative process, a set of volumetric flow rates of water through all the branches of the investigated hydraulic system is obtained. The calculations indicate the influence of changes in the pipeline's geometrical parameters on the total cooling water flow rate in the analysed installation. Such an approach makes it possible to analyse different variants of the modernization of the studied systems, as well as allowing for the indication of its critical elements. Basing on these results, an investor can choose the optimal variant of the reconstruction of the installation from the economic point of view. As examples of such a calculation, two hydraulic installations are described. One is a boiler auxiliary cooling installation including two screw ash coolers. The other is a closed cooling system consisting of cooling towers and condensers.
A biconjugate gradient type algorithm on massively parallel architectures

NASA Technical Reports Server (NTRS)

Freund, Roland W.; Hochbruck, Marlis

1991-01-01

The biconjugate gradient (BCG) method is the natural generalization of the classical conjugate gradient algorithm for Hermitian positive definite matrices to general non-Hermitian linear systems. Unfortunately, the original BCG algorithm is susceptible to possible breakdowns and numerical instabilities. Recently, Freund and Nachtigal have proposed a novel BCG type approach, the quasi-minimal residual method (QMR), which overcomes the problems of BCG. Here, an implementation is presented of QMR based on an s-step version of the nonsymmetric look-ahead Lanczos algorithm. The main feature of the s-step Lanczos algorithm is that, in general, all inner products, except for one, can be computed in parallel at the end of each block; this is unlike the other standard Lanczos process where inner products are generated sequentially. The resulting implementation of QMR is particularly attractive on massively parallel SIMD architectures, such as the Connection Machine.
Modeling and Control of the Redundant Parallel Adjustment Mechanism on a Deployable Antenna Panel

PubMed Central

Tian, Lili; Bao, Hong; Wang, Meng; Duan, Xuechao

2016-01-01

With the aim of developing multiple input and multiple output (MIMO) coupling systems with a redundant parallel adjustment mechanism on the deployable antenna panel, a structural control integrated design methodology is proposed in this paper. Firstly, the modal information from the finite element model of the structure of the antenna panel is extracted, and then the mathematical model is established with the Hamilton principle; Secondly, the discrete Linear Quadratic Regulator (LQR) controller is added to the model in order to control the actuators and adjust the shape of the panel. Finally, the engineering practicality of the modeling and control method based on finite element analysis simulation is verified. PMID:27706076
Image sensor with high dynamic range linear output

NASA Technical Reports Server (NTRS)

Yadid-Pecht, Orly (Inventor); Fossum, Eric R. (Inventor)

2007-01-01

Designs and operational methods to increase the dynamic range of image sensors and APS devices in particular by achieving more than one integration times for each pixel thereof. An APS system with more than one column-parallel signal chains for readout are described for maintaining a high frame rate in readout. Each active pixel is sampled for multiple times during a single frame readout, thus resulting in multiple integration times. The operation methods can also be used to obtain multiple integration times for each pixel with an APS design having a single column-parallel signal chain for readout. Furthermore, analog-to-digital conversion of high speed and high resolution can be implemented.
Fast, exact k-space sample density compensation for trajectories composed of rotationally symmetric segments, and the SNR-optimized image reconstruction from non-Cartesian samples.

PubMed

Mitsouras, Dimitris; Mulkern, Robert V; Rybicki, Frank J

2008-08-01

A recently developed method for exact density compensation of non uniformly arranged samples relies on the analytically known cross-correlations of Fourier basis functions corresponding to the traced k-space trajectory. This method produces a linear system whose solution represents compensated samples that normalize the contribution of each independent element of information that can be expressed by the underlying trajectory. Unfortunately, linear system-based density compensation approaches quickly become computationally demanding with increasing number of samples (i.e., image resolution). Here, it is shown that when a trajectory is composed of rotationally symmetric interleaves, such as spiral and PROPELLER trajectories, this cross-correlations method leads to a highly simplified system of equations. Specifically, it is shown that the system matrix is circulant block-Toeplitz so that the linear system is easily block-diagonalized. The method is described and demonstrated for 32-way interleaved spiral trajectories designed for 256 image matrices; samples are compensated non iteratively in a few seconds by solving the small independent block-diagonalized linear systems in parallel. Because the method is exact and considers all the interactions between all acquired samples, up to a 10% reduction in reconstruction error concurrently with an up to 30% increase in signal to noise ratio are achieved compared to standard density compensation methods. (c) 2008 Wiley-Liss, Inc.
Nonlinear study of the parallel velocity/tearing instability using an implicit, nonlinear resistive MHD solver

NASA Astrophysics Data System (ADS)

Chacon, L.; Finn, J. M.; Knoll, D. A.

2000-10-01

Recently, a new parallel velocity instability has been found.(J. M. Finn, Phys. Plasmas), 2, 12 (1995) This mode is a tearing mode driven unstable by curvature effects and sound wave coupling in the presence of parallel velocity shear. Under such conditions, linear theory predicts that tearing instabilities will grow even in situations in which the classical tearing mode is stable. This could then be a viable seed mechanism for the neoclassical tearing mode, and hence a non-linear study is of interest. Here, the linear and non-linear stages of this instability are explored using a fully implicit, fully nonlinear 2D reduced resistive MHD code,(L. Chacon et al), ``Implicit, Jacobian-free Newton-Krylov 2D reduced resistive MHD nonlinear solver,'' submitted to J. Comput. Phys. (2000) including viscosity and particle transport effects. The nonlinear implicit time integration is performed using the Newton-Raphson iterative algorithm. Krylov iterative techniques are employed for the required algebraic matrix inversions, implemented Jacobian-free (i.e., without ever forming and storing the Jacobian matrix), and preconditioned with a ``physics-based'' preconditioner. Nonlinear results indicate that, for large total plasma beta and large parallel velocity shear, the instability results in the generation of large poloidal shear flows and large magnetic islands even in regimes when the classical tearing mode is absolutely stable. For small viscosity, the time asymptotic state can be turbulent.
An intangible energy in the functioning biosystem. II: Useful parallels with circuit theory and with non-linear optics.

PubMed

Reid, B L

1995-06-01

The argument is developed that a structure and function already exists in selected inanimate systems for an intangible energy dissipating these systems and that, in so doing, this energy exhibits certain properties, readily recognised in the functioning biosystem. The central thesis is that, during dissipation, the structure of the biosystem affords opportunity for an enhanced display of these properties, so that this structure can be rationally recognised as obligatory in the transition, inanimate to animate matter. The systems chosen are those of reactance in linear circuit theory of electronics, and some recent developments in non-linear optics, both of which rely on imaginary or quantal force to display observable effects. Discussion occurs on the fashion which the development of a statistical formalism as a basis for the study of squeezed states of light in these non-linear systems, has, at the same time, overcome a long standing veto on the practical use of quantal energy associated with the Uncertainty Principle of Heisenberg. These ideas are used to vindicate the suggestion that a theoretical basis is presently available for an engineering type approach, toward an intangible force as it exists in the biosystem. The origins and properties of such a force continue to be considered by many as immersed in mysticism.
Parallel algorithms for computation of the manipulator inertia matrix

NASA Technical Reports Server (NTRS)

Amin-Javaheri, Masoud; Orin, David E.

1989-01-01

The development of an O(log2N) parallel algorithm for the manipulator inertia matrix is presented. It is based on the most efficient serial algorithm which uses the composite rigid body method. Recursive doubling is used to reformulate the linear recurrence equations which are required to compute the diagonal elements of the matrix. It results in O(log2N) levels of computation. Computation of the off-diagonal elements involves N linear recurrences of varying-size and a new method, which avoids redundant computation of position and orientation transforms for the manipulator, is developed. The O(log2N) algorithm is presented in both equation and graphic forms which clearly show the parallelism inherent in the algorithm.
Creating Very True Quantum Algorithms for Quantum Energy Based Computing

NASA Astrophysics Data System (ADS)

Nagata, Koji; Nakamura, Tadao; Geurdes, Han; Batle, Josep; Abdalla, Soliman; Farouk, Ahmed; Diep, Do Ngoc

2018-04-01

An interpretation of quantum mechanics is discussed. It is assumed that quantum is energy. An algorithm by means of the energy interpretation is discussed. An algorithm, based on the energy interpretation, for fast determining a homogeneous linear function f( x) := s. x = s 1 x 1 + s 2 x 2 + ⋯ + s N x N is proposed. Here x = ( x 1, … , x N ), x j ∈ R and the coefficients s = ( s 1, … , s N ), s j ∈ N. Given the interpolation values (f(1), f(2),...,f(N))=ěc {y}, the unknown coefficients s = (s1(ěc {y}),\\dots , sN(ěc {y})) of the linear function shall be determined, simultaneously. The speed of determining the values is shown to outperform the classical case by a factor of N. Our method is based on the generalized Bernstein-Vazirani algorithm to qudit systems. Next, by using M parallel quantum systems, M homogeneous linear functions are determined, simultaneously. The speed of obtaining the set of M homogeneous linear functions is shown to outperform the classical case by a factor of N × M.
Creating Very True Quantum Algorithms for Quantum Energy Based Computing

NASA Astrophysics Data System (ADS)

Nagata, Koji; Nakamura, Tadao; Geurdes, Han; Batle, Josep; Abdalla, Soliman; Farouk, Ahmed; Diep, Do Ngoc

2017-12-01

An interpretation of quantum mechanics is discussed. It is assumed that quantum is energy. An algorithm by means of the energy interpretation is discussed. An algorithm, based on the energy interpretation, for fast determining a homogeneous linear function f(x) := s.x = s 1 x 1 + s 2 x 2 + ⋯ + s N x N is proposed. Here x = (x 1, … , x N ), x j ∈ R and the coefficients s = (s 1, … , s N ), s j ∈ N. Given the interpolation values (f(1), f(2),...,f(N))=ěc {y}, the unknown coefficients s = (s1(ěc {y}),\\dots , sN(ěc {y})) of the linear function shall be determined, simultaneously. The speed of determining the values is shown to outperform the classical case by a factor of N. Our method is based on the generalized Bernstein-Vazirani algorithm to qudit systems. Next, by using M parallel quantum systems, M homogeneous linear functions are determined, simultaneously. The speed of obtaining the set of M homogeneous linear functions is shown to outperform the classical case by a factor of N × M.
A spectral analysis of the domain decomposed Monte Carlo method for linear systems

DOE PAGES

Slattery, Stuart R.; Evans, Thomas M.; Wilson, Paul P. H.

2015-09-08

The domain decomposed behavior of the adjoint Neumann-Ulam Monte Carlo method for solving linear systems is analyzed using the spectral properties of the linear oper- ator. Relationships for the average length of the adjoint random walks, a measure of convergence speed and serial performance, are made with respect to the eigenvalues of the linear operator. In addition, relationships for the effective optical thickness of a domain in the decomposition are presented based on the spectral analysis and diffusion theory. Using the effective optical thickness, the Wigner rational approxi- mation and the mean chord approximation are applied to estimate the leakagemore » frac- tion of random walks from a domain in the decomposition as a measure of parallel performance and potential communication costs. The one-speed, two-dimensional neutron diffusion equation is used as a model problem in numerical experiments to test the models for symmetric operators with spectral qualities similar to light water reactor problems. We find, in general, the derived approximations show good agreement with random walk lengths and leakage fractions computed by the numerical experiments.« less
PETSc Users Manual Revision 3.3

DOE Office of Scientific and Technical Information (OSTI.GOV)

Balay, S.; Brown, J.; Buschelman, K.

This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication. PETSc includes an expanding suite of parallel linear, nonlinear equation solvers and time integrators that may be used in application codes written in Fortran, C, C++, Python, and MATLAB (sequential). PETSc provides many of the mechanisms neededmore » within parallel application codes, such as parallel matrix and vector assembly routines. The library is organized hierarchically, enabling users to employ the level of abstraction that is most appropriate for a particular problem. By using techniques of object-oriented programming, PETSc provides enormous flexibility for users. PETSc is a sophisticated set of software tools; as such, for some users it initially has a much steeper learning curve than a simple subroutine library. In particular, for individuals without some computer science background, experience programming in C, C++ or Fortran and experience using a debugger such as gdb or dbx, it may require a significant amount of time to take full advantage of the features that enable efficient software use. However, the power of the PETSc design and the algorithms it incorporates may make the efficient implementation of many application codes simpler than “rolling them” yourself; For many tasks a package such as MATLAB is often the best tool; PETSc is not intended for the classes of problems for which effective MATLAB code can be written. PETSc also has a MATLAB interface, so portions of your code can be written in MATLAB to “try out” the PETSc solvers. The resulting code will not be scalable however because currently MATLAB is inherently not scalable; and PETSc should not be used to attempt to provide a “parallel linear solver” in an otherwise sequential code. Certainly all parts of a previously sequential code need not be parallelized but the matrix generation portion must be parallelized to expect any kind of reasonable performance. Do not expect to generate your matrix sequentially and then “use PETSc” to solve the linear system in parallel. Since PETSc is under continued development, small changes in usage and calling sequences of routines will occur. PETSc is supported; see the web site http://www.mcs.anl.gov/petsc for information on contacting support. A http://www.mcs.anl.gov/petsc/publications may be found a list of publications and web sites that feature work involving PETSc. We welcome any reports of corrections for this document.« less
PETSc Users Manual Revision 3.4

DOE Office of Scientific and Technical Information (OSTI.GOV)

Balay, S.; Brown, J.; Buschelman, K.

This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication. PETSc includes an expanding suite of parallel linear, nonlinear equation solvers and time integrators that may be used in application codes written in Fortran, C, C++, Python, and MATLAB (sequential). PETSc provides many of the mechanisms neededmore » within parallel application codes, such as parallel matrix and vector assembly routines. The library is organized hierarchically, enabling users to employ the level of abstraction that is most appropriate for a particular problem. By using techniques of object-oriented programming, PETSc provides enormous flexibility for users. PETSc is a sophisticated set of software tools; as such, for some users it initially has a much steeper learning curve than a simple subroutine library. In particular, for individuals without some computer science background, experience programming in C, C++ or Fortran and experience using a debugger such as gdb or dbx, it may require a significant amount of time to take full advantage of the features that enable efficient software use. However, the power of the PETSc design and the algorithms it incorporates may make the efficient implementation of many application codes simpler than “rolling them” yourself; For many tasks a package such as MATLAB is often the best tool; PETSc is not intended for the classes of problems for which effective MATLAB code can be written. PETSc also has a MATLAB interface, so portions of your code can be written in MATLAB to “try out” the PETSc solvers. The resulting code will not be scalable however because currently MATLAB is inherently not scalable; and PETSc should not be used to attempt to provide a “parallel linear solver” in an otherwise sequential code. Certainly all parts of a previously sequential code need not be parallelized but the matrix generation portion must be parallelized to expect any kind of reasonable performance. Do not expect to generate your matrix sequentially and then “use PETSc” to solve the linear system in parallel. Since PETSc is under continued development, small changes in usage and calling sequences of routines will occur. PETSc is supported; see the web site http://www.mcs.anl.gov/petsc for information on contacting support. A http://www.mcs.anl.gov/petsc/publications may be found a list of publications and web sites that feature work involving PETSc. We welcome any reports of corrections for this document.« less
PETSc Users Manual Revision 3.5

DOE Office of Scientific and Technical Information (OSTI.GOV)

Balay, S.; Abhyankar, S.; Adams, M.

This manual describes the use of PETSc for the numerical solution of partial differential equations and related problems on high-performance computers. The Portable, Extensible Toolkit for Scientific Computation (PETSc) is a suite of data structures and routines that provide the building blocks for the implementation of large-scale application codes on parallel (and serial) computers. PETSc uses the MPI standard for all message-passing communication. PETSc includes an expanding suite of parallel linear, nonlinear equation solvers and time integrators that may be used in application codes written in Fortran, C, C++, Python, and MATLAB (sequential). PETSc provides many of the mechanisms neededmore » within parallel application codes, such as parallel matrix and vector assembly routines. The library is organized hierarchically, enabling users to employ the level of abstraction that is most appropriate for a particular problem. By using techniques of object-oriented programming, PETSc provides enormous flexibility for users. PETSc is a sophisticated set of software tools; as such, for some users it initially has a much steeper learning curve than a simple subroutine library. In particular, for individuals without some computer science background, experience programming in C, C++ or Fortran and experience using a debugger such as gdb or dbx, it may require a significant amount of time to take full advantage of the features that enable efficient software use. However, the power of the PETSc design and the algorithms it incorporates may make the efficient implementation of many application codes simpler than “rolling them” yourself. ;For many tasks a package such as MATLAB is often the best tool; PETSc is not intended for the classes of problems for which effective MATLAB code can be written. PETSc also has a MATLAB interface, so portions of your code can be written in MATLAB to “try out” the PETSc solvers. The resulting code will not be scalable however because currently MATLAB is inherently not scalable; and PETSc should not be used to attempt to provide a “parallel linear solver” in an otherwise sequential code. Certainly all parts of a previously sequential code need not be parallelized but the matrix generation portion must be parallelized to expect any kind of reasonable performance. Do not expect to generate your matrix sequentially and then “use PETSc” to solve the linear system in parallel. Since PETSc is under continued development, small changes in usage and calling sequences of routines will occur. PETSc is supported; see the web site http://www.mcs.anl.gov/petsc for information on contacting support. A http://www.mcs.anl.gov/petsc/publications may be found a list of publications and web sites that feature work involving PETSc. We welcome any reports of corrections for this document.« less
Efficient Parallel Formulations of Hierarchical Methods and Their Applications

NASA Astrophysics Data System (ADS)

Grama, Ananth Y.

1996-01-01

Hierarchical methods such as the Fast Multipole Method (FMM) and Barnes-Hut (BH) are used for rapid evaluation of potential (gravitational, electrostatic) fields in particle systems. They are also used for solving integral equations using boundary element methods. The linear systems arising from these methods are dense and are solved iteratively. Hierarchical methods reduce the complexity of the core matrix-vector product from O(n^2) to O(n log n) and the memory requirement from O(n^2) to O(n). We have developed highly scalable parallel formulations of a hybrid FMM/BH method that are capable of handling arbitrarily irregular distributions. We apply these formulations to astrophysical simulations of Plummer and Gaussian galaxies. We have used our parallel formulations to solve the integral form of the Laplace equation. We show that our parallel hierarchical mat-vecs yield high efficiency and overall performance even on relatively small problems. A problem containing approximately 200K nodes takes under a second to compute on 256 processors and yet yields over 85% efficiency. The efficiency and raw performance is expected to increase for bigger problems. For the 200K node problem, our code delivers about 5 GFLOPS of performance on a 256 processor T3D. This is impressive considering the fact that the problem has floating point divides and roots, and very little locality resulting in poor cache performance. A dense matrix-vector product of the same dimensions would require about 0.5 TeraBytes of memory and about 770 TeraFLOPS of computing speed. Clearly, if the loss in accuracy resulting from the use of hierarchical methods is acceptable, our code yields significant savings in time and memory. We also study the convergence of a GMRES solver built around this mat-vec. We accelerate the convergence of the solver using three preconditioning techniques: diagonal scaling, block-diagonal preconditioning, and inner-outer preconditioning. We study the performance and parallel efficiency of these preconditioned solvers. Using this solver, we solve dense linear systems with hundreds of thousands of unknowns. Solving a 105K unknown problem takes about 10 minutes on a 64 processor T3D. Until very recently, boundary element problems of this magnitude could not even be generated, let alone solved.
A visual parallel-BCI speller based on the time-frequency coding strategy

NASA Astrophysics Data System (ADS)

Xu, Minpeng; Chen, Long; Zhang, Lixin; Qi, Hongzhi; Ma, Lan; Tang, Jiabei; Wan, Baikun; Ming, Dong

2014-04-01

Objective. Spelling is one of the most important issues in brain-computer interface (BCI) research. This paper is to develop a visual parallel-BCI speller system based on the time-frequency coding strategy in which the sub-speller switching among four simultaneously presented sub-spellers and the character selection are identified in a parallel mode. Approach. The parallel-BCI speller was constituted by four independent P300+SSVEP-B (P300 plus SSVEP blocking) spellers with different flicker frequencies, thereby all characters had a specific time-frequency code. To verify its effectiveness, 11 subjects were involved in the offline and online spellings. A classification strategy was designed to recognize the target character through jointly using the canonical correlation analysis and stepwise linear discriminant analysis. Main results. Online spellings showed that the proposed parallel-BCI speller had a high performance, reaching the highest information transfer rate of 67.4 bit min-1, with an average of 54.0 bit min-1 and 43.0 bit min-1 in the three rounds and five rounds, respectively. Significance. The results indicated that the proposed parallel-BCI could be effectively controlled by users with attention shifting fluently among the sub-spellers, and highly improved the BCI spelling performance.

Parallel STEPS: Large Scale Stochastic Spatial Reaction-Diffusion Simulation with High Performance Computers

PubMed Central

Chen, Weiliang; De Schutter, Erik

2017-01-01

Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation. PMID:28239346
Parallel STEPS: Large Scale Stochastic Spatial Reaction-Diffusion Simulation with High Performance Computers.

PubMed

Chen, Weiliang; De Schutter, Erik

2017-01-01

Stochastic, spatial reaction-diffusion simulations have been widely used in systems biology and computational neuroscience. However, the increasing scale and complexity of models and morphologies have exceeded the capacity of any serial implementation. This led to the development of parallel solutions that benefit from the boost in performance of modern supercomputers. In this paper, we describe an MPI-based, parallel operator-splitting implementation for stochastic spatial reaction-diffusion simulations with irregular tetrahedral meshes. The performance of our implementation is first examined and analyzed with simulations of a simple model. We then demonstrate its application to real-world research by simulating the reaction-diffusion components of a published calcium burst model in both Purkinje neuron sub-branch and full dendrite morphologies. Simulation results indicate that our implementation is capable of achieving super-linear speedup for balanced loading simulations with reasonable molecule density and mesh quality. In the best scenario, a parallel simulation with 2,000 processes runs more than 3,600 times faster than its serial SSA counterpart, and achieves more than 20-fold speedup relative to parallel simulation with 100 processes. In a more realistic scenario with dynamic calcium influx and data recording, the parallel simulation with 1,000 processes and no load balancing is still 500 times faster than the conventional serial SSA simulation.
Matrix preconditioning: a robust operation for optical linear algebra processors.

PubMed

Ghosh, A; Paparao, P

1987-07-15

Analog electrooptical processors are best suited for applications demanding high computational throughput with tolerance for inaccuracies. Matrix preconditioning is one such application. Matrix preconditioning is a preprocessing step for reducing the condition number of a matrix and is used extensively with gradient algorithms for increasing the rate of convergence and improving the accuracy of the solution. In this paper, we describe a simple parallel algorithm for matrix preconditioning, which can be implemented efficiently on a pipelined optical linear algebra processor. From the results of our numerical experiments we show that the efficacy of the preconditioning algorithm is affected very little by the errors of the optical system.
Parallel conjugate gradient algorithms for manipulator dynamic simulation

NASA Technical Reports Server (NTRS)

Fijany, Amir; Scheld, Robert E.

1989-01-01

Parallel conjugate gradient algorithms for the computation of multibody dynamics are developed for the specialized case of a robot manipulator. For an n-dimensional positive-definite linear system, the Classical Conjugate Gradient (CCG) algorithms are guaranteed to converge in n iterations, each with a computation cost of O(n); this leads to a total computational cost of O(n sq) on a serial processor. A conjugate gradient algorithms is presented that provide greater efficiency using a preconditioner, which reduces the number of iterations required, and by exploiting parallelism, which reduces the cost of each iteration. Two Preconditioned Conjugate Gradient (PCG) algorithms are proposed which respectively use a diagonal and a tridiagonal matrix, composed of the diagonal and tridiagonal elements of the mass matrix, as preconditioners. Parallel algorithms are developed to compute the preconditioners and their inversions in O(log sub 2 n) steps using n processors. A parallel algorithm is also presented which, on the same architecture, achieves the computational time of O(log sub 2 n) for each iteration. Simulation results for a seven degree-of-freedom manipulator are presented. Variants of the proposed algorithms are also developed which can be efficiently implemented on the Robot Mathematics Processor (RMP).
Multi-Window Controllers for Autonomous Space Systems

NASA Technical Reports Server (NTRS)

Lurie, B, J.; Hadaegh, F. Y.

1997-01-01

Multi-window controllers select between elementary linear controllers using nonlinear windows based on the amplitude and frequency content of the feedback error. The controllers are relatively simple to implement and perform much better than linear controllers. The commanders for such controllers only order the destination point and are freed from generating the command time-profiles. The robotic missions rely heavily on the tasks of acquisition and tracking. For autonomous and optimal control of the spacecraft, the control bandwidth must be larger while the feedback can (and, therefore, must) be reduced.. Combining linear compensators via multi-window nonlinear summer guarantees minimum phase character of the combined transfer function. It is shown that the solution may require using several parallel branches and windows. Several examples of multi-window nonlinear controller applications are presented.
Parallel-vector solution of large-scale structural analysis problems on supercomputers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Agarwal, Tarun K.

1989-01-01

A direct linear equation solution method based on the Choleski factorization procedure is presented which exploits both parallel and vector features of supercomputers. The new equation solver is described, and its performance is evaluated by solving structural analysis problems on three high-performance computers. The method has been implemented using Force, a generic parallel FORTRAN language.
A design concept of parallel elasticity extracted from biological muscles for engineered actuators.

PubMed

Chen, Jie; Jin, Hongzhe; Iida, Fumiya; Zhao, Jie

2016-08-23

Series elastic actuation that takes inspiration from biological muscle-tendon units has been extensively studied and used to address the challenges (e.g. energy efficiency, robustness) existing in purely stiff robots. However, there also exists another form of passive property in biological actuation, parallel elasticity within muscles themselves, and our knowledge of it is limited: for example, there is still no general design strategy for the elasticity profile. When we look at nature, on the other hand, there seems a universal agreement in biological systems: experimental evidence has suggested that a concave-upward elasticity behaviour is exhibited within the muscles of animals. Seeking to draw possible design clues for elasticity in parallel with actuators, we use a simplified joint model to investigate the mechanisms behind this biologically universal preference of muscles. Actuation of the model is identified from general biological joints and further reduced with a specific focus on muscle elasticity aspects, for the sake of easy implementation. By examining various elasticity scenarios, one without elasticity and three with elasticity of different profiles, we find that parallel elasticity generally exerts contradictory influences on energy efficiency and disturbance rejection, due to the mechanical impedance shift thus caused. The trade-off analysis between them also reveals that concave parallel elasticity is able to achieve a more advantageous balance than linear and convex ones. It is expected that the results could contribute to our further understanding of muscle elasticity and provide a theoretical guideline on how to properly design parallel elasticity behaviours for engineering systems such as artificial actuators and robotic joints.
Topology of polymer chains under nanoscale confinement.

PubMed

Satarifard, Vahid; Heidari, Maziar; Mashaghi, Samaneh; Tans, Sander J; Ejtehadi, Mohammad Reza; Mashaghi, Alireza

2017-08-24

Spatial confinement limits the conformational space accessible to biomolecules but the implications for bimolecular topology are not yet known. Folded linear biopolymers can be seen as molecular circuits formed by intramolecular contacts. The pairwise arrangement of intra-chain contacts can be categorized as parallel, series or cross, and has been identified as a topological property. Using molecular dynamics simulations, we determine the contact order distributions and topological circuits of short semi-flexible linear and ring polymer chains with a persistence length of l p under a spherical confinement of radius R c . At low values of l p /R c , the entropy of the linear chain leads to the formation of independent contacts along the chain and accordingly, increases the fraction of series topology with respect to other topologies. However, at high l p /R c , the fraction of cross and parallel topologies are enhanced in the chain topological circuits with cross becoming predominant. At an intermediate confining regime, we identify a critical value of l p /R c , at which all topological states have equal probability. Confinement thus equalizes the probability of more complex cross and parallel topologies to the level of the more simple, non-cooperative series topology. Moreover, our topology analysis reveals distinct behaviours for ring- and linear polymers under weak confinement; however, we find no difference between ring- and linear polymers under strong confinement. Under weak confinement, ring polymers adopt parallel and series topologies with equal likelihood, while linear polymers show a higher tendency for series arrangement. The radial distribution analysis of the topology reveals a non-uniform effect of confinement on the topology of polymer chains, thereby imposing more pronounced effects on the core region than on the confinement surface. Additionally, our results reveal that over a wide range of confining radii, loops arranged in parallel and cross topologies have nearly the same contact orders. Such degeneracy implies that the kinetics and transition rates between the topological states cannot be solely explained by contact order. We expect these findings to be of general importance in understanding chaperone assisted protein folding, chromosome architecture, and the evolution of molecular folds.
A Performance Comparison of the Parallel Preconditioners for Iterative Methods for Large Sparse Linear Systems Arising from Partial Differential Equations on Structured Grids

NASA Astrophysics Data System (ADS)

Ma, Sangback

In this paper we compare various parallel preconditioners such as Point-SSOR (Symmetric Successive OverRelaxation), ILU(0) (Incomplete LU) in the Wavefront ordering, ILU(0) in the Multi-color ordering, Multi-Color Block SOR (Successive OverRelaxation), SPAI (SParse Approximate Inverse) and pARMS (Parallel Algebraic Recursive Multilevel Solver) for solving large sparse linear systems arising from two-dimensional PDE (Partial Differential Equation)s on structured grids. Point-SSOR is well-known, and ILU(0) is one of the most popular preconditioner, but it is inherently serial. ILU(0) in the Wavefront ordering maximizes the parallelism in the natural order, but the lengths of the wave-fronts are often nonuniform. ILU(0) in the Multi-color ordering is a simple way of achieving a parallelism of the order N, where N is the order of the matrix, but its convergence rate often deteriorates as compared to that of natural ordering. We have chosen the Multi-Color Block SOR preconditioner combined with direct sparse matrix solver, since for the Laplacian matrix the SOR method is known to have a nondeteriorating rate of convergence when used with the Multi-Color ordering. By using block version we expect to minimize the interprocessor communications. SPAI computes the sparse approximate inverse directly by least squares method. Finally, ARMS is a preconditioner recursively exploiting the concept of independent sets and pARMS is the parallel version of ARMS. Experiments were conducted for the Finite Difference and Finite Element discretizations of five two-dimensional PDEs with large meshsizes up to a million on an IBM p595 machine with distributed memory. Our matrices are real positive, i. e., their real parts of the eigenvalues are positive. We have used GMRES(m) as our outer iterative method, so that the convergence of GMRES(m) for our test matrices are mathematically guaranteed. Interprocessor communications were done using MPI (Message Passing Interface) primitives. The results show that in general ILU(0) in the Multi-Color ordering ahd ILU(0) in the Wavefront ordering outperform the other methods but for symmetric and nearly symmetric 5-point matrices Multi-Color Block SOR gives the best performance, except for a few cases with a small number of processors.
Characterization of reticulated vitreous carbon foam using a frisch-grid parallel-plate ionization chamber

NASA Astrophysics Data System (ADS)

Edwards, Nathaniel S.; Conley, Jerrod C.; Reichenberger, Michael A.; Nelson, Kyle A.; Tiner, Christopher N.; Hinson, Niklas J.; Ugorowski, Philip B.; Fronk, Ryan G.; McGregor, Douglas S.

2018-06-01

The propagation of electrons through several linear pore densities of reticulated vitreous carbon (RVC) foam was studied using a Frisch-grid parallel-plate ionization chamber pressurized to 1 psig of P-10 proportional gas. The operating voltages of the electrodes contained within the Frisch-grid parallel-plate ionization chamber were defined by measuring counting curves using a collimated 241Am alpha-particle source with and without a Frisch grid. RVC foam samples with linear pore densities of 5, 10, 20, 30, 45, 80, and 100 pores per linear inch were separately positioned between the cathode and anode. Pulse-height spectra and count rates from a collimated 241Am alpha-particle source positioned between the cathode and each RVC foam sample were measured and compared to a measurement without an RVC foam sample. The Frisch grid was positioned in between the RVC foam sample and the anode. The measured pulse-height spectra were indiscernible from background and resulted in negligible net count rates for all RVC foam samples. The Frisch grid parallel-plate ionization chamber measurement results indicate that electrons do not traverse the bulk of RVC foam and consequently do not produce a pulse.
The stress shadow effect: a mechanical analysis of the evenly-spaced parallel strike-slip faults in the San Andreas fault system

NASA Astrophysics Data System (ADS)

Zuza, A. V.; Yin, A.; Lin, J. C.

2015-12-01

Parallel evenly-spaced strike-slip faults are prominent in the southern San Andreas fault system, as well as other settings along plate boundaries (e.g., the Alpine fault) and within continental interiors (e.g., the North Anatolian, central Asian, and northern Tibetan faults). In southern California, the parallel San Jacinto, Elsinore, Rose Canyon, and San Clemente faults to the west of the San Andreas are regularly spaced at ~40 km. In the Eastern California Shear Zone, east of the San Andreas, faults are spaced at ~15 km. These characteristic spacings provide unique mechanical constraints on how the faults interact. Despite the common occurrence of parallel strike-slip faults, the fundamental questions of how and why these fault systems form remain unanswered. We address this issue by using the stress shadow concept of Lachenbruch (1961)—developed to explain extensional joints by using the stress-free condition on the crack surface—to present a mechanical analysis of the formation of parallel strike-slip faults that relates fault spacing and brittle-crust thickness to fault strength, crustal strength, and the crustal stress state. We discuss three independent models: (1) a fracture mechanics model, (2) an empirical stress-rise function model embedded in a plastic medium, and (3) an elastic-plate model. The assumptions and predictions of these models are quantitatively tested using scaled analogue sandbox experiments that show that strike-slip fault spacing is linearly related to the brittle-crust thickness. We derive constraints on the mechanical properties of the southern San Andreas strike-slip faults and fault-bounded crust (e.g., local fault strength and crustal/regional stress) given the observed fault spacing and brittle-crust thickness, which is obtained by defining the base of the seismogenic zone with high-resolution earthquake data. Our models allow direct comparison of the parallel faults in the southern San Andreas system with other similar strike-slip fault systems, both on Earth and throughout the solar system (e.g., the Tiger Stripe Fractures on Enceladus).
Development of seismic tomography software for hybrid supercomputers

NASA Astrophysics Data System (ADS)

Nikitin, Alexandr; Serdyukov, Alexandr; Duchkov, Anton

2015-04-01

Seismic tomography is a technique used for computing velocity model of geologic structure from first arrival travel times of seismic waves. The technique is used in processing of regional and global seismic data, in seismic exploration for prospecting and exploration of mineral and hydrocarbon deposits, and in seismic engineering for monitoring the condition of engineering structures and the surrounding host medium. As a consequence of development of seismic monitoring systems and increasing volume of seismic data, there is a growing need for new, more effective computational algorithms for use in seismic tomography applications with improved performance, accuracy and resolution. To achieve this goal, it is necessary to use modern high performance computing systems, such as supercomputers with hybrid architecture that use not only CPUs, but also accelerators and co-processors for computation. The goal of this research is the development of parallel seismic tomography algorithms and software package for such systems, to be used in processing of large volumes of seismic data (hundreds of gigabytes and more). These algorithms and software package will be optimized for the most common computing devices used in modern hybrid supercomputers, such as Intel Xeon CPUs, NVIDIA Tesla accelerators and Intel Xeon Phi co-processors. In this work, the following general scheme of seismic tomography is utilized. Using the eikonal equation solver, arrival times of seismic waves are computed based on assumed velocity model of geologic structure being analyzed. In order to solve the linearized inverse problem, tomographic matrix is computed that connects model adjustments with travel time residuals, and the resulting system of linear equations is regularized and solved to adjust the model. The effectiveness of parallel implementations of existing algorithms on target architectures is considered. During the first stage of this work, algorithms were developed for execution on supercomputers using multicore CPUs only, with preliminary performance tests showing good parallel efficiency on large numerical grids. Porting of the algorithms to hybrid supercomputers is currently ongoing.
Time Parallel Solution of Linear Partial Differential Equations on the Intel Touchstone Delta Supercomputer

NASA Technical Reports Server (NTRS)

Toomarian, N.; Fijany, A.; Barhen, J.

1993-01-01

Evolutionary partial differential equations are usually solved by decretization in time and space, and by applying a marching in time procedure to data and algorithms potentially parallelized in the spatial domain.
Parallel Implicit Runge-Kutta Methods Applied to Coupled Orbit/Attitude Propagation

NASA Astrophysics Data System (ADS)

Hatten, Noble; Russell, Ryan P.

2017-12-01

A variable-step Gauss-Legendre implicit Runge-Kutta (GLIRK) propagator is applied to coupled orbit/attitude propagation. Concepts previously shown to improve efficiency in 3DOF propagation are modified and extended to the 6DOF problem, including the use of variable-fidelity dynamics models. The impact of computing the stage dynamics of a single step in parallel is examined using up to 23 threads and 22 associated GLIRK stages; one thread is reserved for an extra dynamics function evaluation used in the estimation of the local truncation error. Efficiency is found to peak for typical examples when using approximately 8 to 12 stages for both serial and parallel implementations. Accuracy and efficiency compare favorably to explicit Runge-Kutta and linear-multistep solvers for representative scenarios. However, linear-multistep methods are found to be more efficient for some applications, particularly in a serial computing environment, or when parallelism can be applied across multiple trajectories.
Monte Carlo MP2 on Many Graphical Processing Units.

PubMed

Doran, Alexander E; Hirata, So

2016-10-11

In the Monte Carlo second-order many-body perturbation (MC-MP2) method, the long sum-of-product matrix expression of the MP2 energy, whose literal evaluation may be poorly scalable, is recast into a single high-dimensional integral of functions of electron pair coordinates, which is evaluated by the scalable method of Monte Carlo integration. The sampling efficiency is further accelerated by the redundant-walker algorithm, which allows a maximal reuse of electron pairs. Here, a multitude of graphical processing units (GPUs) offers a uniquely ideal platform to expose multilevel parallelism: fine-grain data-parallelism for the redundant-walker algorithm in which millions of threads compute and share orbital amplitudes on each GPU; coarse-grain instruction-parallelism for near-independent Monte Carlo integrations on many GPUs with few and infrequent interprocessor communications. While the efficiency boost by the redundant-walker algorithm on central processing units (CPUs) grows linearly with the number of electron pairs and tends to saturate when the latter exceeds the number of orbitals, on a GPU it grows quadratically before it increases linearly and then eventually saturates at a much larger number of pairs. This is because the orbital constructions are nearly perfectly parallelized on a GPU and thus completed in a near-constant time regardless of the number of pairs. In consequence, an MC-MP2/cc-pVDZ calculation of a benzene dimer is 2700 times faster on 256 GPUs (using 2048 electron pairs) than on two CPUs, each with 8 cores (which can use only up to 256 pairs effectively). We also numerically determine that the cost to achieve a given relative statistical uncertainty in an MC-MP2 energy increases as O(n 3 ) or better with system size n, which may be compared with the O(n 5 ) scaling of the conventional implementation of deterministic MP2. We thus establish the scalability of MC-MP2 with both system and computer sizes.
Parallel computation of fluid-structural interactions using high resolution upwind schemes

NASA Astrophysics Data System (ADS)

Hu, Zongjun

An efficient and accurate solver is developed to simulate the non-linear fluid-structural interactions in turbomachinery flutter flows. A new low diffusion E-CUSP scheme, Zha CUSP scheme, is developed to improve the efficiency and accuracy of the inviscid flux computation. The 3D unsteady Navier-Stokes equations with the Baldwin-Lomax turbulence model are solved using the finite volume method with the dual-time stepping scheme. The linearized equations are solved with Gauss-Seidel line iterations. The parallel computation is implemented using MPI protocol. The solver is validated with 2D cases for its turbulence modeling, parallel computation and unsteady calculation. The Zha CUSP scheme is validated with 2D cases, including a supersonic flat plate boundary layer, a transonic converging-diverging nozzle and a transonic inlet diffuser. The Zha CUSP2 scheme is tested with 3D cases, including a circular-to-rectangular nozzle, a subsonic compressor cascade and a transonic channel. The Zha CUSP schemes are proved to be accurate, robust and efficient in these tests. The steady and unsteady separation flows in a 3D stationary cascade under high incidence and three inlet Mach numbers are calculated to study the steady state separation flow patterns and their unsteady oscillation characteristics. The leading edge vortex shedding is the mechanism behind the unsteady characteristics of the high incidence separated flows. The separation flow characteristics is affected by the inlet Mach number. The blade aeroelasticity of a linear cascade with forced oscillating blades is studied using parallel computation. A simplified two-passage cascade with periodic boundary condition is first calculated under a medium frequency and a low incidence. The full scale cascade with 9 blades and two end walls is then studied more extensively under three oscillation frequencies and two incidence angles. The end wall influence and the blade stability are studied and compared under different frequencies and incidence angles. The Zha CUSP schemes are the first time to be applied in moving grid systems and 2D and 3D calculations. The implicit Gauss-Seidel iteration with dual time stepping is the first time to be used for moving grid systems. The NASA flutter cascade is the first time to be calculated in full scale.
Microfabricated linear Paul-Straubel ion trap

DOEpatents

Mangan, Michael A [Albuquerque, NM; Blain, Matthew G [Albuquerque, NM; Tigges, Chris P [Albuquerque, NM; Linker, Kevin L [Albuquerque, NM

2011-04-19

An array of microfabricated linear Paul-Straubel ion traps can be used for mass spectrometric applications. Each ion trap comprises two parallel inner RF electrodes and two parallel outer DC control electrodes symmetric about a central trap axis and suspended over an opening in a substrate. Neighboring ion traps in the array can share a common outer DC control electrode. The ions confined transversely by an RF quadrupole electric field potential well on the ion trap axis. The array can trap a wide array of ions.
Beam dynamics in MABE

DOE Office of Scientific and Technical Information (OSTI.GOV)

Poukey, J.W.; Coleman, P.D.; Sanford, T.W.L.

1985-10-01

MABE is a multistage linear electron accelerator which accelerates up to nine beams in parallel. Nominal parameters per beam are 25 kA, final energy 7 MeV, and guide field 20 kG. We report recent progress via theory and simulation in understanding the beam dynamics in such a system. In particular, we emphasize our results on the radial oscillations and emittance growth for a beam passing through a series of accelerating gaps.
spammpack, Version 2013-06-18

DOE Office of Scientific and Technical Information (OSTI.GOV)

2014-01-17

This library is an implementation of the Sparse Approximate Matrix Multiplication (SpAMM) algorithm introduced. It provides a matrix data type, and an approximate matrix product, which exhibits linear scaling computational complexity for matrices with decay. The product error and the performance of the multiply can be tuned by choosing an appropriate tolerance. The library can be compiled for serial execution or parallel execution on shared memory systems with an OpenMP capable compiler
Parallel Finite Element Domain Decomposition for Structural/Acoustic Analysis

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.; Tungkahotara, Siroj; Watson, Willie R.; Rajan, Subramaniam D.

2005-01-01

A domain decomposition (DD) formulation for solving sparse linear systems of equations resulting from finite element analysis is presented. The formulation incorporates mixed direct and iterative equation solving strategics and other novel algorithmic ideas that are optimized to take advantage of sparsity and exploit modern computer architecture, such as memory and parallel computing. The most time consuming part of the formulation is identified and the critical roles of direct sparse and iterative solvers within the framework of the formulation are discussed. Experiments on several computer platforms using several complex test matrices are conducted using software based on the formulation. Small-scale structural examples are used to validate thc steps in the formulation and large-scale (l,000,000+ unknowns) duct acoustic examples are used to evaluate the ORIGIN 2000 processors, and a duster of 6 PCs (running under the Windows environment). Statistics show that the formulation is efficient in both sequential and parallel computing environmental and that the formulation is significantly faster and consumes less memory than that based on one of the best available commercialized parallel sparse solvers.

On Parallelizing Single Dynamic Simulation Using HPC Techniques and APIs of Commercial Software

DOE Office of Scientific and Technical Information (OSTI.GOV)

Diao, Ruisheng; Jin, Shuangshuang; Howell, Frederic

Time-domain simulations are heavily used in today’s planning and operation practices to assess power system transient stability and post-transient voltage/frequency profiles following severe contingencies to comply with industry standards. Because of the increased modeling complexity, it is several times slower than real time for state-of-the-art commercial packages to complete a dynamic simulation for a large-scale model. With the growing stochastic behavior introduced by emerging technologies, power industry has seen a growing need for performing security assessment in real time. This paper presents a parallel implementation framework to speed up a single dynamic simulation by leveraging the existing stability model librarymore » in commercial tools through their application programming interfaces (APIs). Several high performance computing (HPC) techniques are explored such as parallelizing the calculation of generator current injection, identifying fast linear solvers for network solution, and parallelizing data outputs when interacting with APIs in the commercial package, TSAT. The proposed method has been tested on a WECC planning base case with detailed synchronous generator models and exhibits outstanding scalable performance with sufficient accuracy.« less
IMEX HDG-DG: A coupled implicit hybridized discontinuous Galerkin and explicit discontinuous Galerkin approach for Euler systems on cubed sphere.

NASA Astrophysics Data System (ADS)

Kang, S.; Muralikrishnan, S.; Bui-Thanh, T.

2017-12-01

We propose IMEX HDG-DG schemes for Euler systems on cubed sphere. Of interest is subsonic flow, where the speed of the acoustic wave is faster than that of the nonlinear advection. In order to simulate these flows efficiently, we split the governing system into stiff part describing the fast waves and non-stiff part associated with nonlinear advection. The former is discretized implicitly with HDG method while explicit Runge-Kutta DG discretization is employed for the latter. The proposed IMEX HDG-DG framework: 1) facilitates high-order solution both in time and space; 2) avoids overly small time stepsizes; 3) requires only one linear system solve per time step; and 4) relatively to DG generates smaller and sparser linear system while promoting further parallelism owing to HDG discretization. Numerical results for various test cases demonstrate that our methods are comparable to explicit Runge-Kutta DG schemes in terms of accuracy, while allowing for much larger time stepsizes.
Relationship between mathematical abstraction in learning parallel coordinates concept and performance in learning analytic geometry of pre-service mathematics teachers: an investigation

NASA Astrophysics Data System (ADS)

Nurhasanah, F.; Kusumah, Y. S.; Sabandar, J.; Suryadi, D.

2018-05-01

As one of the non-conventional mathematics concepts, Parallel Coordinates is potential to be learned by pre-service mathematics teachers in order to give them experiences in constructing richer schemes and doing abstraction process. Unfortunately, the study related to this issue is still limited. This study wants to answer a research question “to what extent the abstraction process of pre-service mathematics teachers in learning concept of Parallel Coordinates could indicate their performance in learning Analytic Geometry”. This is a case study that part of a larger study in examining mathematical abstraction of pre-service mathematics teachers in learning non-conventional mathematics concept. Descriptive statistics method is used in this study to analyze the scores from three different tests: Cartesian Coordinate, Parallel Coordinates, and Analytic Geometry. The participants in this study consist of 45 pre-service mathematics teachers. The result shows that there is a linear association between the score on Cartesian Coordinate and Parallel Coordinates. There also found that the higher levels of the abstraction process in learning Parallel Coordinates are linearly associated with higher student achievement in Analytic Geometry. The result of this study shows that the concept of Parallel Coordinates has a significant role for pre-service mathematics teachers in learning Analytic Geometry.
Vacuum-barrier window for wide-bandwidth high-power microwave transmission

DOEpatents

Caplan, M.; Shang, C.C.

1996-08-20

A vacuum output window comprises a planar dielectric material with identical systems of parallel ridges and valleys formed in opposite surfaces. The valleys in each surface neck together along parallel lines in the bulk of the dielectric. Liquid-coolant conduits are disposed linearly along such lines of necking and have water or even liquid nitrogen pumped through to remove heat. The dielectric material can be alumina, or its crystalline form, sapphire. The electric-field of a broadband incident megawatt millimeter-wave radio frequency energy is oriented perpendicular to the system of ridges and valleys. The ridges, about one wavelength tall and with a period of about one wavelength, focus the incident energy through in ribbons that squeeze between the liquid-coolant conduits without significant losses over very broad bands of the radio spectrum. In an alternative embodiment, the liquid-coolant conduits are encased in metal within the bulk of the dielectric. 4 figs.
Vacuum-barrier window for wide-bandwidth high-power microwave transmission

DOEpatents

Caplan, Malcolm; Shang, Clifford C.

1996-01-01

A vacuum output window comprises a planar dielectric material with identical systems of parallel ridges and valleys formed in opposite surfaces. The valleys in each surface neck together along parallel lines in the bulk of the dielectric. Liquid-coolant conduits are disposed linearly along such lines of necking and have water or even liquid nitrogen pumped through to remove heat. The dielectric material can be alumina, or its crystalline form, sapphire. The electric-field of a broadband incident megawatt millimeter-wave radio frequency energy is oriented perpendicular to the system of ridges and valleys. The ridges, about one wavelength tall and with a period of about one wavelength, focus the incident energy through in ribbons that squeeze between the liquid-coolant conduits without significant losses over very broad bands of the radio spectrum. In an alternative embodiment, the liquid-coolant conduits are encased in metal within the bulk of the dielectric.
A surgical parallel continuum manipulator with a cable-driven grasper.

PubMed

Orekhov, Andrew L; Bryson, Caroline E; Till, John; Chung, Scotty; Rucker, D Caleb

2015-01-01

In this paper, we present the design, construction, and control of a six-degree-of-freedom (DOF), 12 mm diameter, parallel continuum manipulator with a 2-DOF, cable-driven grasper. This work is a proof-of-concept first step towards miniaturization of this type of manipulator design to provide increased dexterity and stability in confined-space surgical applications, particularly for endoscopic procedures. Our robotic system consists of six superelastic NiTi (Nitinol) tubes in a standard Stewart-Gough configuration and an end effector with 180 degree motion of its two jaws. Two Kevlar cables pass through the centers of the tube legs to actuate the end effector. A computationally efficient inverse kinematics model provides low-level control inputs to ten independent linear actuators, which drive the Stewart-Gough platform and end-effector actuation cables. We demonstrate the performance and feasibility of this design by conducting open-loop range-of-motion tests for our system.
Three dimensional modelling of earthquake rupture cycles on frictional faults

NASA Astrophysics Data System (ADS)

Simpson, Guy; May, Dave

2017-04-01

We are developing an efficient MPI-parallel numerical method to simulate earthquake sequences on preexisting faults embedding within a three dimensional viscoelastic half-space. We solve the velocity form of the elasto(visco)dynamic equations using a continuous Galerkin Finite Element Method on an unstructured pentahedral mesh, which thus permits local spatial refinement in the vicinity of the fault. Friction sliding is coupled to the viscoelastic solid via rate- and state-dependent friction laws using the split-node technique. Our coupled formulation employs a picard-type non-linear solver with a fully implicit, first order accurate time integrator that utilises an adaptive time step that efficiently evolves the system through multiple seismic cycles. The implementation leverages advanced parallel solvers, preconditioners and linear algebra from the Portable Extensible Toolkit for Scientific Computing (PETSc) library. The model can treat heterogeneous frictional properties and stress states on the fault and surrounding solid as well as non-planar fault geometries. Preliminary tests show that the model successfully reproduces dynamic rupture on a vertical strike-slip fault in a half-space governed by rate-state friction with the ageing law.
An Efficient Objective Analysis System for Parallel Computers

NASA Technical Reports Server (NTRS)

Stobie, James G.

1999-01-01

A new objective analysis system designed for parallel computers will be described. The system can produce a global analysis (on a 2 x 2.5 lat-lon grid with 20 levels of heights and winds and 10 levels of moisture) using 120,000 observations in less than 3 minutes on 32 CPUs (SGI Origin 2000). No special parallel code is needed (e.g. MPI or multitasking) and the 32 CPUs do not have to be on the same platform. The system Ls totally portable and can run on -several different architectures at once. In addition, the system can easily scale up to 100 or more CPUS. This will allow for much higher resolution and significant increases in input data. The system scales linearly as the number of observations and the number of grid points. The cost overhead in going from I to 32 CPus is 18%. in addition, the analysis results are identical regardless of the number of processors used. T'his system has all the characteristics of optimal interpolation, combining detailed instrument and first guess error statistics to produce the best estimate of the atmospheric state. It also includes a new quality control (buddy check) system. Static tests with the system showed it's analysis increments are comparable to the latest NASA operational system including maintenance of mass-wind balance. Results from a 2-month cycling test in the Goddard EOS Data Assimilation System (GEOS DAS) show this new analysis retains the same level of agreement between the first guess and observations (0-F statistics) throughout the entire two months.
Nice Guys Finish Fast and Bad Guys Finish Last: Facilitatory vs. Inhibitory Interaction in Parallel Systems

PubMed Central

Eidels, Ami; Houpt, Joseph W.; Altieri, Nicholas; Pei, Lei; Townsend, James T.

2011-01-01

Systems Factorial Technology is a powerful framework for investigating the fundamental properties of human information processing such as architecture (i.e., serial or parallel processing) and capacity (how processing efficiency is affected by increased workload). The Survivor Interaction Contrast (SIC) and the Capacity Coefficient are effective measures in determining these underlying properties, based on response-time data. Each of the different architectures, under the assumption of independent processing, predicts a specific form of the SIC along with some range of capacity. In this study, we explored SIC predictions of discrete-state (Markov process) and continuous-state (Linear Dynamic) models that allow for certain types of cross-channel interaction. The interaction can be facilitatory or inhibitory: one channel can either facilitate, or slow down processing in its counterpart. Despite the relative generality of these models, the combination of the architecture-oriented plus the capacity oriented analyses provide for precise identification of the underlying system. PMID:21516183
Nice Guys Finish Fast and Bad Guys Finish Last: Facilitatory vs. Inhibitory Interaction in Parallel Systems.

PubMed

Eidels, Ami; Houpt, Joseph W; Altieri, Nicholas; Pei, Lei; Townsend, James T

2011-04-01

Systems Factorial Technology is a powerful framework for investigating the fundamental properties of human information processing such as architecture (i.e., serial or parallel processing) and capacity (how processing efficiency is affected by increased workload). The Survivor Interaction Contrast (SIC) and the Capacity Coefficient are effective measures in determining these underlying properties, based on response-time data. Each of the different architectures, under the assumption of independent processing, predicts a specific form of the SIC along with some range of capacity. In this study, we explored SIC predictions of discrete-state (Markov process) and continuous-state (Linear Dynamic) models that allow for certain types of cross-channel interaction. The interaction can be facilitatory or inhibitory: one channel can either facilitate, or slow down processing in its counterpart. Despite the relative generality of these models, the combination of the architecture-oriented plus the capacity oriented analyses provide for precise identification of the underlying system.
A simple-shear rheometer for linear viscoelastic characterization of vocal fold tissues at phonatory frequencies.

PubMed

Chan, Roger W; Rodriguez, Maritza L

2008-08-01

Previous studies reporting the linear viscoelastic shear properties of the human vocal fold cover or mucosa have been based on torsional rheometry, with measurements limited to low audio frequencies, up to around 80 Hz. This paper describes the design and validation of a custom-built, controlled-strain, linear, simple-shear rheometer system capable of direct empirical measurements of viscoelastic shear properties at phonatory frequencies. A tissue specimen was subjected to simple shear between two parallel, rigid acrylic plates, with a linear motor creating a translational sinusoidal displacement of the specimen via the upper plate, and the lower plate transmitting the harmonic shear force resulting from the viscoelastic response of the specimen. The displacement of the specimen was measured by a linear variable differential transformer whereas the shear force was detected by a piezoelectric transducer. The frequency response characteristics of these system components were assessed by vibration experiments with accelerometers. Measurements of the viscoelastic shear moduli (G' and G") of a standard ANSI S2.21 polyurethane material and those of human vocal fold cover specimens were made, along with estimation of the system signal and noise levels. Preliminary results showed that the rheometer can provide valid and reliable rheometric data of vocal fold lamina propria specimens at frequencies of up to around 250 Hz, well into the phonatory range.
Acceleration of linear stationary iterative processes in multiprocessor computers. II

DOE Office of Scientific and Technical Information (OSTI.GOV)

Romm, Ya.E.

1982-05-01

For pt.I, see Kibernetika, vol.18, no.1, p.47 (1982). For pt.I, see Cybernetics, vol.18, no.1, p.54 (1982). Considers a reduced system of linear algebraic equations x=ax+b, where a=(a/sub ij/) is a real n*n matrix; b is a real vector with common euclidean norm >>>. It is supposed that the existence and uniqueness of solution det (0-a) not equal to e is given, where e is a unit matrix. The linear iterative process converging to x x/sup (k+1)/=fx/sup (k)/, k=0, 1, 2, ..., where the operator f translates r/sup n/ into r/sup n/. In considering implementation of the iterative process (ip) inmore » a multiprocessor system, it is assumed that the number of processors is constant, and are various values of the latter investigated; it is assumed in addition, that the processors perform elementary binary arithmetic operations of addition and multiestimates only include the time of execution of arithmetic operations. With any paralleling of individual iteration, the execution time of the ip is proportional to the number of sequential steps k+1. The author sets the task of reducing the number of sequential steps in the ip so as to execute it in a time proportional to a value smaller than k+1. He also sets the goal of formulating a method of accelerated bit serial-parallel execution of each successive step of the ip, with, in the modification sought, a reduced number of steps in a time comparable to the operation time of logical elements. 6 references.« less
Parallel spatial direct numerical simulations on the Intel iPSC/860 hypercube

NASA Technical Reports Server (NTRS)

Joslin, Ronald D.; Zubair, Mohammad

1993-01-01

The implementation and performance of a parallel spatial direct numerical simulation (PSDNS) approach on the Intel iPSC/860 hypercube is documented. The direct numerical simulation approach is used to compute spatially evolving disturbances associated with the laminar-to-turbulent transition in boundary-layer flows. The feasibility of using the PSDNS on the hypercube to perform transition studies is examined. The results indicate that the direct numerical simulation approach can effectively be parallelized on a distributed-memory parallel machine. By increasing the number of processors nearly ideal linear speedups are achieved with nonoptimized routines; slower than linear speedups are achieved with optimized (machine dependent library) routines. This slower than linear speedup results because the Fast Fourier Transform (FFT) routine dominates the computational cost and because the routine indicates less than ideal speedups. However with the machine-dependent routines the total computational cost decreases by a factor of 4 to 5 compared with standard FORTRAN routines. The computational cost increases linearly with spanwise wall-normal and streamwise grid refinements. The hypercube with 32 processors was estimated to require approximately twice the amount of Cray supercomputer single processor time to complete a comparable simulation; however it is estimated that a subgrid-scale model which reduces the required number of grid points and becomes a large-eddy simulation (PSLES) would reduce the computational cost and memory requirements by a factor of 10 over the PSDNS. This PSLES implementation would enable transition simulations on the hypercube at a reasonable computational cost.
Incomplete Sparse Approximate Inverses for Parallel Preconditioning

DOE PAGES

Anzt, Hartwig; Huckle, Thomas K.; Bräckle, Jürgen; ...

2017-10-28

In this study, we propose a new preconditioning method that can be seen as a generalization of block-Jacobi methods, or as a simplification of the sparse approximate inverse (SAI) preconditioners. The “Incomplete Sparse Approximate Inverses” (ISAI) is in particular efficient in the solution of sparse triangular linear systems of equations. Those arise, for example, in the context of incomplete factorization preconditioning. ISAI preconditioners can be generated via an algorithm providing fine-grained parallelism, which makes them attractive for hardware with a high concurrency level. Finally, in a study covering a large number of matrices, we identify the ISAI preconditioner as anmore » attractive alternative to exact triangular solves in the context of incomplete factorization preconditioning.« less
Message passing with parallel queue traversal

DOEpatents

Underwood, Keith D [Albuquerque, NM; Brightwell, Ronald B [Albuquerque, NM; Hemmert, K Scott [Albuquerque, NM

2012-05-01

In message passing implementations, associative matching structures are used to permit list entries to be searched in parallel fashion, thereby avoiding the delay of linear list traversal. List management capabilities are provided to support list entry turnover semantics and priority ordering semantics.
A parallel algorithm for the eigenvalues and eigenvectors for a general complex matrix

NASA Technical Reports Server (NTRS)

Shroff, Gautam

1989-01-01

A new parallel Jacobi-like algorithm is developed for computing the eigenvalues of a general complex matrix. Most parallel methods for this parallel typically display only linear convergence. Sequential norm-reducing algorithms also exit and they display quadratic convergence in most cases. The new algorithm is a parallel form of the norm-reducing algorithm due to Eberlein. It is proven that the asymptotic convergence rate of this algorithm is quadratic. Numerical experiments are presented which demonstrate the quadratic convergence of the algorithm and certain situations where the convergence is slow are also identified. The algorithm promises to be very competitive on a variety of parallel architectures.
Parallel processing architecture for computing inverse differential kinematic equations of the PUMA arm

NASA Technical Reports Server (NTRS)

Hsia, T. C.; Lu, G. Z.; Han, W. H.

1987-01-01

In advanced robot control problems, on-line computation of inverse Jacobian solution is frequently required. Parallel processing architecture is an effective way to reduce computation time. A parallel processing architecture is developed for the inverse Jacobian (inverse differential kinematic equation) of the PUMA arm. The proposed pipeline/parallel algorithm can be inplemented on an IC chip using systolic linear arrays. This implementation requires 27 processing cells and 25 time units. Computation time is thus significantly reduced.
Equivalent model construction for a non-linear dynamic system based on an element-wise stiffness evaluation procedure and reduced analysis of the equivalent system

NASA Astrophysics Data System (ADS)

Kim, Euiyoung; Cho, Maenghyo

2017-11-01

In most non-linear analyses, the construction of a system matrix uses a large amount of computation time, comparable to the computation time required by the solving process. If the process for computing non-linear internal force matrices is substituted with an effective equivalent model that enables the bypass of numerical integrations and assembly processes used in matrix construction, efficiency can be greatly enhanced. A stiffness evaluation procedure (STEP) establishes non-linear internal force models using polynomial formulations of displacements. To efficiently identify an equivalent model, the method has evolved such that it is based on a reduced-order system. The reduction process, however, makes the equivalent model difficult to parameterize, which significantly affects the efficiency of the optimization process. In this paper, therefore, a new STEP, E-STEP, is proposed. Based on the element-wise nature of the finite element model, the stiffness evaluation is carried out element-by-element in the full domain. Since the unit of computation for the stiffness evaluation is restricted by element size, and since the computation is independent, the equivalent model can be constructed efficiently in parallel, even in the full domain. Due to the element-wise nature of the construction procedure, the equivalent E-STEP model is easily characterized by design parameters. Various reduced-order modeling techniques can be applied to the equivalent system in a manner similar to how they are applied in the original system. The reduced-order model based on E-STEP is successfully demonstrated for the dynamic analyses of non-linear structural finite element systems under varying design parameters.
Research in computer science

NASA Technical Reports Server (NTRS)

Ortega, J. M.

1986-01-01

Various graduate research activities in the field of computer science are reported. Among the topics discussed are: (1) failure probabilities in multi-version software; (2) Gaussian Elimination on parallel computers; (3) three dimensional Poisson solvers on parallel/vector computers; (4) automated task decomposition for multiple robot arms; (5) multi-color incomplete cholesky conjugate gradient methods on the Cyber 205; and (6) parallel implementation of iterative methods for solving linear equations.
LSRN: A PARALLEL ITERATIVE SOLVER FOR STRONGLY OVER- OR UNDERDETERMINED SYSTEMS*

PubMed Central

Meng, Xiangrui; Saunders, Michael A.; Mahoney, Michael W.

2014-01-01

We describe a parallel iterative least squares solver named LSRN that is based on random normal projection. LSRN computes the min-length solution to minx∈ℝn ‖Ax − b‖2, where A ∈ ℝm × n with m ≫ n or m ≪ n, and where A may be rank-deficient. Tikhonov regularization may also be included. Since A is involved only in matrix-matrix and matrix-vector multiplications, it can be a dense or sparse matrix or a linear operator, and LSRN automatically speeds up when A is sparse or a fast linear operator. The preconditioning phase consists of a random normal projection, which is embarrassingly parallel, and a singular value decomposition of size ⌈γ min(m, n)⌉ × min(m, n), where γ is moderately larger than 1, e.g., γ = 2. We prove that the preconditioned system is well-conditioned, with a strong concentration result on the extreme singular values, and hence that the number of iterations is fully predictable when we apply LSQR or the Chebyshev semi-iterative method. As we demonstrate, the Chebyshev method is particularly efficient for solving large problems on clusters with high communication cost. Numerical results show that on a shared-memory machine, LSRN is very competitive with LAPACK’s DGELSD and a fast randomized least squares solver called Blendenpik on large dense problems, and it outperforms the least squares solver from SuiteSparseQR on sparse problems without sparsity patterns that can be exploited to reduce fill-in. Further experiments show that LSRN scales well on an Amazon Elastic Compute Cloud cluster. PMID:25419094

Beam dynamics in MABE

DOE Office of Scientific and Technical Information (OSTI.GOV)

Poukey, J.W.; Coleman, P.D.; Sanford, T.W.L.

1985-01-01

MABE is a multistage linear electron accelerator which accelerates up to nine beams in parallel. Nominal parameters per beam are 25 kA, final energy 7 MeV, and guide field 20 kG. We report recent progress via theory and simulation in understanding the beam dynamics in such a system. In particular, we emphasize our results on the radial oscillations and emittance growth for a beam passing through a series of accelerating gaps. 12 refs., 8 figs.
Preconditioned conjugate gradient methods for the compressible Navier-Stokes equations

NASA Technical Reports Server (NTRS)

Venkatakrishnan, V.

1990-01-01

The compressible Navier-Stokes equations are solved for a variety of two-dimensional inviscid and viscous problems by preconditioned conjugate gradient-like algorithms. Roe's flux difference splitting technique is used to discretize the inviscid fluxes. The viscous terms are discretized by using central differences. An algebraic turbulence model is also incorporated. The system of linear equations which arises out of the linearization of a fully implicit scheme is solved iteratively by the well known methods of GMRES (Generalized Minimum Residual technique) and Chebyschev iteration. Incomplete LU factorization and block diagonal factorization are used as preconditioners. The resulting algorithm is competitive with the best current schemes, but has wide applications in parallel computing and unstructured mesh computations.
Linear array optical edge sensor

NASA Technical Reports Server (NTRS)

Bejczy, Antal K. (Inventor); Primus, Howard C. (Inventor)

1987-01-01

A series of independent parallel pairs of light emitting and detecting diodes for a linear pixel array, which is laterally positioned over an edge-like discontinuity in a workpiece to be scanned, is disclosed. These independent pairs of light emitters and detectors sense along intersecting pairs of separate optical axes. A discontinuity, such as an edge in the sensed workpiece, reflects a detectable difference in the amount of light from that discontinuity in comparison to the amount of light that is reflected on either side of the discontinuity. A sequentially sychronized clamping and sampling circuit detects that difference as an electrical signal which is recovered by circuitry that exhibits an improved signal-to-noise capability for the system.
GPU computing with Kaczmarz’s and other iterative algorithms for linear systems

PubMed Central

Elble, Joseph M.; Sahinidis, Nikolaos V.; Vouzis, Panagiotis

2009-01-01

The graphics processing unit (GPU) is used to solve large linear systems derived from partial differential equations. The differential equations studied are strongly convection-dominated, of various sizes, and common to many fields, including computational fluid dynamics, heat transfer, and structural mechanics. The paper presents comparisons between GPU and CPU implementations of several well-known iterative methods, including Kaczmarz’s, Cimmino’s, component averaging, conjugate gradient normal residual (CGNR), symmetric successive overrelaxation-preconditioned conjugate gradient, and conjugate-gradient-accelerated component-averaged row projections (CARP-CG). Computations are preformed with dense as well as general banded systems. The results demonstrate that our GPU implementation outperforms CPU implementations of these algorithms, as well as previously studied parallel implementations on Linux clusters and shared memory systems. While the CGNR method had begun to fall out of favor for solving such problems, for the problems studied in this paper, the CGNR method implemented on the GPU performed better than the other methods, including a cluster implementation of the CARP-CG method. PMID:20526446
A scalable parallel algorithm for multiple objective linear programs

NASA Technical Reports Server (NTRS)

Wiecek, Malgorzata M.; Zhang, Hong

1994-01-01

This paper presents an ADBASE-based parallel algorithm for solving multiple objective linear programs (MOLP's). Job balance, speedup and scalability are of primary interest in evaluating efficiency of the new algorithm. Implementation results on Intel iPSC/2 and Paragon multiprocessors show that the algorithm significantly speeds up the process of solving MOLP's, which is understood as generating all or some efficient extreme points and unbounded efficient edges. The algorithm gives specially good results for large and very large problems. Motivation and justification for solving such large MOLP's are also included.
Parallel transmit beamforming using orthogonal frequency division multiplexing applied to harmonic imaging--a feasibility study.

PubMed

Demi, Libertario; Verweij, Martin D; Van Dongen, Koen W A

2012-11-01

Real-time 2-D or 3-D ultrasound imaging systems are currently used for medical diagnosis. To achieve the required data acquisition rate, these systems rely on parallel beamforming, i.e., a single wide-angled beam is used for transmission and several narrow parallel beams are used for reception. When applied to harmonic imaging, the demand for high-amplitude pressure wave fields, necessary to generate the harmonic components, conflicts with the use of a wide-angled beam in transmission because this results in a large spatial decay of the acoustic pressure. To enhance the amplitude of the harmonics, it is preferable to do the reverse: transmit several narrow parallel beams and use a wide-angled beam in reception. Here, this concept is investigated to determine whether it can be used for harmonic imaging. The method proposed in this paper relies on orthogonal frequency division multiplexing (OFDM), which is used to create distinctive parallel beams in transmission. To test the proposed method, a numerical study has been performed, in which the transmit, receive, and combined beam profiles generated by a linear array have been simulated for the second-harmonic component. Compared with standard parallel beamforming, application of the proposed technique results in a gain of 12 dB for the main beam and in a reduction of the side lobes. Experimental verification in water has also been performed. Measurements obtained with a single-element emitting transducer and a hydrophone receiver confirm the possibility of exciting a practical ultrasound transducer with multiple Gaussian modulated pulses, each having a different center frequency, and the capability to generate distinguishable second-harmonic components.
Parallel computing of physical maps--a comparative study in SIMD and MIMD parallelism.

PubMed

Bhandarkar, S M; Chirravuri, S; Arnold, J

1996-01-01

Ordering clones from a genomic library into physical maps of whole chromosomes presents a central computational problem in genetics. Chromosome reconstruction via clone ordering is usually isomorphic to the NP-complete Optimal Linear Arrangement problem. Parallel SIMD and MIMD algorithms for simulated annealing based on Markov chain distribution are proposed and applied to the problem of chromosome reconstruction via clone ordering. Perturbation methods and problem-specific annealing heuristics are proposed and described. The SIMD algorithms are implemented on a 2048 processor MasPar MP-2 system which is an SIMD 2-D toroidal mesh architecture whereas the MIMD algorithms are implemented on an 8 processor Intel iPSC/860 which is an MIMD hypercube architecture. A comparative analysis of the various SIMD and MIMD algorithms is presented in which the convergence, speedup, and scalability characteristics of the various algorithms are analyzed and discussed. On a fine-grained, massively parallel SIMD architecture with a low synchronization overhead such as the MasPar MP-2, a parallel simulated annealing algorithm based on multiple periodically interacting searches performs the best. For a coarse-grained MIMD architecture with high synchronization overhead such as the Intel iPSC/860, a parallel simulated annealing algorithm based on multiple independent searches yields the best results. In either case, distribution of clonal data across multiple processors is shown to exacerbate the tendency of the parallel simulated annealing algorithm to get trapped in a local optimum.
Efficient Scalable Median Filtering Using Histogram-Based Operations.

PubMed

Green, Oded

2018-05-01

Median filtering is a smoothing technique for noise removal in images. While there are various implementations of median filtering for a single-core CPU, there are few implementations for accelerators and multi-core systems. Many parallel implementations of median filtering use a sorting algorithm for rearranging the values within a filtering window and taking the median of the sorted value. While using sorting algorithms allows for simple parallel implementations, the cost of the sorting becomes prohibitive as the filtering windows grow. This makes such algorithms, sequential and parallel alike, inefficient. In this work, we introduce the first software parallel median filtering that is non-sorting-based. The new algorithm uses efficient histogram-based operations. These reduce the computational requirements of the new algorithm while also accessing the image fewer times. We show an implementation of our algorithm for both the CPU and NVIDIA's CUDA supported graphics processing unit (GPU). The new algorithm is compared with several other leading CPU and GPU implementations. The CPU implementation has near perfect linear scaling with a speedup on a quad-core system. The GPU implementation is several orders of magnitude faster than the other GPU implementations for mid-size median filters. For small kernels, and , comparison-based approaches are preferable as fewer operations are required. Lastly, the new algorithm is open-source and can be found in the OpenCV library.
Electromagnetically induced disintegration and polarization plane rotation of laser pulses

NASA Astrophysics Data System (ADS)

Parshkov, Oleg M.; Budyak, Victoria V.; Kochetkova, Anastasia E.

2017-04-01

The numerical simulation results of disintegration effect of linear polarized shot probe pulses of electromagnetically induced transparency in the counterintuitive superposed linear polarized control field are presented. It is shown, that this disintegration occurs, if linear polarizations of interacting pulses are not parallel or mutually perpendicular. In case of weak input probe field the polarization of one probe pulse in the medium is parallel, whereas the polarization of another probe pulse is perpendicular to polarization direction of input control radiation. The concerned effect is analogous to the effect, which must to take place when short laser pulse propagates along main axes of biaxial crystal because of group velocity of normal mod difference. The essential difference of probe pulse disintegration and linear process in biaxial crystal is that probe pulse preserves linear polarization in all stages of propagation. The numerical simulation is performed for scheme of degenerated quantum transitions between 3P0 , 3P01 and 3P2 energy levels of 208Pb isotope.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Spotz, William F.

PyTrilinos is a set of Python interfaces to compiled Trilinos packages. This collection supports serial and parallel dense linear algebra, serial and parallel sparse linear algebra, direct and iterative linear solution techniques, algebraic and multilevel preconditioners, nonlinear solvers and continuation algorithms, eigensolvers and partitioning algorithms. Also included are a variety of related utility functions and classes, including distributed I/O, coloring algorithms and matrix generation. PyTrilinos vector objects are compatible with the popular NumPy Python package. As a Python front end to compiled libraries, PyTrilinos takes advantage of the flexibility and ease of use of Python, and the efficiency of themore » underlying C++, C and Fortran numerical kernels. This paper covers recent, previously unpublished advances in the PyTrilinos package.« less
Field-Programmable Gate Array Computer in Structural Analysis: An Initial Exploration

NASA Technical Reports Server (NTRS)

Singleterry, Robert C., Jr.; Sobieszczanski-Sobieski, Jaroslaw; Brown, Samuel

2002-01-01

This paper reports on an initial assessment of using a Field-Programmable Gate Array (FPGA) computational device as a new tool for solving structural mechanics problems. A FPGA is an assemblage of binary gates arranged in logical blocks that are interconnected via software in a manner dependent on the algorithm being implemented and can be reprogrammed thousands of times per second. In effect, this creates a computer specialized for the problem that automatically exploits all the potential for parallel computing intrinsic in an algorithm. This inherent parallelism is the most important feature of the FPGA computational environment. It is therefore important that if a problem offers a choice of different solution algorithms, an algorithm of a higher degree of inherent parallelism should be selected. It is found that in structural analysis, an 'analog computer' style of programming, which solves problems by direct simulation of the terms in the governing differential equations, yields a more favorable solution algorithm than current solution methods. This style of programming is facilitated by a 'drag-and-drop' graphic programming language that is supplied with the particular type of FPGA computer reported in this paper. Simple examples in structural dynamics and statics illustrate the solution approach used. The FPGA system also allows linear scalability in computing capability. As the problem grows, the number of FPGA chips can be increased with no loss of computing efficiency due to data flow or algorithmic latency that occurs when a single problem is distributed among many conventional processors that operate in parallel. This initial assessment finds the FPGA hardware and software to be in their infancy in regard to the user conveniences; however, they have enormous potential for shrinking the elapsed time of structural analysis solutions if programmed with algorithms that exhibit inherent parallelism and linear scalability. This potential warrants further development of FPGA-tailored algorithms for structural analysis.
Computational time analysis of the numerical solution of 3D electrostatic Poisson's equation

NASA Astrophysics Data System (ADS)

Kamboh, Shakeel Ahmed; Labadin, Jane; Rigit, Andrew Ragai Henri; Ling, Tech Chaw; Amur, Khuda Bux; Chaudhary, Muhammad Tayyab

2015-05-01

3D Poisson's equation is solved numerically to simulate the electric potential in a prototype design of electrohydrodynamic (EHD) ion-drag micropump. Finite difference method (FDM) is employed to discretize the governing equation. The system of linear equations resulting from FDM is solved iteratively by using the sequential Jacobi (SJ) and sequential Gauss-Seidel (SGS) methods, simulation results are also compared to examine the difference between the results. The main objective was to analyze the computational time required by both the methods with respect to different grid sizes and parallelize the Jacobi method to reduce the computational time. In common, the SGS method is faster than the SJ method but the data parallelism of Jacobi method may produce good speedup over SGS method. In this study, the feasibility of using parallel Jacobi (PJ) method is attempted in relation to SGS method. MATLAB Parallel/Distributed computing environment is used and a parallel code for SJ method is implemented. It was found that for small grid size the SGS method remains dominant over SJ method and PJ method while for large grid size both the sequential methods may take nearly too much processing time to converge. Yet, the PJ method reduces computational time to some extent for large grid sizes.
[Comparative ultrasound visualization of clinically relevant structures for evaluating the infant hip joint utilizing trapezoidal vs. parallel transducers].

PubMed

Wunsch, R; Wegener-Panzer, A; Reinehr, T; Aurisch, E; Cleaveland, B; Wunsch, C; Dudwiesus, H

2011-01-01

Sonographic evaluation of the infant hip joint according to the method of Graf has proven to be an important pediatric investigative instrument. Our goal was to investigate quantitatively whether (and in what ways) the clinically relevant infant hip joint structures visualize differently when utilizing trapezoidal as opposed to linear transducers. Our approach was both theoretical via a mathematical model and practical with in-vivo measurements in neonates. In a prospective study: 1. theoretical and computed analyses were performed for both linear and trapezoidal transducers regarding their respective accuracy for demonstrating the anatomic geometry of the infant hip, assuming not only correctly centered transducer positioning but also cases with off-centered displacement in the cranial or caudal direction; 2. both hip joints in 97 infants were examined by experienced investigators with comparison of the results for parallel vs. trapezoidal transducers. Theoretical mathematical error analysis reveals no intrinsic systemic deviations between trapezoidal vs. parallel transducers in US scanning of the infant hip and furthermore no inherent disadvantages in the trapezoidal technique. Even when off-center transducer alignments of 1.5 cm are employed in the mathematical models, there is no significant relative distortion of the required anatomic structures when comparing the characteristics of both transducers. The practical in-vivo data from our 97 neonates confirmed the theoretical considerations. No loss of accuracy or other negative factors are evident when trapezoidal transducers are used to visualize the infant hip joint in comparison with the customary parallel technique. There are no significantly measurable differences between the two approaches. © Georg Thieme Verlag KG Stuttgart · New York.
A compact linear accelerator based on a scalable microelectromechanical-system RF-structure

DOE PAGES

Persaud, A.; Ji, Q.; Feinberg, E.; ...

2017-06-08

Here, a new approach for a compact radio-frequency (RF) accelerator structure is presented. The new accelerator architecture is based on the Multiple Electrostatic Quadrupole Array Linear Accelerator (MEQALAC) structure that was first developed in the 1980s. The MEQALAC utilized RF resonators producing the accelerating fields and providing for higher beam currents through parallel beamlets focused using arrays of electrostatic quadrupoles (ESQs). While the early work obtained ESQs with lateral dimensions on the order of a few centimeters, using a printed circuit board (PCB), we reduce the characteristic dimension to the millimeter regime, while massively scaling up the potential number ofmore » parallel beamlets. Using Microelectromechanical systems scalable fabrication approaches, we are working on further red ucing the characteristic dimension to the sub-millimeter regime. The technology is based on RF-acceleration components and ESQs implemented in the PCB or silicon wafers where each beamlet passes through beam apertures in the wafer. The complete accelerator is then assembled by stacking these wafers. This approach has the potential for fast and inexpensive batch fabrication of the components and flexibility in system design for application specific beam energies and currents. For prototyping the accelerator architecture, the components have been fabricated using the PCB. In this paper, we present proof of concept results of the principal components using the PCB: RF acceleration and ESQ focusing. Finally, ongoing developments on implementing components in silicon and scaling of the accelerator technology to high currents and beam energies are discussed.« less
A compact linear accelerator based on a scalable microelectromechanical-system RF-structure

NASA Astrophysics Data System (ADS)

Persaud, A.; Ji, Q.; Feinberg, E.; Seidl, P. A.; Waldron, W. L.; Schenkel, T.; Lal, A.; Vinayakumar, K. B.; Ardanuc, S.; Hammer, D. A.

2017-06-01

A new approach for a compact radio-frequency (RF) accelerator structure is presented. The new accelerator architecture is based on the Multiple Electrostatic Quadrupole Array Linear Accelerator (MEQALAC) structure that was first developed in the 1980s. The MEQALAC utilized RF resonators producing the accelerating fields and providing for higher beam currents through parallel beamlets focused using arrays of electrostatic quadrupoles (ESQs). While the early work obtained ESQs with lateral dimensions on the order of a few centimeters, using a printed circuit board (PCB), we reduce the characteristic dimension to the millimeter regime, while massively scaling up the potential number of parallel beamlets. Using Microelectromechanical systems scalable fabrication approaches, we are working on further reducing the characteristic dimension to the sub-millimeter regime. The technology is based on RF-acceleration components and ESQs implemented in the PCB or silicon wafers where each beamlet passes through beam apertures in the wafer. The complete accelerator is then assembled by stacking these wafers. This approach has the potential for fast and inexpensive batch fabrication of the components and flexibility in system design for application specific beam energies and currents. For prototyping the accelerator architecture, the components have been fabricated using the PCB. In this paper, we present proof of concept results of the principal components using the PCB: RF acceleration and ESQ focusing. Ongoing developments on implementing components in silicon and scaling of the accelerator technology to high currents and beam energies are discussed.
A compact linear accelerator based on a scalable microelectromechanical-system RF-structure.

PubMed

Persaud, A; Ji, Q; Feinberg, E; Seidl, P A; Waldron, W L; Schenkel, T; Lal, A; Vinayakumar, K B; Ardanuc, S; Hammer, D A

2017-06-01

A new approach for a compact radio-frequency (RF) accelerator structure is presented. The new accelerator architecture is based on the Multiple Electrostatic Quadrupole Array Linear Accelerator (MEQALAC) structure that was first developed in the 1980s. The MEQALAC utilized RF resonators producing the accelerating fields and providing for higher beam currents through parallel beamlets focused using arrays of electrostatic quadrupoles (ESQs). While the early work obtained ESQs with lateral dimensions on the order of a few centimeters, using a printed circuit board (PCB), we reduce the characteristic dimension to the millimeter regime, while massively scaling up the potential number of parallel beamlets. Using Microelectromechanical systems scalable fabrication approaches, we are working on further reducing the characteristic dimension to the sub-millimeter regime. The technology is based on RF-acceleration components and ESQs implemented in the PCB or silicon wafers where each beamlet passes through beam apertures in the wafer. The complete accelerator is then assembled by stacking these wafers. This approach has the potential for fast and inexpensive batch fabrication of the components and flexibility in system design for application specific beam energies and currents. For prototyping the accelerator architecture, the components have been fabricated using the PCB. In this paper, we present proof of concept results of the principal components using the PCB: RF acceleration and ESQ focusing. Ongoing developments on implementing components in silicon and scaling of the accelerator technology to high currents and beam energies are discussed.
Self-motion magnitude estimation during linear oscillation - Changes with head orientation and following fatigue

NASA Technical Reports Server (NTRS)

Parker, D. E.; Wood, D. L.; Gulledge, W. L.; Goodrich, R. L.

1979-01-01

Two types of experiments concerning the estimated magnitude of self-motion during exposure to linear oscillation on a parallel swing are described in this paper. Experiment I examined changes in magnitude estimation as a function of variation of the subject's head orientation, and Experiments II a, II b, and II c assessed changes in magnitude estimation performance following exposure to sustained, 'intense' linear oscillation (fatigue-inducting stimulation). The subjects' performance was summarized employing Stevens' power law R = k x S to the nth, where R is perceived self-motion magnitude, k is a constant, S is amplitude of linear oscillation, and n is an exponent). The results of Experiment I indicated that the exponents, n, for the magnitude estimation functions varied with head orientation and were greatest when the head was oriented 135 deg off the vertical. In Experiments II a-c, the magnitude estimation function exponents were increased following fatigue. Both types of experiments suggest ways in which the vestibular system's contribution to a spatial orientation perceptual system may vary. This variability may be a contributing factor to the development of pilot/astronaut disorientation and may also be implicated in the occurrence of motion sickness.
Voltage regulation in linear induction accelerators

DOEpatents

Parsons, William M.

1992-01-01

Improvement in voltage regulation in a Linear Induction Accelerator wherein a varistor, such as a metal oxide varistor, is placed in parallel with the beam accelerating cavity and the magnetic core. The non-linear properties of the varistor result in a more stable voltage across the beam accelerating cavity than with a conventional compensating resistance.
A Model for Displacements Between Parallel Plates That Shows Change of Type from Hyperbolic to Elliptic

NASA Astrophysics Data System (ADS)

Shariati, Maryam; Yortsos, Yannis; Talon, Laurent; Martin, Jerome; Rakotomalala, Nicole; Salin, Dominique

2003-11-01

We consider miscible displacement between parallel plates, where the viscosity is a function of the concentration. By selecting a piece-wise representation, the problem can be considered as ``three-phase'' flow. Assuming a lubrication-type approximation, the mathematical description is in terms of two quasi-linear hyperbolic equations. When the mobility of the middle phase is smaller than its neighbors, the system is genuinely hyperbolic and can be solved analytically. However, when it is larger, an elliptic region develops. This change-of-type behavior is for the first time proved here based on sound physical principles. Numerical solutions with a small diffusion are presented. Good agreement is obtained outside the elliptic region, but not inside, where the numerical results show unstable behavior. We conjecture that for the solution of the real problem in the mixed-type case, the full higher-dimensionality problem must be considered inside the elliptic region, in which the lubrication (parallel-flow) approximation is no longer appropriate. This is discussed in a companion presentation.
A non-linear theory of the parallel firehose and gyrothermal instabilities in a weakly collisional plasma

NASA Astrophysics Data System (ADS)

Rosin, M. S.; Schekochihin, A. A.; Rincon, F.; Cowley, S. C.

2011-05-01

Weakly collisional magnetized cosmic plasmas have a dynamical tendency to develop pressure anisotropies with respect to the local direction of the magnetic field. These anisotropies trigger plasma instabilities at scales just above the ion Larmor radius ρi and much below the mean free path λmfp. They have growth rates of a fraction of the ion cyclotron frequency, which is much faster than either the global dynamics or even local turbulence. Despite their microscopic nature, these instabilities dramatically modify the transport properties and, therefore, the macroscopic dynamics of the plasma. The non-linear evolution of these instabilities is expected to drive pressure anisotropies towards marginal stability values, controlled by the plasma beta βi. Here this non-linear evolution is worked out in an ab initio kinetic calculation for the simplest analytically tractable example - the parallel (k⊥= 0) firehose instability in a high-beta plasma. An asymptotic theory is constructed, based on a particular physical ordering and leading to a closed non-linear equation for the firehose turbulence. In the non-linear regime, both the analytical theory and the numerical solution predict secular (∝t) growth of magnetic fluctuations. The fluctuations develop a k-3∥ spectrum, extending from scales somewhat larger than ρi to the maximum scale that grows secularly with time (∝t1/2); the relative pressure anisotropy (p⊥-p∥)/p∥ tends to the marginal value -2/βi. The marginal state is achieved via changes in the magnetic field, not particle scattering. When a parallel ion heat flux is present, the parallel firehose mutates into the new gyrothermal instability (GTI), which continues to exist up to firehose-stable values of pressure anisotropy, which can be positive and are limited by the magnitude of the ion heat flux. The non-linear evolution of the GTI also features secular growth of magnetic fluctuations, but the fluctuation spectrum is eventually dominated by modes around a maximal scale ˜ρilT/λmfp, where lT is the scale of the parallel temperature variation. Implications for momentum and heat transport are speculated about. This study is motivated by our interest in the dynamics of galaxy cluster plasmas (which are used as the main astrophysical example), but its relevance to solar wind and accretion flow plasmas is also briefly discussed.

Small amplitude waves and linear firehose and mirror instabilities in rotating polytropic quantum plasma

NASA Astrophysics Data System (ADS)

Bhakta, S.; Prajapati, R. P.; Dolai, B.

2017-08-01

The small amplitude quantum magnetohydrodynamic (QMHD) waves and linear firehose and mirror instabilities in uniformly rotating dense quantum plasma have been investigated using generalized polytropic pressure laws. The QMHD model and Chew-Goldberger-Low (CGL) set of equations are used to formulate the basic equations of the problem. The general dispersion relation is derived using normal mode analysis which is discussed in parallel, transverse, and oblique wave propagations. The fast, slow, and intermediate QMHD wave modes and linear firehose and mirror instabilities are analyzed for isotropic MHD and CGL quantum fluid plasmas. The firehose instability remains unaffected while the mirror instability is modified by polytropic exponents and quantum diffraction parameter. The graphical illustrations show that quantum corrections have a stabilizing influence on the mirror instability. The presence of uniform rotation stabilizes while quantum corrections destabilize the growth rate of the system. It is also observed that the growth rate stabilizes much faster in parallel wave propagation in comparison to the transverse mode of propagation. The quantum corrections and polytropic exponents also modify the pseudo-MHD and reverse-MHD modes in dense quantum plasma. The phase speed (Friedrichs) diagrams of slow, fast, and intermediate wave modes are illustrated for isotropic MHD and double adiabatic MHD or CGL quantum plasmas, where the significant role of magnetic field and quantum diffraction parameters on the phase speed is observed.
Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method

DOE PAGES

Grayver, Alexander V.; Kolev, Tzanio V.

2015-11-01

Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less
Large-scale 3D geoelectromagnetic modeling using parallel adaptive high-order finite element method

DOE Office of Scientific and Technical Information (OSTI.GOV)

Grayver, Alexander V.; Kolev, Tzanio V.

Here, we have investigated the use of the adaptive high-order finite-element method (FEM) for geoelectromagnetic modeling. Because high-order FEM is challenging from the numerical and computational points of view, most published finite-element studies in geoelectromagnetics use the lowest order formulation. Solution of the resulting large system of linear equations poses the main practical challenge. We have developed a fully parallel and distributed robust and scalable linear solver based on the optimal block-diagonal and auxiliary space preconditioners. The solver was found to be efficient for high finite element orders, unstructured and nonconforming locally refined meshes, a wide range of frequencies, largemore » conductivity contrasts, and number of degrees of freedom (DoFs). Furthermore, the presented linear solver is in essence algebraic; i.e., it acts on the matrix-vector level and thus requires no information about the discretization, boundary conditions, or physical source used, making it readily efficient for a wide range of electromagnetic modeling problems. To get accurate solutions at reduced computational cost, we have also implemented goal-oriented adaptive mesh refinement. The numerical tests indicated that if highly accurate modeling results were required, the high-order FEM in combination with the goal-oriented local mesh refinement required less computational time and DoFs than the lowest order adaptive FEM.« less
Fast Time and Space Parallel Algorithms for Solution of Parabolic Partial Differential Equations

NASA Technical Reports Server (NTRS)

Fijany, Amir

1993-01-01

In this paper, fast time- and Space -Parallel agorithms for solution of linear parabolic PDEs are developed. It is shown that the seemingly strictly serial iterations of the time-stepping procedure for solution of the problem can be completed decoupled.
A general parallel sparse-blocked matrix multiply for linear scaling SCF theory

NASA Astrophysics Data System (ADS)

Challacombe, Matt

2000-06-01

A general approach to the parallel sparse-blocked matrix-matrix multiply is developed in the context of linear scaling self-consistent-field (SCF) theory. The data-parallel message passing method uses non-blocking communication to overlap computation and communication. The space filling curve heuristic is used to achieve data locality for sparse matrix elements that decay with “separation”. Load balance is achieved by solving the bin packing problem for blocks with variable size.With this new method as the kernel, parallel performance of the simplified density matrix minimization (SDMM) for solution of the SCF equations is investigated for RHF/6-31G ∗∗ water clusters and RHF/3-21G estane globules. Sustained rates above 5.7 GFLOPS for the SDMM have been achieved for (H 2 O) 200 with 95 Origin 2000 processors. Scalability is found to be limited by load imbalance, which increases with decreasing granularity, due primarily to the inhomogeneous distribution of variable block sizes.
Computational mechanics analysis tools for parallel-vector supercomputers

NASA Technical Reports Server (NTRS)

Storaasli, Olaf O.; Nguyen, Duc T.; Baddourah, Majdi; Qin, Jiangning

1993-01-01

Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigensolution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization search analysis and domain decomposition. The source code for many of these algorithms is available.
Using Coarrays to Parallelize Legacy Fortran Applications: Strategy and Case Study

DOE PAGES

Radhakrishnan, Hari; Rouson, Damian W. I.; Morris, Karla; ...

2015-01-01

This paper summarizes a strategy for parallelizing a legacy Fortran 77 program using the object-oriented (OO) and coarray features that entered Fortran in the 2003 and 2008 standards, respectively. OO programming (OOP) facilitates the construction of an extensible suite of model-verification and performance tests that drive the development. Coarray parallel programming facilitates a rapid evolution from a serial application to a parallel application capable of running on multicore processors and many-core accelerators in shared and distributed memory. We delineate 17 code modernization steps used to refactor and parallelize the program and study the resulting performance. Our initial studies were donemore » using the Intel Fortran compiler on a 32-core shared memory server. Scaling behavior was very poor, and profile analysis using TAU showed that the bottleneck in the performance was due to our implementation of a collective, sequential summation procedure. We were able to improve the scalability and achieve nearly linear speedup by replacing the sequential summation with a parallel, binary tree algorithm. We also tested the Cray compiler, which provides its own collective summation procedure. Intel provides no collective reductions. With Cray, the program shows linear speedup even in distributed-memory execution. We anticipate similar results with other compilers once they support the new collective procedures proposed for Fortran 2015.« less
A new Hysteretic Nonlinear Energy Sink (HNES)

NASA Astrophysics Data System (ADS)

Tsiatas, George C.; Charalampakis, Aristotelis E.

2018-07-01

The behavior of a new Hysteretic Nonlinear Energy Sink (HNES) coupled to a linear primary oscillator is investigated in shock mitigation. Apart from a small mass and a nonlinear elastic spring of the Duffing oscillator, the HNES is also comprised of a purely hysteretic and a linear elastic spring of potentially negative stiffness, connected in parallel. The Bouc-Wen model is used to describe the force produced by both the purely hysteretic and linear elastic springs. Coupling the primary oscillator with the HNES, three nonlinear equations of motion are derived in terms of the two displacements and the dimensionless hysteretic variable, which are integrated numerically using the analog equation method. The performance of the HNES is examined by quantifying the percentage of the initially induced energy in the primary system that is passively transferred and dissipated by the HNES. Remarkable results are achieved for a wide range of initial input energies. The great performance of the HNES is mostly evidenced when the linear spring stiffness takes on negative values.
Phase properties of elastic waves in systems constituted of adsorbed diatomic molecules on the (001) surface of a simple cubic crystal

NASA Astrophysics Data System (ADS)

Deymier, P. A.; Runge, K.

2018-03-01

A Green's function-based numerical method is developed to calculate the phase of scattered elastic waves in a harmonic model of diatomic molecules adsorbed on the (001) surface of a simple cubic crystal. The phase properties of scattered waves depend on the configuration of the molecules. The configurations of adsorbed molecules on the crystal surface such as parallel chain-like arrays coupled via kinks are used to demonstrate not only linear but also non-linear dependency of the phase on the number of kinks along the chains. Non-linear behavior arises for scattered waves with frequencies in the vicinity of a diatomic molecule resonance. In the non-linear regime, the variation in phase with the number of kinks is formulated mathematically as unitary matrix operations leading to an analogy between phase-based elastic unitary operations and quantum gates. The advantage of elastic based unitary operations is that they are easily realizable physically and measurable.
Step-response of a torsional device with multiple discontinuous non-linearities: Formulation of a vibratory experiment

NASA Astrophysics Data System (ADS)

Krak, Michael D.; Dreyer, Jason T.; Singh, Rajendra

2016-03-01

A vehicle clutch damper is intentionally designed to contain multiple discontinuous non-linearities, such as multi-staged springs, clearances, pre-loads, and multi-staged friction elements. The main purpose of this practical torsional device is to transmit a wide range of torque while isolating torsional vibration between an engine and transmission. Improved understanding of the dynamic behavior of the device could be facilitated by laboratory measurement, and thus a refined vibratory experiment is proposed. The experiment is conceptually described as a single degree of freedom non-linear torsional system that is excited by an external step torque. The single torsional inertia (consisting of a shaft and torsion arm) is coupled to ground through parallel production clutch dampers, which are characterized by quasi-static measurements provided by the manufacturer. Other experimental objectives address physical dimensions, system actuation, flexural modes, instrumentation, and signal processing issues. Typical measurements show that the step response of the device is characterized by three distinct non-linear regimes (double-sided impact, single-sided impact, and no-impact). Each regime is directly related to the non-linear features of the device and can be described by peak angular acceleration values. Predictions of a simplified single degree of freedom non-linear model verify that the experiment performs well and as designed. Accordingly, the benchmark measurements could be utilized to validate non-linear models and simulation codes, as well as characterize dynamic parameters of the device including its dissipative properties.
Nonlinear identification of the total baroreflex arc.

PubMed

Moslehpour, Mohsen; Kawada, Toru; Sunagawa, Kenji; Sugimachi, Masaru; Mukkamala, Ramakrishna

2015-12-15

The total baroreflex arc [the open-loop system relating carotid sinus pressure (CSP) to arterial pressure (AP)] is known to exhibit nonlinear behaviors. However, few studies have quantitatively characterized its nonlinear dynamics. The aim of this study was to develop a nonlinear model of the sympathetically mediated total arc without assuming any model form. Normal rats were studied under anesthesia. The vagal and aortic depressor nerves were sectioned, the carotid sinus regions were isolated and attached to a servo-controlled piston pump, and the AP and sympathetic nerve activity (SNA) were measured. CSP was perturbed using a Gaussian white noise signal. A second-order Volterra model was developed by applying nonparametric identification to the measurements. The second-order kernel was mainly diagonal, but the diagonal differed in shape from the first-order kernel. Hence, a reduced second-order model was similarly developed comprising a linear dynamic system in parallel with a squaring system in cascade with a slower linear dynamic system. This "Uryson" model predicted AP changes 12% better (P < 0.01) than a linear model in response to new Gaussian white noise CSP. The model also predicted nonlinear behaviors, including thresholding and mean responses to CSP changes about the mean. Models of the neural arc (the system relating CSP to SNA) and peripheral arc (the system relating SNA to AP) were likewise developed and tested. However, these models of subsystems of the total arc showed approximately linear behaviors. In conclusion, the validated nonlinear model of the total arc revealed that the system takes on an Uryson structure. Copyright © 2015 the American Physiological Society.
Nonlinear identification of the total baroreflex arc

PubMed Central

Moslehpour, Mohsen; Kawada, Toru; Sunagawa, Kenji; Sugimachi, Masaru

2015-01-01

The total baroreflex arc [the open-loop system relating carotid sinus pressure (CSP) to arterial pressure (AP)] is known to exhibit nonlinear behaviors. However, few studies have quantitatively characterized its nonlinear dynamics. The aim of this study was to develop a nonlinear model of the sympathetically mediated total arc without assuming any model form. Normal rats were studied under anesthesia. The vagal and aortic depressor nerves were sectioned, the carotid sinus regions were isolated and attached to a servo-controlled piston pump, and the AP and sympathetic nerve activity (SNA) were measured. CSP was perturbed using a Gaussian white noise signal. A second-order Volterra model was developed by applying nonparametric identification to the measurements. The second-order kernel was mainly diagonal, but the diagonal differed in shape from the first-order kernel. Hence, a reduced second-order model was similarly developed comprising a linear dynamic system in parallel with a squaring system in cascade with a slower linear dynamic system. This “Uryson” model predicted AP changes 12% better (P < 0.01) than a linear model in response to new Gaussian white noise CSP. The model also predicted nonlinear behaviors, including thresholding and mean responses to CSP changes about the mean. Models of the neural arc (the system relating CSP to SNA) and peripheral arc (the system relating SNA to AP) were likewise developed and tested. However, these models of subsystems of the total arc showed approximately linear behaviors. In conclusion, the validated nonlinear model of the total arc revealed that the system takes on an Uryson structure. PMID:26354845
Three-dimensional Finite Element Formulation and Scalable Domain Decomposition for High Fidelity Rotor Dynamic Analysis

NASA Technical Reports Server (NTRS)

Datta, Anubhav; Johnson, Wayne R.

2009-01-01

This paper has two objectives. The first objective is to formulate a 3-dimensional Finite Element Model for the dynamic analysis of helicopter rotor blades. The second objective is to implement and analyze a dual-primal iterative substructuring based Krylov solver, that is parallel and scalable, for the solution of the 3-D FEM analysis. The numerical and parallel scalability of the solver is studied using two prototype problems - one for ideal hover (symmetric) and one for a transient forward flight (non-symmetric) - both carried out on up to 48 processors. In both hover and forward flight conditions, a perfect linear speed-up is observed, for a given problem size, up to the point of substructure optimality. Substructure optimality and the linear parallel speed-up range are both shown to depend on the problem size as well as on the selection of the coarse problem. With a larger problem size, linear speed-up is restored up to the new substructure optimality. The solver also scales with problem size - even though this conclusion is premature given the small prototype grids considered in this study.
Portable parallel stochastic optimization for the design of aeropropulsion components

NASA Technical Reports Server (NTRS)

Sues, Robert H.; Rhodes, G. S.

1994-01-01

This report presents the results of Phase 1 research to develop a methodology for performing large-scale Multi-disciplinary Stochastic Optimization (MSO) for the design of aerospace systems ranging from aeropropulsion components to complete aircraft configurations. The current research recognizes that such design optimization problems are computationally expensive, and require the use of either massively parallel or multiple-processor computers. The methodology also recognizes that many operational and performance parameters are uncertain, and that uncertainty must be considered explicitly to achieve optimum performance and cost. The objective of this Phase 1 research was to initialize the development of an MSO methodology that is portable to a wide variety of hardware platforms, while achieving efficient, large-scale parallelism when multiple processors are available. The first effort in the project was a literature review of available computer hardware, as well as review of portable, parallel programming environments. The first effort was to implement the MSO methodology for a problem using the portable parallel programming language, Parallel Virtual Machine (PVM). The third and final effort was to demonstrate the example on a variety of computers, including a distributed-memory multiprocessor, a distributed-memory network of workstations, and a single-processor workstation. Results indicate the MSO methodology can be well-applied towards large-scale aerospace design problems. Nearly perfect linear speedup was demonstrated for computation of optimization sensitivity coefficients on both a 128-node distributed-memory multiprocessor (the Intel iPSC/860) and a network of workstations (speedups of almost 19 times achieved for 20 workstations). Very high parallel efficiencies (75 percent for 31 processors and 60 percent for 50 processors) were also achieved for computation of aerodynamic influence coefficients on the Intel. Finally, the multi-level parallelization strategy that will be needed for large-scale MSO problems was demonstrated to be highly efficient. The same parallel code instructions were used on both platforms, demonstrating portability. There are many applications for which MSO can be applied, including NASA's High-Speed-Civil Transport, and advanced propulsion systems. The use of MSO will reduce design and development time and testing costs dramatically.
Micromagnetic simulations of anisotropies in coupled and uncoupled ferromagnetic nanowire systems.

PubMed

Blachowicz, T; Ehrmann, A

2013-01-01

The influence of a variation of spatial relative orientations onto the coupling dynamics and subsequent magnetic anisotropies was modeled in ferromagnetic nanowires. The wires were analyzed in the most elementary configurations, thus, arranged in pairs perpendicular to each other, leading to one-dimensional (linear) and zero-dimensional (point-like) coupling. Different distances within each elementary pair of wires and between the pairs give rise to varying interactions between parallel and perpendicular wires, respectively. Simulated coercivities show an exchange of easy and hard axes for systems with different couplings. Additionally, two of the systems exhibit a unique switching behavior which can be utilized for developing new functionalities.
Detecting Super-Thin Clouds With Polarized Light

NASA Technical Reports Server (NTRS)

Sun, Wenbo; Videen, Gorden; Mishchenko, Michael I.

2014-01-01

We report a novel method for detecting cloud particles in the atmosphere. Solar radiation backscattered from clouds is studied with both satellite data and a radiative transfer model. A distinct feature is found in the angle of linear polarization of solar radiation that is backscattered from clouds. The dominant backscattered electric field from the clear-sky Earth-atmosphere system is nearly parallel to the Earth surface. However, when clouds are present, this electric field can rotate significantly away from the parallel direction. Model results demonstrate that this polarization feature can be used to detect super-thin cirrus clouds having an optical depth of only 0.06 and super-thin liquid water clouds having an optical depth of only 0.01. Such clouds are too thin to be sensed using any current passive satellite instruments.
The high performance parallel algorithm for Unified Gas-Kinetic Scheme

NASA Astrophysics Data System (ADS)

Li, Shiyi; Li, Qibing; Fu, Song; Xu, Jinxiu

2016-11-01

A high performance parallel algorithm for UGKS is developed to simulate three-dimensional flows internal and external on arbitrary grid system. The physical domain and velocity domain are divided into different blocks and distributed according to the two-dimensional Cartesian topology with intra-communicators in physical domain for data exchange and other intra-communicators in velocity domain for sum reduction to moment integrals. Numerical results of three-dimensional cavity flow and flow past a sphere agree well with the results from the existing studies and validate the applicability of the algorithm. The scalability of the algorithm is tested both on small (1-16) and large (729-5832) scale processors. The tested speed-up ratio is near linear ashind thus the efficiency is around 1, which reveals the good scalability of the present algorithm.
Detecting Super-Thin Clouds with Polarized Sunlight

NASA Technical Reports Server (NTRS)

Sun, Wenbo; Videen, Gorden; Mishchenko, Michael I.

2014-01-01

We report a novel method for detecting cloud particles in the atmosphere. Solar radiation backscattered from clouds is studied with both satellite data and a radiative transfer model. A distinct feature is found in the angle of linear polarization of solar radiation that is backscattered from clouds. The dominant backscattered electric field from the clear-sky Earth-atmosphere system is nearly parallel to the Earth surface. However, when clouds are present, this electric field can rotate significantly away from the parallel direction. Model results demonstrate that this polarization feature can be used to detect super-thin cirrus clouds having an optical depth of only 0.06 and super-thin liquid water clouds having an optical depth of only 0.01. Such clouds are too thin to be sensed using any current passive satellite instruments.
Research in computer science

NASA Technical Reports Server (NTRS)

Ortega, J. M.

1985-01-01

Synopses are given for NASA supported work in computer science at the University of Virginia. Some areas of research include: error seeding as a testing method; knowledge representation for engineering design; analysis of faults in a multi-version software experiment; implementation of a parallel programming environment; two computer graphics systems for visualization of pressure distribution and convective density particles; task decomposition for multiple robot arms; vectorized incomplete conjugate gradient; and iterative methods for solving linear equations on the Flex/32.
Fluid displacement between two parallel plates: a non-empirical model displaying change of type from hyperbolic to elliptic equations

NASA Astrophysics Data System (ADS)

Shariati, M.; Talon, L.; Martin, J.; Rakotomalala, N.; Salin, D.; Yortsos, Y. C.

2004-11-01

We consider miscible displacement between parallel plates in the absence of diffusion, with a concentration-dependent viscosity. By selecting a piecewise viscosity function, this can also be considered as ‘three-fluid’ flow in the same geometry. Assuming symmetry across the gap and based on the lubrication (‘equilibrium’) approximation, a description in terms of two quasi-linear hyperbolic equations is obtained. We find that the system is hyperbolic and can be solved analytically, when the mobility profile is monotonic, or when the mobility of the middle phase is smaller than its neighbours. When the mobility of the middle phase is larger, a change of type is displayed, an elliptic region developing in the composition space. Numerical solutions of Riemann problems of the hyperbolic system spanning the elliptic region, with small diffusion added, show good agreement with the analytical outside, but an unstable behaviour inside the elliptic region. In these problems, the elliptic region arises precisely at the displacement front. Crossing the elliptic region requires the solution of essentially an eigenvalue problem of the full higher-dimensional model, obtained here using lattice BGK simulations. The hyperbolic-to-elliptic change-of-type reflects the failing of the lubrication approximation, underlying the quasi-linear hyperbolic formalism, to describe the problem uniformly. The obtained solution is analogous to non-classical shocks recently suggested in problems with change of type.

Development and characterization of hollow microprobe array as a potential tool for versatile and massively parallel manipulation of single cells.

PubMed

Nagai, Moeto; Oohara, Kiyotaka; Kato, Keita; Kawashima, Takahiro; Shibata, Takayuki

2015-04-01

Parallel manipulation of single cells is important for reconstructing in vivo cellular microenvironments and studying cell functions. To manipulate single cells and reconstruct their environments, development of a versatile manipulation tool is necessary. In this study, we developed an array of hollow probes using microelectromechanical systems fabrication technology and demonstrated the manipulation of single cells. We conducted a cell aspiration experiment with a glass pipette and modeled a cell using a standard linear solid model, which provided information for designing hollow stepped probes for minimally invasive single-cell manipulation. We etched a silicon wafer on both sides and formed through holes with stepped structures. The inner diameters of the holes were reduced by SiO2 deposition of plasma-enhanced chemical vapor deposition to trap cells on the tips. This fabrication process makes it possible to control the wall thickness, inner diameter, and outer diameter of the probes. With the fabricated probes, single cells were manipulated and placed in microwells at a single-cell level in a parallel manner. We studied the capture, release, and survival rates of cells at different suction and release pressures and found that the cell trapping rate was directly proportional to the suction pressure, whereas the release rate and viability decreased with increasing the suction pressure. The proposed manipulation system makes it possible to place cells in a well array and observe the adherence, spreading, culture, and death of the cells. This system has potential as a tool for massively parallel manipulation and for three-dimensional hetero cellular assays.
Real-Time Spaceborne Synthetic Aperture Radar Float-Point Imaging System Using Optimized Mapping Methodology and a Multi-Node Parallel Accelerating Technique

PubMed Central

Li, Bingyi; Chen, Liang; Yu, Wenyue; Xie, Yizhuang; Bian, Mingming; Zhang, Qingjun; Pang, Long

2018-01-01

With the development of satellite load technology and very large-scale integrated (VLSI) circuit technology, on-board real-time synthetic aperture radar (SAR) imaging systems have facilitated rapid response to disasters. A key goal of the on-board SAR imaging system design is to achieve high real-time processing performance under severe size, weight, and power consumption constraints. This paper presents a multi-node prototype system for real-time SAR imaging processing. We decompose the commonly used chirp scaling (CS) SAR imaging algorithm into two parts according to the computing features. The linearization and logic-memory optimum allocation methods are adopted to realize the nonlinear part in a reconfigurable structure, and the two-part bandwidth balance method is used to realize the linear part. Thus, float-point SAR imaging processing can be integrated into a single Field Programmable Gate Array (FPGA) chip instead of relying on distributed technologies. A single-processing node requires 10.6 s and consumes 17 W to focus on 25-km swath width, 5-m resolution stripmap SAR raw data with a granularity of 16,384 × 16,384. The design methodology of the multi-FPGA parallel accelerating system under the real-time principle is introduced. As a proof of concept, a prototype with four processing nodes and one master node is implemented using a Xilinx xc6vlx315t FPGA. The weight and volume of one single machine are 10 kg and 32 cm × 24 cm × 20 cm, respectively, and the power consumption is under 100 W. The real-time performance of the proposed design is demonstrated on Chinese Gaofen-3 stripmap continuous imaging. PMID:29495637
A parallel algorithm for switch-level timing simulation on a hypercube multiprocessor

NASA Technical Reports Server (NTRS)

Rao, Hariprasad Nannapaneni

1989-01-01

The parallel approach to speeding up simulation is studied, specifically the simulation of digital LSI MOS circuitry on the Intel iPSC/2 hypercube. The simulation algorithm is based on RSIM, an event driven switch-level simulator that incorporates a linear transistor model for simulating digital MOS circuits. Parallel processing techniques based on the concepts of Virtual Time and rollback are utilized so that portions of the circuit may be simulated on separate processors, in parallel for as large an increase in speed as possible. A partitioning algorithm is also developed in order to subdivide the circuit for parallel processing.
Voltage regulation in linear induction accelerators

DOEpatents

Parsons, W.M.

1992-12-29

Improvement in voltage regulation in a linear induction accelerator wherein a varistor, such as a metal oxide varistor, is placed in parallel with the beam accelerating cavity and the magnetic core is disclosed. The non-linear properties of the varistor result in a more stable voltage across the beam accelerating cavity than with a conventional compensating resistance. 4 figs.
An efficient spectral method for the simulation of dynamos in Cartesian geometry and its implementation on massively parallel computers

NASA Astrophysics Data System (ADS)

Stellmach, Stephan; Hansen, Ulrich

2008-05-01

Numerical simulations of the process of convection and magnetic field generation in planetary cores still fail to reach geophysically realistic control parameter values. Future progress in this field depends crucially on efficient numerical algorithms which are able to take advantage of the newest generation of parallel computers. Desirable features of simulation algorithms include (1) spectral accuracy, (2) an operation count per time step that is small and roughly proportional to the number of grid points, (3) memory requirements that scale linear with resolution, (4) an implicit treatment of all linear terms including the Coriolis force, (5) the ability to treat all kinds of common boundary conditions, and (6) reasonable efficiency on massively parallel machines with tens of thousands of processors. So far, algorithms for fully self-consistent dynamo simulations in spherical shells do not achieve all these criteria simultaneously, resulting in strong restrictions on the possible resolutions. In this paper, we demonstrate that local dynamo models in which the process of convection and magnetic field generation is only simulated for a small part of a planetary core in Cartesian geometry can achieve the above goal. We propose an algorithm that fulfills the first five of the above criteria and demonstrate that a model implementation of our method on an IBM Blue Gene/L system scales impressively well for up to O(104) processors. This allows for numerical simulations at rather extreme parameter values.
Snapshot linear-Stokes imaging spectropolarimeter using division-of-focal-plane polarimetry and integral field spectroscopy.

PubMed

Mu, Tingkui; Pacheco, Shaun; Chen, Zeyu; Zhang, Chunmin; Liang, Rongguang

2017-02-13

In this paper, the design and experimental demonstration of a snapshot linear-Stokes imaging spectropolarimeter (SLSIS) is presented. The SLSIS, which is based on division-of-focal-plane polarimetry with four parallel linear polarization channels and integral field spectroscopy with numerous slit dispersive paths, has no moving parts and provides video-rate Stokes-vector hyperspectral datacubes. It does not need any scanning in the spectral, spatial or polarization dimension and offers significant advantages of rapid reconstruction without heavy computation during post-processing. The principle and the experimental setup of the SLSIS are described in detail. The image registration, Stokes spectral reconstruction and calibration procedures are included, and the system is validated using measurements of tungsten light and a static scene. The SLSIS's snapshot ability to resolve polarization spectral signatures is demonstrated using measurements of a dynamic scene.
Snapshot linear-Stokes imaging spectropolarimeter using division-of-focal-plane polarimetry and integral field spectroscopy

PubMed Central

Mu, Tingkui; Pacheco, Shaun; Chen, Zeyu; Zhang, Chunmin; Liang, Rongguang

2017-01-01

In this paper, the design and experimental demonstration of a snapshot linear-Stokes imaging spectropolarimeter (SLSIS) is presented. The SLSIS, which is based on division-of-focal-plane polarimetry with four parallel linear polarization channels and integral field spectroscopy with numerous slit dispersive paths, has no moving parts and provides video-rate Stokes-vector hyperspectral datacubes. It does not need any scanning in the spectral, spatial or polarization dimension and offers significant advantages of rapid reconstruction without heavy computation during post-processing. The principle and the experimental setup of the SLSIS are described in detail. The image registration, Stokes spectral reconstruction and calibration procedures are included, and the system is validated using measurements of tungsten light and a static scene. The SLSIS’s snapshot ability to resolve polarization spectral signatures is demonstrated using measurements of a dynamic scene. PMID:28191819
Least squares reconstruction of non-linear RF phase encoded MR data.

PubMed

Salajeghe, Somaie; Babyn, Paul; Sharp, Jonathan C; Sarty, Gordon E

2016-09-01

The numerical feasibility of reconstructing MRI signals generated by RF coils that produce B1 fields with a non-linearly varying spatial phase is explored. A global linear spatial phase variation of B1 is difficult to produce from current confined to RF coils. Here we use regularized least squares inversion, in place of the usual Fourier transform, to reconstruct signals generated in B1 fields with non-linear phase variation. RF encoded signals were simulated for three RF coil configurations: ideal linear, parallel conductors and, circular coil pairs. The simulated signals were reconstructed by Fourier transform and by regularized least squares. The Fourier reconstruction of simulated RF encoded signals from the parallel conductor coil set showed minor distortions over the reconstruction of signals from the ideal linear coil set but the Fourier reconstruction of signals from the circular coil set produced severe geometric distortion. Least squares inversion in all cases produced reconstruction errors comparable to the Fourier reconstruction of the simulated signal from the ideal linear coil set. MRI signals encoded in B1 fields with non-linearly varying spatial phase may be accurately reconstructed using regularized least squares thus pointing the way to the use of simple RF coil designs for RF encoded MRI. Crown Copyright © 2016. Published by Elsevier Inc. All rights reserved.
Computational mechanics analysis tools for parallel-vector supercomputers

NASA Technical Reports Server (NTRS)

Storaasli, O. O.; Nguyen, D. T.; Baddourah, M. A.; Qin, J.

1993-01-01

Computational algorithms for structural analysis on parallel-vector supercomputers are reviewed. These parallel algorithms, developed by the authors, are for the assembly of structural equations, 'out-of-core' strategies for linear equation solution, massively distributed-memory equation solution, unsymmetric equation solution, general eigen-solution, geometrically nonlinear finite element analysis, design sensitivity analysis for structural dynamics, optimization algorithm and domain decomposition. The source code for many of these algorithms is available from NASA Langley.
A New Approach to Parallel Dynamic Partitioning for Adaptive Unstructured Meshes

NASA Technical Reports Server (NTRS)

Heber, Gerd; Biswas, Rupak; Gao, Guang R.

1999-01-01

Classical mesh partitioning algorithms were designed for rather static situations, and their straightforward application in a dynamical framework may lead to unsatisfactory results, e.g., excessive data migration among processors. Furthermore, special attention should be paid to their amenability to parallelization. In this paper, a novel parallel method for the dynamic partitioning of adaptive unstructured meshes is described. It is based on a linear representation of the mesh using self-avoiding walks.
Parallel computation of transverse wakes in linear colliders

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhan, Xiaowei; Ko, Kwok

1996-11-01

SLAC has proposed the detuned structure (DS) as one possible design to control the emittance growth of long bunch trains due to transverse wakefields in the Next Linear Collider (NLC). The DS consists of 206 cells with tapering from cell to cell of the order of few microns to provide Gaussian detuning of the dipole modes. The decoherence of these modes leads to two orders of magnitude reduction in wakefield experienced by the trailing bunch. To model such a large heterogeneous structure realistically is impractical with finite-difference codes using structured grids. The authors have calculated the wakefield in the DSmore » on a parallel computer with a finite-element code using an unstructured grid. The parallel implementation issues are presented along with simulation results that include contributions from higher dipole bands and wall dissipation.« less
Imaging resolution and properties analysis of super resolution microscopy with parallel detection under different noise, detector and image restoration conditions

NASA Astrophysics Data System (ADS)

Yu, Zhongzhi; Liu, Shaocong; Sun, Shiyi; Kuang, Cuifang; Liu, Xu

2018-06-01

Parallel detection, which can use the additional information of a pinhole plane image taken at every excitation scan position, could be an efficient method to enhance the resolution of a confocal laser scanning microscope. In this paper, we discuss images obtained under different conditions and using different image restoration methods with parallel detection to quantitatively compare the imaging quality. The conditions include different noise levels and different detector array settings. The image restoration methods include linear deconvolution and pixel reassignment with Richard-Lucy deconvolution and with maximum-likelihood estimation deconvolution. The results show that the linear deconvolution share properties such as high-efficiency and the best performance under all different conditions, and is therefore expected to be of use for future biomedical routine research.
The Many Ways Data Must Flow.

ERIC Educational Resources Information Center

La Brecque, Mort

1984-01-01

To break the bottleneck inherent in today's linear computer architectures, parallel schemes (which allow computers to perform multiple tasks at one time) are being devised. Several of these schemes are described. Dataflow devices, parallel number-crunchers, programing languages, and a device based on a neurological model are among the areas…
Automation of multi-agent control for complex dynamic systems in heterogeneous computational network

NASA Astrophysics Data System (ADS)

Oparin, Gennady; Feoktistov, Alexander; Bogdanova, Vera; Sidorov, Ivan

2017-01-01

The rapid progress of high-performance computing entails new challenges related to solving large scientific problems for various subject domains in a heterogeneous distributed computing environment (e.g., a network, Grid system, or Cloud infrastructure). The specialists in the field of parallel and distributed computing give the special attention to a scalability of applications for problem solving. An effective management of the scalable application in the heterogeneous distributed computing environment is still a non-trivial issue. Control systems that operate in networks, especially relate to this issue. We propose a new approach to the multi-agent management for the scalable applications in the heterogeneous computational network. The fundamentals of our approach are the integrated use of conceptual programming, simulation modeling, network monitoring, multi-agent management, and service-oriented programming. We developed a special framework for an automation of the problem solving. Advantages of the proposed approach are demonstrated on the parametric synthesis example of the static linear regulator for complex dynamic systems. Benefits of the scalable application for solving this problem include automation of the multi-agent control for the systems in a parallel mode with various degrees of its detailed elaboration.
Modular and efficient ozone systems based on massively parallel chemical processing in microchannel plasma arrays: performance and commercialization

NASA Astrophysics Data System (ADS)

Kim, M.-H.; Cho, J. H.; Park, S.-J.; Eden, J. G.

2017-08-01

Plasmachemical systems based on the production of a specific molecule (O3) in literally thousands of microchannel plasmas simultaneously have been demonstrated, developed and engineered over the past seven years, and commercialized. At the heart of this new plasma technology is the plasma chip, a flat aluminum strip fabricated by photolithographic and wet chemical processes and comprising 24-48 channels, micromachined into nanoporous aluminum oxide, with embedded electrodes. By integrating 4-6 chips into a module, the mass output of an ozone microplasma system is scaled linearly with the number of modules operating in parallel. A 115 g/hr (2.7 kg/day) ozone system, for example, is realized by the combined output of 18 modules comprising 72 chips and 1,800 microchannels. The implications of this plasma processing architecture for scaling ozone production capability, and reducing capital and service costs when introducing redundancy into the system, are profound. In contrast to conventional ozone generator technology, microplasma systems operate reliably (albeit with reduced output) in ambient air and humidity levels up to 90%, a characteristic attributable to the water adsorption/desorption properties and electrical breakdown strength of nanoporous alumina. Extensive testing has documented chip and system lifetimes (MTBF) beyond 5,000 hours, and efficiencies >130 g/kWh when oxygen is the feedstock gas. Furthermore, the weight and volume of microplasma systems are a factor of 3-10 lower than those for conventional ozone systems of comparable output. Massively-parallel plasmachemical processing offers functionality, performance, and commercial value beyond that afforded by conventional technology, and is currently in operation in more than 30 countries worldwide.
FPGA-based real-time swept-source OCT systems for B-scan live-streaming or volumetric imaging

NASA Astrophysics Data System (ADS)

Bandi, Vinzenz; Goette, Josef; Jacomet, Marcel; von Niederhäusern, Tim; Bachmann, Adrian H.; Duelk, Marcus

2013-03-01

We have developed a Swept-Source Optical Coherence Tomography (Ss-OCT) system with high-speed, real-time signal processing on a commercially available Data-Acquisition (DAQ) board with a Field-Programmable Gate Array (FPGA). The Ss-OCT system simultaneously acquires OCT and k-clock reference signals at 500MS/s. From the k-clock signal of each A-scan we extract a remap vector for the k-space linearization of the OCT signal. The linear but oversampled interpolation is followed by a 2048-point FFT, additional auxiliary computations, and a data transfer to a host computer for real-time, live-streaming of B-scan or volumetric C-scan OCT visualization. We achieve a 100 kHz A-scan rate by parallelization of our hardware algorithms, which run on standard and affordable, commercially available DAQ boards. Our main development tool for signal analysis as well as for hardware synthesis is MATLAB® with add-on toolboxes and 3rd-party tools.
A parallel and modular deformable cell Car-Parrinello code

NASA Astrophysics Data System (ADS)

Cavazzoni, Carlo; Chiarotti, Guido L.

1999-12-01

We have developed a modular parallel code implementing the Car-Parrinello [Phys. Rev. Lett. 55 (1985) 2471] algorithm including the variable cell dynamics [Europhys. Lett. 36 (1994) 345; J. Phys. Chem. Solids 56 (1995) 510]. Our code is written in Fortran 90, and makes use of some new programming concepts like encapsulation, data abstraction and data hiding. The code has a multi-layer hierarchical structure with tree like dependences among modules. The modules include not only the variables but also the methods acting on them, in an object oriented fashion. The modular structure allows easier code maintenance, develop and debugging procedures, and is suitable for a developer team. The layer structure permits high portability. The code displays an almost linear speed-up in a wide range of number of processors independently of the architecture. Super-linear speed up is obtained with a "smart" Fast Fourier Transform (FFT) that uses the available memory on the single node (increasing for a fixed problem with the number of processing elements) as temporary buffer to store wave function transforms. This code has been used to simulate water and ammonia at giant planet conditions for systems as large as 64 molecules for ˜50 ps.
Numerical approach of the quantum circuit theory

DOE Office of Scientific and Technical Information (OSTI.GOV)

Silva, J.J.B., E-mail: jaedsonfisica@hotmail.com; Duarte-Filho, G.C.; Almeida, F.A.G.

2017-03-15

In this paper we develop a numerical method based on the quantum circuit theory to approach the coherent electronic transport in a network of quantum dots connected with arbitrary topology. The algorithm was employed in a circuit formed by quantum dots connected each other in a shape of a linear chain (associations in series), and of a ring (associations in series, and in parallel). For both systems we compute two current observables: conductance and shot noise power. We find an excellent agreement between our numerical results and the ones found in the literature. Moreover, we analyze the algorithm efficiency formore » a chain of quantum dots, where the mean processing time exhibits a linear dependence with the number of quantum dots in the array.« less
Direct Observation of Parallel Folding Pathways Revealed Using a Symmetric Repeat Protein System

PubMed Central

Aksel, Tural; Barrick, Doug

2014-01-01

Although progress has been made to determine the native fold of a polypeptide from its primary structure, the diversity of pathways that connect the unfolded and folded states has not been adequately explored. Theoretical and computational studies predict that proteins fold through parallel pathways on funneled energy landscapes, although experimental detection of pathway diversity has been challenging. Here, we exploit the high translational symmetry and the direct length variation afforded by linear repeat proteins to directly detect folding through parallel pathways. By comparing folding rates of consensus ankyrin repeat proteins (CARPs), we find a clear increase in folding rates with increasing size and repeat number, although the size of the transition states (estimated from denaturant sensitivity) remains unchanged. The increase in folding rate with chain length, as opposed to a decrease expected from typical models for globular proteins, is a clear demonstration of parallel pathways. This conclusion is not dependent on extensive curve-fitting or structural perturbation of protein structure. By globally fitting a simple parallel-Ising pathway model, we have directly measured nucleation and propagation rates in protein folding, and have quantified the fluxes along each path, providing a detailed energy landscape for folding. This finding of parallel pathways differs from results from kinetic studies of repeat-proteins composed of sequence-variable repeats, where modest repeat-to-repeat energy variation coalesces folding into a single, dominant channel. Thus, for globular proteins, which have much higher variation in local structure and topology, parallel pathways are expected to be the exception rather than the rule. PMID:24988356
Effects of Ordering Strategies and Programming Paradigms on Sparse Matrix Computations

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Li, Xiaoye; Husbands, Parry; Biswas, Rupak; Biegel, Bryan (Technical Monitor)

2002-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. For systems that are ill-conditioned, it is often necessary to use a preconditioning technique. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and ILU(O) preconditioned CG (PCG) using different programming paradigms and architectures. Results show that for this class of applications: ordering significantly improves overall performance on both distributed and distributed shared-memory systems, that cache reuse may be more important than reducing communication, that it is possible to achieve message-passing performance using shared-memory constructs through careful data ordering and distribution, and that a hybrid MPI+OpenMP paradigm increases programming complexity with little performance gains. A implementation of CG on the Cray MTA does not require special ordering or partitioning to obtain high efficiency and scalability, giving it a distinct advantage for adaptive applications; however, it shows limited scalability for PCG due to a lack of thread level parallelism.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ghysels, Pieter; Li, Xiaoye S.; Rouet, Francois -Henry

Here, we present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factoriz ation leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite.more » The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK - STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices.« less
An Efficient Multicore Implementation of a Novel HSS-Structured Multifrontal Solver Using Randomized Sampling

DOE PAGES

Ghysels, Pieter; Li, Xiaoye S.; Rouet, Francois -Henry; ...

2016-10-27

Here, we present a sparse linear system solver that is based on a multifrontal variant of Gaussian elimination and exploits low-rank approximation of the resulting dense frontal matrices. We use hierarchically semiseparable (HSS) matrices, which have low-rank off-diagonal blocks, to approximate the frontal matrices. For HSS matrix construction, a randomized sampling algorithm is used together with interpolative decompositions. The combination of the randomized compression with a fast ULV HSS factoriz ation leads to a solver with lower computational complexity than the standard multifrontal method for many applications, resulting in speedups up to 7 fold for problems in our test suite.more » The implementation targets many-core systems by using task parallelism with dynamic runtime scheduling. Numerical experiments show performance improvements over state-of-the-art sparse direct solvers. The implementation achieves high performance and good scalability on a range of modern shared memory parallel systems, including the Intel Xeon Phi (MIC). The code is part of a software package called STRUMPACK - STRUctured Matrices PACKage, which also has a distributed memory component for dense rank-structured matrices.« less
Parallel Online Temporal Difference Learning for Motor Control.

PubMed

Caarls, Wouter; Schuitema, Erik

2016-07-01

Temporal difference (TD) learning, a key concept in reinforcement learning, is a popular method for solving simulated control problems. However, in real systems, this method is often avoided in favor of policy search methods because of its long learning time. But policy search suffers from its own drawbacks, such as the necessity of informed policy parameterization and initialization. In this paper, we show that TD learning can work effectively in real robotic systems as well, using parallel model learning and planning. Using locally weighted linear regression and trajectory sampled planning with 14 concurrent threads, we can achieve a speedup of almost two orders of magnitude over regular TD control on simulated control benchmarks. For a real-world pendulum swing-up task and a two-link manipulator movement task, we report a speedup of 20× to 60× , with a real-time learning speed of less than half a minute. The results are competitive with state-of-the-art policy search.
Full waveform time domain solutions for source and induced magnetotelluric and controlled-source electromagnetic fields using quasi-equivalent time domain decomposition and GPU parallelization

NASA Astrophysics Data System (ADS)

Imamura, N.; Schultz, A.

2015-12-01

Recently, a full waveform time domain solution has been developed for the magnetotelluric (MT) and controlled-source electromagnetic (CSEM) methods. The ultimate goal of this approach is to obtain a computationally tractable direct waveform joint inversion for source fields and earth conductivity structure in three and four dimensions. This is desirable on several grounds, including the improved spatial resolving power expected from use of a multitude of source illuminations of non-zero wavenumber, the ability to operate in areas of high levels of source signal spatial complexity and non-stationarity, etc. This goal would not be obtainable if one were to adopt the finite difference time-domain (FDTD) approach for the forward problem. This is particularly true for the case of MT surveys, since an enormous number of degrees of freedom are required to represent the observed MT waveforms across the large frequency bandwidth. It means that for FDTD simulation, the smallest time steps should be finer than that required to represent the highest frequency, while the number of time steps should also cover the lowest frequency. This leads to a linear system that is computationally burdensome to solve. We have implemented our code that addresses this situation through the use of a fictitious wave domain method and GPUs to speed up the computation time. We also substantially reduce the size of the linear systems by applying concepts from successive cascade decimation, through quasi-equivalent time domain decomposition. By combining these refinements, we have made good progress toward implementing the core of a full waveform joint source field/earth conductivity inverse modeling method. From results, we found the use of previous generation of CPU/GPU speeds computations by an order of magnitude over a parallel CPU only approach. In part, this arises from the use of the quasi-equivalent time domain decomposition, which shrinks the size of the linear system dramatically.
Progress in linear optics, non-linear optics and surface alignment of liquid crystals

NASA Astrophysics Data System (ADS)

Ong, H. L.; Meyer, R. B.; Hurd, A. J.; Karn, A. J.; Arakelian, S. M.; Shen, Y. R.; Sanda, P. N.; Dove, D. B.; Jansen, S. A.; Hoffmann, R.

We first discuss the progress in linear optics, in particular, the formulation and application of geometrical-optics approximation and its generalization. We then discuss the progress in non-linear optics, in particular, the enhancement of a first-order Freedericksz transition and intrinsic optical bistability in homeotropic and parallel oriented nematic liquid crystal cells. Finally, we discuss the liquid crystal alignment and surface effects on field-induced Freedericksz transition.
Large Scale Document Inversion using a Multi-threaded Computing System

PubMed Central

Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won

2018-01-01

Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. CCS Concepts •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations. PMID:29861701
Large Scale Document Inversion using a Multi-threaded Computing System.

PubMed

Jung, Sungbo; Chang, Dar-Jen; Park, Juw Won

2017-06-01

Current microprocessor architecture is moving towards multi-core/multi-threaded systems. This trend has led to a surge of interest in using multi-threaded computing devices, such as the Graphics Processing Unit (GPU), for general purpose computing. We can utilize the GPU in computation as a massive parallel coprocessor because the GPU consists of multiple cores. The GPU is also an affordable, attractive, and user-programmable commodity. Nowadays a lot of information has been flooded into the digital domain around the world. Huge volume of data, such as digital libraries, social networking services, e-commerce product data, and reviews, etc., is produced or collected every moment with dramatic growth in size. Although the inverted index is a useful data structure that can be used for full text searches or document retrieval, a large number of documents will require a tremendous amount of time to create the index. The performance of document inversion can be improved by multi-thread or multi-core GPU. Our approach is to implement a linear-time, hash-based, single program multiple data (SPMD), document inversion algorithm on the NVIDIA GPU/CUDA programming platform utilizing the huge computational power of the GPU, to develop high performance solutions for document indexing. Our proposed parallel document inversion system shows 2-3 times faster performance than a sequential system on two different test datasets from PubMed abstract and e-commerce product reviews. •Information systems➝Information retrieval • Computing methodologies➝Massively parallel and high-performance simulations.
Evaluation of concurrent priority queue algorithms. Technical report

DOE Office of Scientific and Technical Information (OSTI.GOV)

Huang, Q.

1991-02-01

The priority queue is a fundamental data structure that is used in a large variety of parallel algorithms, such as multiprocessor scheduling and parallel best-first search of state-space graphs. This thesis addresses the design and experimental evaluation of two novel concurrent priority queues: a parallel Fibonacci heap and a concurrent priority pool, and compares them with the concurrent binary heap. The parallel Fibonacci heap is based on the sequential Fibonacci heap, which is theoretically the most efficient data structure for sequential priority queues. This scheme not only preserves the efficient operation time bounds of its sequential counterpart, but also hasmore » very low contention by distributing locks over the entire data structure. The experimental results show its linearly scalable throughput and speedup up to as many processors as tested (currently 18). A concurrent access scheme for a doubly linked list is described as part of the implementation of the parallel Fibonacci heap. The concurrent priority pool is based on the concurrent B-tree and the concurrent pool. The concurrent priority pool has the highest throughput among the priority queues studied. Like the parallel Fibonacci heap, the concurrent priority pool scales linearly up to as many processors as tested. The priority queues are evaluated in terms of throughput and speedup. Some applications of concurrent priority queues such as the vertex cover problem and the single source shortest path problem are tested.« less
Electron/ion whistler instabilities and magnetic noise bursts

NASA Technical Reports Server (NTRS)

Akimoto, K.; Gary, S. Peter; Omidi, N.

1987-01-01

Two whistler instabilities are investigated by means of the linear Vlasov dispersion equation. They are called the electron/ion parallel and oblique whistler instabilities, and are driven by electron/ion relative drifts along the magnetic field. It is demonstrated that the enhanced fluctuations from these instabilities can explain several properties of magnetic noise bursts in and near the plasma sheet in the presence of ion beams and/or field-aligned currents. At sufficiently high plasma beta, these instabilities may affect the current system in the magnetotail.
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chrisochoides, N.; Sukup, F.

In this paper we present a parallel implementation of the Bowyer-Watson (BW) algorithm using the task-parallel programming model. The BW algorithm constitutes an ideal mesh refinement strategy for implementing a large class of unstructured mesh generation techniques on both sequential and parallel computers, by preventing the need for global mesh refinement. Its implementation on distributed memory multicomputes using the traditional data-parallel model has been proven very inefficient due to excessive synchronization needed among processors. In this paper we demonstrate that with the task-parallel model we can tolerate synchronization costs inherent to data-parallel methods by exploring concurrency in the processor level.more » Our preliminary performance data indicate that the task- parallel approach: (i) is almost four times faster than the existing data-parallel methods, (ii) scales linearly, and (iii) introduces minimum overheads compared to the {open_quotes}best{close_quotes} sequential implementation of the BW algorithm.« less
High-speed extended-term time-domain simulation for online cascading analysis of power system

NASA Astrophysics Data System (ADS)

Fu, Chuan

A high-speed extended-term (HSET) time domain simulator (TDS), intended to become a part of an energy management system (EMS), has been newly developed for use in online extended-term dynamic cascading analysis of power systems. HSET-TDS includes the following attributes for providing situational awareness of high-consequence events: (i) online analysis, including n-1 and n-k events, (ii) ability to simulate both fast and slow dynamics for 1-3 hours in advance, (iii) inclusion of rigorous protection-system modeling, (iv) intelligence for corrective action ID, storage, and fast retrieval, and (v) high-speed execution. Very fast on-line computational capability is the most desired attribute of this simulator. Based on the process of solving algebraic differential equations describing the dynamics of power system, HSET-TDS seeks to develop computational efficiency at each of the following hierarchical levels, (i) hardware, (ii) strategies, (iii) integration methods, (iv) nonlinear solvers, and (v) linear solver libraries. This thesis first describes the Hammer-Hollingsworth 4 (HH4) implicit integration method. Like the trapezoidal rule, HH4 is symmetrically A-Stable but it possesses greater high-order precision (h4 ) than the trapezoidal rule. Such precision enables larger integration steps and therefore improves simulation efficiency for variable step size implementations. This thesis provides the underlying theory on which we advocate use of HH4 over other numerical integration methods for power system time-domain simulation. Second, motivated by the need to perform high speed extended-term time domain simulation (HSET-TDS) for on-line purposes, this thesis presents principles for designing numerical solvers of differential algebraic systems associated with power system time-domain simulation, including DAE construction strategies (Direct Solution Method), integration methods(HH4), nonlinear solvers(Very Dishonest Newton), and linear solvers(SuperLU). We have implemented a design appropriate for HSET-TDS, and we compare it to various solvers, including the commercial grade PSSE program, with respect to computational efficiency and accuracy, using as examples the New England 39 bus system, the expanded 8775 bus system, and PJM 13029 buses system. Third, we have explored a stiffness-decoupling method, intended to be part of parallel design of time domain simulation software for super computers. The stiffness-decoupling method is able to combine the advantages of implicit methods (A-stability) and explicit method(less computation). With the new stiffness detection method proposed herein, the stiffness can be captured. The expanded 975 buses system is used to test simulation efficiency. Finally, several parallel strategies for super computer deployment to simulate power system dynamics are proposed and compared. Design A partitions the task via scale with the stiffness decoupling method, waveform relaxation, and parallel linear solver. Design B partitions the task via the time axis using a highly precise integration method, the Kuntzmann-Butcher Method - order 8 (KB8). The strategy of partitioning events is designed to partition the whole simulation via the time axis through a simulated sequence of cascading events. For all strategies proposed, a strategy of partitioning cascading events is recommended, since the sub-tasks for each processor are totally independent, and therefore minimum communication time is needed.
A Generic Mesh Data Structure with Parallel Applications

ERIC Educational Resources Information Center

Cochran, William Kenneth, Jr.

2009-01-01

High performance, massively-parallel multi-physics simulations are built on efficient mesh data structures. Most data structures are designed from the bottom up, focusing on the implementation of linear algebra routines. In this thesis, we explore a top-down approach to design, evaluating the various needs of many aspects of simulation, not just…
Performance of the OVERFLOW-MLP and LAURA-MLP CFD Codes on the NASA Ames 512 CPU Origin System

NASA Technical Reports Server (NTRS)

Taft, James R.

2000-01-01

The shared memory Multi-Level Parallelism (MLP) technique, developed last year at NASA Ames has been very successful in dramatically improving the performance of important NASA CFD codes. This new and very simple parallel programming technique was first inserted into the OVERFLOW production CFD code in FY 1998. The OVERFLOW-MLP code's parallel performance scaled linearly to 256 CPUs on the NASA Ames 256 CPU Origin 2000 system (steger). Overall performance exceeded 20.1 GFLOP/s, or about 4.5x the performance of a dedicated 16 CPU C90 system. All of this was achieved without any major modification to the original vector based code. The OVERFLOW-MLP code is now in production on the inhouse Origin systems as well as being used offsite at commercial aerospace companies. Partially as a result of this work, NASA Ames has purchased a new 512 CPU Origin 2000 system to further test the limits of parallel performance for NASA codes of interest. This paper presents the performance obtained from the latest optimization efforts on this machine for the LAURA-MLP and OVERFLOW-MLP codes. The Langley Aerothermodynamics Upwind Relaxation Algorithm (LAURA) code is a key simulation tool in the development of the next generation shuttle, interplanetary reentry vehicles, and nearly all "X" plane development. This code sustains about 4-5 GFLOP/s on a dedicated 16 CPU C90. At this rate, expected workloads would require over 100 C90 CPU years of computing over the next few calendar years. It is not feasible to expect that this would be affordable or available to the user community. Dramatic performance gains on cheaper systems are needed. This code is expected to be perhaps the largest consumer of NASA Ames compute cycles per run in the coming year.The OVERFLOW CFD code is extensively used in the government and commercial aerospace communities to evaluate new aircraft designs. It is one of the largest consumers of NASA supercomputing cycles and large simulations of highly resolved full aircraft are routinely undertaken. Typical large problems might require 100s of Cray C90 CPU hours to complete. The dramatic performance gains with the 256 CPU steger system are exciting. Obtaining results in hours instead of months is revolutionizing the way in which aircraft manufacturers are looking at future aircraft simulation work. Figure 2 below is a current state of the art plot of OVERFLOW-MLP performance on the 512 CPU Lomax system. As can be seen, the chart indicates that OVERFLOW-MLP continues to scale linearly with CPU count up to 512 CPUs on a large 35 million point full aircraft RANS simulation. At this point performance is such that a fully converged simulation of 2500 time steps is completed in less than 2 hours of elapsed time. Further work over the next few weeks will improve the performance of this code even further.The LAURA code has been converted to the MLP format as well. This code is currently being optimized for the 512 CPU system. Performance statistics indicate that the goal of 100 GFLOP/s will be achieved by year's end. This amounts to 20x the 16 CPU C90 result and strongly demonstrates the viability of the new parallel systems rapidly solving very large simulations in a production environment.
Temperature responsive transmitter

NASA Technical Reports Server (NTRS)

Kleinberg, Leonard L. (Inventor)

1987-01-01

A temperature responsive transmitter is provided in which frequency varies linearly with temperature. The transmitter includes two identically biased transistors connected in parallel. A capacitor, which reflects into the common bases to generate negative resistance effectively in parallel with the capacitor, is connected to the common emitters. A crystal is effectively in parallel with the capacitor and the negative resistance. Oscillations occur if the magnitude of the absolute value of the negative resistance is less than the positive resistive impedance of the capacitor and the inductance of the crystal. The crystal has a large linear temperature coefficient and a resonant frequency which is substantially less than the gain-bandwidth product of the transistors to ensure that the crystal primarily determines the frequency of oscillation. A high-Q tank circuit having an inductor and a capacitor is connected to the common collectors to increase the collector current flow which in turn enhances the radiation of the oscillator frequency by the inductor.
Comparison of least squares and exponential sine sweep methods for Parallel Hammerstein Models estimation

NASA Astrophysics Data System (ADS)

Rebillat, Marc; Schoukens, Maarten

2018-05-01

Linearity is a common assumption for many real-life systems, but in many cases the nonlinear behavior of systems cannot be ignored and must be modeled and estimated. Among the various existing classes of nonlinear models, Parallel Hammerstein Models (PHM) are interesting as they are at the same time easy to interpret as well as to estimate. One way to estimate PHM relies on the fact that the estimation problem is linear in the parameters and thus that classical least squares (LS) estimation algorithms can be used. In that area, this article introduces a regularized LS estimation algorithm inspired on some of the recently developed regularized impulse response estimation techniques. Another mean to estimate PHM consists in using parametric or non-parametric exponential sine sweeps (ESS) based methods. These methods (LS and ESS) are founded on radically different mathematical backgrounds but are expected to tackle the same issue. A methodology is proposed here to compare them with respect to (i) their accuracy, (ii) their computational cost, and (iii) their robustness to noise. Tests are performed on simulated systems for several values of methods respective parameters and of signal to noise ratio. Results show that, for a given set of data points, the ESS method is less demanding in computational resources than the LS method but that it is also less accurate. Furthermore, the LS method needs parameters to be set in advance whereas the ESS method is not subject to conditioning issues and can be fully non-parametric. In summary, for a given set of data points, ESS method can provide a first, automatic, and quick overview of a nonlinear system than can guide more computationally demanding and precise methods, such as the regularized LS one proposed here.
A General-purpose Framework for Parallel Processing of Large-scale LiDAR Data

NASA Astrophysics Data System (ADS)

Li, Z.; Hodgson, M.; Li, W.

2016-12-01

Light detection and ranging (LiDAR) technologies have proven efficiency to quickly obtain very detailed Earth surface data for a large spatial extent. Such data is important for scientific discoveries such as Earth and ecological sciences and natural disasters and environmental applications. However, handling LiDAR data poses grand geoprocessing challenges due to data intensity and computational intensity. Previous studies received notable success on parallel processing of LiDAR data to these challenges. However, these studies either relied on high performance computers and specialized hardware (GPUs) or focused mostly on finding customized solutions for some specific algorithms. We developed a general-purpose scalable framework coupled with sophisticated data decomposition and parallelization strategy to efficiently handle big LiDAR data. Specifically, 1) a tile-based spatial index is proposed to manage big LiDAR data in the scalable and fault-tolerable Hadoop distributed file system, 2) two spatial decomposition techniques are developed to enable efficient parallelization of different types of LiDAR processing tasks, and 3) by coupling existing LiDAR processing tools with Hadoop, this framework is able to conduct a variety of LiDAR data processing tasks in parallel in a highly scalable distributed computing environment. The performance and scalability of the framework is evaluated with a series of experiments conducted on a real LiDAR dataset using a proof-of-concept prototype system. The results show that the proposed framework 1) is able to handle massive LiDAR data more efficiently than standalone tools; and 2) provides almost linear scalability in terms of either increased workload (data volume) or increased computing nodes with both spatial decomposition strategies. We believe that the proposed framework provides valuable references on developing a collaborative cyberinfrastructure for processing big earth science data in a highly scalable environment.
Computer simulations of electromagnetic cool ion beam instabilities. [in near earth space

NASA Technical Reports Server (NTRS)

Gary, S. P.; Madland, C. D.; Schriver, D.; Winske, D.

1986-01-01

Electromagnetic ion beam instabilities driven by cool ion beams at propagation parallel or antiparallel to a uniform magnetic field are studied using computer simulations. The elements of linear theory applicable to electromagnetic ion beam instabilities and the simulations derived from a one-dimensional hybrid computer code are described. The quasi-linear regime of the right-hand resonant ion beam instability, and the gyrophase bunching of the nonlinear regime of the right-hand resonant and nonresonant instabilities are examined. It is detected that in the quasi-linear regime the instability saturation is due to a reduction in the beam core relative drift speed and an increase in the perpendicular-to-parallel beam temperature; in the nonlinear regime the instabilities saturate when half the initial beam drift kinetic energy density is converted to fluctuating magnetic field energy density.
3D CSEM inversion based on goal-oriented adaptive finite element method

NASA Astrophysics Data System (ADS)

Zhang, Y.; Key, K.

2016-12-01

We present a parallel 3D frequency domain controlled-source electromagnetic inversion code name MARE3DEM. Non-linear inversion of observed data is performed with the Occam variant of regularized Gauss-Newton optimization. The forward operator is based on the goal-oriented finite element method that efficiently calculates the responses and sensitivity kernels in parallel using a data decomposition scheme where independent modeling tasks contain different frequencies and subsets of the transmitters and receivers. To accommodate complex 3D conductivity variation with high flexibility and precision, we adopt the dual-grid approach where the forward mesh conforms to the inversion parameter grid and is adaptively refined until the forward solution converges to the desired accuracy. This dual-grid approach is memory efficient, since the inverse parameter grid remains independent from fine meshing generated around the transmitter and receivers by the adaptive finite element method. Besides, the unstructured inverse mesh efficiently handles multiple scale structures and allows for fine-scale model parameters within the region of interest. Our mesh generation engine keeps track of the refinement hierarchy so that the map of conductivity and sensitivity kernel between the forward and inverse mesh is retained. We employ the adjoint-reciprocity method to calculate the sensitivity kernels which establish a linear relationship between changes in the conductivity model and changes in the modeled responses. Our code uses a direcy solver for the linear systems, so the adjoint problem is efficiently computed by re-using the factorization from the primary problem. Further computational efficiency and scalability is obtained in the regularized Gauss-Newton portion of the inversion using parallel dense matrix-matrix multiplication and matrix factorization routines implemented with the ScaLAPACK library. We show the scalability, reliability and the potential of the algorithm to deal with complex geological scenarios by applying it to the inversion of synthetic marine controlled source EM data generated for a complex 3D offshore model with significant seafloor topography.
Analysis of 2D THz-Raman spectroscopy using a non-Markovian Brownian oscillator model with nonlinear system-bath interactions.

PubMed

Ikeda, Tatsushi; Ito, Hironobu; Tanimura, Yoshitaka

2015-06-07

We explore and describe the roles of inter-molecular vibrations employing a Brownian oscillator (BO) model with linear-linear (LL) and square-linear (SL) system-bath interactions, which we use to analyze two-dimensional (2D) THz-Raman spectra obtained by means of molecular dynamics (MD) simulations. In addition to linear infrared absorption (1D IR), we calculated 2D Raman-THz-THz, THz-Raman-THz, and THz-THz-Raman signals for liquid formamide, water, and methanol using an equilibrium non-equilibrium hybrid MD simulation. The calculated 1D IR and 2D THz-Raman signals are compared with results obtained from the LL+SL BO model applied through use of hierarchal Fokker-Planck equations with non-perturbative and non-Markovian noise. We find that all of the qualitative features of the 2D profiles of the signals obtained from the MD simulations are reproduced with the LL+SL BO model, indicating that this model captures the essential features of the inter-molecular motion. We analyze the fitted 2D profiles in terms of anharmonicity, nonlinear polarizability, and dephasing time. The origins of the echo peaks of the librational motion and the elongated peaks parallel to the probe direction are elucidated using optical Liouville paths.
Three-dimensional finite amplitude electroconvection in dielectric liquids

NASA Astrophysics Data System (ADS)

Luo, Kang; Wu, Jian; Yi, Hong-Liang; Tan, He-Ping

2018-02-01

Charge injection induced electroconvection in a dielectric liquid lying between two parallel plates is numerically simulated in three dimensions (3D) using a unified lattice Boltzmann method (LBM). Cellular flow patterns and their subcritical bifurcation phenomena of 3D electroconvection are numerically investigated for the first time. A unit conversion is also derived to connect the LBM system to the real physical system. The 3D LBM codes are validated by three carefully chosen cases and all results are found to be highly consistent with the analytical solutions or other numerical studies. For strong injection, the steady state roll, polygon, and square flow patterns are observed under different initial disturbances. Numerical results show that the hexagonal cell with the central region being empty of charge and centrally downward flow is preferred in symmetric systems under random initial disturbance. For weak injection, the numerical results show that the flow directly passes from the motionless state to turbulence once the system loses its linear stability. In addition, the numerically predicted linear and finite amplitude stability criteria of different flow patterns are discussed.

DOE Office of Scientific and Technical Information (OSTI.GOV)

de Miguel, E.; Rull, L.F.; Gubbins, K.E.

Using molecular-dynamics computer simulation, we study the dynamical behavior of the isotropic and nematic phases of highly anisotropic molecular fluids. The interactions are modeled by means of the Gay-Berne potential with anisotropy parameters {kappa}=3 and {kappa}{prime}=5. The linear-velocity autocorrelation function shows no evidence of a negative region in the isotropic phase, even at the higher densities considered. The self-diffusion coefficient parallel to the molecular axis shows an anomalous increase with density as the system enters the nematic region. This enhancement in parallel diffusion is also observed in the isotropic side of the transition as a precursor effect. The molecular reorientationmore » is discussed in the light of different theoretical models. The Debye diffusion model appears to explain the reorientational mechanism in the nematic phase. None of the models gives a satisfactory account of the reorientation process in the isotropic phase.« less
Projective flatness in the quantisation of bosons and fermions

NASA Astrophysics Data System (ADS)

Wu, Siye

2015-07-01

We compare the quantisation of linear systems of bosons and fermions. We recall the appearance of projectively flat connection and results on parallel transport in the quantisation of bosons. We then discuss pre-quantisation and quantisation of fermions using the calculus of fermionic variables. We define a natural connection on the bundle of Hilbert spaces and show that it is projectively flat. This identifies, up to a phase, equivalent spinor representations constructed by various polarisations. We introduce the concept of metaplectic correction for fermions and show that the bundle of corrected Hilbert spaces is naturally flat. We then show that the parallel transport in the bundle of Hilbert spaces along a geodesic is a rescaled projection provided that the geodesic lies within the complement of a cut locus. Finally, we study the bundle of Hilbert spaces when there is a symmetry.
Parallel Conjugate Gradient: Effects of Ordering Strategies, Programming Paradigms, and Architectural Platforms

NASA Technical Reports Server (NTRS)

Oliker, Leonid; Heber, Gerd; Biswas, Rupak

2000-01-01

The Conjugate Gradient (CG) algorithm is perhaps the best-known iterative technique to solve sparse linear systems that are symmetric and positive definite. A sparse matrix-vector multiply (SPMV) usually accounts for most of the floating-point operations within a CG iteration. In this paper, we investigate the effects of various ordering and partitioning strategies on the performance of parallel CG and SPMV using different programming paradigms and architectures. Results show that for this class of applications, ordering significantly improves overall performance, that cache reuse may be more important than reducing communication, and that it is possible to achieve message passing performance using shared memory constructs through careful data ordering and distribution. However, a multi-threaded implementation of CG on the Tera MTA does not require special ordering or partitioning to obtain high efficiency and scalability.
Research on parallel load sharing principle of piezoelectric six-dimensional heavy force/torque sensor

NASA Astrophysics Data System (ADS)

Liu, Wei; Li, Ying-jun; Jia, Zhen-yuan; Zhang, Jun; Qian, Min

2011-01-01

In working process of huge heavy-load manipulators, such as the free forging machine, hydraulic die-forging press, forging manipulator, heavy grasping manipulator, large displacement manipulator, measurement of six-dimensional heavy force/torque and real-time force feedback of the operation interface are basis to realize coordinate operation control and force compliance control. It is also an effective way to raise the control accuracy and achieve highly efficient manufacturing. Facing to solve dynamic measurement problem on six-dimensional time-varying heavy load in extremely manufacturing process, the novel principle of parallel load sharing on six-dimensional heavy force/torque is put forward. The measuring principle of six-dimensional force sensor is analyzed, and the spatial model is built and decoupled. The load sharing ratios are analyzed and calculated in vertical and horizontal directions. The mapping relationship between six-dimensional heavy force/torque value to be measured and output force value is built. The finite element model of parallel piezoelectric six-dimensional heavy force/torque sensor is set up, and its static characteristics are analyzed by ANSYS software. The main parameters, which affect load sharing ratio, are analyzed. The experiments for load sharing with different diameters of parallel axis are designed. The results show that the six-dimensional heavy force/torque sensor has good linearity. Non-linearity errors are less than 1%. The parallel axis makes good effect of load sharing. The larger the diameter is, the better the load sharing effect is. The results of experiments are in accordance with the FEM analysis. The sensor has advantages of large measuring range, good linearity, high inherent frequency, and high rigidity. It can be widely used in extreme environments for real-time accurate measurement of six-dimensional time-varying huge loads on manipulators.
Wakefield Simulation of CLIC PETS Structure Using Parallel 3D Finite Element Time-Domain Solver T3P

DOE Office of Scientific and Technical Information (OSTI.GOV)

Candel, A.; Kabel, A.; Lee, L.

In recent years, SLAC's Advanced Computations Department (ACD) has developed the parallel 3D Finite Element electromagnetic time-domain code T3P. Higher-order Finite Element methods on conformal unstructured meshes and massively parallel processing allow unprecedented simulation accuracy for wakefield computations and simulations of transient effects in realistic accelerator structures. Applications include simulation of wakefield damping in the Compact Linear Collider (CLIC) power extraction and transfer structure (PETS).
User's guide of TOUGH2-EGS-MP: A Massively Parallel Simulator with Coupled Geomechanics for Fluid and Heat Flow in Enhanced Geothermal Systems VERSION 1.0

DOE Office of Scientific and Technical Information (OSTI.GOV)

Xiong, Yi; Fakcharoenphol, Perapon; Wang, Shihao

2013-12-01

TOUGH2-EGS-MP is a parallel numerical simulation program coupling geomechanics with fluid and heat flow in fractured and porous media, and is applicable for simulation of enhanced geothermal systems (EGS). TOUGH2-EGS-MP is based on the TOUGH2-MP code, the massively parallel version of TOUGH2. In TOUGH2-EGS-MP, the fully-coupled flow-geomechanics model is developed from linear elastic theory for thermo-poro-elastic systems and is formulated in terms of mean normal stress as well as pore pressure and temperature. Reservoir rock properties such as porosity and permeability depend on rock deformation, and the relationships between these two, obtained from poro-elasticity theories and empirical correlations, are incorporatedmore » into the simulation. This report provides the user with detailed information on the TOUGH2-EGS-MP mathematical model and instructions for using it for Thermal-Hydrological-Mechanical (THM) simulations. The mathematical model includes the fluid and heat flow equations, geomechanical equation, and discretization of those equations. In addition, the parallel aspects of the code, such as domain partitioning and communication between processors, are also included. Although TOUGH2-EGS-MP has the capability for simulating fluid and heat flows coupled with geomechanical effects, it is up to the user to select the specific coupling process, such as THM or only TH, in a simulation. There are several example problems illustrating applications of this program. These example problems are described in detail and their input data are presented. Their results demonstrate that this program can be used for field-scale geothermal reservoir simulation in porous and fractured media with fluid and heat flow coupled with geomechanical effects.« less
Three is much more than two in coarsening dynamics of cyclic competitions

NASA Astrophysics Data System (ADS)

Mitarai, Namiko; Gunnarson, Ivar; Pedersen, Buster Niels; Rosiek, Christian Anker; Sneppen, Kim

2016-04-01

The classical game of rock-paper-scissors has inspired experiments and spatial model systems that address the robustness of biological diversity. In particular, the game nicely illustrates that cyclic interactions allow multiple strategies to coexist for long-time intervals. When formulated in terms of a one-dimensional cellular automata, the spatial distribution of strategies exhibits coarsening with algebraically growing domain size over time, while the two-dimensional version allows domains to break and thereby opens the possibility for long-time coexistence. We consider a quasi-one-dimensional implementation of the cyclic competition, and study the long-term dynamics as a function of rare invasions between parallel linear ecosystems. We find that increasing the complexity from two to three parallel subsystems allows a transition from complete coarsening to an active steady state where the domain size stays finite. We further find that this transition happens irrespective of whether the update is done in parallel for all sites simultaneously or done randomly in sequential order. In both cases, the active state is characterized by localized bursts of dislocations, followed by longer periods of coarsening. In the case of the parallel dynamics, we find that there is another phase transition between the active steady state and the coarsening state within the three-line system when the invasion rate between the subsystems is varied. We identify the critical parameter for this transition and show that the density of active boundaries has critical exponents that are consistent with the directed percolation universality class. On the other hand, numerical simulations with the random sequential dynamics suggest that the system may exhibit an active steady state as long as the invasion rate is finite.
On Parallel Push-Relabel based Algorithms for Bipartite Maximum Matching

DOE Office of Scientific and Technical Information (OSTI.GOV)

Langguth, Johannes; Azad, Md Ariful; Halappanavar, Mahantesh

2014-07-01

We study multithreaded push-relabel based algorithms for computing maximum cardinality matching in bipartite graphs. Matching is a fundamental combinatorial (graph) problem with applications in a wide variety of problems in science and engineering. We are motivated by its use in the context of sparse linear solvers for computing maximum transversal of a matrix. We implement and test our algorithms on several multi-socket multicore systems and compare their performance to state-of-the-art augmenting path-based serial and parallel algorithms using a testset comprised of a wide range of real-world instances. Building on several heuristics for enhancing performance, we demonstrate good scaling for themore » parallel push-relabel algorithm. We show that it is comparable to the best augmenting path-based algorithms for bipartite matching. To the best of our knowledge, this is the first extensive study of multithreaded push-relabel based algorithms. In addition to a direct impact on the applications using matching, the proposed algorithmic techniques can be extended to preflow-push based algorithms for computing maximum flow in graphs.« less
Vectorial finite elements for solving the radiative transfer equation

NASA Astrophysics Data System (ADS)

Badri, M. A.; Jolivet, P.; Rousseau, B.; Le Corre, S.; Digonnet, H.; Favennec, Y.

2018-06-01

The discrete ordinate method coupled with the finite element method is often used for the spatio-angular discretization of the radiative transfer equation. In this paper we attempt to improve upon such a discretization technique. Instead of using standard finite elements, we reformulate the radiative transfer equation using vectorial finite elements. In comparison to standard finite elements, this reformulation yields faster timings for the linear system assemblies, as well as for the solution phase when using scattering media. The proposed vectorial finite element discretization for solving the radiative transfer equation is cross-validated against a benchmark problem available in literature. In addition, we have used the method of manufactured solutions to verify the order of accuracy for our discretization technique within different absorbing, scattering, and emitting media. For solving large problems of radiation on parallel computers, the vectorial finite element method is parallelized using domain decomposition. The proposed domain decomposition method scales on large number of processes, and its performance is unaffected by the changes in optical thickness of the medium. Our parallel solver is used to solve a large scale radiative transfer problem of the Kelvin-cell radiation.
A systematic approach to embedded biomedical decision making.

PubMed

Song, Zhe; Ji, Zhongkai; Ma, Jian-Guo; Sputh, Bernhard; Acharya, U Rajendra; Faust, Oliver

2012-11-01

An embedded decision making is a key feature for many biomedical systems. In most cases human life directly depends on correct decisions made by these systems, therefore they have to work reliably. This paper describes how we applied systems engineering principles to design a high performance embedded classification system in a systematic and well structured way. We introduce the structured design approach by discussing requirements capturing, specifications refinement, implementation and testing. Thereby, we follow systems engineering principles and execute each of these processes as formal as possible. The requirements, which motivate the system design, describe an automated decision making system for diagnostic support. These requirements are refined into the implementation of a support vector machine (SVM) algorithm which enables us to integrate automated decision making in embedded systems. With a formal model we establish functionality, stability and reliability of the system. Furthermore, we investigated different parallel processing configurations of this computationally complex algorithm. We found that, by adding SVM processes, an almost linear speedup is possible. Once we established these system properties, we translated the formal model into an implementation. The resulting implementation was tested using XMOS processors with both normal and failure cases, to build up trust in the implementation. Finally, we demonstrated that our parallel implementation achieves the speedup, predicted by the formal model. Copyright © 2011 Elsevier Ireland Ltd. All rights reserved.
A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems

DOE PAGES

Song, Fengguang; Dongarra, Jack

2014-10-01

Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on hybrid CPU-GPU systems to solve dense linear algebra problems, in this paper we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasksmore » without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double-precision Cholesky factorization and QR factorization. Finally, our approach demonstrates a performance comparable to Intel MKL on shared-memory multicore systems and better performance than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU) in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared-memory systems with multiple GPUs.« less
A scalable approach to solving dense linear algebra problems on hybrid CPU-GPU systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Song, Fengguang; Dongarra, Jack

Aiming to fully exploit the computing power of all CPUs and all graphics processing units (GPUs) on hybrid CPU-GPU systems to solve dense linear algebra problems, in this paper we design a class of heterogeneous tile algorithms to maximize the degree of parallelism, to minimize the communication volume, and to accommodate the heterogeneity between CPUs and GPUs. The new heterogeneous tile algorithms are executed upon our decentralized dynamic scheduling runtime system, which schedules a task graph dynamically and transfers data between compute nodes automatically. The runtime system uses a new distributed task assignment protocol to solve data dependencies between tasksmore » without any coordination between processing units. By overlapping computation and communication through dynamic scheduling, we are able to attain scalable performance for the double-precision Cholesky factorization and QR factorization. Finally, our approach demonstrates a performance comparable to Intel MKL on shared-memory multicore systems and better performance than both vendor (e.g., Intel MKL) and open source libraries (e.g., StarPU) in the following three environments: heterogeneous clusters with GPUs, conventional clusters without GPUs, and shared-memory systems with multiple GPUs.« less
Consensus Algorithms for Networks of Systems with Second- and Higher-Order Dynamics

NASA Astrophysics Data System (ADS)

Fruhnert, Michael

This thesis considers homogeneous networks of linear systems. We consider linear feedback controllers and require that the directed graph associated with the network contains a spanning tree and systems are stabilizable. We show that, in continuous-time, consensus with a guaranteed rate of convergence can always be achieved using linear state feedback. For networks of continuous-time second-order systems, we provide a new and simple derivation of the conditions for a second-order polynomials with complex coefficients to be Hurwitz. We apply this result to obtain necessary and sufficient conditions to achieve consensus with networks whose graph Laplacian matrix may have complex eigenvalues. Based on the conditions found, methods to compute feedback gains are proposed. We show that gains can be chosen such that consensus is achieved robustly over a variety of communication structures and system dynamics. We also consider the use of static output feedback. For networks of discrete-time second-order systems, we provide a new and simple derivation of the conditions for a second-order polynomials with complex coefficients to be Schur. We apply this result to obtain necessary and sufficient conditions to achieve consensus with networks whose graph Laplacian matrix may have complex eigenvalues. We show that consensus can always be achieved for marginally stable systems and discretized systems. Simple conditions for consensus achieving controllers are obtained when the Laplacian eigenvalues are all real. For networks of continuous-time time-variant higher-order systems, we show that uniform consensus can always be achieved if systems are quadratically stabilizable. In this case, we provide a simple condition to obtain a linear feedback control. For networks of discrete-time higher-order systems, we show that constant gains can be chosen such that consensus is achieved for a variety of network topologies. First, we develop simple results for networks of time-invariant systems and networks of time-variant systems that are given in controllable canonical form. Second, we formulate the problem in terms of Linear Matrix Inequalities (LMIs). The condition found simplifies the design process and avoids the parallel solution of multiple LMIs. The result yields a modified Algebraic Riccati Equation (ARE) for which we present an equivalent LMI condition.
A robust H∞-tracking design for uncertain Takagi-Sugeno fuzzy systems with unknown premise variables using descriptor redundancy approach

NASA Astrophysics Data System (ADS)

Hassan Asemani, Mohammad; Johari Majd, Vahid

2015-12-01

This paper addresses a robust H∞ fuzzy observer-based tracking design problem for uncertain Takagi-Sugeno fuzzy systems with external disturbances. To have a practical observer-based controller, the premise variables of the system are assumed to be not measurable in general, which leads to a more complex design process. The tracker is synthesised based on a fuzzy Lyapunov function approach and non-parallel distributed compensation (non-PDC) scheme. Using the descriptor redundancy approach, the robust stability conditions are derived in the form of strict linear matrix inequalities (LMIs) even in the presence of uncertainties in the system, input, and output matrices simultaneously. Numerical simulations are provided to show the effectiveness of the proposed method.
Can Multiple "Spatial" Virtual Timelines Convey the Relatedness of Chronological Knowledge across Parallel Domains?

ERIC Educational Resources Information Center

Korallo, Liliya; Foreman, Nigel; Boyd-Davis, Stephen; Moar, Magnus; Coulson, Mark

2012-01-01

Single linear virtual timelines have been used effectively with undergraduates and primary school children to convey the chronological ordering of historical items, improving on PowerPoint and paper/textual displays. In the present study, a virtual environment (VE) consisting of three parallel related timelines (world history and the histories of…
Multilevel decomposition of complete vehicle configuration in a parallel computing environment

NASA Technical Reports Server (NTRS)

Bhatt, Vinay; Ragsdell, K. M.

1989-01-01

This research summarizes various approaches to multilevel decomposition to solve large structural problems. A linear decomposition scheme based on the Sobieski algorithm is selected as a vehicle for automated synthesis of a complete vehicle configuration in a parallel processing environment. The research is in a developmental state. Preliminary numerical results are presented for several example problems.
Parallel Computation of the Jacobian Matrix for Nonlinear Equation Solvers Using MATLAB

NASA Technical Reports Server (NTRS)

Rose, Geoffrey K.; Nguyen, Duc T.; Newman, Brett A.

2017-01-01

Demonstrating speedup for parallel code on a multicore shared memory PC can be challenging in MATLAB due to underlying parallel operations that are often opaque to the user. This can limit potential for improvement of serial code even for the so-called embarrassingly parallel applications. One such application is the computation of the Jacobian matrix inherent to most nonlinear equation solvers. Computation of this matrix represents the primary bottleneck in nonlinear solver speed such that commercial finite element (FE) and multi-body-dynamic (MBD) codes attempt to minimize computations. A timing study using MATLAB's Parallel Computing Toolbox was performed for numerical computation of the Jacobian. Several approaches for implementing parallel code were investigated while only the single program multiple data (spmd) method using composite objects provided positive results. Parallel code speedup is demonstrated but the goal of linear speedup through the addition of processors was not achieved due to PC architecture.
Digital algorithms for parallel pipelined single-detector homodyne fringe counting in laser interferometry

NASA Astrophysics Data System (ADS)

Rerucha, Simon; Sarbort, Martin; Hola, Miroslava; Cizek, Martin; Hucl, Vaclav; Cip, Ondrej; Lazar, Josef

2016-12-01

The homodyne detection with only a single detector represents a promising approach in the interferometric application which enables a significant reduction of the optical system complexity while preserving the fundamental resolution and dynamic range of the single frequency laser interferometers. We present the design, implementation and analysis of algorithmic methods for computational processing of the single-detector interference signal based on parallel pipelined processing suitable for real time implementation on a programmable hardware platform (e.g. the FPGA - Field Programmable Gate Arrays or the SoC - System on Chip). The algorithmic methods incorporate (a) the single detector signal (sine) scaling, filtering, demodulations and mixing necessary for the second (cosine) quadrature signal reconstruction followed by a conic section projection in Cartesian plane as well as (a) the phase unwrapping together with the goniometric and linear transformations needed for the scale linearization and periodic error correction. The digital computing scheme was designed for bandwidths up to tens of megahertz which would allow to measure the displacements at the velocities around half metre per second. The algorithmic methods were tested in real-time operation with a PC-based reference implementation that employed the advantage pipelined processing by balancing the computational load among multiple processor cores. The results indicate that the algorithmic methods are suitable for a wide range of applications [3] and that they are bringing the fringe counting interferometry closer to the industrial applications due to their optical setup simplicity and robustness, computational stability, scalability and also a cost-effectiveness.
Multiphase three-dimensional direct numerical simulation of a rotating impeller with code Blue

NASA Astrophysics Data System (ADS)

Kahouadji, Lyes; Shin, Seungwon; Chergui, Jalel; Juric, Damir; Craster, Richard V.; Matar, Omar K.

2017-11-01

The flow driven by a rotating impeller inside an open fixed cylindrical cavity is simulated using code Blue, a solver for massively-parallel simulations of fully three-dimensional multiphase flows. The impeller is composed of four blades at a 45° inclination all attached to a central hub and tube stem. In Blue, solid forms are constructed through the definition of immersed objects via a distance function that accounts for the object's interaction with the flow for both single and two-phase flows. We use a moving frame technique for imposing translation and/or rotation. The variation of the Reynolds number, the clearance, and the tank aspect ratio are considered, and we highlight the importance of the confinement ratio (blade radius versus the tank radius) in the mixing process. Blue uses a domain decomposition strategy for parallelization with MPI. The fluid interface solver is based on a parallel implementation of a hybrid front-tracking/level-set method designed complex interfacial topological changes. Parallel GMRES and multigrid iterative solvers are applied to the linear systems arising from the implicit solution for the fluid velocities and pressure in the presence of strong density and viscosity discontinuities across fluid phases. EPSRC, UK, MEMPHIS program Grant (EP/K003976/1), RAEng Research Chair (OKM).
A Parallel Compact Multi-Dimensional Numerical Algorithm with Aeroacoustics Applications

NASA Technical Reports Server (NTRS)

Povitsky, Alex; Morris, Philip J.

1999-01-01

In this study we propose a novel method to parallelize high-order compact numerical algorithms for the solution of three-dimensional PDEs (Partial Differential Equations) in a space-time domain. For this numerical integration most of the computer time is spent in computation of spatial derivatives at each stage of the Runge-Kutta temporal update. The most efficient direct method to compute spatial derivatives on a serial computer is a version of Gaussian elimination for narrow linear banded systems known as the Thomas algorithm. In a straightforward pipelined implementation of the Thomas algorithm processors are idle due to the forward and backward recurrences of the Thomas algorithm. To utilize processors during this time, we propose to use them for either non-local data independent computations, solving lines in the next spatial direction, or local data-dependent computations by the Runge-Kutta method. To achieve this goal, control of processor communication and computations by a static schedule is adopted. Thus, our parallel code is driven by a communication and computation schedule instead of the usual "creative, programming" approach. The obtained parallelization speed-up of the novel algorithm is about twice as much as that for the standard pipelined algorithm and close to that for the explicit DRP algorithm.

A look at scalable dense linear algebra libraries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dongarra, J.J.; Van de Geijn, R.A.; Walker, D.W.

1992-01-01

We discuss the essential design features of a library of scalable software for performing dense linear algebra computations on distributed memory concurrent computers. The square block scattered decomposition is proposed as a flexible and general-purpose way of decomposing most, if not all, dense matrix problems. An object- oriented interface to the library permits more portable applications to be written, and is easy to learn and use, since details of the parallel implementation are hidden from the user. Experiments on the Intel Touchstone Delta system with a prototype code that uses the square block scattered decomposition to perform LU factorization aremore » presented and analyzed. It was found that the code was both scalable and efficient, performing at about 14 GFLOPS (double precision) for the largest problem considered.« less
A look at scalable dense linear algebra libraries

DOE Office of Scientific and Technical Information (OSTI.GOV)

Dongarra, J.J.; Van de Geijn, R.A.; Walker, D.W.

1992-08-01

We discuss the essential design features of a library of scalable software for performing dense linear algebra computations on distributed memory concurrent computers. The square block scattered decomposition is proposed as a flexible and general-purpose way of decomposing most, if not all, dense matrix problems. An object- oriented interface to the library permits more portable applications to be written, and is easy to learn and use, since details of the parallel implementation are hidden from the user. Experiments on the Intel Touchstone Delta system with a prototype code that uses the square block scattered decomposition to perform LU factorization aremore » presented and analyzed. It was found that the code was both scalable and efficient, performing at about 14 GFLOPS (double precision) for the largest problem considered.« less
DOE Office of Scientific and Technical Information (OSTI.GOV)

Chiang, Nai-Yuan; Zavala, Victor M.

We present a filter line-search algorithm that does not require inertia information of the linear system. This feature enables the use of a wide range of linear algebra strategies and libraries, which is essential to tackle large-scale problems on modern computing architectures. The proposed approach performs curvature tests along the search step to detect negative curvature and to trigger convexification. We prove that the approach is globally convergent and we implement the approach within a parallel interior-point framework to solve large-scale and highly nonlinear problems. Our numerical tests demonstrate that the inertia-free approach is as efficient as inertia detection viamore » symmetric indefinite factorizations. We also demonstrate that the inertia-free approach can lead to reductions in solution time because it reduces the amount of convexification needed.« less
Linear and nonlinear spectroscopy from quantum master equations.

PubMed

Fetherolf, Jonathan H; Berkelbach, Timothy C

2017-12-28

We investigate the accuracy of the second-order time-convolutionless (TCL2) quantum master equation for the calculation of linear and nonlinear spectroscopies of multichromophore systems. We show that even for systems with non-adiabatic coupling, the TCL2 master equation predicts linear absorption spectra that are accurate over an extremely broad range of parameters and well beyond what would be expected based on the perturbative nature of the approach; non-equilibrium population dynamics calculated with TCL2 for identical parameters are significantly less accurate. For third-order (two-dimensional) spectroscopy, the importance of population dynamics and the violation of the so-called quantum regression theorem degrade the accuracy of TCL2 dynamics. To correct these failures, we combine the TCL2 approach with a classical ensemble sampling of slow microscopic bath degrees of freedom, leading to an efficient hybrid quantum-classical scheme that displays excellent accuracy over a wide range of parameters. In the spectroscopic setting, the success of such a hybrid scheme can be understood through its separate treatment of homogeneous and inhomogeneous broadening. Importantly, the presented approach has the computational scaling of TCL2, with the modest addition of an embarrassingly parallel prefactor associated with ensemble sampling. The presented approach can be understood as a generalized inhomogeneous cumulant expansion technique, capable of treating multilevel systems with non-adiabatic dynamics.
Linear and nonlinear spectroscopy from quantum master equations

NASA Astrophysics Data System (ADS)

Fetherolf, Jonathan H.; Berkelbach, Timothy C.

2017-12-01

We investigate the accuracy of the second-order time-convolutionless (TCL2) quantum master equation for the calculation of linear and nonlinear spectroscopies of multichromophore systems. We show that even for systems with non-adiabatic coupling, the TCL2 master equation predicts linear absorption spectra that are accurate over an extremely broad range of parameters and well beyond what would be expected based on the perturbative nature of the approach; non-equilibrium population dynamics calculated with TCL2 for identical parameters are significantly less accurate. For third-order (two-dimensional) spectroscopy, the importance of population dynamics and the violation of the so-called quantum regression theorem degrade the accuracy of TCL2 dynamics. To correct these failures, we combine the TCL2 approach with a classical ensemble sampling of slow microscopic bath degrees of freedom, leading to an efficient hybrid quantum-classical scheme that displays excellent accuracy over a wide range of parameters. In the spectroscopic setting, the success of such a hybrid scheme can be understood through its separate treatment of homogeneous and inhomogeneous broadening. Importantly, the presented approach has the computational scaling of TCL2, with the modest addition of an embarrassingly parallel prefactor associated with ensemble sampling. The presented approach can be understood as a generalized inhomogeneous cumulant expansion technique, capable of treating multilevel systems with non-adiabatic dynamics.
Six-degree-of-freedom parallel minimanipulator with three inextensible limbs

NASA Technical Reports Server (NTRS)

Tahmasebi, Farhad (Inventor); Tsai, Lung-Wen (Inventor)

1994-01-01

A Six-Degree-of-Freedom Parallel-Manipulator having three inextensible limbs for manipulating a platform is described. The three inextensible limbs are attached via universal joints to the platform at non-collinear points. Each of the inextensible limbs is also attached via universal joints to a two-degree-of-freedom parallel driver such as a five-bar linkage, a pantograph, or a bidirectional linear stepper motor. The drivers move the lower ends of the limbs parallel to a fixed base and thereby provide manipulation of the platform. The actuators are mounted on the fixed base without using any power transmission devices such as gears or belts.
FWT2D: A massively parallel program for frequency-domain full-waveform tomography of wide-aperture seismic data—Part 1: Algorithm

NASA Astrophysics Data System (ADS)

Sourbier, Florent; Operto, Stéphane; Virieux, Jean; Amestoy, Patrick; L'Excellent, Jean-Yves

2009-03-01

This is the first paper in a two-part series that describes a massively parallel code that performs 2D frequency-domain full-waveform inversion of wide-aperture seismic data for imaging complex structures. Full-waveform inversion methods, namely quantitative seismic imaging methods based on the resolution of the full wave equation, are computationally expensive. Therefore, designing efficient algorithms which take advantage of parallel computing facilities is critical for the appraisal of these approaches when applied to representative case studies and for further improvements. Full-waveform modelling requires the resolution of a large sparse system of linear equations which is performed with the massively parallel direct solver MUMPS for efficient multiple-shot simulations. Efficiency of the multiple-shot solution phase (forward/backward substitutions) is improved by using the BLAS3 library. The inverse problem relies on a classic local optimization approach implemented with a gradient method. The direct solver returns the multiple-shot wavefield solutions distributed over the processors according to a domain decomposition driven by the distribution of the LU factors. The domain decomposition of the wavefield solutions is used to compute in parallel the gradient of the objective function and the diagonal Hessian, this latter providing a suitable scaling of the gradient. The algorithm allows one to test different strategies for multiscale frequency inversion ranging from successive mono-frequency inversion to simultaneous multifrequency inversion. These different inversion strategies will be illustrated in the following companion paper. The parallel efficiency and the scalability of the code will also be quantified.
Breaking Barriers to Low-Cost Modular Inverter Production & Use

DOE Office of Scientific and Technical Information (OSTI.GOV)

Bogdan Borowy; Leo Casey; Jerry Foshage

2005-05-31

The goal of this cost share contract is to advance key technologies to reduce size, weight and cost while enhancing performance and reliability of Modular Inverter Product for Distributed Energy Resources (DER). Efforts address technology development to meet technical needs of DER market protection, isolation, reliability, and quality. Program activities build on SatCon Technology Corporation inverter experience (e.g., AIPM, Starsine, PowerGate) for Photovoltaic, Fuel Cell, Energy Storage applications. Efforts focused four technical areas, Capacitors, Cooling, Voltage Sensing and Control of Parallel Inverters. Capacitor efforts developed a hybrid capacitor approach for conditioning SatCon's AIPM unit supply voltages by incorporating several typesmore » and sizes to store energy and filter at high, medium and low frequencies while minimizing parasitics (ESR and ESL). Cooling efforts converted the liquid cooled AIPM module to an air-cooled unit using augmented fin, impingement flow cooling. Voltage sensing efforts successfully modified the existing AIPM sensor board to allow several, application dependent configurations and enabling voltage sensor galvanic isolation. Parallel inverter control efforts realized a reliable technique to control individual inverters, connected in a parallel configuration, without a communication link. Individual inverter currents, AC and DC, were balanced in the paralleled modules by introducing a delay to the individual PWM gate pulses. The load current sharing is robust and independent of load types (i.e., linear and nonlinear, resistive and/or inductive). It is a simple yet powerful method for paralleling both individual devices dramatically improves reliability and fault tolerance of parallel inverter power systems. A patent application has been made based on this control technology.« less
Spiking Neural P Systems With Rules on Synapses Working in Maximum Spiking Strategy.

PubMed

Tao Song; Linqiang Pan

2015-06-01

Spiking neural P systems (called SN P systems for short) are a class of parallel and distributed neural-like computation models inspired by the way the neurons process information and communicate with each other by means of impulses or spikes. In this work, we introduce a new variant of SN P systems, called SN P systems with rules on synapses working in maximum spiking strategy, and investigate the computation power of the systems as both number and vector generators. Specifically, we prove that i) if no limit is imposed on the number of spikes in any neuron during any computation, such systems can generate the sets of Turing computable natural numbers and the sets of vectors of positive integers computed by k-output register machine; ii) if an upper bound is imposed on the number of spikes in each neuron during any computation, such systems can characterize semi-linear sets of natural numbers as number generating devices; as vector generating devices, such systems can only characterize the family of sets of vectors computed by sequential monotonic counter machine, which is strictly included in family of semi-linear sets of vectors. This gives a positive answer to the problem formulated in Song et al., Theor. Comput. Sci., vol. 529, pp. 82-95, 2014.
Statistics of vacuum breakdown in the high-gradient and low-rate regime

NASA Astrophysics Data System (ADS)

Wuensch, Walter; Degiovanni, Alberto; Calatroni, Sergio; Korsbäck, Anders; Djurabekova, Flyura; Rajamäki, Robin; Giner-Navarro, Jorge

2017-01-01

In an increasing number of high-gradient linear accelerator applications, accelerating structures must operate with both high surface electric fields and low breakdown rates. Understanding the statistical properties of breakdown occurrence in such a regime is of practical importance for optimizing accelerator conditioning and operation algorithms, as well as of interest for efforts to understand the physical processes which underlie the breakdown phenomenon. Experimental data of breakdown has been collected in two distinct high-gradient experimental set-ups: A prototype linear accelerating structure operated in the Compact Linear Collider Xbox 12 GHz test stands, and a parallel plate electrode system operated with pulsed DC in the kV range. Collected data is presented, analyzed and compared. The two systems show similar, distinctive, two-part distributions of number of pulses between breakdowns, with each part corresponding to a specific, constant event rate. The correlation between distance and number of pulses between breakdown indicates that the two parts of the distribution, and their corresponding event rates, represent independent primary and induced follow-up breakdowns. The similarity of results from pulsed DC to 12 GHz rf indicates a similar vacuum arc triggering mechanism over the range of conditions covered by the experiments.
Kinetics of Cyclic Oxidation and Cracking and Finite Element Analysis of MA956 and Sapphire/MA956 Composite System

NASA Technical Reports Server (NTRS)

Lee, Kang N.; Arya, Vinod K.; Halford, Gary R.; Barrett, Charles A.

1996-01-01

Sapphire fiber-reinforced MA956 composites hold promise for significant weight savings and increased high-temperature structural capability, as compared to unreinforced MA956. As part of an overall assessment of the high-temperature characteristics of this material system, cyclic oxidation behavior was studied at 1093 C and 1204 C. Initially, both sets of coupons exhibited parabolic oxidation kinetics. Later, monolithic MA956 exhibited spallation and a linear weight loss, whereas the composite showed a linear weight gain without spallation. Weight loss of the monolithic MA956 resulted from the linking of a multiplicity of randomly oriented and closely spaced surface cracks that facilitated ready spallation. By contrast, cracking of the composite's oxide layer was nonintersecting and aligned nominally parallel with the orientation of the subsurface reinforcing fibers. Oxidative lifetime of monolithic MA956 was projected from the observed oxidation kinetics. Linear elastic, finite element continuum, and micromechanics analyses were performed on coupons of the monolithic and composite materials. Results of the analyses qualitatively agreed well with the observed oxide cracking and spallation behavior of both the MA956 and the Sapphire/MA956 composite coupons.
Implementation of the DPM Monte Carlo code on a parallel architecture for treatment planning applications.

PubMed

Tyagi, Neelam; Bose, Abhijit; Chetty, Indrin J

2004-09-01

We have parallelized the Dose Planning Method (DPM), a Monte Carlo code optimized for radiotherapy class problems, on distributed-memory processor architectures using the Message Passing Interface (MPI). Parallelization has been investigated on a variety of parallel computing architectures at the University of Michigan-Center for Advanced Computing, with respect to efficiency and speedup as a function of the number of processors. We have integrated the parallel pseudo random number generator from the Scalable Parallel Pseudo-Random Number Generator (SPRNG) library to run with the parallel DPM. The Intel cluster consisting of 800 MHz Intel Pentium III processor shows an almost linear speedup up to 32 processors for simulating 1 x 10(8) or more particles. The speedup results are nearly linear on an Athlon cluster (up to 24 processors based on availability) which consists of 1.8 GHz+ Advanced Micro Devices (AMD) Athlon processors on increasing the problem size up to 8 x 10(8) histories. For a smaller number of histories (1 x 10(8)) the reduction of efficiency with the Athlon cluster (down to 83.9% with 24 processors) occurs because the processing time required to simulate 1 x 10(8) histories is less than the time associated with interprocessor communication. A similar trend was seen with the Opteron Cluster (consisting of 1400 MHz, 64-bit AMD Opteron processors) on increasing the problem size. Because of the 64-bit architecture Opteron processors are capable of storing and processing instructions at a faster rate and hence are faster as compared to the 32-bit Athlon processors. We have validated our implementation with an in-phantom dose calculation study using a parallel pencil monoenergetic electron beam of 20 MeV energy. The phantom consists of layers of water, lung, bone, aluminum, and titanium. The agreement in the central axis depth dose curves and profiles at different depths shows that the serial and parallel codes are equivalent in accuracy.
Valles Marineris as a Cryokarstic Structure Formed by a Giant Dyke System: Support From New Analogue Experiments

NASA Astrophysics Data System (ADS)

Ozeren, M. S.; Sengor, A. M. C.; Acar, D.; Ülgen, S. C.; Onsel, I. E.

2014-12-01

Valles Marineris is the most significant near-linear depression on Mars. It is some 4000 km long, up to about 200 km wide and some 7 km deep. Although its margins look parallel at first sight, the entire structure has a long spindle shape with significant enlargement in its middle (Melas Chasma) caused by cuspate slope retreat mechanisms. Farther to its north is Hebes Chasma which is an entirely closed depression with a more pronounced spindle shape. Tithonium Chasma is a parallel, but much narrower depression to its northeast. All these chasmae have axes parallel with one another and such structures occur nowhere else on Mars. A scabland surface exists to the east of the Valles Marineris and the causative water mass seems to have issued from it. The great resemblance of these chasmae on mars to poljes in the karstic regions on earth have led us to assume that they owed their existence to dissolution of rock layers underlying them. We assumed that the dissolving layer consisted of water ice forming substantial layers, in fact entirely frozen seas of several km depth. We have simulated this geometry by using bentonite and flour layers (in different experiments) overlying layers of ice in which a resistant coil was used to simulate a dyke. We used different thicknesses of bentonite and flour overlying ice layers again of various thicknesses. The flour seems to simulate the Martian crust better because on Mars, g is only about 3/8ths of its value on Earth, so (for equal crustal density) the depth to which the cohesion term C remains important in the Mohr-Coulomb shear failure criterion is about 8/3 times greater. As examples we show two of those experiments in which both the rock analogue and ice layers were of 1.5 cm. thick. Perfect analogues of the Valles Marineris formed above the dyke analogue thermal source complete with the near-linear structure, overall flat spindle shape, cuspate margins, a central ridge, parallel side faults, parallel depressions resembling the Tithonium Chasma. When water was allowed to drain from the beginning, closed depressions formed that have an amazing resemblance to Hebes chasma. We postulate that the entire system of chasmae here discussed formed atop a major dyke swarm some 4000 km length, not dissimilar to the 3500 km long Mesoproterozoic (Ectasian) dyke swarm disrupting the Canadian Shield.
High-Beta Electromagnetic Turbulence in LAPD Plasmas

NASA Astrophysics Data System (ADS)

Rossi, G.; Carter, T. A.; Pueschel, M. J.; Jenko, F.; Told, D.; Terry, P. W.

2015-11-01

The introduction of a new LaB6 cathode plasma source in the Large Plasma Device has enabled the study of pressure-gradient-driven turbulence and transport variations at significantly higher plasma β. Density fluctuations are observed to decrease with increasing β while magnetic fluctuations increase. Furthermore, the perpendicular magnetic fluctuations are seen to saturate while parallel (compressional) magnetic fluctuations increase continuously with β. These observations are compared to linear and nonlinear simulations with the GENE code. The results are consistent with the linear excitation of a Gradient-driven Drift Coupling mode (GDC) which relies on grad-B drift due to parallel magnetic fluctuations and can be driven by density or temperature gradients.
Sparse matrix-vector multiplication on network-on-chip

NASA Astrophysics Data System (ADS)

Sun, C.-C.; Götze, J.; Jheng, H.-Y.; Ruan, S.-J.

2010-12-01

In this paper, we present an idea for performing matrix-vector multiplication by using Network-on-Chip (NoC) architecture. In traditional IC design on-chip communications have been designed with dedicated point-to-point interconnections. Therefore, regular local data transfer is the major concept of many parallel implementations. However, when dealing with the parallel implementation of sparse matrix-vector multiplication (SMVM), which is the main step of all iterative algorithms for solving systems of linear equation, the required data transfers depend on the sparsity structure of the matrix and can be extremely irregular. Using the NoC architecture makes it possible to deal with arbitrary structure of the data transfers; i.e. with the irregular structure of the sparse matrices. So far, we have already implemented the proposed SMVM-NoC architecture with the size 4×4 and 5×5 in IEEE 754 single float point precision using FPGA.
Satellite Angular Rate Estimation From Vector Measurements

NASA Technical Reports Server (NTRS)

Azor, Ruth; Bar-Itzhack, Itzhack Y.; Harman, Richard R.

1996-01-01

This paper presents an algorithm for estimating the angular rate vector of a satellite which is based on the time derivatives of vector measurements expressed in a reference and body coordinate. The computed derivatives are fed into a spacial Kalman filter which yields an estimate of the spacecraft angular velocity. The filter, named Extended Interlaced Kalman Filter (EIKF), is an extension of the Kalman filter which, although being linear, estimates the state of a nonlinear dynamic system. It consists of two or three parallel Kalman filters whose individual estimates are fed to one another and are considered as known inputs by the other parallel filter(s). The nonlinear dynamics stem from the nonlinear differential equation that describes the rotation of a three dimensional body. Initial results, using simulated data, and real Rossi X ray Timing Explorer (RXTE) data indicate that the algorithm is efficient and robust.
Multidisciplinary systems optimization by linear decomposition

NASA Technical Reports Server (NTRS)

Sobieski, J.

1984-01-01

In a typical design process major decisions are made sequentially. An illustrated example is given for an aircraft design in which the aerodynamic shape is usually decided first, then the airframe is sized for strength and so forth. An analogous sequence could be laid out for any other major industrial product, for instance, a ship. The loops in the discipline boxes symbolize iterative design improvements carried out within the confines of a single engineering discipline, or subsystem. The loops spanning several boxes depict multidisciplinary design improvement iterations. Omitted for graphical simplicity is parallelism of the disciplinary subtasks. The parallelism is important in order to develop a broad workfront necessary to shorten the design time. If all the intradisciplinary and interdisciplinary iterations were carried out to convergence, the process could yield a numerically optimal design. However, it usually stops short of that because of time and money limitations. This is especially true for the interdisciplinary iterations.
Spiral: Automated Computing for Linear Transforms

NASA Astrophysics Data System (ADS)

Püschel, Markus

2010-09-01

Writing fast software has become extraordinarily difficult. For optimal performance, programs and their underlying algorithms have to be adapted to take full advantage of the platform's parallelism, memory hierarchy, and available instruction set. To make things worse, the best implementations are often platform-dependent and platforms are constantly evolving, which quickly renders libraries obsolete. We present Spiral, a domain-specific program generation system for important functionality used in signal processing and communication including linear transforms, filters, and other functions. Spiral completely replaces the human programmer. For a desired function, Spiral generates alternative algorithms, optimizes them, compiles them into programs, and intelligently searches for the best match to the computing platform. The main idea behind Spiral is a mathematical, declarative, domain-specific framework to represent algorithms and the use of rewriting systems to generate and optimize algorithms at a high level of abstraction. Experimental results show that the code generated by Spiral competes with, and sometimes outperforms, the best available human-written code.
Block Preconditioning to Enable Physics-Compatible Implicit Multifluid Plasma Simulations

NASA Astrophysics Data System (ADS)

Phillips, Edward; Shadid, John; Cyr, Eric; Miller, Sean

2017-10-01

Multifluid plasma simulations involve large systems of partial differential equations in which many time-scales ranging over many orders of magnitude arise. Since the fastest of these time-scales may set a restrictively small time-step limit for explicit methods, the use of implicit or implicit-explicit time integrators can be more tractable for obtaining dynamics at time-scales of interest. Furthermore, to enforce properties such as charge conservation and divergence-free magnetic field, mixed discretizations using volume, nodal, edge-based, and face-based degrees of freedom are often employed in some form. Together with the presence of stiff modes due to integrating over fast time-scales, the mixed discretization makes the required linear solves for implicit methods particularly difficult for black box and monolithic solvers. This work presents a block preconditioning strategy for multifluid plasma systems that segregates the linear system based on discretization type and approximates off-diagonal coupling in block diagonal Schur complement operators. By employing multilevel methods for the block diagonal subsolves, this strategy yields algorithmic and parallel scalability which we demonstrate on a range of problems.
Two-dimensional imaging via a narrowband MIMO radar system with two perpendicular linear arrays.

PubMed

Wang, Dang-wei; Ma, Xiao-yan; Su, Yi

2010-05-01

This paper presents a system model and method for the 2-D imaging application via a narrowband multiple-input multiple-output (MIMO) radar system with two perpendicular linear arrays. Furthermore, the imaging formulation for our method is developed through a Fourier integral processing, and the parameters of antenna array including the cross-range resolution, required size, and sampling interval are also examined. Different from the spatial sequential procedure sampling the scattered echoes during multiple snapshot illuminations in inverse synthetic aperture radar (ISAR) imaging, the proposed method utilizes a spatial parallel procedure to sample the scattered echoes during a single snapshot illumination. Consequently, the complex motion compensation in ISAR imaging can be avoided. Moreover, in our array configuration, multiple narrowband spectrum-shared waveforms coded with orthogonal polyphase sequences are employed. The mainlobes of the compressed echoes from the different filter band could be located in the same range bin, and thus, the range alignment in classical ISAR imaging is not necessary. Numerical simulations based on synthetic data are provided for testing our proposed method.

1988 IEEE Aerospace Applications Conference, Park City, UT, Feb. 7-12, 1988, Digest

NASA Astrophysics Data System (ADS)

The conference presents papers on microwave applications, data and signal processing applications, related aerospace applications, and advanced microelectronic products for the aerospace industry. Topics include a high-performance antenna measurement system, microwave power beaming from earth to space, the digital enhancement of microwave component performance, and a GaAs vector processor based on parallel RISC microprocessors. Consideration is also given to unique techniques for reliable SBNR architectures, a linear analysis subsystem for CSSL-IV, and a structured singular value approach to missile autopilot analysis.
N-MODY: A Code for Collisionless N-body Simulations in Modified Newtonian Dynamics

NASA Astrophysics Data System (ADS)

Londrillo, Pasquale; Nipoti, Carlo

2011-02-01

N-MODY is a parallel particle-mesh code for collisionless N-body simulations in modified Newtonian dynamics (MOND). N-MODY is based on a numerical potential solver in spherical coordinates that solves the non-linear MOND field equation, and is ideally suited to simulate isolated stellar systems. N-MODY can be used also to compute the MOND potential of arbitrary static density distributions. A few applications of N-MODY indicate that some astrophysically relevant dynamical processes are profoundly different in MOND and in Newtonian gravity with dark matter.
Parallel scheduling of recursively defined arrays

NASA Technical Reports Server (NTRS)

Myers, T. J.; Gokhale, M. B.

1986-01-01

A new method of automatic generation of concurrent programs which constructs arrays defined by sets of recursive equations is described. It is assumed that the time of computation of an array element is a linear combination of its indices, and integer programming is used to seek a succession of hyperplanes along which array elements can be computed concurrently. The method can be used to schedule equations involving variable length dependency vectors and mutually recursive arrays. Portions of the work reported here have been implemented in the PS automatic program generation system.
Linearized electrooptic polymeric directional coupler modulator

NASA Astrophysics Data System (ADS)

Hung, Yu-Chueh

External linearized modulators are required in high-performance analog optical communication systems since the performance of conventional modulators, such as Mach-Zehnder modulators, are degraded by distortions by the nonlinearity of their transfer functions. Various linearization schemes have been proposed to increase the dynamic range of an analog optical link. Most of the optical schemes involve multiple Mach-Zehnder modulators, either in parallel or series configuration, incorporated with strict balance of RF and bias control. This is a significant challenge when it comes to practical implementation. In this dissertation, a linearized two-section directional coupler modulator made from electrooptic polymer is presented. The coupling coefficient of each section is tailored by properly tuning the refractive index contrast, which can be easily employed using the photobleaching technique in polymer technology. A two-tone test was performed to evaluate the linearity of the modulator and the spur-free dynamic range shows a 7.5 dB improvement compared to a conventional Mach-Zehnder modulator. This scheme avoids multiple modulators or complicated modulation synchronization and demonstrates a compact design in real implementation. Most of the linearization schemes up to date consider only the direct detection mode of operation. However, the RF output characteristics at the detection side are determined differently by various system parameters if a coherent link is implemented instead. Therefore, different considerations of linearization have to be examined for this kind of application. In the second part of this dissertation, the impact of various modulation scenarios on the system performance of an analog coherent optical link will be addressed. It will be shown that a directional coupler modulator is better suited at increasing the dynamic range in coherent optical links. Specific designs of a directional coupler modulator shows an SFDR improvement of 20 dB compared to a Mach-Zehnder modulator. This new type of device can be easily fabricated using photobleaching technique in eletrooptic polymer and can be utilized in various applications.
Identification of wheat varieties with a parallel-plate capacitance sensor using fisher linear discriminant analysis

USDA-ARS?s Scientific Manuscript database

Fisher’s linear discriminant (FLD) models for wheat variety classification were developed and validated. The inputs to the FLD models were the capacitance (C), impedance (Z), and phase angle ('), measured at two frequencies. Classification of wheat varieties was obtained as output of the FLD mod...
Avoiding Communication in Dense Linear Algebra

DTIC Science & Technology

2013-08-16

Definitions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 2.1.1 Asymptotic Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . 6...and parallelizing Strassen’s matrix multiplication algorithm (Chapter 11). 6 Chapter 2 Preliminaries 2.1 Notation and Definitions In this section we...between computations and algo- rithms). The following definition is based on [56]: Definition 2.1. A classical algorithm in linear algebra is one that
An information-theoretic approach to motor action decoding with a reconfigurable parallel architecture.

PubMed

Craciun, Stefan; Brockmeier, Austin J; George, Alan D; Lam, Herman; Príncipe, José C

2011-01-01

Methods for decoding movements from neural spike counts using adaptive filters often rely on minimizing the mean-squared error. However, for non-Gaussian distribution of errors, this approach is not optimal for performance. Therefore, rather than using probabilistic modeling, we propose an alternate non-parametric approach. In order to extract more structure from the input signal (neuronal spike counts) we propose using minimum error entropy (MEE), an information-theoretic approach that minimizes the error entropy as part of an iterative cost function. However, the disadvantage of using MEE as the cost function for adaptive filters is the increase in computational complexity. In this paper we present a comparison between the decoding performance of the analytic Wiener filter and a linear filter trained with MEE, which is then mapped to a parallel architecture in reconfigurable hardware tailored to the computational needs of the MEE filter. We observe considerable speedup from the hardware design. The adaptation of filter weights for the multiple-input, multiple-output linear filters, necessary in motor decoding, is a highly parallelizable algorithm. It can be decomposed into many independent computational blocks with a parallel architecture readily mapped to a field-programmable gate array (FPGA) and scales to large numbers of neurons. By pipelining and parallelizing independent computations in the algorithm, the proposed parallel architecture has sublinear increases in execution time with respect to both window size and filter order.
Characterization of the seismically imaged Tuscarora fold system and implications for layer parallel shortening in the Pennsylvania salient

NASA Astrophysics Data System (ADS)

Mount, Van S.; Wilkins, Scott; Comiskey, Cody S.

2017-12-01

The Tuscarora fold system (TFS) is located in the Pennsylvania salient in the foreland of the Valley and Ridge province. The TFS is imaged in high quality 3D seismic data and comprises a system of small-scale folds within relatively flat-lying Lower Silurian Tuscarora Formation strata. We characterize the TFS structures and infer layer parallel shortening (LPS) directions and magnitudes associated with deformation during the Alleghany Orogeny. Previously reported LPS data in our study area are from shallow Devonian and Carboniferous strata (based on outcrop and core analyses) above the shallowest of three major detachments recognized in the region. Seismic data allows us to characterize LPS at depth in strata beneath the shallow detachment. Our LPS data (orientations and inferred magnitudes) are consistent with the shallow data leading us to surmise that LPS during Alleghanian deformation fanned around the salient and was distributed throughout the stratigraphic section - and not isolated to strata above the shallow detachment. We propose that a NW-SE oriented Alleghanian maximum principal stress was perturbed by deep structure associated with the non-linear margin of Laurentia resulting in fanning of shortening directions within the salient.
A high-speed linear algebra library with automatic parallelism

NASA Technical Reports Server (NTRS)

Boucher, Michael L.

1994-01-01

Parallel or distributed processing is key to getting highest performance workstations. However, designing and implementing efficient parallel algorithms is difficult and error-prone. It is even more difficult to write code that is both portable to and efficient on many different computers. Finally, it is harder still to satisfy the above requirements and include the reliability and ease of use required of commercial software intended for use in a production environment. As a result, the application of parallel processing technology to commercial software has been extremely small even though there are numerous computationally demanding programs that would significantly benefit from application of parallel processing. This paper describes DSSLIB, which is a library of subroutines that perform many of the time-consuming computations in engineering and scientific software. DSSLIB combines the high efficiency and speed of parallel computation with a serial programming model that eliminates many undesirable side-effects of typical parallel code. The result is a simple way to incorporate the power of parallel processing into commercial software without compromising maintainability, reliability, or ease of use. This gives significant advantages over less powerful non-parallel entries in the market.
Distributed Sensing and Processing for Multi-Camera Networks

NASA Astrophysics Data System (ADS)

Sankaranarayanan, Aswin C.; Chellappa, Rama; Baraniuk, Richard G.

Sensor networks with large numbers of cameras are becoming increasingly prevalent in a wide range of applications, including video conferencing, motion capture, surveillance, and clinical diagnostics. In this chapter, we identify some of the fundamental challenges in designing such systems: robust statistical inference, computationally efficiency, and opportunistic and parsimonious sensing. We show that the geometric constraints induced by the imaging process are extremely useful for identifying and designing optimal estimators for object detection and tracking tasks. We also derive pipelined and parallelized implementations of popular tools used for statistical inference in non-linear systems, of which multi-camera systems are examples. Finally, we highlight the use of the emerging theory of compressive sensing in reducing the amount of data sensed and communicated by a camera network.
Space, energy and anisotropy effects on effective cross sections and diffusion coefficients in the resonance region

DOE Office of Scientific and Technical Information (OSTI.GOV)

Meftah, B.

1982-01-01

Present methods used in reactor analysis do not include adequately the effect of anisotropic scattering in the calculation of resonance effective cross sections. Also the assumption that the streaming term ..cap omega...del Phi is conserved when the total, absorption and transfer cross sections are conserved, is bad because the leakage from a heterogeneous cell will not be conserved and is strongly anisotropic. A third major consideration is the coupling between different regions in a multiregion reactor; currently this effect is being completely ignored. To assess the magnitude of these effects, a code based on integral transport formalism with linear anisotropicmore » scattering was developed. Also, a more adequate formulation of the diffusion coefficient in a heterogeneous cell was derived. Two reactors, one fast, ZPR-6/5, and one thermal, TRX-3, were selected for the study. The study showed that, in general, the inclusion of linear scattering anisotropy increases the cell effective capture cross section of U-238. The increase was up to 2% in TRX-3 and 0.5% in ZPR-6/5. The effect on the multiplication factor was -0.003% ..delta..k/k for ZPR-6/5 and -0.05% ..delta..k/k for TRX-3. For the case of the diffusion coefficient, the combined effect of heterogeneity and linear anisotropy gave an increase of up to 29% in the parallel diffusion coefficient of TRX-3 and 5% in the parallel diffusion coefficient of ZPR-6/5. In contrast, the change in the perpendicular diffusion coefficient did not exceed 2% in both systems.« less
A computationally efficient parallel Levenberg-Marquardt algorithm for highly parameterized inverse model analyses

NASA Astrophysics Data System (ADS)

Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.

2016-09-01

Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace such that the dimensionality of the problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2-D and a random hydraulic conductivity field in 3-D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ˜101 to ˜102 in a multicore computational environment. Therefore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate to large-scale problems.
Contributions of Mirror and Ion Bernstein Instabilities to the Scattering of Pickup Ions in the Outer Heliosheath

NASA Astrophysics Data System (ADS)

Min, Kyungguk; Liu, Kaijun

2018-01-01

Maintaining the stability of pickup ions in the outer heliosheath is a critical element for the secondary energetic neutral atom (ENA) mechanism, a theory put forth to explain the nearly annular band of ENA emission observed by the Interstellar Boundary EXplorer. A recent study showed that a pickup ion ring can remain stable to the Alfvén/ion cyclotron (AC) instability at propagation parallel to the background magnetic field when the parallel thermal spread of the ring is comparable to that of a background population. This study investigates the potential role that the mirror or ion Bernstein (IB) instabilities can play in the stability of pickup ions when conditions are such that the AC instability is suppressed. Linear Vlasov theory predicts relatively fast mirror and IB instability growth even though AC instability growth is suppressed. For a few such cases, two-dimensional hybrid and macroscopic quasi-linear simulations are carried out to examine how the unstable mirror and IB modes evolve and affect the pickup ion ring beyond the linear theory picture. For the parameters used, the mirror mode dominates initially and leads to a rapid parallel heating of the pickup ions in excess of the parallel temperature of the background protons. The heated pickup ions subsequently trigger onset of the AC mode, which grows sufficiently large to be the dominant pitch angle scattering agent after the mirror mode has decayed away. The present results indicate that the pickup ion stability needed may not be guaranteed once the mirror and IB instabilities are taken into account.
A Sparse Self-Consistent Field Algorithm and Its Parallel Implementation: Application to Density-Functional-Based Tight Binding.

PubMed

Scemama, Anthony; Renon, Nicolas; Rapacioli, Mathias

2014-06-10

We present an algorithm and its parallel implementation for solving a self-consistent problem as encountered in Hartree-Fock or density functional theory. The algorithm takes advantage of the sparsity of matrices through the use of local molecular orbitals. The implementation allows one to exploit efficiently modern symmetric multiprocessing (SMP) computer architectures. As a first application, the algorithm is used within the density-functional-based tight binding method, for which most of the computational time is spent in the linear algebra routines (diagonalization of the Fock/Kohn-Sham matrix). We show that with this algorithm (i) single point calculations on very large systems (millions of atoms) can be performed on large SMP machines, (ii) calculations involving intermediate size systems (1000-100 000 atoms) are also strongly accelerated and can run efficiently on standard servers, and (iii) the error on the total energy due to the use of a cutoff in the molecular orbital coefficients can be controlled such that it remains smaller than the SCF convergence criterion.
A new parallel-vector finite element analysis software on distributed-memory computers

NASA Technical Reports Server (NTRS)

Qin, Jiangning; Nguyen, Duc T.

1993-01-01

A new parallel-vector finite element analysis software package MPFEA (Massively Parallel-vector Finite Element Analysis) is developed for large-scale structural analysis on massively parallel computers with distributed-memory. MPFEA is designed for parallel generation and assembly of the global finite element stiffness matrices as well as parallel solution of the simultaneous linear equations, since these are often the major time-consuming parts of a finite element analysis. Block-skyline storage scheme along with vector-unrolling techniques are used to enhance the vector performance. Communications among processors are carried out concurrently with arithmetic operations to reduce the total execution time. Numerical results on the Intel iPSC/860 computers (such as the Intel Gamma with 128 processors and the Intel Touchstone Delta with 512 processors) are presented, including an aircraft structure and some very large truss structures, to demonstrate the efficiency and accuracy of MPFEA.
Efficient Parallel Algorithms for Landscape Evolution Modelling

NASA Astrophysics Data System (ADS)

Moresi, L. N.; Mather, B.; Beucher, R.

2017-12-01

Landscape erosion and the deposition of sediments by river systems are strongly controlled bytopography, rainfall patterns, and the susceptibility of the basement to the action ofrunning water. It is well understood that each of these processes depends on the other, for example:topography results from active tectonic processes; deformation, metamorphosis andexhumation alter the competence of the basement; rainfall patterns depend on topography;uplift and subsidence in response to tectonic stress can be amplified by erosionand sediment deposition. We typically gain understanding of such coupled systems through forward models which capture theessential interactions of the various components and attempt parameterise those parts of the individual systemthat are unresolvable at the scale of the interaction. Here we address the problem of predicting erosion and deposition rates at a continental scalewith a resolution of tens to hundreds of metres in a dynamic, Lagrangian framework. This isa typical requirement for a code to interface with a mantle / lithosphere dynamics model anddemands an efficient, unstructured, parallel implementation. We address this through a very general algorithm that treats all parts of the landscape evolution equationsin sparse-matrix form including those for stream-flow accumulation, dam-filling and catchment determination. This givesus considerable flexibility in developing unstructured, parallel code, and in creating a modular packagethat can be configured by users to work at different temporal and spatial scales, but is also has potential advantagesin treating the non-linear parts of the problem in a general manner.
Development and Verification of the Charring Ablating Thermal Protection Implicit System Solver

NASA Technical Reports Server (NTRS)

Amar, Adam J.; Calvert, Nathan D.; Kirk, Benjamin S.

2010-01-01

The development and verification of the Charring Ablating Thermal Protection Implicit System Solver is presented. This work concentrates on the derivation and verification of the stationary grid terms in the equations that govern three-dimensional heat and mass transfer for charring thermal protection systems including pyrolysis gas flow through the porous char layer. The governing equations are discretized according to the Galerkin finite element method with first and second order implicit time integrators. The governing equations are fully coupled and are solved in parallel via Newton's method, while the fully implicit linear system is solved with the Generalized Minimal Residual method. Verification results from exact solutions and the Method of Manufactured Solutions are presented to show spatial and temporal orders of accuracy as well as nonlinear convergence rates.
Development and Verification of the Charring, Ablating Thermal Protection Implicit System Simulator

NASA Technical Reports Server (NTRS)

Amar, Adam J.; Calvert, Nathan; Kirk, Benjamin S.

2011-01-01

The development and verification of the Charring Ablating Thermal Protection Implicit System Solver (CATPISS) is presented. This work concentrates on the derivation and verification of the stationary grid terms in the equations that govern three-dimensional heat and mass transfer for charring thermal protection systems including pyrolysis gas flow through the porous char layer. The governing equations are discretized according to the Galerkin finite element method (FEM) with first and second order fully implicit time integrators. The governing equations are fully coupled and are solved in parallel via Newton s method, while the linear system is solved via the Generalized Minimum Residual method (GMRES). Verification results from exact solutions and Method of Manufactured Solutions (MMS) are presented to show spatial and temporal orders of accuracy as well as nonlinear convergence rates.
A numerical differentiation library exploiting parallel architectures

NASA Astrophysics Data System (ADS)

Voglis, C.; Hadjidoukas, P. E.; Lagaris, I. E.; Papageorgiou, D. G.

2009-08-01

We present a software library for numerically estimating first and second order partial derivatives of a function by finite differencing. Various truncation schemes are offered resulting in corresponding formulas that are accurate to order O(h), O(h), and O(h), h being the differencing step. The derivatives are calculated via forward, backward and central differences. Care has been taken that only feasible points are used in the case where bound constraints are imposed on the variables. The Hessian may be approximated either from function or from gradient values. There are three versions of the software: a sequential version, an OpenMP version for shared memory architectures and an MPI version for distributed systems (clusters). The parallel versions exploit the multiprocessing capability offered by computer clusters, as well as modern multi-core systems and due to the independent character of the derivative computation, the speedup scales almost linearly with the number of available processors/cores. Program summaryProgram title: NDL (Numerical Differentiation Library) Catalogue identifier: AEDG_v1_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEDG_v1_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 73 030 No. of bytes in distributed program, including test data, etc.: 630 876 Distribution format: tar.gz Programming language: ANSI FORTRAN-77, ANSI C, MPI, OPENMP Computer: Distributed systems (clusters), shared memory systems Operating system: Linux, Solaris Has the code been vectorised or parallelized?: Yes RAM: The library uses O(N) internal storage, N being the dimension of the problem Classification: 4.9, 4.14, 6.5 Nature of problem: The numerical estimation of derivatives at several accuracy levels is a common requirement in many computational tasks, such as optimization, solution of nonlinear systems, etc. The parallel implementation that exploits systems with multiple CPUs is very important for large scale and computationally expensive problems. Solution method: Finite differencing is used with carefully chosen step that minimizes the sum of the truncation and round-off errors. The parallel versions employ both OpenMP and MPI libraries. Restrictions: The library uses only double precision arithmetic. Unusual features: The software takes into account bound constraints, in the sense that only feasible points are used to evaluate the derivatives, and given the level of the desired accuracy, the proper formula is automatically employed. Running time: Running time depends on the function's complexity. The test run took 15 ms for the serial distribution, 0.6 s for the OpenMP and 4.2 s for the MPI parallel distribution on 2 processors.
Fiber Optic Sensor Embedment Study for Multi-Parameter Strain Sensing

PubMed Central

Drissi-Habti, Monssef; Raman, Venkadesh; Khadour, Aghiad; Timorian, Safiullah

2017-01-01

The fiber optic sensors (FOSs) are commonly used for large-scale structure monitoring systems for their small size, noise free and low electrical risk characteristics. Embedded fiber optic sensors (FOSs) lead to micro-damage in composite structures. This damage generation threshold is based on the coating material of the FOSs and their diameter. In addition, embedded FOSs are aligned parallel to reinforcement fibers to avoid micro-damage creation. This linear positioning of distributed FOS fails to provide all strain parameters. We suggest novel sinusoidal sensor positioning to overcome this issue. This method tends to provide multi-parameter strains in a large surface area. The effectiveness of sinusoidal FOS positioning over linear FOS positioning is studied under both numerical and experimental methods. This study proves the advantages of the sinusoidal positioning method for FOS in composite material’s bonding. PMID:28333117

A quasi-linear analysis of the impurity effect on turbulent momentum transport and residual stress

DOE Office of Scientific and Technical Information (OSTI.GOV)

Ko, S. H., E-mail: shko@nfri.re.kr; Jhang, Hogun; Singh, R.

2015-08-15

We study the impact of impurities on turbulence driven intrinsic rotation (via residual stress) in the context of the quasi-linear theory. A two-fluid formulation for main and impurity ions is employed to study ion temperature gradient modes in sheared slab geometry modified by the presence of impurities. An effective form of the parallel Reynolds stress is derived in the center of mass frame of a coupled main ion-impurity system. Analyses show that the contents and the radial profile of impurities have a strong influence on the residual stress. In particular, an impurity profile aligned with that of main ions ismore » shown to cause a considerable reduction of the residual stress, which may lead to the reduction of turbulence driven intrinsic rotation.« less
Experimental study of the laminar-turbulent transition of a concave wall in a parallel flow

NASA Technical Reports Server (NTRS)

Bippes, H.

1978-01-01

The instability of the laminar boundary layer flow along a concave wall was studied. Observations of these three-dimensional boundary layer phenomena were made using the hydrogen-bubble visualization technique. With the application of stereo-photogrammetric methods in the air-water system it was possible to investigate the flow processes qualitatively and quantitatively. In the case of a concave wall of sufficient curvature, a primary instability occurs first in the form of Goertler vortices with wave lengths depending upon the boundary layer thickness and the wall curvature. At the onset the amplification rate is in agreement with the linear theory. Later, during the non-linear amplification stage, periodic spanwise vorticity concentrations develop in the low velocity region between the longitudinal vortices. Then a meandering motion of the longitudinal vortex streets subsequently ensues, leading to turbulence.
Integration of a Decentralized Linear-Quadratic-Gaussian Control into GSFC's Universal 3-D Autonomous Formation Flying Algorithm

NASA Technical Reports Server (NTRS)

Folta, David C.; Carpenter, J. Russell

1999-01-01

A decentralized control is investigated for applicability to the autonomous formation flying control algorithm developed by GSFC for the New Millenium Program Earth Observer-1 (EO-1) mission. This decentralized framework has the following characteristics: The approach is non-hierarchical, and coordination by a central supervisor is not required; Detected failures degrade the system performance gracefully; Each node in the decentralized network processes only its own measurement data, in parallel with the other nodes; Although the total computational burden over the entire network is greater than it would be for a single, centralized controller, fewer computations are required locally at each node; Requirements for data transmission between nodes are limited to only the dimension of the control vector, at the cost of maintaining a local additional data vector. The data vector compresses all past measurement history from all the nodes into a single vector of the dimension of the state; and The approach is optimal with respect to standard cost functions. The current approach is valid for linear time-invariant systems only. Similar to the GSFC formation flying algorithm, the extension to linear LQG time-varying systems requires that each node propagate its filter covariance forward (navigation) and controller Riccati matrix backward (guidance) at each time step. Extension of the GSFC algorithm to non-linear systems can also be accomplished via linearization about a reference trajectory in the standard fashion, or linearization about the current state estimate as with the extended Kalman filter. To investigate the feasibility of the decentralized integration with the GSFC algorithm, an existing centralized LQG design for a single spacecraft orbit control problem is adapted to the decentralized framework while using the GSFC algorithm's state transition matrices and framework. The existing GSFC design uses both reference trajectories of each spacecraft in formation and by appropriate choice of coordinates and simplified measurement modeling is formulated as a linear time-invariant system. Results for improvements to the GSFC algorithm and a multiple satellite formation will be addressed. The goal of this investigation is to progressively relax the assumptions that result in linear time-invariance, ultimately to the point of linearization of the non-linear dynamics about the current state estimate as in the extended Kalman filter. An assessment will then be made about the feasibility of the decentralized approach to the realistic formation flying application of the EO-1/Landsat 7 formation flying experiment.
Distributed Memory Parallel Computing with SEAWAT

NASA Astrophysics Data System (ADS)

Verkaik, J.; Huizer, S.; van Engelen, J.; Oude Essink, G.; Ram, R.; Vuik, K.

2017-12-01

Fresh groundwater reserves in coastal aquifers are threatened by sea-level rise, extreme weather conditions, increasing urbanization and associated groundwater extraction rates. To counteract these threats, accurate high-resolution numerical models are required to optimize the management of these precious reserves. The major model drawbacks are long run times and large memory requirements, limiting the predictive power of these models. Distributed memory parallel computing is an efficient technique for reducing run times and memory requirements, where the problem is divided over multiple processor cores. A new Parallel Krylov Solver (PKS) for SEAWAT is presented. PKS has recently been applied to MODFLOW and includes Conjugate Gradient (CG) and Biconjugate Gradient Stabilized (BiCGSTAB) linear accelerators. Both accelerators are preconditioned by an overlapping additive Schwarz preconditioner in a way that: a) subdomains are partitioned using Recursive Coordinate Bisection (RCB) load balancing, b) each subdomain uses local memory only and communicates with other subdomains by Message Passing Interface (MPI) within the linear accelerator, c) it is fully integrated in SEAWAT. Within SEAWAT, the PKS-CG solver replaces the Preconditioned Conjugate Gradient (PCG) solver for solving the variable-density groundwater flow equation and the PKS-BiCGSTAB solver replaces the Generalized Conjugate Gradient (GCG) solver for solving the advection-diffusion equation. PKS supports the third-order Total Variation Diminishing (TVD) scheme for computing advection. Benchmarks were performed on the Dutch national supercomputer (https://userinfo.surfsara.nl/systems/cartesius) using up to 128 cores, for a synthetic 3D Henry model (100 million cells) and the real-life Sand Engine model ( 10 million cells). The Sand Engine model was used to investigate the potential effect of the long-term morphological evolution of a large sand replenishment and climate change on fresh groundwater resources. Speed-ups up to 40 were obtained with the new PKS solver.
A study of the effect of a boundary layer profile on the dynamic response and acoustic radiation of flat panels. Ph.D. Thesis - Virginia Univ.

NASA Technical Reports Server (NTRS)

Mixson, J. S.

1973-01-01

The response of a thin, elastic plate to a harmonic force which drives the plate from below and a compressible air stream with a viscous boundary layer flowing parallel to the upper surface along the length was investigated. Equations governing the forced response of the coupled plate-aerodynamic system are derived along with appropriate boundary conditions. Calculations of basic solution parameters for a linear velocity profile and for a Blasius profile showed that the same system response could be obtained from each profile if appropriate values of boundary layer thickness were chosen for each profile.
Hybrid Structure Multichannel All-Fiber Current Sensor.

PubMed

Jiang, Junzhen; Zhang, Hao; He, Youwu; Qiu, Yishen

2017-08-02

We have experimentally developed a hybrid-structure multi-channel all-fiber current sensor with ordinary silica fiber using fiber loop architecture. According to the rationale of time division multiplexing, the sensor combines parallel and serial structures. The purpose of the hybrid-structure multi-channel all-fiber current sensor is to get more information from the different measured points simultaneously. In addition, the hybrid-structure fiber current sensor exhibited a good linear response for each channel. A three-channel experiment was performed in the study and showed that the system could detect different current positions. Each channel could individually detect the current and needed a separate calibration system. Furthermore, the three channels will not affect each other.
String resistance detector

NASA Technical Reports Server (NTRS)

Hall, A. Daniel (Inventor); Davies, Francis J. (Inventor)

2007-01-01

Method and system are disclosed for determining individual string resistance in a network of strings when the current through a parallel connected string is unknown and when the voltage across a series connected string is unknown. The method/system of the invention involves connecting one or more frequency-varying impedance components with known electrical characteristics to each string and applying a frequency-varying input signal to the network of strings. The frequency-varying impedance components may be one or more capacitors, inductors, or both, and are selected so that each string is uniquely identifiable in the output signal resulting from the frequency-varying input signal. Numerical methods, such as non-linear regression, may then be used to resolve the resistance associated with each string.
Recursive inverse factorization.

PubMed

Rubensson, Emanuel H; Bock, Nicolas; Holmström, Erik; Niklasson, Anders M N

2008-03-14

A recursive algorithm for the inverse factorization S(-1)=ZZ(*) of Hermitian positive definite matrices S is proposed. The inverse factorization is based on iterative refinement [A.M.N. Niklasson, Phys. Rev. B 70, 193102 (2004)] combined with a recursive decomposition of S. As the computational kernel is matrix-matrix multiplication, the algorithm can be parallelized and the computational effort increases linearly with system size for systems with sufficiently sparse matrices. Recent advances in network theory are used to find appropriate recursive decompositions. We show that optimization of the so-called network modularity results in an improved partitioning compared to other approaches. In particular, when the recursive inverse factorization is applied to overlap matrices of irregularly structured three-dimensional molecules.
Spatial proximity effects on the excitation of sheath RF voltages by evanescent slow waves in the ion cyclotron range of frequencies

NASA Astrophysics Data System (ADS)

Colas, Laurent; Lu, Ling-Feng; Křivská, Alena; Jacquot, Jonathan; Hillairet, Julien; Helou, Walid; Goniche, Marc; Heuraux, Stéphane; Faudot, Eric

2017-02-01

We investigate theoretically how sheath radio-frequency (RF) oscillations relate to the spatial structure of the near RF parallel electric field E ∥ emitted by ion cyclotron (IC) wave launchers. We use a simple model of slow wave (SW) evanescence coupled with direct current (DC) plasma biasing via sheath boundary conditions in a 3D parallelepiped filled with homogeneous cold magnetized plasma. Within a ‘wide-sheath’ asymptotic regime, valid for large-amplitude near RF fields, the RF part of this simple RF + DC model becomes linear: the sheath oscillating voltage V RF at open field line boundaries can be re-expressed as a linear combination of individual contributions by every emitting point in the input field map. SW evanescence makes individual contributions all the larger as the wave emission point is located closer to the sheath walls. The decay of |V RF| with the emission point/sheath poloidal distance involves the transverse SW evanescence length and the radial protrusion depth of lateral boundaries. The decay of |V RF| with the emitter/sheath parallel distance is quantified as a function of the parallel SW evanescence length and the parallel connection length of open magnetic field lines. For realistic geometries and target SOL plasmas, poloidal decay occurs over a few centimeters. Typical parallel decay lengths for |V RF| are found to be smaller than IC antenna parallel extension. Oscillating sheath voltages at IC antenna side limiters are therefore mainly sensitive to E ∥ emission by active or passive conducting elements near these limiters, as suggested by recent experimental observations. Parallel proximity effects could also explain why sheath oscillations persist with antisymmetric strap toroidal phasing, despite the parallel antisymmetry of the radiated field map. They could finally justify current attempts at reducing the RF fields induced near antenna boxes to attenuate sheath oscillations in their vicinity.
T-Slide Linear Actuators

NASA Technical Reports Server (NTRS)

Vranish, John

2009-01-01

T-slide linear actuators use gear bearing differential epicyclical transmissions (GBDETs) to directly drive a linear rack, which, in turn, performs the actuation. Conventional systems use a rotary power source in conjunction with a nut and screw to provide linear motion. Non-back-drive properties of GBDETs make the new actuator more direct and simpler. Versions of this approach will serve as a long-stroke, ultra-precision, position actuator for NASA science instruments, and as a rugged, linear actuator for NASA deployment duties. The T slide can operate effectively in the presence of side forces and torques. Versions of the actuator can perform ultra-precision positioning. A basic T-slide actuator is a long-stroke, rack-and-pinion linear actuator that, typically, consists of a T-slide, several idlers, a transmission to drive the slide (powered by an electric motor) and a housing that holds the entire assembly. The actuator is driven by gear action on its top surface, and is guided and constrained by gear-bearing idlers on its other two parallel surfaces. The geometry, implemented with gear-bearing technology, is particularly effective. An electronic motor operating through a GBDET can directly drive the T slide against large loads, as a rack and pinion linear actuator, with no break and no danger of back driving. The actuator drives the slide into position and stops. The slide holes position with power off and no brake, regardless of load. With the T slide configuration, this GBDET has an entire T-gear surface on which to operate. The GB idlers coupling the other two T slide parallel surfaces to their housing counterpart surfaces provide constraints in five degrees-of-freedom and rolling friction in the direction of actuation. Multiple GB idlers provide roller bearing strength sufficient to support efficient, rolling friction movement, even in the presence of large, resisting forces. T-slide actuators can be controlled using the combination of an off-the-shelf, electric servomotor, a motor angle resolution sensor (typically an encoder or resolver), and microprocessor-based intelligent software. In applications requiring precision positioning, it may be necessary to add strain gauges to the T-slide housing. Existing sensory- interactive motion control art will work for T slides. For open-loop positioning, a stepping motor emulation technique can be used.
Hypersonic Boundary Layer Instability Over a Corner

NASA Technical Reports Server (NTRS)

Balakumar, Ponnampalam; Zhao, Hong-Wu; McClinton, Charles (Technical Monitor)

2001-01-01

A boundary-layer transition study over a compression corner was conducted under a hypersonic flow condition. Due to the discontinuities in boundary layer flow, the full Navier-Stokes equations were solved to simulate the development of disturbance in the boundary layer. A linear stability analysis and PSE method were used to get the initial disturbance for parallel and non-parallel flow respectively. A 2-D code was developed to solve the full Navier-stokes by using WENO(weighted essentially non-oscillating) scheme. The given numerical results show the evolution of the linear disturbance for the most amplified disturbance in supersonic and hypersonic flow over a compression ramp. The nonlinear computations also determined the minimal amplitudes necessary to cause transition at a designed location.
Parallel processing and expert systems

NASA Technical Reports Server (NTRS)

Yan, Jerry C.; Lau, Sonie

1991-01-01

Whether it be monitoring the thermal subsystem of Space Station Freedom, or controlling the navigation of the autonomous rover on Mars, NASA missions in the 90's cannot enjoy an increased level of autonomy without the efficient use of expert systems. Merely increasing the computational speed of uniprocessors may not be able to guarantee that real time demands are met for large expert systems. Speed-up via parallel processing must be pursued alongside the optimization of sequential implementations. Prototypes of parallel expert systems have been built at universities and industrial labs in the U.S. and Japan. The state-of-the-art research in progress related to parallel execution of expert systems was surveyed. The survey is divided into three major sections: (1) multiprocessors for parallel expert systems; (2) parallel languages for symbolic computations; and (3) measurements of parallelism of expert system. Results to date indicate that the parallelism achieved for these systems is small. In order to obtain greater speed-ups, data parallelism and application parallelism must be exploited.
Combined effect of external damper and cross-tie on the modal response of hybrid two-cable networks

NASA Astrophysics Data System (ADS)

Ahmad, Javaid; Cheng, Shaohong; Ghrib, Faouzi

2018-03-01

Combining external dampers and cross-ties into a hybrid system to control bridge stay cable vibrations can address deficiencies associated with these two commonly used vibration control solutions while retaining their respective merits. Despite successful implementation of this strategy on a few cable-stayed bridges, behavior of such a structural system is still not fully understood. In the current study, an analytical model of a hybrid system consisting of two parallel taut cables interconnected by a transverse linear flexible cross-tie, with one cable also equipped with a transverse linear viscous damper close to one end support, is developed. The proposed model is validated by an experimental work in the literature and an independent numerical simulation. A parametric study is conducted to comprehend the impact of main design parameters on the performance of a hybrid system in terms of the in-plane frequency, the damping and the degree of mode localization of the system's fundamental mode. In addition, the concept of isoquant curve is applied not only to appreciate the effect of simultaneous variation in main design parameters on the modal behavior of a hybrid system, but also to identify the optimal ranges of these parameters to achieve the required cable vibration control effect.
Incomplete data based parameter identification of nonlinear and time-variant oscillators with fractional derivative elements

NASA Astrophysics Data System (ADS)

Kougioumtzoglou, Ioannis A.; dos Santos, Ketson R. M.; Comerford, Liam

2017-09-01

Various system identification techniques exist in the literature that can handle non-stationary measured time-histories, or cases of incomplete data, or address systems following a fractional calculus modeling. However, there are not many (if any) techniques that can address all three aforementioned challenges simultaneously in a consistent manner. In this paper, a novel multiple-input/single-output (MISO) system identification technique is developed for parameter identification of nonlinear and time-variant oscillators with fractional derivative terms subject to incomplete non-stationary data. The technique utilizes a representation of the nonlinear restoring forces as a set of parallel linear sub-systems. In this regard, the oscillator is transformed into an equivalent MISO system in the wavelet domain. Next, a recently developed L1-norm minimization procedure based on compressive sensing theory is applied for determining the wavelet coefficients of the available incomplete non-stationary input-output (excitation-response) data. Finally, these wavelet coefficients are utilized to determine appropriately defined time- and frequency-dependent wavelet based frequency response functions and related oscillator parameters. Several linear and nonlinear time-variant systems with fractional derivative elements are used as numerical examples to demonstrate the reliability of the technique even in cases of noise corrupted and incomplete data.
Noise fluctuations and drive dependence of the skyrmion Hall effect in disordered systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Reichhardt, Charles; Olson Reichhardt, Cynthia Jane

Using a particle-based simulation model, we show that quenched disorder creates a drive-dependent skyrmion Hall effect as measured by the change in the ratiomore » $$R={V}_{\\perp }/{V}_{| | }$$ of the skyrmion velocity perpendicular (V ⊥) and parallel ($${V}_{| | }$$) to an external drive. R is zero at depinning and increases linearly with increasing drive, in agreement with recent experimental observations. At sufficiently high drives where the skyrmions enter a free flow regime, R saturates to the disorder-free limit. In addition, this behavior is robust for a wide range of disorder strengths and intrinsic Hall angle values, and occurs whenever plastic flow is present. For systems with small intrinsic Hall angles, we find that the Hall angle increases linearly with external drive, as also observed in experiment. In the weak pinning regime where the skyrmion lattice depins elastically, R is nonlinear and the net direction of the skyrmion lattice motion can rotate as a function of external drive. We show that the changes in the skyrmion Hall effect correlate with changes in the power spectrum of the skyrmion velocity noise fluctuations. The plastic flow regime is associated with $1/f$ noise, while in the regime in which R has saturated, the noise is white with a weak narrow band signal, and the noise power drops by several orders of magnitude. Finally, at low drives, the velocity noise in the perpendicular and parallel directions is of the same order of magnitude, while at intermediate drives the perpendicular noise fluctuations are much larger.« less
Noise fluctuations and drive dependence of the skyrmion Hall effect in disordered systems

DOE PAGES

Reichhardt, Charles; Olson Reichhardt, Cynthia Jane

2016-09-29

Using a particle-based simulation model, we show that quenched disorder creates a drive-dependent skyrmion Hall effect as measured by the change in the ratiomore » $$R={V}_{\\perp }/{V}_{| | }$$ of the skyrmion velocity perpendicular (V ⊥) and parallel ($${V}_{| | }$$) to an external drive. R is zero at depinning and increases linearly with increasing drive, in agreement with recent experimental observations. At sufficiently high drives where the skyrmions enter a free flow regime, R saturates to the disorder-free limit. In addition, this behavior is robust for a wide range of disorder strengths and intrinsic Hall angle values, and occurs whenever plastic flow is present. For systems with small intrinsic Hall angles, we find that the Hall angle increases linearly with external drive, as also observed in experiment. In the weak pinning regime where the skyrmion lattice depins elastically, R is nonlinear and the net direction of the skyrmion lattice motion can rotate as a function of external drive. We show that the changes in the skyrmion Hall effect correlate with changes in the power spectrum of the skyrmion velocity noise fluctuations. The plastic flow regime is associated with $1/f$ noise, while in the regime in which R has saturated, the noise is white with a weak narrow band signal, and the noise power drops by several orders of magnitude. Finally, at low drives, the velocity noise in the perpendicular and parallel directions is of the same order of magnitude, while at intermediate drives the perpendicular noise fluctuations are much larger.« less
Multi-thread parallel algorithm for reconstructing 3D large-scale porous structures

NASA Astrophysics Data System (ADS)

Ju, Yang; Huang, Yaohui; Zheng, Jiangtao; Qian, Xu; Xie, Heping; Zhao, Xi

2017-04-01

Geomaterials inherently contain many discontinuous, multi-scale, geometrically irregular pores, forming a complex porous structure that governs their mechanical and transport properties. The development of an efficient reconstruction method for representing porous structures can significantly contribute toward providing a better understanding of the governing effects of porous structures on the properties of porous materials. In order to improve the efficiency of reconstructing large-scale porous structures, a multi-thread parallel scheme was incorporated into the simulated annealing reconstruction method. In the method, four correlation functions, which include the two-point probability function, the linear-path functions for the pore phase and the solid phase, and the fractal system function for the solid phase, were employed for better reproduction of the complex well-connected porous structures. In addition, a random sphere packing method and a self-developed pre-conditioning method were incorporated to cast the initial reconstructed model and select independent interchanging pairs for parallel multi-thread calculation, respectively. The accuracy of the proposed algorithm was evaluated by examining the similarity between the reconstructed structure and a prototype in terms of their geometrical, topological, and mechanical properties. Comparisons of the reconstruction efficiency of porous models with various scales indicated that the parallel multi-thread scheme significantly shortened the execution time for reconstruction of a large-scale well-connected porous model compared to a sequential single-thread procedure.
A communication-avoiding, hybrid-parallel, rank-revealing orthogonalization method.

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hoemmen, Mark

2010-11-01

Orthogonalization consumes much of the run time of many iterative methods for solving sparse linear systems and eigenvalue problems. Commonly used algorithms, such as variants of Gram-Schmidt or Householder QR, have performance dominated by communication. Here, 'communication' includes both data movement between the CPU and memory, and messages between processors in parallel. Our Tall Skinny QR (TSQR) family of algorithms requires asymptotically fewer messages between processors and data movement between CPU and memory than typical orthogonalization methods, yet achieves the same accuracy as Householder QR factorization. Furthermore, in block orthogonalizations, TSQR is faster and more accurate than existing approaches formore » orthogonalizing the vectors within each block ('normalization'). TSQR's rank-revealing capability also makes it useful for detecting deflation in block iterative methods, for which existing approaches sacrifice performance, accuracy, or both. We have implemented a version of TSQR that exploits both distributed-memory and shared-memory parallelism, and supports real and complex arithmetic. Our implementation is optimized for the case of orthogonalizing a small number (5-20) of very long vectors. The shared-memory parallel component uses Intel's Threading Building Blocks, though its modular design supports other shared-memory programming models as well, including computation on the GPU. Our implementation achieves speedups of 2 times or more over competing orthogonalizations. It is available now in the development branch of the Trilinos software package, and will be included in the 10.8 release.« less
PLASMA TURBULENCE AND KINETIC INSTABILITIES AT ION SCALES IN THE EXPANDING SOLAR WIND

DOE Office of Scientific and Technical Information (OSTI.GOV)

Hellinger, Petr; Trávnícek, Pavel M.; Matteini, Lorenzo

The relationship between a decaying strong turbulence and kinetic instabilities in a slowly expanding plasma is investigated using two-dimensional (2D) hybrid expanding box simulations. We impose an initial ambient magnetic field perpendicular to the simulation box, and we start with a spectrum of large-scale, linearly polarized, random-phase Alfvénic fluctuations that have energy equipartition between kinetic and magnetic fluctuations and vanishing correlation between the two fields. A turbulent cascade rapidly develops; magnetic field fluctuations exhibit a power-law spectrum at large scales and a steeper spectrum at ion scales. The turbulent cascade leads to an overall anisotropic proton heating, protons are heatedmore » in the perpendicular direction, and, initially, also in the parallel direction. The imposed expansion leads to generation of a large parallel proton temperature anisotropy which is at later stages partly reduced by turbulence. The turbulent heating is not sufficient to overcome the expansion-driven perpendicular cooling and the system eventually drives the oblique firehose instability in a form of localized nonlinear wave packets which efficiently reduce the parallel temperature anisotropy. This work demonstrates that kinetic instabilities may coexist with strong plasma turbulence even in a constrained 2D regime.« less
Plug-in nanoliter pneumatic liquid dispenser with nozzle design flexibility

PubMed Central

Choi, In Ho; Kim, Hojin; Lee, Sanghyun; Baek, Seungbum; Kim, Joonwon

2015-01-01

This paper presents a novel plug-in nanoliter liquid dispensing system with a plug-and-play interface for simple and reversible, yet robust integration of the dispenser. A plug-in type dispenser was developed to facilitate assembly and disassembly with an actuating part through efficient modularization. The entire process for assembly and operation of the plug-in dispenser is performed via the plug-and-play interface in less than a minute without loss of dispensing quality. The minimum volume of droplets pneumatically dispensed using the plug-in dispenser was 124 nl with a coefficient of variation of 1.6%. The dispensed volume increased linearly with the nozzle size. Utilizing this linear relationship, two types of multinozzle dispensers consisting of six parallel channels (emerging from an inlet) and six nozzles were developed to demonstrate a novel strategy for volume gradient dispensing at a single operating condition. The droplet volume dispensed from each nozzle also increased linearly with nozzle size, demonstrating that nozzle size is a dominant factor on dispensed volume, even for multinozzle dispensing. Therefore, the proposed plug-in dispenser enables flexible design of nozzles and reversible integration to dispense droplets with different volumes, depending on the application. Furthermore, to demonstrate the practicality of the proposed dispensing system, we developed a pencil-type dispensing system as an alternative to a conventional pipette for rapid and reliable dispensing of minute volume droplets. PMID:26594263

Plug-in nanoliter pneumatic liquid dispenser with nozzle design flexibility.

PubMed

Choi, In Ho; Kim, Hojin; Lee, Sanghyun; Baek, Seungbum; Kim, Joonwon

2015-11-01

This paper presents a novel plug-in nanoliter liquid dispensing system with a plug-and-play interface for simple and reversible, yet robust integration of the dispenser. A plug-in type dispenser was developed to facilitate assembly and disassembly with an actuating part through efficient modularization. The entire process for assembly and operation of the plug-in dispenser is performed via the plug-and-play interface in less than a minute without loss of dispensing quality. The minimum volume of droplets pneumatically dispensed using the plug-in dispenser was 124 nl with a coefficient of variation of 1.6%. The dispensed volume increased linearly with the nozzle size. Utilizing this linear relationship, two types of multinozzle dispensers consisting of six parallel channels (emerging from an inlet) and six nozzles were developed to demonstrate a novel strategy for volume gradient dispensing at a single operating condition. The droplet volume dispensed from each nozzle also increased linearly with nozzle size, demonstrating that nozzle size is a dominant factor on dispensed volume, even for multinozzle dispensing. Therefore, the proposed plug-in dispenser enables flexible design of nozzles and reversible integration to dispense droplets with different volumes, depending on the application. Furthermore, to demonstrate the practicality of the proposed dispensing system, we developed a pencil-type dispensing system as an alternative to a conventional pipette for rapid and reliable dispensing of minute volume droplets.
Simultaneously driven linear and nonlinear spatial encoding fields in MRI.

PubMed

Gallichan, Daniel; Cocosco, Chris A; Dewdney, Andrew; Schultz, Gerrit; Welz, Anna; Hennig, Jürgen; Zaitsev, Maxim

2011-03-01

Spatial encoding in MRI is conventionally achieved by the application of switchable linear encoding fields. The general concept of the recently introduced PatLoc (Parallel Imaging Technique using Localized Gradients) encoding is to use nonlinear fields to achieve spatial encoding. Relaxing the requirement that the encoding fields must be linear may lead to improved gradient performance or reduced peripheral nerve stimulation. In this work, a custom-built insert coil capable of generating two independent quadratic encoding fields was driven with high-performance amplifiers within a clinical MR system. In combination with the three linear encoding fields, the combined hardware is capable of independently manipulating five spatial encoding fields. With the linear z-gradient used for slice-selection, there remain four separate channels to encode a 2D-image. To compare trajectories of such multidimensional encoding, the concept of a local k-space is developed. Through simulations, reconstructions using six gradient-encoding strategies were compared, including Cartesian encoding separately or simultaneously on both PatLoc and linear gradients as well as two versions of a radial-based in/out trajectory. Corresponding experiments confirmed that such multidimensional encoding is practically achievable and demonstrated that the new radial-based trajectory offers the PatLoc property of variable spatial resolution while maintaining finite resolution across the entire field-of-view. Copyright © 2010 Wiley-Liss, Inc.
Method of forming oriented block copolymer line patterns, block copolymer line patterns formed thereby, and their use to form patterned articles

DOEpatents

Russell, Thomas P.; Hong, Sung Woo; Lee, Doug Hyun; Park, Soojin; Xu, Ting

2015-10-13

A block copolymer film having a line pattern with a high degree of long-range order is formed by a method that includes forming a block copolymer film on a substrate surface with parallel facets, and annealing the block copolymer film to form an annealed block copolymer film having linear microdomains parallel to the substrate surface and orthogonal to the parallel facets of the substrate. The line-patterned block copolymer films are useful for the fabrication of magnetic storage media, polarizing devices, and arrays of nanowires.
Method of forming oriented block copolymer line patterns, block copolymer line patterns formed thereby, and their use to form patterned articles

DOE Office of Scientific and Technical Information (OSTI.GOV)

Russell, Thomas P.; Hong, Sung Woo; Lee, Dong Hyun

A block copolymer film having a line pattern with a high degree of long-range order is formed by a method that includes forming a block copolymer film on a substrate surface with parallel facets, and annealing the block copolymer film to form an annealed block copolymer film having linear microdomains parallel to the substrate surface and orthogonal to the parallel facets of the substrate. The line-patterned block copolymer films are useful for the fabrication of magnetic storage media, polarizing devices, and arrays of nanowires.
Application of integration algorithms in a parallel processing environment for the simulation of jet engines

NASA Technical Reports Server (NTRS)

Krosel, S. M.; Milner, E. J.

1982-01-01

The application of Predictor corrector integration algorithms developed for the digital parallel processing environment are investigated. The algorithms are implemented and evaluated through the use of a software simulator which provides an approximate representation of the parallel processing hardware. Test cases which focus on the use of the algorithms are presented and a specific application using a linear model of a turbofan engine is considered. Results are presented showing the effects of integration step size and the number of processors on simulation accuracy. Real time performance, interprocessor communication, and algorithm startup are also discussed.
The Programming Language Python In Earth System Simulations

NASA Astrophysics Data System (ADS)

Gross, L.; Imranullah, A.; Mora, P.; Saez, E.; Smillie, J.; Wang, C.

2004-12-01

Mathematical models in earth sciences base on the solution of systems of coupled, non-linear, time-dependent partial differential equations (PDEs). The spatial and time-scale vary from a planetary scale and million years for convection problems to 100km and 10 years for fault systems simulations. Various techniques are in use to deal with the time dependency (e.g. Crank-Nicholson), with the non-linearity (e.g. Newton-Raphson) and weakly coupled equations (e.g. non-linear Gauss-Seidel). Besides these high-level solution algorithms discretization methods (e.g. finite element method (FEM), boundary element method (BEM)) are used to deal with spatial derivatives. Typically, large-scale, three dimensional meshes are required to resolve geometrical complexity (e.g. in the case of fault systems) or features in the solution (e.g. in mantel convection simulations). The modelling environment escript allows the rapid implementation of new physics as required for the development of simulation codes in earth sciences. Its main object is to provide a programming language, where the user can define new models and rapidly develop high-level solution algorithms. The current implementation is linked with the finite element package finley as a PDE solver. However, the design is open and other discretization technologies such as finite differences and boundary element methods could be included. escript is implemented as an extension of the interactive programming environment python (see www.python.org). Key concepts introduced are Data objects, which are holding values on nodes or elements of the finite element mesh, and linearPDE objects, which are defining linear partial differential equations to be solved by the underlying discretization technology. In this paper we will show the basic concepts of escript and will show how escript is used to implement a simulation code for interacting fault systems. We will show some results of large-scale, parallel simulations on an SGI Altix system. Acknowledgements: Project work is supported by Australian Commonwealth Government through the Australian Computational Earth Systems Simulator Major National Research Facility, Queensland State Government Smart State Research Facility Fund, The University of Queensland and SGI.
Amplification of perpendicular and parallel magnetic fields by cosmic ray currents

NASA Astrophysics Data System (ADS)

Matthews, J. H.; Bell, A. R.; Blundell, K. M.; Araudo, A. T.

2017-08-01

Cosmic ray (CR) currents through magnetized plasma drive strong instabilities producing amplification of the magnetic field. This amplification helps explain the CR energy spectrum as well as observations of supernova remnants and radio galaxy hotspots. Using magnetohydrodynamic simulations, we study the behaviour of the non-resonant hybrid (NRH) instability (also known as the Bell instability) in the case of CR currents perpendicular and parallel to the initial magnetic field. We demonstrate that extending simulations of the perpendicular case to 3D reveals a different character to the turbulence from that observed in 2D. Despite these differences, in 3D the perpendicular NRH instability still grows exponentially far into the non-linear regime with a similar growth rate to both the 2D perpendicular and 3D parallel situations. We introduce some simple analytical models to elucidate the physical behaviour, using them to demonstrate that the transition to the non-linear regime is governed by the growth of thermal pressure inside dense filaments at the edges of the expanding loops. We discuss our results in the context of supernova remnants and jets in radio galaxies. Our work shows that the NRH instability can amplify magnetic fields to many times their initial value in parallel and perpendicular shocks.
Using Perturbed QR Factorizations To Solve Linear Least-Squares Problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Avron, Haim; Ng, Esmond G.; Toledo, Sivan

2008-03-21

We propose and analyze a new tool to help solve sparse linear least-squares problems min{sub x} {parallel}Ax-b{parallel}{sub 2}. Our method is based on a sparse QR factorization of a low-rank perturbation {cflx A} of A. More precisely, we show that the R factor of {cflx A} is an effective preconditioner for the least-squares problem min{sub x} {parallel}Ax-b{parallel}{sub 2}, when solved using LSQR. We propose applications for the new technique. When A is rank deficient we can add rows to ensure that the preconditioner is well-conditioned without column pivoting. When A is sparse except for a few dense rows we canmore » drop these dense rows from A to obtain {cflx A}. Another application is solving an updated or downdated problem. If R is a good preconditioner for the original problem A, it is a good preconditioner for the updated/downdated problem {cflx A}. We can also solve what-if scenarios, where we want to find the solution if a column of the original matrix is changed/removed. We present a spectral theory that analyzes the generalized spectrum of the pencil (A*A,R*R) and analyze the applications.« less
Parallel-vector computation for structural analysis and nonlinear unconstrained optimization problems

NASA Technical Reports Server (NTRS)

Nguyen, Duc T.

1990-01-01

Practical engineering application can often be formulated in the form of a constrained optimization problem. There are several solution algorithms for solving a constrained optimization problem. One approach is to convert a constrained problem into a series of unconstrained problems. Furthermore, unconstrained solution algorithms can be used as part of the constrained solution algorithms. Structural optimization is an iterative process where one starts with an initial design, a finite element structure analysis is then performed to calculate the response of the system (such as displacements, stresses, eigenvalues, etc.). Based upon the sensitivity information on the objective and constraint functions, an optimizer such as ADS or IDESIGN, can be used to find the new, improved design. For the structural analysis phase, the equation solver for the system of simultaneous, linear equations plays a key role since it is needed for either static, or eigenvalue, or dynamic analysis. For practical, large-scale structural analysis-synthesis applications, computational time can be excessively large. Thus, it is necessary to have a new structural analysis-synthesis code which employs new solution algorithms to exploit both parallel and vector capabilities offered by modern, high performance computers such as the Convex, Cray-2 and Cray-YMP computers. The objective of this research project is, therefore, to incorporate the latest development in the parallel-vector equation solver, PVSOLVE into the widely popular finite-element production code, such as the SAP-4. Furthermore, several nonlinear unconstrained optimization subroutines have also been developed and tested under a parallel computer environment. The unconstrained optimization subroutines are not only useful in their own right, but they can also be incorporated into a more popular constrained optimization code, such as ADS.
Partitioning in parallel processing of production systems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Oflazer, K.

1987-01-01

This thesis presents research on certain issues related to parallel processing of production systems. It first presents a parallel production system interpreter that has been implemented on a four-processor multiprocessor. This parallel interpreter is based on Forgy's OPS5 interpreter and exploits production-level parallelism in production systems. Runs on the multiprocessor system indicate that it is possible to obtain speed-up of around 1.7 in the match computation for certain production systems when productions are split into three sets that are processed in parallel. The next issue addressed is that of partitioning a set of rules to processors in a parallel interpretermore » with production-level parallelism, and the extent of additional improvement in performance. The partitioning problem is formulated and an algorithm for approximate solutions is presented. The thesis next presents a parallel processing scheme for OPS5 production systems that allows some redundancy in the match computation. This redundancy enables the processing of a production to be divided into units of medium granularity each of which can be processed in parallel. Subsequently, a parallel processor architecture for implementing the parallel processing algorithm is presented.« less
The Galley Parallel File System

NASA Technical Reports Server (NTRS)

Nieuwejaar, Nils; Kotz, David

1996-01-01

As the I/O needs of parallel scientific applications increase, file systems for multiprocessors are being designed to provide applications with parallel access to multiple disks. Many parallel file systems present applications with a conventional Unix-like interface that allows the application to access multiple disks transparently. The interface conceals the parallelism within the file system, which increases the ease of programmability, but makes it difficult or impossible for sophisticated programmers and libraries to use knowledge about their I/O needs to exploit that parallelism. Furthermore, most current parallel file systems are optimized for a different workload than they are being asked to support. We introduce Galley, a new parallel file system that is intended to efficiently support realistic parallel workloads. We discuss Galley's file structure and application interface, as well as an application that has been implemented using that interface.
On some Aitken-like acceleration of the Schwarz method

NASA Astrophysics Data System (ADS)

Garbey, M.; Tromeur-Dervout, D.

2002-12-01

In this paper we present a family of domain decomposition based on Aitken-like acceleration of the Schwarz method seen as an iterative procedure with a linear rate of convergence. We first present the so-called Aitken-Schwarz procedure for linear differential operators. The solver can be a direct solver when applied to the Helmholtz problem with five-point finite difference scheme on regular grids. We then introduce the Steffensen-Schwarz variant which is an iterative domain decomposition solver that can be applied to linear and nonlinear problems. We show that these solvers have reasonable numerical efficiency compared to classical fast solvers for the Poisson problem or multigrids for more general linear and nonlinear elliptic problems. However, the salient feature of our method is that our algorithm has high tolerance to slow network in the context of distributed parallel computing and is attractive, generally speaking, to use with computer architecture for which performance is limited by the memory bandwidth rather than the flop performance of the CPU. This is nowadays the case for most parallel. computer using the RISC processor architecture. We will illustrate this highly desirable property of our algorithm with large-scale computing experiments.
Trainable hardware for dynamical computing using error backpropagation through physical media.

PubMed

Hermans, Michiel; Burm, Michaël; Van Vaerenbergh, Thomas; Dambre, Joni; Bienstman, Peter

2015-03-24

Neural networks are currently implemented on digital Von Neumann machines, which do not fully leverage their intrinsic parallelism. We demonstrate how to use a novel class of reconfigurable dynamical systems for analogue information processing, mitigating this problem. Our generic hardware platform for dynamic, analogue computing consists of a reciprocal linear dynamical system with nonlinear feedback. Thanks to reciprocity, a ubiquitous property of many physical phenomena like the propagation of light and sound, the error backpropagation-a crucial step for tuning such systems towards a specific task-can happen in hardware. This can potentially speed up the optimization process significantly, offering important benefits for the scalability of neuro-inspired hardware. In this paper, we show, using one experimentally validated and one conceptual example, that such systems may provide a straightforward mechanism for constructing highly scalable, fully dynamical analogue computers.
Aspects and Some Results on Passivity and Positivity of Dynamic Systems

NASA Astrophysics Data System (ADS)

De la Sen, M.

2017-12-01

This paper is devoted to discuss certain aspects of passivity results in dynamic systems and the characterization of the regenerative systems counterparts. In particular, the various concepts of passivity as standard passivity, strict input passivity, strict output passivity and very strict passivity (i.e. joint strict input and output passivity) are given and related to the existence of a storage function and a dissipation function. Later on, the obtained results are related to external positivity of systems and positivity or strict positivity of the transfer matrices and transfer functions in the time-invariant case. On the other hand, it is discussed how to achieve or how eventually to increase the passivity effects via linear feedback by the synthesis of the appropriate feed-forward or feedback controllers or, simply, by adding a positive parallel direct input-output matrix interconnection gain.
Trainable hardware for dynamical computing using error backpropagation through physical media

NASA Astrophysics Data System (ADS)

Hermans, Michiel; Burm, Michaël; van Vaerenbergh, Thomas; Dambre, Joni; Bienstman, Peter

2015-03-01

Neural networks are currently implemented on digital Von Neumann machines, which do not fully leverage their intrinsic parallelism. We demonstrate how to use a novel class of reconfigurable dynamical systems for analogue information processing, mitigating this problem. Our generic hardware platform for dynamic, analogue computing consists of a reciprocal linear dynamical system with nonlinear feedback. Thanks to reciprocity, a ubiquitous property of many physical phenomena like the propagation of light and sound, the error backpropagation—a crucial step for tuning such systems towards a specific task—can happen in hardware. This can potentially speed up the optimization process significantly, offering important benefits for the scalability of neuro-inspired hardware. In this paper, we show, using one experimentally validated and one conceptual example, that such systems may provide a straightforward mechanism for constructing highly scalable, fully dynamical analogue computers.
NAS Experiences of Porting CM Fortran Codes to HPF on IBM SP2 and SGI Power Challenge

NASA Technical Reports Server (NTRS)

Saini, Subhash

1995-01-01

Current Connection Machine (CM) Fortran codes developed for the CM-2 and the CM-5 represent an important class of parallel applications. Several users have employed CM Fortran codes in production mode on the CM-2 and the CM-5 for the last five to six years, constituting a heavy investment in terms of cost and time. With Thinking Machines Corporation's decision to withdraw from the hardware business and with the decommissioning of many CM-2 and CM-5 machines, the best way to protect the substantial investment in CM Fortran codes is to port the codes to High Performance Fortran (HPF) on highly parallel systems. HPF is very similar to CM Fortran and thus represents a natural transition. Conversion issues involved in porting CM Fortran codes on the CM-5 to HPF are presented. In particular, the differences between data distribution directives and the CM Fortran Utility Routines Library, as well as the equivalent functionality in the HPF Library are discussed. Several CM Fortran codes (Cannon algorithm for matrix-matrix multiplication, Linear solver Ax=b, 1-D convolution for 2-D datasets, Laplace's Equation solver, and Direct Simulation Monte Carlo (DSMC) codes have been ported to Subset HPF on the IBM SP2 and the SGI Power Challenge. Speedup ratios versus number of processors for the Linear solver and DSMC code are presented.
A new beam emission polarimetry diagnostic for measuring the magnetic field line angle at the plasma edge of ASDEX Upgrade.

PubMed

Viezzer, E; Dux, R; Dunne, M G

2016-11-01

A new edge beam emission polarimetry diagnostic dedicated to the measurement of the magnetic field line angle has been installed on the ASDEX Upgrade tokamak. The new diagnostic relies on the motional Stark effect and is based on the simultaneous measurement of the polarization direction of the linearly polarized π (parallel to the electric field) and σ (perpendicular to the electric field) lines of the Balmer line D α . The technical properties of the system are described. The calibration procedures are discussed and first measurements are presented.
A new beam emission polarimetry diagnostic for measuring the magnetic field line angle at the plasma edge of ASDEX Upgrade

DOE Office of Scientific and Technical Information (OSTI.GOV)

Viezzer, E., E-mail: eleonora.viezzer@ipp.mpg.de, E-mail: eviezzer@us.es; Department of Atomic, Molecular, and Nuclear Physics, University of Seville, Avda. Reina Mercedes, 41012 Seville; Dux, R.

2016-11-15

A new edge beam emission polarimetry diagnostic dedicated to the measurement of the magnetic field line angle has been installed on the ASDEX Upgrade tokamak. The new diagnostic relies on the motional Stark effect and is based on the simultaneous measurement of the polarization direction of the linearly polarized π (parallel to the electric field) and σ (perpendicular to the electric field) lines of the Balmer line D{sub α}. The technical properties of the system are described. The calibration procedures are discussed and first measurements are presented.
LMI-Based Fuzzy Optimal Variance Control of Airfoil Model Subject to Input Constraints

NASA Technical Reports Server (NTRS)

Swei, Sean S.M.; Ayoubi, Mohammad A.

2017-01-01

This paper presents a study of fuzzy optimal variance control problem for dynamical systems subject to actuator amplitude and rate constraints. Using Takagi-Sugeno fuzzy modeling and dynamic Parallel Distributed Compensation technique, the stability and the constraints can be cast as a multi-objective optimization problem in the form of Linear Matrix Inequalities. By utilizing the formulations and solutions for the input and output variance constraint problems, we develop a fuzzy full-state feedback controller. The stability and performance of the proposed controller is demonstrated through its application to the airfoil flutter suppression.
N-MODY: a code for collisionless N-body simulations in modified Newtonian dynamics.

NASA Astrophysics Data System (ADS)

Londrillo, P.; Nipoti, C.

We describe the numerical code N-MODY, a parallel particle-mesh code for collisionless N-body simulations in modified Newtonian dynamics (MOND). N-MODY is based on a numerical potential solver in spherical coordinates that solves the non-linear MOND field equation, and is ideally suited to simulate isolated stellar systems. N-MODY can be used also to compute the MOND potential of arbitrary static density distributions. A few applications of N-MODY indicate that some astrophysically relevant dynamical processes are profoundly different in MOND and in Newtonian gravity with dark matter.

Detection of partial-thickness tears in ligaments and tendons by Stokes-polarimetry imaging

NASA Astrophysics Data System (ADS)

Kim, Jihoon; John, Raheel; Walsh, Joseph T.

2008-02-01

A Stokes polarimetry imaging (SPI) system utilizes an algorithm developed to construct degree of polarization (DoP) image maps from linearly polarized light illumination. Partial-thickness tears of turkey tendons were imaged by the SPI system in order to examine the feasibility of the system to detect partial-thickness rotator cuff tear or general tendon pathology. The rotating incident polarization angle (IPA) for the linearly polarized light provides a way to analyze different tissue types which may be sensitive to IPA variations. Degree of linear polarization (DoLP) images revealed collagen fiber structure, related to partial-thickness tears, better than standard intensity images. DoLP images also revealed structural changes in tears that are related to the tendon load. DoLP images with red-wavelength-filtered incident light may show tears and related organization of collagen fiber structure at a greater depth from the tendon surface. Degree of circular polarization (DoCP) images exhibited well the horizontal fiber orientation that is not parallel to the vertically aligned collagen fibers of the tendon. The SPI system's DOLP images reveal alterations in tendons and ligaments, which have a tissue matrix consisting largely of collagen, better than intensity images. All polarized images showed modulated intensity as the IPA was varied. The optimal detection of the partial-thickness tendon tears at a certain IPA was observed. The SPI system with varying IPA and spectral information can improve the detection of partial-thickness rotator cuff tears by higher visibility of fiber orientations and thereby improve diagnosis and treatment of tendon related injuries.
Massively parallel sparse matrix function calculations with NTPoly

NASA Astrophysics Data System (ADS)

Dawson, William; Nakajima, Takahito

2018-04-01

We present NTPoly, a massively parallel library for computing the functions of sparse, symmetric matrices. The theory of matrix functions is a well developed framework with a wide range of applications including differential equations, graph theory, and electronic structure calculations. One particularly important application area is diagonalization free methods in quantum chemistry. When the input and output of the matrix function are sparse, methods based on polynomial expansions can be used to compute matrix functions in linear time. We present a library based on these methods that can compute a variety of matrix functions. Distributed memory parallelization is based on a communication avoiding sparse matrix multiplication algorithm. OpenMP task parallellization is utilized to implement hybrid parallelization. We describe NTPoly's interface and show how it can be integrated with programs written in many different programming languages. We demonstrate the merits of NTPoly by performing large scale calculations on the K computer.
On the dimensionally correct kinetic theory of turbulence for parallel propagation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Gaelzer, R., E-mail: rudi.gaelzer@ufrgs.br, E-mail: yoonp@umd.edu, E-mail: 007gasun@khu.ac.kr, E-mail: luiz.ziebell@ufrgs.br; Ziebell, L. F., E-mail: rudi.gaelzer@ufrgs.br, E-mail: yoonp@umd.edu, E-mail: 007gasun@khu.ac.kr, E-mail: luiz.ziebell@ufrgs.br; Yoon, P. H., E-mail: rudi.gaelzer@ufrgs.br, E-mail: yoonp@umd.edu, E-mail: 007gasun@khu.ac.kr, E-mail: luiz.ziebell@ufrgs.br

2015-03-15

Yoon and Fang [Phys. Plasmas 15, 122312 (2008)] formulated a second-order nonlinear kinetic theory that describes the turbulence propagating in directions parallel/anti-parallel to the ambient magnetic field. Their theory also includes discrete-particle effects, or the effects due to spontaneously emitted thermal fluctuations. However, terms associated with the spontaneous fluctuations in particle and wave kinetic equations in their theory contain proper dimensionality only for an artificial one-dimensional situation. The present paper extends the analysis and re-derives the dimensionally correct kinetic equations for three-dimensional case. The new formalism properly describes the effects of spontaneous fluctuations emitted in three-dimensional space, while the collectivelymore » emitted turbulence propagates predominantly in directions parallel/anti-parallel to the ambient magnetic field. As a first step, the present investigation focuses on linear wave-particle interaction terms only. A subsequent paper will include the dimensionally correct nonlinear wave-particle interaction terms.« less
Generalized parallel-perspective stereo mosaics from airborne video.

PubMed

Zhu, Zhigang; Hanson, Allen R; Riseman, Edward M

2004-02-01

In this paper, we present a new method for automatically and efficiently generating stereoscopic mosaics by seamless registration of images collected by a video camera mounted on an airborne platform. Using a parallel-perspective representation, a pair of geometrically registered stereo mosaics can be precisely constructed under quite general motion. A novel parallel ray interpolation for stereo mosaicing (PRISM) approach is proposed to make stereo mosaics seamless in the presence of obvious motion parallax and for rather arbitrary scenes. Parallel-perspective stereo mosaics generated with the PRISM method have better depth resolution than perspective stereo due to the adaptive baseline geometry. Moreover, unlike previous results showing that parallel-perspective stereo has a constant depth error, we conclude that the depth estimation error of stereo mosaics is in fact a linear function of the absolute depths of a scene. Experimental results on long video sequences are given.
A computationally efficient parallel Levenberg-Marquardt algorithm for highly parameterized inverse model analyses

DOE Office of Scientific and Technical Information (OSTI.GOV)

Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.

Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally-efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace, such that the dimensionality of themore » problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2D and a random hydraulic conductivity field in 3D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ~10 1 to ~10 2 in a multi-core computational environment. Furthermore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate- to large-scale problems.« less
A computationally efficient parallel Levenberg-Marquardt algorithm for highly parameterized inverse model analyses

DOE PAGES

Lin, Youzuo; O'Malley, Daniel; Vesselinov, Velimir V.

2016-09-01

Inverse modeling seeks model parameters given a set of observations. However, for practical problems because the number of measurements is often large and the model parameters are also numerous, conventional methods for inverse modeling can be computationally expensive. We have developed a new, computationally-efficient parallel Levenberg-Marquardt method for solving inverse modeling problems with a highly parameterized model space. Levenberg-Marquardt methods require the solution of a linear system of equations which can be prohibitively expensive to compute for moderate to large-scale problems. Our novel method projects the original linear problem down to a Krylov subspace, such that the dimensionality of themore » problem can be significantly reduced. Furthermore, we store the Krylov subspace computed when using the first damping parameter and recycle the subspace for the subsequent damping parameters. The efficiency of our new inverse modeling algorithm is significantly improved using these computational techniques. We apply this new inverse modeling method to invert for random transmissivity fields in 2D and a random hydraulic conductivity field in 3D. Our algorithm is fast enough to solve for the distributed model parameters (transmissivity) in the model domain. The algorithm is coded in Julia and implemented in the MADS computational framework (http://mads.lanl.gov). By comparing with Levenberg-Marquardt methods using standard linear inversion techniques such as QR or SVD methods, our Levenberg-Marquardt method yields a speed-up ratio on the order of ~10 1 to ~10 2 in a multi-core computational environment. Furthermore, our new inverse modeling method is a powerful tool for characterizing subsurface heterogeneity for moderate- to large-scale problems.« less
Enhancing Scalability and Efficiency of the TOUGH2_MP for LinuxClusters

DOE Office of Scientific and Technical Information (OSTI.GOV)

Zhang, Keni; Wu, Yu-Shu

2006-04-17

TOUGH2{_}MP, the parallel version TOUGH2 code, has been enhanced by implementing more efficient communication schemes. This enhancement is achieved through reducing the amount of small-size messages and the volume of large messages. The message exchange speed is further improved by using non-blocking communications for both linear and nonlinear iterations. In addition, we have modified the AZTEC parallel linear-equation solver to nonblocking communication. Through the improvement of code structuring and bug fixing, the new version code is now more stable, while demonstrating similar or even better nonlinear iteration converging speed than the original TOUGH2 code. As a result, the new versionmore » of TOUGH2{_}MP is improved significantly in its efficiency. In this paper, the scalability and efficiency of the parallel code are demonstrated by solving two large-scale problems. The testing results indicate that speedup of the code may depend on both problem size and complexity. In general, the code has excellent scalability in memory requirement as well as computing time.« less
A Kinematic, Flexure-based Mechanism for Precise, Parallel Motion for the Hertz Variable-delay Polarization Modulator (VPM)

NASA Technical Reports Server (NTRS)

Voellmer, G. M.; Chuss, D. T.; Jackson, M.; Krejny, M.; Moseley, S. H.; Novak, G.; Wollack, E. J.

2008-01-01

We describe the design of the linear motion stage for a Variable-delay Polarization Modulator (VPM) and of a grid flattener that has been built and integrated into the Hertz ground-based, submillimeter polarimeter. VPMs allow the modulation of a polarized source by controlling the phase difference between two linear, orthogonal polarizations. The size of the gap between a mirror and a very flat polarizing grid determines the amount of the phase difference. This gap must be parallel to better than 1% of the wavelength. A novel, kinematic, flexure-based mechanism is described that passively maintains the parallelism of the mirror and the grid to 1.5 pm over a 150 mm diameter, with a 400 pm throw. A single piezoceramic actuator is used to modulate the gap, and a capacitive sensor provides position feedback for closed-loop control. A simple device that ensures the planarity of the polarizing grid is also described. Engineering results from the deployment of this device in the Hertz instrument April 2006 at the Submillimeter Telescope Observatory (SMTO) in Arizona are presented.
On the Development of an Efficient Parallel Hybrid Solver with Application to Acoustically Treated Aero-Engine Nacelles

NASA Technical Reports Server (NTRS)

Watson, Willie R.; Nark, Douglas M.; Nguyen, Duc T.; Tungkahotara, Siroj

2006-01-01

A finite element solution to the convected Helmholtz equation in a nonuniform flow is used to model the noise field within 3-D acoustically treated aero-engine nacelles. Options to select linear or cubic Hermite polynomial basis functions and isoparametric elements are included. However, the key feature of the method is a domain decomposition procedure that is based upon the inter-mixing of an iterative and a direct solve strategy for solving the discrete finite element equations. This procedure is optimized to take full advantage of sparsity and exploit the increased memory and parallel processing capability of modern computer architectures. Example computations are presented for the Langley Flow Impedance Test facility and a rectangular mapping of a full scale, generic aero-engine nacelle. The accuracy and parallel performance of this new solver are tested on both model problems using a supercomputer that contains hundreds of central processing units. Results show that the method gives extremely accurate attenuation predictions, achieves super-linear speedup over hundreds of CPUs, and solves upward of 25 million complex equations in a quarter of an hour.
Tunable microwave generation based on frequency quadrupling

NASA Astrophysics Data System (ADS)

Liu, Yu-Lei; Liang, Jun; Li, Xuan; Xiao, Nan; Yuan, Xiao-Gang

2018-07-01

To generate linearly chirped microwave signals with large frequency tunable range, a photonic approach is proposed. A dual-output dual-parallel Mach-Zehnder modulator followed by a polarisation beam combiner and an optical filter are utilised to generate orthogonally polarised ± second-order optical sidebands. A polarisation modulator is employed to achieve phase modulation of the two wavelengths. The balanced detection is applied to suppress the distortion and background noise. The central frequency of the generated signal is four times that of the local oscillator frequency. Simulation results show that a linear pulse is produced with time-bandwidth as well as a compression ratio for the pulse of 11 and 9.3 respectively. Moreover, a peak-to-sidelobe ratio of 7.4 dB is generated. The system has both good reconfigurability and tunability, and its frequency can be continuously adjusted from about 10 GHz to as much as 50 GHz in principle.
Multiple regression for physiological data analysis: the problem of multicollinearity.

PubMed

Slinker, B K; Glantz, S A

1985-07-01

Multiple linear regression, in which several predictor variables are related to a response variable, is a powerful statistical tool for gaining quantitative insight into complex in vivo physiological systems. For these insights to be correct, all predictor variables must be uncorrelated. However, in many physiological experiments the predictor variables cannot be precisely controlled and thus change in parallel (i.e., they are highly correlated). There is a redundancy of information about the response, a situation called multicollinearity, that leads to numerical problems in estimating the parameters in regression equations; the parameters are often of incorrect magnitude or sign or have large standard errors. Although multicollinearity can be avoided with good experimental design, not all interesting physiological questions can be studied without encountering multicollinearity. In these cases various ad hoc procedures have been proposed to mitigate multicollinearity. Although many of these procedures are controversial, they can be helpful in applying multiple linear regression to some physiological problems.
Cooperative storage of shared files in a parallel computing system with dynamic block size

DOEpatents

Bent, John M.; Faibish, Sorin; Grider, Gary

2015-11-10

Improved techniques are provided for parallel writing of data to a shared object in a parallel computing system. A method is provided for storing data generated by a plurality of parallel processes to a shared object in a parallel computing system. The method is performed by at least one of the processes and comprises: dynamically determining a block size for storing the data; exchanging a determined amount of the data with at least one additional process to achieve a block of the data having the dynamically determined block size; and writing the block of the data having the dynamically determined block size to a file system. The determined block size comprises, e.g., a total amount of the data to be stored divided by the number of parallel processes. The file system comprises, for example, a log structured virtual parallel file system, such as a Parallel Log-Structured File System (PLFS).
MUTILS - a set of efficient modeling tools for multi-core CPUs implemented in MEX

NASA Astrophysics Data System (ADS)

Krotkiewski, Marcin; Dabrowski, Marcin

2013-04-01

The need for computational performance is common in scientific applications, and in particular in numerical simulations, where high resolution models require efficient processing of large amounts of data. Especially in the context of geological problems the need to increase the model resolution to resolve physical and geometrical complexities seems to have no limits. Alas, the performance of new generations of CPUs does not improve any longer by simply increasing clock speeds. Current industrial trends are to increase the number of computational cores. As a result, parallel implementations are required in order to fully utilize the potential of new processors, and to study more complex models. We target simulations on small to medium scale shared memory computers: laptops and desktop PCs with ~8 CPU cores and up to tens of GB of memory to high-end servers with ~50 CPU cores and hundereds of GB of memory. In this setting MATLAB is often the environment of choice for scientists that want to implement their own models with little effort. It is a useful general purpose mathematical software package, but due to its versatility some of its functionality is not as efficient as it could be. In particular, the challanges of modern multi-core architectures are not fully addressed. We have developed MILAMIN 2 - an efficient FEM modeling environment written in native MATLAB. Amongst others, MILAMIN provides functions to define model geometry, generate and convert structured and unstructured meshes (also through interfaces to external mesh generators), compute element and system matrices, apply boundary conditions, solve the system of linear equations, address non-linear and transient problems, and perform post-processing. MILAMIN strives to combine the ease of code development and the computational efficiency. Where possible, the code is optimized and/or parallelized within the MATLAB framework. Native MATLAB is augmented with the MUTILS library - a set of MEX functions that implement the computationally intensive, performance critical parts of the code, which we have identified to be bottlenecks. Here, we discuss the functionality and performance of the MUTILS library. Currently, it includes: 1. time and memory efficient assembly of sparse matrices for FEM simulations 2. parallel sparse matrix - vector product with optimizations speficic to symmetric matrices and multiple degrees of freedom per node 3. parallel point in triangle location and point in tetrahedron location for unstructured, adaptive 2D and 3D meshes (useful for 'marker in cell' type of methods) 4. parallel FEM interpolation for 2D and 3D meshes of elements of different types and orders, and for different number of degrees of freedom per node 5. a stand-alone, MEX implementation of the Conjugate Gradients iterative solver 6. interface to METIS graph partitioning and a fast implementation of RCM reordering
Numerical methods on some structured matrix algebra problems

DOE Office of Scientific and Technical Information (OSTI.GOV)

Jessup, E.R.

1996-06-01

This proposal concerned the design, analysis, and implementation of serial and parallel algorithms for certain structured matrix algebra problems. It emphasized large order problems and so focused on methods that can be implemented efficiently on distributed-memory MIMD multiprocessors. Such machines supply the computing power and extensive memory demanded by the large order problems. We proposed to examine three classes of matrix algebra problems: the symmetric and nonsymmetric eigenvalue problems (especially the tridiagonal cases) and the solution of linear systems with specially structured coefficient matrices. As all of these are of practical interest, a major goal of this work was tomore » translate our research in linear algebra into useful tools for use by the computational scientists interested in these and related applications. Thus, in addition to software specific to the linear algebra problems, we proposed to produce a programming paradigm and library to aid in the design and implementation of programs for distributed-memory MIMD computers. We now report on our progress on each of the problems and on the programming tools.« less
Parallel deterministic neutronics with AMR in 3D

DOE Office of Scientific and Technical Information (OSTI.GOV)

Clouse, C.; Ferguson, J.; Hendrickson, C.

1997-12-31

AMTRAN, a three dimensional Sn neutronics code with adaptive mesh refinement (AMR) has been parallelized over spatial domains and energy groups and runs on the Meiko CS-2 with MPI message passing. Block refined AMR is used with linear finite element representations for the fluxes, which allows for a straight forward interpretation of fluxes at block interfaces with zoning differences. The load balancing algorithm assumes 8 spatial domains, which minimizes idle time among processors.
Productive High Performance Parallel Programming with Auto-tuned Domain-Specific Embedded Languages

DTIC Science & Technology

2013-01-02

Compilation JVM Java Virtual Machine KB Kilobyte KDT Knowledge Discovery Toolbox LAPACK Linear Algebra Package LLVM Low-Level Virtual Machine LOC Lines...different starting points. Leo Meyerovich also helped solidify some of the ideas here in discussions during Par Lab retreats. I would also like to thank...multi-timestep computations by blocking in both time and space. 88 Implementation Output Approx DSL Type Language Language Parallelism LoC Graphite
A parallel Monte Carlo code for planar and SPECT imaging: implementation, verification and applications in (131)I SPECT.

PubMed

Dewaraja, Yuni K; Ljungberg, Michael; Majumdar, Amitava; Bose, Abhijit; Koral, Kenneth F

2002-02-01

This paper reports the implementation of the SIMIND Monte Carlo code on an IBM SP2 distributed memory parallel computer. Basic aspects of running Monte Carlo particle transport calculations on parallel architectures are described. Our parallelization is based on equally partitioning photons among the processors and uses the Message Passing Interface (MPI) library for interprocessor communication and the Scalable Parallel Random Number Generator (SPRNG) to generate uncorrelated random number streams. These parallelization techniques are also applicable to other distributed memory architectures. A linear increase in computing speed with the number of processors is demonstrated for up to 32 processors. This speed-up is especially significant in Single Photon Emission Computed Tomography (SPECT) simulations involving higher energy photon emitters, where explicit modeling of the phantom and collimator is required. For (131)I, the accuracy of the parallel code is demonstrated by comparing simulated and experimental SPECT images from a heart/thorax phantom. Clinically realistic SPECT simulations using the voxel-man phantom are carried out to assess scatter and attenuation correction.
Parallel adaptive wavelet collocation method for PDEs

DOE Office of Scientific and Technical Information (OSTI.GOV)

Nejadmalayeri, Alireza, E-mail: Alireza.Nejadmalayeri@gmail.com; Vezolainen, Alexei, E-mail: Alexei.Vezolainen@Colorado.edu; Brown-Dymkoski, Eric, E-mail: Eric.Browndymkoski@Colorado.edu

2015-10-01

A parallel adaptive wavelet collocation method for solving a large class of Partial Differential Equations is presented. The parallelization is achieved by developing an asynchronous parallel wavelet transform, which allows one to perform parallel wavelet transform and derivative calculations with only one data synchronization at the highest level of resolution. The data are stored using tree-like structure with tree roots starting at a priori defined level of resolution. Both static and dynamic domain partitioning approaches are developed. For the dynamic domain partitioning, trees are considered to be the minimum quanta of data to be migrated between the processes. This allowsmore » fully automated and efficient handling of non-simply connected partitioning of a computational domain. Dynamic load balancing is achieved via domain repartitioning during the grid adaptation step and reassigning trees to the appropriate processes to ensure approximately the same number of grid points on each process. The parallel efficiency of the approach is discussed based on parallel adaptive wavelet-based Coherent Vortex Simulations of homogeneous turbulence with linear forcing at effective non-adaptive resolutions up to 2048{sup 3} using as many as 2048 CPU cores.« less
A nonrecursive 'Order N' preconditioned conjugate gradient/range space formulation of MDOF dynamics

NASA Technical Reports Server (NTRS)

Kurdila, A. J.; Menon, R.; Sunkel, John

1991-01-01

This paper addresses the requirements of present-day mechanical system simulations of algorithms that induce parallelism on a fine scale and of transient simulation methods which must be automatically load balancing for a wide collection of system topologies and hardware configurations. To this end, a combination range space/preconditioned conjugage gradient formulation of multidegree-of-freedon dynamics is developed, which, by employing regular ordering of the system connectivity graph, makes it possible to derive an extremely efficient preconditioner from the range space metric (as opposed to the system coefficient matrix). Because of the effectiveness of the preconditioner, the method can achieve performance rates that depend linearly on the number of substructures. The method, termed 'Order N' does not require the assembly of system mass or stiffness matrices, and is therefore amenable to implementation on work stations. Using this method, a 13-substructure model of the Space Station was constructed.
A low-cost and portable realization on fringe projection three-dimensional measurement

NASA Astrophysics Data System (ADS)

Xiao, Suzhi; Tao, Wei; Zhao, Hui

2015-12-01

Fringe projection three-dimensional measurement is widely applied in a wide range of industrial application. The traditional fringe projection system has the disadvantages of high expense, big size, and complicated calibration requirements. In this paper we introduce a low-cost and portable realization on three-dimensional measurement with Pico projector. It has the advantages of low cost, compact physical size, and flexible configuration. For the proposed fringe projection system, there is no restriction to camera and projector's relative alignment on parallelism and perpendicularity for installation. Moreover, plane-based calibration method is adopted in this paper that avoids critical requirements on calibration system such as additional gauge block or precise linear z stage. What is more, error sources existing in the proposed system are introduced in this paper. The experimental results demonstrate the feasibility of the proposed low cost and portable fringe projection system.

Highly-sensitive and large-dynamic diffuse optical tomography system for breast tumor detection

NASA Astrophysics Data System (ADS)

Du, Wenwen; Zhang, Limin; Yin, Guoyan; Zhang, Yanqi; Zhao, Huijuan; Gao, Feng

2018-02-01

Diffuse optical tomography (DOT) as a new functional imaging has important clinical applications in many aspects such as benign and malignant breast tumor detection, tumor staging and so on. For quantitative detection of breast tumor, a three-wavelength continuous-wave DOT prototype system combined the ultra-high sensitivity of the photon-counting detection and the measurement parallelism of the lock-in technique was developed to provide high temporal resolution, high sensitivity, large dynamic detection range and signal-to-noise ratio. Additionally, a CT-analogous scanning mode was proposed to cost-effectively increase the detection data. To evaluate the feasibility of the system, a series of assessments were conducted. The results demonstrate that the system can obtain high linearity, stability and negligible inter-wavelength crosstalk. The preliminary phantom experiments show the absorption coefficient is able to be successfully reconstructed, indicating that the system is one of the ideal platforms for optical breast tumor detection.
Polarization division multiplexing for optical data communications

NASA Astrophysics Data System (ADS)

Ivanovich, Darko; Powell, Samuel B.; Gruev, Viktor; Chamberlain, Roger D.

2018-02-01

Multiple parallel channels are ubiquitous in optical communications, with spatial division multiplexing (separate physical paths) and wavelength division multiplexing (separate optical wavelengths) being the most common forms. Here, we investigate the viability of polarization division multiplexing, the separation of distinct parallel optical communication channels through the polarization properties of light. Two or more linearly polarized optical signals (at different polarization angles) are transmitted through a common medium, filtered using aluminum nanowire optical filters fabricated on-chip, and received using individual silicon photodetectors (one per channel). The entire receiver (including optics) is compatible with standard CMOS fabrication processes. The filter model is based upon an input optical signal formed as the sum of the Stokes vectors for each individual channel, transformed by the Mueller matrix that models the filter proper, resulting in an output optical signal that impinges on each photodiode. The results show that two- and three-channel systems can operate with a fixed-threshold comparator in the receiver circuit, but four-channel systems (and larger) will require channel coding of some form. For example, in the four-channel system, 10 of 16 distinct bit patterns are separable by the receiver. The model supports investigation of the range of variability tolerable in the fabrication of the on-chip polarization filters.
Highly Efficient Parallel Multigrid Solver For Large-Scale Simulation of Grain Growth Using the Structural Phase Field Crystal Model

NASA Astrophysics Data System (ADS)

Guan, Zhen; Pekurovsky, Dmitry; Luce, Jason; Thornton, Katsuyo; Lowengrub, John

The structural phase field crystal (XPFC) model can be used to model grain growth in polycrystalline materials at diffusive time-scales while maintaining atomic scale resolution. However, the governing equation of the XPFC model is an integral-partial-differential-equation (IPDE), which poses challenges in implementation onto high performance computing (HPC) platforms. In collaboration with the XSEDE Extended Collaborative Support Service, we developed a distributed memory HPC solver for the XPFC model, which combines parallel multigrid and P3DFFT. The performance benchmarking on the Stampede supercomputer indicates near linear strong and weak scaling for both multigrid and transfer time between multigrid and FFT modules up to 1024 cores. Scalability of the FFT module begins to decline at 128 cores, but it is sufficient for the type of problem we will be examining. We have demonstrated simulations using 1024 cores, and we expect to achieve 4096 cores and beyond. Ongoing work involves optimization of MPI/OpenMP-based codes for the Intel KNL Many-Core Architecture. This optimizes the code for coming pre-exascale systems, in particular many-core systems such as Stampede 2.0 and Cori 2 at NERSC, without sacrificing efficiency on other general HPC systems.
Multi-Dimensional, Mesoscopic Monte Carlo Simulations of Inhomogeneous Reaction-Drift-Diffusion Systems on Graphics-Processing Units

PubMed Central

Vigelius, Matthias; Meyer, Bernd

2012-01-01

For many biological applications, a macroscopic (deterministic) treatment of reaction-drift-diffusion systems is insufficient. Instead, one has to properly handle the stochastic nature of the problem and generate true sample paths of the underlying probability distribution. Unfortunately, stochastic algorithms are computationally expensive and, in most cases, the large number of participating particles renders the relevant parameter regimes inaccessible. In an attempt to address this problem we present a genuine stochastic, multi-dimensional algorithm that solves the inhomogeneous, non-linear, drift-diffusion problem on a mesoscopic level. Our method improves on existing implementations in being multi-dimensional and handling inhomogeneous drift and diffusion. The algorithm is well suited for an implementation on data-parallel hardware architectures such as general-purpose graphics processing units (GPUs). We integrate the method into an operator-splitting approach that decouples chemical reactions from the spatial evolution. We demonstrate the validity and applicability of our algorithm with a comprehensive suite of standard test problems that also serve to quantify the numerical accuracy of the method. We provide a freely available, fully functional GPU implementation. Integration into Inchman, a user-friendly web service, that allows researchers to perform parallel simulations of reaction-drift-diffusion systems on GPU clusters is underway. PMID:22506001
Automatic Management of Parallel and Distributed System Resources

NASA Technical Reports Server (NTRS)

Yan, Jerry; Ngai, Tin Fook; Lundstrom, Stephen F.

1990-01-01

Viewgraphs on automatic management of parallel and distributed system resources are presented. Topics covered include: parallel applications; intelligent management of multiprocessing systems; performance evaluation of parallel architecture; dynamic concurrent programs; compiler-directed system approach; lattice gaseous cellular automata; and sparse matrix Cholesky factorization.
Integrated electronics for time-resolved array of single-photon avalanche diodes

NASA Astrophysics Data System (ADS)

Acconcia, G.; Crotti, M.; Rech, I.; Ghioni, M.

2013-12-01

The Time Correlated Single Photon Counting (TCSPC) technique has reached a prominent position among analytical methods employed in a great variety of fields, from medicine and biology (fluorescence spectroscopy) to telemetry (laser ranging) and communication (quantum cryptography). Nevertheless the development of TCSPC acquisition systems featuring both a high number of parallel channels and very high performance is still an open challenge: to satisfy the tight requirements set by the applications, a fully parallel acquisition system requires not only high efficiency single photon detectors but also a read-out electronics specifically designed to obtain the highest performance in conjunction with these sensors. To this aim three main blocks have been designed: a gigahertz bandwidth front-end stage to directly read the custom technology SPAD array avalanche current, a reconfigurable logic to route the detectors output signals to the acquisition chain and an array of time measurement circuits capable of recording the photon arrival times with picoseconds time resolution and a very high linearity. An innovative architecture based on these three circuits will feature a very high number of detectors to perform a truly parallel spatial or spectral analysis and a smaller number of high performance time-to-amplitude converter offering very high performance and a very high conversion frequency while limiting the area occupation and power dissipation. The routing logic will make the dynamic connection between the two arrays possible in order to guarantee that no information gets lost.
ColDICE: A parallel Vlasov–Poisson solver using moving adaptive simplicial tessellation

DOE Office of Scientific and Technical Information (OSTI.GOV)

Sousbie, Thierry, E-mail: tsousbie@gmail.com; Department of Physics, The University of Tokyo, Tokyo 113-0033; Research Center for the Early Universe, School of Science, The University of Tokyo, Tokyo 113-0033

2016-09-15

Resolving numerically Vlasov–Poisson equations for initially cold systems can be reduced to following the evolution of a three-dimensional sheet evolving in six-dimensional phase-space. We describe a public parallel numerical algorithm consisting in representing the phase-space sheet with a conforming, self-adaptive simplicial tessellation of which the vertices follow the Lagrangian equations of motion. The algorithm is implemented both in six- and four-dimensional phase-space. Refinement of the tessellation mesh is performed using the bisection method and a local representation of the phase-space sheet at second order relying on additional tracers created when needed at runtime. In order to preserve in the bestmore » way the Hamiltonian nature of the system, refinement is anisotropic and constrained by measurements of local Poincaré invariants. Resolution of Poisson equation is performed using the fast Fourier method on a regular rectangular grid, similarly to particle in cells codes. To compute the density projected onto this grid, the intersection of the tessellation and the grid is calculated using the method of Franklin and Kankanhalli [65–67] generalised to linear order. As preliminary tests of the code, we study in four dimensional phase-space the evolution of an initially small patch in a chaotic potential and the cosmological collapse of a fluctuation composed of two sinusoidal waves. We also perform a “warm” dark matter simulation in six-dimensional phase-space that we use to check the parallel scaling of the code.« less
Physics-Based Preconditioning of a Compressible Flow Solver for Large-Scale Simulations of Additive Manufacturing Processes

NASA Astrophysics Data System (ADS)

Weston, Brian; Nourgaliev, Robert; Delplanque, Jean-Pierre

2017-11-01

We present a new block-based Schur complement preconditioner for simulating all-speed compressible flow with phase change. The conservation equations are discretized with a reconstructed Discontinuous Galerkin method and integrated in time with fully implicit time discretization schemes. The resulting set of non-linear equations is converged using a robust Newton-Krylov framework. Due to the stiffness of the underlying physics associated with stiff acoustic waves and viscous material strength effects, we solve for the primitive-variables (pressure, velocity, and temperature). To enable convergence of the highly ill-conditioned linearized systems, we develop a physics-based preconditioner, utilizing approximate block factorization techniques to reduce the fully-coupled 3×3 system to a pair of reduced 2×2 systems. We demonstrate that our preconditioned Newton-Krylov framework converges on very stiff multi-physics problems, corresponding to large CFL and Fourier numbers, with excellent algorithmic and parallel scalability. Results are shown for the classic lid-driven cavity flow problem as well as for 3D laser-induced phase change. This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Wakefield Computations for the CLIC PETS using the Parallel Finite Element Time-Domain Code T3P

DOE Office of Scientific and Technical Information (OSTI.GOV)

Candel, A; Kabel, A.; Lee, L.

In recent years, SLAC's Advanced Computations Department (ACD) has developed the high-performance parallel 3D electromagnetic time-domain code, T3P, for simulations of wakefields and transients in complex accelerator structures. T3P is based on advanced higher-order Finite Element methods on unstructured grids with quadratic surface approximation. Optimized for large-scale parallel processing on leadership supercomputing facilities, T3P allows simulations of realistic 3D structures with unprecedented accuracy, aiding the design of the next generation of accelerator facilities. Applications to the Compact Linear Collider (CLIC) Power Extraction and Transfer Structure (PETS) are presented.
Application of viscous and Iwan modal damping models to experimental measurements from bolted structures

DOE PAGES

Deaner, Brandon J.; Allen, Matthew S.; Starr, Michael James; ...

2015-01-20

Measurements are presented from a two-beam structure with several bolted interfaces in order to characterize the nonlinear damping introduced by the joints. The measurements (all at force levels below macroslip) reveal that each underlying mode of the structure is well approximated by a single degree-of-freedom (SDOF) system with a nonlinear mechanical joint. At low enough force levels, the measurements show dissipation that scales as the second power of the applied force, agreeing with theory for a linear viscously damped system. This is attributed to linear viscous behavior of the material and/or damping provided by the support structure. At larger forcemore » levels, the damping is observed to behave nonlinearly, suggesting that damping from the mechanical joints is dominant. A model is presented that captures these effects, consisting of a spring and viscous damping element in parallel with a four-parameter Iwan model. As a result, the parameters of this model are identified for each mode of the structure and comparisons suggest that the model captures the stiffness and damping accurately over a range of forcing levels.« less
Design and Analysis of a Compact Precision Positioning Platform Integrating Strain Gauges and the Piezoactuator

PubMed Central

Huang, Hu; Zhao, Hongwei; Yang, Zhaojun; Fan, Zunqiang; Wan, Shunguang; Shi, Chengli; Ma, Zhichao

2012-01-01

Miniaturization precision positioning platforms are needed for in situ nanomechanical test applications. This paper proposes a compact precision positioning platform integrating strain gauges and the piezoactuator. Effects of geometric parameters of two parallel plates on Von Mises stress distribution as well as static and dynamic characteristics of the platform were studied by the finite element method. Results of the calibration experiment indicate that the strain gauge sensor has good linearity and its sensitivity is about 0.0468 mV/μm. A closed-loop control system was established to solve the problem of nonlinearity of the platform. Experimental results demonstrate that for the displacement control process, both the displacement increasing portion and the decreasing portion have good linearity, verifying that the control system is available. The developed platform has a compact structure but can realize displacement measurement with the embedded strain gauges, which is useful for the closed-loop control and structure miniaturization of piezo devices. It has potential applications in nanoindentation and nanoscratch tests, especially in the field of in situ nanomechanical testing which requires compact structures. PMID:23012566
Broadband Venetian-Blind Polarizer With Dual Vanes

NASA Technical Reports Server (NTRS)

Conroy, Bruce L.; Hoppe, Daniel J.

1995-01-01

Improved venetian-blind polarizer features optimized tandem, two-layer vane configuration reducing undesired reflections and deformation of radiation pattern below those of prior single-layer vane configuration. Consists of number of thin, parallel metal strips placed in path of propagating radio-frequency beam. Offers simple way to convert polarization from linear to circular or from circular to linear. Particularly useful for beam-wave-guide applications.
Introducing PROFESS 2.0: A parallelized, fully linear scaling program for orbital-free density functional theory calculations

NASA Astrophysics Data System (ADS)

Hung, Linda; Huang, Chen; Shin, Ilgyou; Ho, Gregory S.; Lignères, Vincent L.; Carter, Emily A.

2010-12-01

Orbital-free density functional theory (OFDFT) is a first principles quantum mechanics method to find the ground-state energy of a system by variationally minimizing with respect to the electron density. No orbitals are used in the evaluation of the kinetic energy (unlike Kohn-Sham DFT), and the method scales nearly linearly with the size of the system. The PRinceton Orbital-Free Electronic Structure Software (PROFESS) uses OFDFT to model materials from the atomic scale to the mesoscale. This new version of PROFESS allows the study of larger systems with two significant changes: PROFESS is now parallelized, and the ion-electron and ion-ion terms scale quasilinearly, instead of quadratically as in PROFESS v1 (L. Hung and E.A. Carter, Chem. Phys. Lett. 475 (2009) 163). At the start of a run, PROFESS reads the various input files that describe the geometry of the system (ion positions and cell dimensions), the type of elements (defined by electron-ion pseudopotentials), the actions you want it to perform (minimize with respect to electron density and/or ion positions and/or cell lattice vectors), and the various options for the computation (such as which functionals you want it to use). Based on these inputs, PROFESS sets up a computation and performs the appropriate optimizations. Energies, forces, stresses, material geometries, and electron density configurations are some of the values that can be output throughout the optimization. New version program summaryProgram Title: PROFESS Catalogue identifier: AEBN_v2_0 Program summary URL:http://cpc.cs.qub.ac.uk/summaries/AEBN_v2_0.html Program obtainable from: CPC Program Library, Queen's University, Belfast, N. Ireland Licensing provisions: Standard CPC licence, http://cpc.cs.qub.ac.uk/licence/licence.html No. of lines in distributed program, including test data, etc.: 68 721 No. of bytes in distributed program, including test data, etc.: 1 708 547 Distribution format: tar.gz Programming language: Fortran 90 Computer: Intel with ifort; AMD Opteron with pathf90 Operating system: Linux Has the code been vectorized or parallelized?: Yes. Parallelization is implemented through domain composition using MPI. RAM: Problem dependent, but 2 GB is sufficient for up to 10,000 ions. Classification: 7.3 External routines: FFTW 2.1.5 ( http://www.fftw.org) Catalogue identifier of previous version: AEBN_v1_0 Journal reference of previous version: Comput. Phys. Comm. 179 (2008) 839 Does the new version supersede the previous version?: Yes Nature of problem: Given a set of coordinates describing the initial ion positions under periodic boundary conditions, recovers the ground state energy, electron density, ion positions, and cell lattice vectors predicted by orbital-free density functional theory. The computation of all terms is effectively linear scaling. Parallelization is implemented through domain decomposition, and up to ˜10,000 ions may be included in the calculation on just a single processor, limited by RAM. For example, when optimizing the geometry of ˜50,000 aluminum ions (plus vacuum) on 48 cores, a single iteration of conjugate gradient ion geometry optimization takes ˜40 minutes wall time. However, each CG geometry step requires two or more electron density optimizations, so step times will vary. Solution method: Computes energies as described in text; minimizes this energy with respect to the electron density, ion positions, and cell lattice vectors. Reasons for new version: To allow much larger systems to be simulated using PROFESS. Restrictions: PROFESS cannot use nonlocal (such as ultrasoft) pseudopotentials. A variety of local pseudopotential files are available at the Carter group website ( http://www.princeton.edu/mae/people/faculty/carter/homepage/research/localpseudopotentials/). Also, due to the current state of the kinetic energy functionals, PROFESS is only reliable for main group metals and some properties of semiconductors. Running time: Problem dependent: the test example provided with the code takes less than a second to run. Timing results for large scale problems are given in the PROFESS paper and Ref. [1].
Thought Leaders during Crises in Massive Social Networks

DOE Office of Scientific and Technical Information (OSTI.GOV)

Corley, Courtney D.; Farber, Robert M.; Reynolds, William

The vast amount of social media data that can be gathered from the internet coupled with workflows that utilize both commodity systems and massively parallel supercomputers, such as the Cray XMT, open new vistas for research to support health, defense, and national security. Computer technology now enables the analysis of graph structures containing more than 4 billion vertices joined by 34 billion edges along with metrics and massively parallel algorithms that exhibit near-linear scalability according to number of processors. The challenge lies in making this massive data and analysis comprehensible to an analyst and end-users that require actionable knowledge tomore » carry out their duties. Simply stated, we have developed language and content agnostic techniques to reduce large graphs built from vast media corpora into forms people can understand. Specifically, our tools and metrics act as a survey tool to identify thought leaders' -- those members that lead or reflect the thoughts and opinions of an online community, independent of the source language.« less
Optimal processor assignment for pipeline computations

NASA Technical Reports Server (NTRS)

Nicol, David M.; Simha, Rahul; Choudhury, Alok N.; Narahari, Bhagirath

1991-01-01

The availability of large scale multitasked parallel architectures introduces the following processor assignment problem for pipelined computations. Given a set of tasks and their precedence constraints, along with their experimentally determined individual responses times for different processor sizes, find an assignment of processor to tasks. Two objectives are of interest: minimal response given a throughput requirement, and maximal throughput given a response time requirement. These assignment problems differ considerably from the classical mapping problem in which several tasks share a processor; instead, it is assumed that a large number of processors are to be assigned to a relatively small number of tasks. Efficient assignment algorithms were developed for different classes of task structures. For a p processor system and a series parallel precedence graph with n constituent tasks, an O(np2) algorithm is provided that finds the optimal assignment for the response time optimization problem; it was found that the assignment optimizing the constrained throughput in O(np2log p) time. Special cases of linear, independent, and tree graphs are also considered.
Spectromicroscopy study of interfacial Co/NiO(001)

DOE Office of Scientific and Technical Information (OSTI.GOV)

van der Laan, Gerrit; Telling, Neil; Potenza, Alberto

2010-09-26

Photoemission electron microscopy (PEEM) with linearly polarized x-rays is used to determine the orientation of antiferromagnetic domains by monitoring the relative peak intensities at the 3d transition metal L{sub 2} absorption edge. In such an analysis the orientations of the x-ray polarization E and magnetization H with respect to the crystalline axes has to be taken into account. We address this problem by presenting a general expression of the angular dependence for both x-ray absorption spectroscopy and x-ray magnetic linear dichroism (XMLD) for arbitrary direction of E and H in the (001) cubic plane. In cubic symmetry the angular dependentmore » XMLD is a linear combination of two spectra with different photon energy dependence, which reduces to one spectrum when E or H is along a high-symmetry axis. The angular dependent XMLD can be separated into an isotropic term, which is symmetric along H, and an anisotropic term, which depends on the orientation of the crystal axes. The anisotropic term has maximal intensity when E and H have equal but opposite angles with respect to the [100] direction. The Ni{sup 2+} L{sub 2} edge has the peculiarity that the isotropic term vanishes, which means that the maximum in the XMLD intensity is observed not only for E {parallel} H {parallel} [100] but also for (E {parallel} [110], H {parallel} [110]). We apply the angular dependent theory to determine the spin orientation near the Co/NiO(100) interface. The PEEM images show that the ferromagnetic Co moments and antiferromagnetic NiO moments are aligned perpendicular to each other. By rotating the sample with respect to the linear x-ray polarization we furthermore find that the perpendicular coupling with the ferromagnetic Co layer at the interface causes a canting of the antiferromagnetic Ni moments. This shows that taking into account the angular dependence of the XMLD in the detailed analysis of PEEM images leads to an accurate retrieval of the spin axes of the antiferromagnetic domains.« less
Development and Evaluation of a Parallel Reaction Monitoring Strategy for Large-Scale Targeted Metabolomics Quantification.

PubMed

Zhou, Juntuo; Liu, Huiying; Liu, Yang; Liu, Jia; Zhao, Xuyang; Yin, Yuxin

2016-04-19

Recent advances in mass spectrometers which have yielded higher resolution and faster scanning speeds have expanded their application in metabolomics of diverse diseases. Using a quadrupole-Orbitrap LC-MS system, we developed an efficient large-scale quantitative method targeting 237 metabolites involved in various metabolic pathways using scheduled, parallel reaction monitoring (PRM). We assessed the dynamic range, linearity, reproducibility, and system suitability of the PRM assay by measuring concentration curves, biological samples, and clinical serum samples. The quantification performances of PRM and MS1-based assays in Q-Exactive were compared, and the MRM assay in QTRAP 6500 was also compared. The PRM assay monitoring 237 polar metabolites showed greater reproducibility and quantitative accuracy than MS1-based quantification and also showed greater flexibility in postacquisition assay refinement than the MRM assay in QTRAP 6500. We present a workflow for convenient PRM data processing using Skyline software which is free of charge. In this study we have established a reliable PRM methodology on a quadrupole-Orbitrap platform for evaluation of large-scale targeted metabolomics, which provides a new choice for basic and clinical metabolomics study.
High-voltage isolation transformer for sub-nanosecond rise time pulses constructed with annular parallel-strip transmission lines.

PubMed

Homma, Akira

2011-07-01

A novel annular parallel-strip transmission line was devised to construct high-voltage high-speed pulse isolation transformers. The transmission lines can easily realize stable high-voltage operation and good impedance matching between primary and secondary circuits. The time constant for the step response of the transformer was calculated by introducing a simple low-frequency equivalent circuit model. Results show that the relation between the time constant and low-cut-off frequency of the transformer conforms to the theory of the general first-order linear time-invariant system. Results also show that the test transformer composed of the new transmission lines can transmit about 600 ps rise time pulses across the dc potential difference of more than 150 kV with insertion loss of -2.5 dB. The measured effective time constant of 12 ns agreed exactly with the theoretically predicted value. For practical applications involving the delivery of synchronized trigger signals to a dc high-voltage electron gun station, the transformer described in this paper exhibited advantages over methods using fiber optic cables for the signal transfer system. This transformer has no jitter or breakdown problems that invariably occur in active circuit components.
File concepts for parallel I/O

NASA Technical Reports Server (NTRS)

Crockett, Thomas W.

1989-01-01

The subject of input/output (I/O) was often neglected in the design of parallel computer systems, although for many problems I/O rates will limit the speedup attainable. The I/O problem is addressed by considering the role of files in parallel systems. The notion of parallel files is introduced. Parallel files provide for concurrent access by multiple processes, and utilize parallelism in the I/O system to improve performance. Parallel files can also be used conventionally by sequential programs. A set of standard parallel file organizations is proposed, organizations are suggested, using multiple storage devices. Problem areas are also identified and discussed.
Chaos, complexity and complicatedness: lessons from rocket science.

PubMed

Norman, Geoff

2011-06-01

Recently several authors have drawn parallels between educational research and some theories of natural science, in particular complexity theory and chaos theory. The central claim is that both the natural science theories are useful metaphors for education research in that they deal with phenomena that involve many variables interacting in complex, non-linear and unstable ways, and leading to effects that are neither reproducible nor comprehensible. This paper presents a counter-argument. I begin by carefully examining the concepts of uncertainty, complexity and chaos, as described in physical science. I distinguish carefully between systems that are, respectively, complex, chaotic and complicated. I demonstrate that complex and chaotic systems have highly specific characteristics that are unlikely to be present in education systems. I then suggest that, in fact, there is ample evidence that human learning can be understood adequately with conventional linear models. The implications of these opposing world views are substantial. If education science has the properties of complex or chaotic systems, we should abandon any attempt at control or understanding. However, as I point out, to do so would ignore a number of recent developments in our understanding of learning that hold promise to yield substantial improvements in effectiveness and efficiency of learning. © Blackwell Publishing Ltd 2011.

Some links on this page may take you to non-federal websites. Their policies may differ from this site.